Reconstructing 6-hourly PM2.5 datasets from 1960 to 2020 in China
- 1State Key Laboratory of Severe Weather & Key Laboratory of Atmospheric Chemistry of CMA, Chinese Academy of Meteorological Sciences, Beijing, 100081, China
- 2National Meteorological Information Center, Beijing, 100081, China
- 3Earth System Numerical Prediction Center, Beijing, 100081, China
- 4Center for Excellence in Regional Atmospheric Environment, IUE, Chinese Academy of Sciences, Xiamen, 361021, China
- 5Department of Earth System Science, Tsinghua University, Beijing 100084, China
- 1State Key Laboratory of Severe Weather & Key Laboratory of Atmospheric Chemistry of CMA, Chinese Academy of Meteorological Sciences, Beijing, 100081, China
- 2National Meteorological Information Center, Beijing, 100081, China
- 3Earth System Numerical Prediction Center, Beijing, 100081, China
- 4Center for Excellence in Regional Atmospheric Environment, IUE, Chinese Academy of Sciences, Xiamen, 361021, China
- 5Department of Earth System Science, Tsinghua University, Beijing 100084, China
Abstract. Fine particulate matter (PM2.5) has altered radiation balance on earth and raised environmental and health risks for decades, but only been monitored widely since 2013 in China. Historical long-term PM2.5 records with high temporal resolution are essential but lacking for both research and environmental management. Here, we reconstruct a site-based PM2.5 dataset at 6-hour intervals from 1960 to 2020 that combines long-term visibility, conventional meteorological observations, emissions, and elevation. The PM2.5 concentration at each site is estimated based on an advanced machine learning model, LightGBM, that takes advantage of spatial features from 20 surrounding meteorological stations. Our model's performance is comparable or even better than those of previous studies in by-year cross validation (CV) (R2=0.7) and spatial CV (R2=0.76), and more advantageous in long-term records and high temporal resolution. This model also reconstructs a 0.25°×0.25°, 6-hourly, gridded PM2.5 dataset by incorporating spatial features. The results show PM2.5 pollution worsens gradually or maintains before 2010 from an interdecadal scale but mitigates in the following decade. Although the turning points vary in different regions, PM2.5 mass concentrations in key regions decreased significantly after 2013 due to clean air actions. In particular, the annual average value of PM2.5 in 2020 is nearly at the lowest value in history since 1960. These two PM2.5 datasets (publicly available at https://doi.org/10.5281/zenodo.6372847) provide spatiotemporal variations at high resolution, which lay the foundation of research studies associated with air pollution, climate change, and atmospheric chemical reanalysis.
Junting Zhong et al.
Status: open (until 09 Jun 2022)
-
RC1: 'Comment on essd-2022-110', Anonymous Referee #1, 22 Apr 2022
reply
This paper constructed a long-term PM2.5 dataset in China based on visibility measurements and meteorological datasets. The quality of the data is well validated, and the trend and spatial variability of PM2.5 in China is also examined. This dataset is very useful in studying the long term changes of particulate matter pollution as well as aerosol radiative effects in China. The paper is also clearly presented and well written. I only have two comments:
1. In constructing the PM2.5 model, the authors used a set of rigorous variable-selection strategy, and selected a set of variables used for prediction. I wonder if the authors could also provide the relative importane of these variables? Is visibility the most important feature, or the emissions?
2. I think that the organization of the results section can be changed a bit for better logical flow. I understand that the authors logic to separate the discussion into temporal and spatial, and also split the valiate into these two parts. But typicall the readers would expect the resuls to be validated in full and then analyzed. So it might be better to move all the validation into a separate "Validation" section, and combine all trends and spatial distribuiton into a "Spatial-temporal variabiltiy" section. This is only my thought, the author can decide whether to change.
-
RC2: 'Comment on essd-2022-110', Anonymous Referee #2, 28 Apr 2022
reply
This manuscript reconstructed site-based and gridded PM2.5 datasets at six-hour intervals from 1960 to 2020 using visibility, traditional meteorological factors, and other variables based on machine learning methods. These two datasets’ quality was well evaluated using 10-fold CV, by-year CV, spatial CV, and independent validation and compared with other available datasets. It shows that the two PM2.5 datasets are more advantageous in long-term records and high temporal resolution, which would be of great value for evaluating long-term variations, radiative effects, and health impacts of PM2.5 in China. I suggest that this manuscript be published after addressing the following issues:
- There have been studies on the hourly PM5 estimations based on AOD data from geostationary satellites, such asHimawari 8. However, it needs to be acknowledged that AOD from geostationary satellites is only available during the daytime and the sequence time is relatively short. I suggest adding related studies and pointing out their strengths and weaknesses in the Introduction Section. Also, relationships between PM2.5 and visibility together with other meteorological variables have been widely documented in previous studies but lacking in this manuscript, it’s better to add relevant studies to make the content of this section more complete.
- It is mentioned in the manuscript that extracting spatial features can significantly improve the prediction accuracy of the model, but this is not verified in the manuscript. Adding some sensitivity experiments by setting two groups with/without extracted features will serve to demonstrate their impacts.
- In Section 3.3., the authors found the large biases among different public available PM5 datasets and proposed to apply ensemble average to multi-datasets. I’m curious about whether the authors consider the specific approach to fusing different PM2.5 datasets and how to evaluate the accuracy of the fused dataset.
- The authors specify the spatial resolution of the input data for constructing grid points in the text, and the current grid resolution is 0.25°. Is it possible to further improve the resolution while ensuring accuracy?
- What is the duration for the hourly meteorological records mentioned in the manuscript (L139)? Did they start in 1960 or in recent years? Please point it out.
- Are the CV results in Fig. 2 hourly, 6-hourly, or daily? It’s better to point out the time resolution in the title of Fig. 2.
- L423: The word “The” in “For by-year CV, The…” should be lowercase.
- L416: The verb be in “The sited-based PM2.5 dataset are in the CSV format, and the gridded dataset PM2.5 are…” should be singular.
Junting Zhong et al.
Data sets
Reconstructing 6-hourly PM2.5 datasets from 1960 to 2020 in China Zhong, Junting, Zhang, Xiaoye, Gui, Ke, Liao, Jie, Fei, Ye, Jiang, Lipeng, Guo, Lifeng, Liu, Liangke, Che, Huizheng, Wang, Yaqiang, Wang, Deying, & Zhou, Zijiang https://doi.org/10.5281/zenodo.6372846
Junting Zhong et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
405 | 115 | 12 | 532 | 5 | 8 |
- HTML: 405
- PDF: 115
- XML: 12
- Total: 532
- BibTeX: 5
- EndNote: 8
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1