the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
PM2.5 concentrations based on near-surface visibility at 4011 sites in the Northern Hemisphere from 1959 to 2022
Abstract. Long-term PM2.5 data are needed to study the atmospheric environment, human health, and climate change. PM2.5 measurements are sparsely distributed and of short duration. In this study, daily PM2.5 concentrations are estimated from 1959 to 2022 using a machine learning method at 4011 terrestrial sites in the Northern Hemisphere based on hourly atmospheric visibility data, which are extracted from the Meteorological Terminal Aviation Routine Weather Report (METAR). PM2.5 monitoring is the target of machine learning, and atmospheric visibility and other related variables are the inputs. The training results show that the slope between the estimated PM2.5 concentration and the monitored PM2.5 concentration is 0.946± 0.0002 within the 95 % confidence interval (CI), the coefficient of determination (R2) is 0.95, the root mean square error (RMSE) is 7.0 μg/m3, and the mean absolute error (MAE) is 3.1 μg/m3. The test results show that the slope between the predicted PM2.5 concentration and the monitored PM2.5 concentration is 0.862 ± 0.0010 within a 95 % CI, the R2 is 0.80, the RMSE is 13.5 μg/m3, and the MAE is 6.9 μg/m3. The multiyear mean PM2.5 concentrations from 1959 to 2022 in the United States, Canada, Europe, China, and India are 11.2 μg/m3, 8.2 μg/m3, 20.1 μg/m3, 51.3 μg/m3 and 88.6 μg/m3, respectively. PM2.5 is low and continues to decrease from 1959 to 2022. PM2.5 in the United States increases slightly at a rate of 0.38 μg/m3/decade from 1959 to 1990 and decreases at a rate of -1.32 μg/m3/decade from 1991 to 2022. Trends in Europe are positive (5.69 μg/m3/decade) from 1959 to 1972 and negative (-1.91 μg/m3/decade) from 1973 to 2022. Trends in China and India are increasing (3.04 and 3.35 μg/m3/decade, respectively) from 1959 to 2012 and decreasing (-38.82 and -42.84 μg/m3/decade, respectively) from 2013 to 2022. The dataset is available at National Tibetan Plateau / Third Pole Environment Data Center (https://doi.org/10.11888/Atmos.tpdc.301127) (Hao et al., 2024).
- Preprint
(3116 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 09 May 2024)
-
RC1: 'Comment on essd-2024-96', Anonymous Referee #1, 24 Apr 2024
reply
Hao et al. used the visibility to estimate the historical PM2.5 concentration in the northern hemisphere in the past 60 years. Overall, the topic is very interesting and the manuscript is well-organized. However, the manuscript still suffers from some major flaws and thus I recommend the manuscript for publication on ESSD after the following comments have been well addressed.
- Visibility is a useful tool to estimate the long-term PM2.5 concentration during a long period. However, the accuracy based on visibility was generally less than that based on AOD. Why do not you use the combination scheme of AOD and visibility? For instance, you could use AOD during 2000-2022, and use visibility before 2000. I think you should evaluate the performances of two schemes and compared the difference in your study.
- Visibility station is scattered around the world. Why do you only focus on China, Europe, US, and India? I think the estimates of long-term PM2.5 concentrations across the northern hemisphere might be more valuable. You could even construct the full-coverage grid-based PM2.5 dataset across the northern hemisphere.
- Section 3.2.2: The validation of constructed PM2.5 dataset in recent years might be not enough because the major novelty of this study is a long-term estimate. Thus, the authors should add more examinations of PM5 estimates before 2010 especially in China and India. I think the authors could search many previous references to obtain these ground-level observations.
- I think the comparison of your dataset with other reanalysis data might be not very necessary because the dataset in this study is site-based instead of grid-based. I think you must confirm your dataset is much superior to all of the previous reanalysis dataset if you want to compare them.
- Figure 14: Why do the PM2.5 in India experience dramatic decreases from 2010 to 2022? I think India proposed clean air policy since 2019. The authors should test the observations to examine whether the estimate is right.
Citation: https://doi.org/10.5194/essd-2024-96-RC1 -
RC2: 'Comment on essd-2024-96', Anonymous Referee #2, 28 Apr 2024
reply
Hao et al. utilized a machine learning method to estimate a long-term global PM2.5 dataset based on visibility data at a site scale. Comprehensive validation and analysis have confirmed the reliability and value of this dataset. However, there are some major issues that must be addressed before considering the manuscript for publication. The specific comments are as follows.
- L23-31: The representativeness of spatially distributed sparse station monitoring data for average concentrations on a national scale needs careful consideration. In China, PM2.5 monitoring stations are predominantly located in urban areas, where concentrations tend to be higher than in rural areas. Additionally, the methodology for calculating trends warrants clarification. Calculating regional trends across these locations is challenging due to the uneven distribution of monitoring sites. Chang et al. (2017) noted that the European network is more sparsely populated across its northern and eastern regions and therefore a simple average of the individual trends at each site does not yield an accurate regional trend. More robust conclusions could be drawn when estimating the spatiotemporal full-coverage dataset. Reference: Kai-Lan Chang, Irina Petropavlovskikh, Owen R. Cooper, Martin G. Schultz, Tao Wang; Regional trend analysis of surface ozone observations from monitoring networks in eastern North America, Europe and East Asia. Elementa: Science of the Anthropocene 1 January 2017; 5 50. doi: https://doi.org/10.1525/elementa.243
- L39-141: The content is repeated in the caption of Figure 1.
- L197: Does “2000” in sites as of 2000 refer to 2022 or 2020? Figure 1 indicates the sites in China have existed for only about ten years.
- L332: Please provide the full name of the abbreviation “CART”.
- How are PM2.5, visibility and meteorological data matched spatially, and what is the distance between PM2.5 and meteorological monitoring stations? Are there multiple PM2.5 sites that match the same meteorological and visibility stations, thereby providing the same features and different labels for the samples of these sites? This scenario is counterfactual.
- The verification method for the machine learning model may not be convincing, even if the cross-validation based on samples was used. Given the study aims to establish a long-term PM2.5 dataset, especially for historical periods lacking surface monitoring, the temporal generalization performance of the model is crucial. It is necessary to evaluate the performance based on data from the period not included in the training dataset. For instance, the model could be trained on data from before 2020 and tested on data from after 2020.
- L615: “Elevation of Meteorological Station” should be corrected to “Elevation of Visibility Station” in Figure 9. The same problem occurs in Figure 10.
- L805: There is no section 2.6.3, please check the full text.
Citation: https://doi.org/10.5194/essd-2024-96-RC2
Data sets
Daily PM2.5 concentration data at more than 4000 sites in the Northern Hemisphere from 1959 to 2022 Hongfei Hao, Kaicun Wang, Guocan Wu, Jianbao Liu, and Jing Li https://doi.org/10.11888/Atmos.tpdc.301127
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
242 | 34 | 17 | 293 | 11 | 15 |
- HTML: 242
- PDF: 34
- XML: 17
- Total: 293
- BibTeX: 11
- EndNote: 15
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1