the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Retrieving Ground-Level PM2.5 Concentrations in China (2013–2021) with a Numerical Model-Informed Testbed to Mitigate Sample Imbalance-Induced Biases
Abstract. Ground-level PM2.5 data derived from satellites with machine learning are crucial for health and climate assessments, however, uncertainties persist due to the absence of spatially covered observations. To address this, we propose a novel testbed using untraditional numerical simulations to evaluate PM2.5 estimation across the entire spatial domain. The testbed emulates the general machine-learning approach, by training the model with grids corresponding to ground monitor sites and subsequently testing its predictive accuracy for other locations. Our approach enables comprehensive evaluation of various machine-learning methods’ performance in estimating PM2.5 across the spatial domain for the first time. Unexpected results are shown in the application in China, with larger PM2.5 biases found in densely populated regions with abundant ground observations across all benchmark models, challenging conventional expectations and are not explored in the recent literature. The imbalance in training samples, mostly from urban areas with high emissions, is the main reason, leading to significant overestimation due to the lack of monitors in downwind areas where PM2.5 is transported from urban areas with varying vertical profiles. Our proposed testbed also provides an efficient strategy for optimizing model structure or training samples to enhance satellite-retrieval model performance. Integration of spatiotemporal features, especially with CNN-based deep-learning approaches like the ResNet model, successfully mitigates PM2.5 overestimation (by 5–30 µg m-3) and corresponding exposure (by 3 million people • µg m-3) in the downwind area over the past nine years (2013–2021) compared to the traditional approach. Furthermore, the incorporation of 600 strategically positioned ground-measurement sites identified through the testbed is essential to achieve a more balanced distribution of training samples, thereby ensuring precise PM2.5 estimation and facilitating the assessment of associated impacts in China. In addition to presenting the retrieved surface PM2.5 concentrations in China from 2013 to 2021, this study provides a testbed dataset derived from physical modeling simulations which can serve to evaluate the performance of data-driven methodologies, such as machine learning, in estimating spatial PM2.5 concentrations for the community.
- Preprint
(1661 KB) - Metadata XML
-
Supplement
(3357 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on essd-2024-170', Anonymous Referee #1, 05 Jun 2024
This is an important paper for the community of air quality estimates from satellite remote sensing and utilizing machine learning, by assessing uncertainties due to the placement of ground-based observations in the gapless PM2.5 estimates based on such method. In doing so, the authors use synthetically modeled PM2.5 from a state-of-art high-resolution air quality model in China, and test the biases of PM2.5 estimates using the current site placement. The data processing, assumptions, and evaluation logics are clearly described. The analysis is overall sound and described in details. The findings are important and implicative for the development of PM2.5 estimates, and for future placement of PM2.5 monitoring sites in China and other polluted regions of the world. More work is needed to improve the applicability of the derived data, especially regarding the journal ESSD. I support the publication of this manuscript provided that the following comments can be addressed.
Major comments:
1) The main conclusion "Larger PM2.5 biases found in densely populated regions" needs more consideration. This statement is based on ABSOLUTE biases. Meanwhile, these regions associate with high PM2.5, so larger absolute PM2.5 biases are expected. Normalized (concentration-independent) biases statistics should also be provided to better discuss the results and guide the analysis.
2) It is unclear in the current manuscript, if the input data (predictors of PM2.5) are identical between the tree-based methods and the RedNet in Section 2. To make a fair comparison, please clarity this issue. If they are not consistent, please explain why.
3) The machine-learning estimation performed in this paper are annual averages at 27 km resolution. PM2.5 air quality applications usually require higher spatial resolution due to its strong heterogeneity and co-variability with population. For an ESSD paper, please add a paragraph to explain and discuss the potential applications of such data, since such applications are unclear to me. Also, the current form of data (in python .npy format) is very hard to use, and the data description in the README cannot support the users to use the data. Please consider changing them to more widely used format (e.g., netcdf) with geo-references, and providing more detailed descriptions. If the applicability of the generated data is not well justified and improved, I personally think this paper suits better for other journals like ACP, but I am open to leave the judgement of suitability to the editor.
Other comments:
1) Line 19-21: As outlined before, I believe relative errors should be discussed apart from absolute errors. The current statement might be misleading.
2) Line 64: Based on my experiences with the PM2.5 data in China, many sites can be apart from each other within <20 km. How did you deal with a 27-km grid cell containing >1 site? Furthermore, how would the 27 km model resolution affect the evaluation of conventional PM2.5 estimation approach, considering that these existing data in literature are estimated at finer (e.g., 1km) resolution? Overall, more discussion about model vs. desirable resolution, and how the insights from this paper could guide evaluation of PM2.5 estimates at finer resolution should be provided.
3) Line 125: Should firstly discuss the importance of representing varying aerosol vertical profiles in source and downwind areas in the Introduction Section.
4) Section 3.2 and Figure 3: The tree-based methods seem showing improved performance after adding surrounding features, and the “-new” RMSE values are comparable to ResNet-time. For example, the xgboost-new RMSE looks better than ResNet-time at distance >2 grids. Overall, I do not agree that the ResNet and ResNet-time approaches are that overwhelmingly superior. Please more accurately discuss these features and revise the evaluation of these approaches accordingly. Also, why are the RF-new or XgBoost-new results not shown on Figure 3b and 3c?
Citation: https://doi.org/10.5194/essd-2024-170-RC1 -
RC2: 'Comment on essd-2024-170', Anonymous Referee #2, 07 Jun 2024
The authors generated a new PM2.5 concentration dataset in China with a new PM2.5 modeling framework, aiming at mitigating sample imbalance-induced biases. The topic is of interest and is important for improving satellite-based PM2.5 modeling. The following flaws should be addressed to improve the quality of this manuscript.
1. The writing style of the introduction section is more likely a technical report since the authors used three paragraphs to describe the modeling method.
2. Section 2.1: the authors stated that the emission data of ABaCAS-EI has a spatial resolution of 1km-by-1km and a temporal resolution of 1 hour. Please double check the resolution. As reported by the data producer, the temporal resolution of this dataset is annual.
3. Line 99-100: "The remaining grid cells encompass the surrounding PM2.5, which has not been previously evaluated in other studies", the validation schemes for PM2.5 had been intercompared in previous studies, also including how to evaluate the model's extrapolation capacity.
4. How was the accuracy of the CTM results in rural areas (without CNEMC monitors) evaluated?
5. The NO2 column density from satellite suffers from significant data gaps, how did the authors account for this issue in their study?
6. Line 120-121: "the model is trained using data from only the first 25 days of each month,", why didn't choose randomly?
7. What parameters were used in machine learning models? In section 2.2 the authors mentioned meteorological factors, land use, and NO2 density. However, in section 2.3, the authors mentioned "Beyond simply including corresponding features from the surrounding neighborhood grid cells as additional predictors for predicting PM2.5 concentration at the target grid cells in decision tree-based methods". The whole logic flow is a bit confusing, which needs to be improved to ease the readership.
8. Line 156: "Therefore, the model, trained with urban sites, attributes more pollution to the ground level from the AOD", was AOD used as an explanatory variable?
9. Line 165-166: "we can conclude that the uneven distribution of sites introduces considerable biases in PM2.5 estimation within traditional methods that rely on local features.", previous machine-learned PM2.5 modeling relies largely on satellite AOD. The method used in this study relies largely on NO2 density, which is pretty high in urban and low in rural. Is this a possible reason for causing such modeling difference?10. The dataset presented in the study is good for evaluating existing or any new method for PM2.5 retrievals from satellites, while its spatial resolution is only 27km by 27km. Can it be also applied for studies using high resolution dataset?
Citation: https://doi.org/10.5194/essd-2024-170-RC2
Data sets
Numerical model-informed testbed for surface PM2.5 concentration over China and its estimates during 2013-2021 S. Li et al. https://doi.org/10.5281/zenodo.11122294
Model code and software
Numerical model-informed testbed for surface PM2.5 concentration over China and its estimates during 2013-2021 S. Li et al. https://doi.org/10.5281/zenodo.11122294
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
249 | 48 | 16 | 313 | 26 | 10 | 10 |
- HTML: 249
- PDF: 48
- XML: 16
- Total: 313
- Supplement: 26
- BibTeX: 10
- EndNote: 10
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1