the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A continuous 2011–2022 record of fine particulate matter (PM2.5) in East Asia at daily 2-km resolution from geostationary satellite observations: population exposure and long-term trends
Abstract. We construct a continuous 24-h daily fine particulate matter (PM2.5) record with 2×2 km2 resolution over eastern China, South Korea, and Japan for 2011–2022 by applying a random forest (RF) algorithm to aerosol optical depth (AOD) observations from the Geostationary Ocean Color Imager (GOCI) I and II satellite instruments. The RF uses PM2.5 observations from the national surface networks as training data. PM2.5 network data starting in 2015 in South Korea are extended to pre-2015 with a RF trained on other air quality data available from the network including PM10. PM2.5 network data starting in 2014 in China are supplemented by pre-2014 data from the US embassy and consulates. Missing AODs in the GOCI data are gap-filled by a separate RF fit. We show that the resulting GOCI PM2.5 dataset is successful in reproducing the surface network observations including extreme events, and that the network data in the different countries are representative of population-weighted exposure. We find that PM2.5 peaked in 2014 (China) and 2013 (South Korea, Japan), and has been decreasing steadily since with no region left behind. We quantify the population in each country exposed to annual PM2.5 in excess of national ambient air quality standards and how this exposure evolves with time. The long record for the Seoul Metropolitan Area (SMA) shows a steady decrease from 2013 to 2022 that was not present in the first five years of AirKorea network PM2.5 measurements. Mapping of an extreme pollution event in Seoul with GOCI PM2.5 shows a predicted distribution indistinguishable from the dense urban network observations, while our previous 6×6 km2 product smoothed local features. Our product should be useful for public health studies where long-term spatial continuity of PM2.5 information is essential.
This preprint has been withdrawn.
-
Withdrawal notice
This preprint has been withdrawn.
-
Preprint
(7201 KB)
Interactive discussion
Status: closed
-
RC1: 'Comment on essd-2024-172', Anonymous Referee #1, 12 Jun 2024
Pendergrass et al. constructed a random forest model to estimate continuous daily PM2.5 data with 2×2 km2 resolution over eastern China, South Korea, and Japan for 2011-2022 with AOD observations from the GOCI I and II satellite instruments. Based on this dataset, long-term PM2.5 trends and population exposure were analyzed. Overall, the methods used for filling in missing AOD information and estimating PM2.5 concentrations are common and do not exhibit strong novelty. However, this paper reports the generation of long-term PM2.5 concentrations for East Asia based on GOCI AOD, which is a supplement to this field. I think it can be given a chance for revision, with specific comments as follows:
1. In recent years, studies have generated global 1-km PM2.5 concentration data, such as: https://doi.org/10.1038/s41467-023-43862-3, https://doi.org/10.5194/essd-16-2425-2024. The PM2.5 data generated here should be compared with existing products to elucidate the advantages and disadvantages of the PM2.5 data product in this study.
2.The preparation of this manuscript is not serious and meticulous enough. All the text in the figures is particularly large, which is unfriendly to readers. Some results can also be presented better, such as Figure 3, which looks very casual. There are also some model names, such as “AirKorea PM2.5 RF”.
3. The description of the methods needs improvement. First, it lacks an overall framework flowchart to clearly explain each process; secondly, each sub-process is not well understood. Another highly important point is that a lot of results are described in the methods section, which I think is inappropriate; the results should be summarized in the results section.
4. The experiments in the current version of this manuscript are insufficient, mainly reflected in: 1) The reconstruction process of AOD data only used cross-validation for accuracy verification, but in fact, it should also include validation with ground-based AERONET measurements to verify the reconstruction accuracy; 2) Figure 4 gives the overall accuracy of PM2.5 estimates, which is not enough. Further analysis of accuracy differences in different regions and different years can be done, as site data come from different regions and years (some years’ ground PM2.5 are even estimated). Therefore, the PM2.5 accuracy evaluation is insufficient.
5. Lines 52-61: The summary of PM2.5 remote sensing estimation methods is insufficient, with many methods not mentioned.
6. Table 1: Looks very cumbersome, and the presentation of this table needs improvement. There are too many annotations; can they be directly placed in the table?
7. Lines 157-158: What does “first aggregate GOCI I AOD into an 8-h average (0:30-7:30 UTC) and GOCI II AOD into a 10-h average (23:15-8:15 UTC)” mean? This looks very confusing.
8. Figure 2: Why are the annual results worse than the 24-h results? This is puzzling. The sample size may be insufficient from 2011-2014, but the cross-validation for 2015-2020 should not be like this, which is inconsistent with previous studies.
9. The “3 Results and discussion” section can actually be divided into subsections for display. The current version looks very chaotic.
10. The presentation of PM2.5 estimation results needs improvement. Currently, only Figure 6 shows the PM2.5 concentration for three years. Consider adding seasonal distribution, multi-year average distribution, etc.
11. Figure 6: There are still white regions in the results for the year 2012. Is this data missing? The meaning is unclear. If so, AOD is filled in for missing data; why is there still missing data?
12. Lines 327-338: The explanation of the changes in PM2.5 concentration trends is insufficient; explore the underlying reasons.
13. Lines 399-406: The explanation of the pollution event is too brief and needs to be supplemented.
14. This study still has some points worth discussing, such as the simplicity of the methods used and the sources of errors in PM2.5 estimates. It is recommended to supplement a ‘Discussion’ section to analyze these points.
15. A curious question: since there is hourly AOD data and hourly PM2.5 site data, why not consider estimating hourly PM2.5?
Citation: https://doi.org/10.5194/essd-2024-172-RC1 -
RC2: 'Comment on essd-2024-172', Donghyun Lee, 19 Jun 2024
This study is significant as it continuously records and analyzes air quality changes in East Asia (eastern China, South Korea, Japan) from 2011 to 2022 by high-resolution measurement of fine particulate matter (PM2.5). The researchers integrated Aerosol Optical Depth (AOD) data from the GOCI I and GOCI II satellites with PM2.5 data from ground observation networks using a Random Forest (RF) algorithm to estimate daily PM2.5 concentrations at a 2×2 km² resolution.
The study provides a foundation for analyzing long-term air quality changes through continuous data over 12 years. The 2×2 km² resolution data enables detailed analysis of PM2.5 concentration changes at urban and regional levels, contributing significantly to evaluating air pollution control policies and public health research. Additionally, the use of satellite observation data allows for PM2.5 estimation in areas lacking ground observation data. The use of the Random Forest model improves data accuracy, filling in missing AOD data to create consistent PM2.5 concentration data. Notably, the model maintained relatively high prediction accuracy even during high pollution events.
However, several limitations exist. AOD data inherently has uncertainties due to satellite observation characteristics, especially under weather conditions like clouds or snow cover, which can cause AOD data to be missing or less accurate. GOCI II's AOD data tends to have a lower bias over land, affecting PM2.5 predictions. There is also uncertainty in adjusting discrepancies between GOCI I and GOCI II data. These limitations need to be clearly addressed.
Additionally, the Random Forest model itself has limitations. While it performs well for specific regions or periods, generalizing to other regions or periods can be challenging. High pollution events are particularly difficult to predict accurately. Variability in weather conditions also significantly impacts PM2.5 predictions. Seasonal changes or weather anomalies can reduce the model's accuracy.
Detailed information on the statistical properties(E.g. Descriptive statistics table) and missing values of both AOD and meteorological observation data is necessary. This information will clarify the current state and limitations of the data. Furthermore, providing detailed descriptions and diagrams of the data integration and preprocessing methods would enhance readers' understanding.
This study's integration of satellite and ground observation data to create long-term, high-resolution PM2.5 concentration data is valuable for air quality assessment and public health research. Addressing these limitations through further research will enable more precise and reliable air quality assessments.
Citation: https://doi.org/10.5194/essd-2024-172-RC2 -
RC3: 'Comment on essd-2024-172', Anonymous Referee #3, 25 Jun 2024
Over the past decade, many researchers have used aerosol optical depth (AOD) data obtained from polar-orbiting or geostationary satellite sensors to estimate near-surface PM2.5 concentrations at various temporal and spatial resolutions. This has become a widely discussed research direction in atmospheric environmental science. The study by Pendergrass et al. is aligned with this trend; they used 12 years of continuous AOD data from the GOCI I and II satellite instruments, employing a random forest (RF) model to estimate the daily PM2.5 concentrations over parts of the East Asian land area (mainly the area covered by GOCI). The authors used this dataset to analyze long-term trends of PM2.5 and its impact on population exposure. Overall, this paper extends the authors’ previous research by expanding the temporal length and filling in spatial gaps. However, compared to existing global daily PM2.5 datasets based on satellite AOD, the dataset generated in this study does not show significant advantages in spatial and temporal resolution. Therefore, unless this study undergoes significant modifications, I do not consider this study meets the requirements for publication in ESSD. Below are my specific suggestions and comments for improvement this work:
1. As mentioned by the authors, GOCI provides hourly AOD observations. I hope the authors will extend the daily scale to hourly scale PM2.5 retrievals, which would be more attractive.
2. Considering that the authors use a trained model to hindcast historical data, site-based 10-fold cross-validation (CV) alone is insufficient. I suggest using hindcast-validation (e.g., using 2012-2014 data to predict 2011) to further assess the predictive capability of the model.
3. In recent years, machine learning models with higher computational efficiency and accuracy than RF (such as LGB, CatBoost) have been widely used. Please explain the reason for choosing RF and whether other models would improve prediction accuracy.
4. Many scholars have produced global and regional daily PM₂.₅ datasets (e.g., CHAP, TAP, LGHAP) in recent years. I suggest comparing the dataset produced in this study with them for mutual validation.
5. Using GEOS-Chem simulated monthly AOD for 2016 in the model might not be advantageous. If background AOD information is needed, I recommend using hourly assimilated AOD products provided by MERRA-2 or CAMS. Please explain the advantage of using GEOS-Chem’s simulated monthly AOD for a fixed year over reanalysis AOD products.
6. In addition to annual averages, adding seasonal trend analyses and exposure assessments of PM2.5 would benefit readers.
7. The authors mention a “24-hour resolution” multiple times, which can be misleading as hourly inversion. I suggest replacing it with “daily average” or “24-hour daily average”.
8. Figure 6 has three sets of colors but only one legend. Please complete the legend information.
Citation: https://doi.org/10.5194/essd-2024-172-RC3
Interactive discussion
Status: closed
-
RC1: 'Comment on essd-2024-172', Anonymous Referee #1, 12 Jun 2024
Pendergrass et al. constructed a random forest model to estimate continuous daily PM2.5 data with 2×2 km2 resolution over eastern China, South Korea, and Japan for 2011-2022 with AOD observations from the GOCI I and II satellite instruments. Based on this dataset, long-term PM2.5 trends and population exposure were analyzed. Overall, the methods used for filling in missing AOD information and estimating PM2.5 concentrations are common and do not exhibit strong novelty. However, this paper reports the generation of long-term PM2.5 concentrations for East Asia based on GOCI AOD, which is a supplement to this field. I think it can be given a chance for revision, with specific comments as follows:
1. In recent years, studies have generated global 1-km PM2.5 concentration data, such as: https://doi.org/10.1038/s41467-023-43862-3, https://doi.org/10.5194/essd-16-2425-2024. The PM2.5 data generated here should be compared with existing products to elucidate the advantages and disadvantages of the PM2.5 data product in this study.
2.The preparation of this manuscript is not serious and meticulous enough. All the text in the figures is particularly large, which is unfriendly to readers. Some results can also be presented better, such as Figure 3, which looks very casual. There are also some model names, such as “AirKorea PM2.5 RF”.
3. The description of the methods needs improvement. First, it lacks an overall framework flowchart to clearly explain each process; secondly, each sub-process is not well understood. Another highly important point is that a lot of results are described in the methods section, which I think is inappropriate; the results should be summarized in the results section.
4. The experiments in the current version of this manuscript are insufficient, mainly reflected in: 1) The reconstruction process of AOD data only used cross-validation for accuracy verification, but in fact, it should also include validation with ground-based AERONET measurements to verify the reconstruction accuracy; 2) Figure 4 gives the overall accuracy of PM2.5 estimates, which is not enough. Further analysis of accuracy differences in different regions and different years can be done, as site data come from different regions and years (some years’ ground PM2.5 are even estimated). Therefore, the PM2.5 accuracy evaluation is insufficient.
5. Lines 52-61: The summary of PM2.5 remote sensing estimation methods is insufficient, with many methods not mentioned.
6. Table 1: Looks very cumbersome, and the presentation of this table needs improvement. There are too many annotations; can they be directly placed in the table?
7. Lines 157-158: What does “first aggregate GOCI I AOD into an 8-h average (0:30-7:30 UTC) and GOCI II AOD into a 10-h average (23:15-8:15 UTC)” mean? This looks very confusing.
8. Figure 2: Why are the annual results worse than the 24-h results? This is puzzling. The sample size may be insufficient from 2011-2014, but the cross-validation for 2015-2020 should not be like this, which is inconsistent with previous studies.
9. The “3 Results and discussion” section can actually be divided into subsections for display. The current version looks very chaotic.
10. The presentation of PM2.5 estimation results needs improvement. Currently, only Figure 6 shows the PM2.5 concentration for three years. Consider adding seasonal distribution, multi-year average distribution, etc.
11. Figure 6: There are still white regions in the results for the year 2012. Is this data missing? The meaning is unclear. If so, AOD is filled in for missing data; why is there still missing data?
12. Lines 327-338: The explanation of the changes in PM2.5 concentration trends is insufficient; explore the underlying reasons.
13. Lines 399-406: The explanation of the pollution event is too brief and needs to be supplemented.
14. This study still has some points worth discussing, such as the simplicity of the methods used and the sources of errors in PM2.5 estimates. It is recommended to supplement a ‘Discussion’ section to analyze these points.
15. A curious question: since there is hourly AOD data and hourly PM2.5 site data, why not consider estimating hourly PM2.5?
Citation: https://doi.org/10.5194/essd-2024-172-RC1 -
RC2: 'Comment on essd-2024-172', Donghyun Lee, 19 Jun 2024
This study is significant as it continuously records and analyzes air quality changes in East Asia (eastern China, South Korea, Japan) from 2011 to 2022 by high-resolution measurement of fine particulate matter (PM2.5). The researchers integrated Aerosol Optical Depth (AOD) data from the GOCI I and GOCI II satellites with PM2.5 data from ground observation networks using a Random Forest (RF) algorithm to estimate daily PM2.5 concentrations at a 2×2 km² resolution.
The study provides a foundation for analyzing long-term air quality changes through continuous data over 12 years. The 2×2 km² resolution data enables detailed analysis of PM2.5 concentration changes at urban and regional levels, contributing significantly to evaluating air pollution control policies and public health research. Additionally, the use of satellite observation data allows for PM2.5 estimation in areas lacking ground observation data. The use of the Random Forest model improves data accuracy, filling in missing AOD data to create consistent PM2.5 concentration data. Notably, the model maintained relatively high prediction accuracy even during high pollution events.
However, several limitations exist. AOD data inherently has uncertainties due to satellite observation characteristics, especially under weather conditions like clouds or snow cover, which can cause AOD data to be missing or less accurate. GOCI II's AOD data tends to have a lower bias over land, affecting PM2.5 predictions. There is also uncertainty in adjusting discrepancies between GOCI I and GOCI II data. These limitations need to be clearly addressed.
Additionally, the Random Forest model itself has limitations. While it performs well for specific regions or periods, generalizing to other regions or periods can be challenging. High pollution events are particularly difficult to predict accurately. Variability in weather conditions also significantly impacts PM2.5 predictions. Seasonal changes or weather anomalies can reduce the model's accuracy.
Detailed information on the statistical properties(E.g. Descriptive statistics table) and missing values of both AOD and meteorological observation data is necessary. This information will clarify the current state and limitations of the data. Furthermore, providing detailed descriptions and diagrams of the data integration and preprocessing methods would enhance readers' understanding.
This study's integration of satellite and ground observation data to create long-term, high-resolution PM2.5 concentration data is valuable for air quality assessment and public health research. Addressing these limitations through further research will enable more precise and reliable air quality assessments.
Citation: https://doi.org/10.5194/essd-2024-172-RC2 -
RC3: 'Comment on essd-2024-172', Anonymous Referee #3, 25 Jun 2024
Over the past decade, many researchers have used aerosol optical depth (AOD) data obtained from polar-orbiting or geostationary satellite sensors to estimate near-surface PM2.5 concentrations at various temporal and spatial resolutions. This has become a widely discussed research direction in atmospheric environmental science. The study by Pendergrass et al. is aligned with this trend; they used 12 years of continuous AOD data from the GOCI I and II satellite instruments, employing a random forest (RF) model to estimate the daily PM2.5 concentrations over parts of the East Asian land area (mainly the area covered by GOCI). The authors used this dataset to analyze long-term trends of PM2.5 and its impact on population exposure. Overall, this paper extends the authors’ previous research by expanding the temporal length and filling in spatial gaps. However, compared to existing global daily PM2.5 datasets based on satellite AOD, the dataset generated in this study does not show significant advantages in spatial and temporal resolution. Therefore, unless this study undergoes significant modifications, I do not consider this study meets the requirements for publication in ESSD. Below are my specific suggestions and comments for improvement this work:
1. As mentioned by the authors, GOCI provides hourly AOD observations. I hope the authors will extend the daily scale to hourly scale PM2.5 retrievals, which would be more attractive.
2. Considering that the authors use a trained model to hindcast historical data, site-based 10-fold cross-validation (CV) alone is insufficient. I suggest using hindcast-validation (e.g., using 2012-2014 data to predict 2011) to further assess the predictive capability of the model.
3. In recent years, machine learning models with higher computational efficiency and accuracy than RF (such as LGB, CatBoost) have been widely used. Please explain the reason for choosing RF and whether other models would improve prediction accuracy.
4. Many scholars have produced global and regional daily PM₂.₅ datasets (e.g., CHAP, TAP, LGHAP) in recent years. I suggest comparing the dataset produced in this study with them for mutual validation.
5. Using GEOS-Chem simulated monthly AOD for 2016 in the model might not be advantageous. If background AOD information is needed, I recommend using hourly assimilated AOD products provided by MERRA-2 or CAMS. Please explain the advantage of using GEOS-Chem’s simulated monthly AOD for a fixed year over reanalysis AOD products.
6. In addition to annual averages, adding seasonal trend analyses and exposure assessments of PM2.5 would benefit readers.
7. The authors mention a “24-hour resolution” multiple times, which can be misleading as hourly inversion. I suggest replacing it with “daily average” or “24-hour daily average”.
8. Figure 6 has three sets of colors but only one legend. Please complete the legend information.
Citation: https://doi.org/10.5194/essd-2024-172-RC3
Data sets
Continuous 2011-2022 record of fine particulate matter (PM2.5) in East Asia at daily 2-km resolution from GOCI I and II satellite observations Drew C. Pendergrass, Daniel J. Jacob, Yujin J. Oak, Jeewoo Lee, Minseok Kim, Jhoon Kim, Seoyoung Lee, Shixian Zhai, Hitoshi Irie, and Hong Liao https://doi.org/10.7910/DVN/0GO7BS
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
528 | 124 | 24 | 676 | 23 | 28 |
- HTML: 528
- PDF: 124
- XML: 24
- Total: 676
- BibTeX: 23
- EndNote: 28
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1