A continuous 2011&ndash;2022 record of fine particulate matter (PM<sub>2.5</sub>) in East Asia at daily 2-km resolution from geostationary satellite observations: population exposure and long-term trends

Pendergrass, Drew C.; Jacob, Daniel J.; Oak, Yujin J.; Lee, Jeewoo; Kim, Minseok; Kim, Jhoon; Lee, Seoyoung; Zhai, Shixian; Irie, Hitoshi; Liao, Hong

doi:https://doi.org/10.5194/essd-2024-172

Preprints

https://doi.org/10.5194/essd-2024-172

Preprints

21 May 2024

| 21 May 2024

Status: this preprint has been withdrawn by the authors.

A continuous 2011–2022 record of fine particulate matter (PM_2.5) in East Asia at daily 2-km resolution from geostationary satellite observations: population exposure and long-term trends

Drew C. Pendergrass, Daniel J. Jacob, Yujin J. Oak, Jeewoo Lee, Minseok Kim, Jhoon Kim, Seoyoung Lee, Shixian Zhai, Hitoshi Irie, and Hong Liao

Abstract. We construct a continuous 24-h daily fine particulate matter (PM_2.5)record with 2×2 km² resolution over eastern China, South Korea, and Japan for 2011–2022 by applying a random forest (RF) algorithm to aerosol optical depth (AOD) observations from the Geostationary Ocean Color Imager (GOCI) I and II satellite instruments. The RF uses PM_2.5 observations from the national surface networks as training data. PM_2.5 network data starting in 2015 in South Korea are extended to pre-2015 with a RF trained on other air quality data available from the network including PM₁₀. PM_2.5 network data starting in 2014 in China are supplemented by pre-2014 data from the US embassy and consulates. Missing AODs in the GOCI data are gap-filled by a separate RF fit. We show that the resulting GOCI PM_2.5 dataset is successful in reproducing the surface network observations including extreme events, and that the network data in the different countries are representative of population-weighted exposure. We find that PM_2.5 peaked in 2014 (China) and 2013 (South Korea, Japan), and has been decreasing steadily since with no region left behind. We quantify the population in each country exposed to annual PM_2.5 in excess of national ambient air quality standards and how this exposure evolves with time. The long record for the Seoul Metropolitan Area (SMA) shows a steady decrease from 2013 to 2022 that was not present in the first five years of AirKorea network PM_2.5 measurements. Mapping of an extreme pollution event in Seoul with GOCI PM_2.5 shows a predicted distribution indistinguishable from the dense urban network observations, while our previous 6×6 km² product smoothed local features. Our product should be useful for public health studies where long-term spatial continuity of PM_2.5 information is essential.

This preprint has been withdrawn.

Received: 10 May 2024 – Discussion started: 21 May 2024

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 7201 KB)

Withdrawal notice
This preprint has been withdrawn.
Preprint (7201 KB)

Download & links

This preprint has been withdrawn.

Drew C. Pendergrass, Daniel J. Jacob, Yujin J. Oak, Jeewoo Lee, Minseok Kim, Jhoon Kim, Seoyoung Lee, Shixian Zhai, Hitoshi Irie, and Hong Liao

Interactive discussion

Status: closed

RC1: 'Comment on essd-2024-172', Anonymous Referee #1, 12 Jun 2024

Pendergrass et al. constructed a random forest model to estimate continuous daily PM2.5 data with 2×2 km2 resolution over eastern China, South Korea, and Japan for 2011-2022 with AOD observations from the GOCI I and II satellite instruments. Based on this dataset, long-term PM2.5 trends and population exposure were analyzed. Overall, the methods used for filling in missing AOD information and estimating PM2.5 concentrations are common and do not exhibit strong novelty. However, this paper reports the generation of long-term PM2.5 concentrations for East Asia based on GOCI AOD, which is a supplement to this field. I think it can be given a chance for revision, with specific comments as follows:
1. In recent years, studies have generated global 1-km PM2.5 concentration data, such as: https://doi.org/10.1038/s41467-023-43862-3, https://doi.org/10.5194/essd-16-2425-2024. The PM2.5 data generated here should be compared with existing products to elucidate the advantages and disadvantages of the PM2.5 data product in this study.
2.The preparation of this manuscript is not serious and meticulous enough. All the text in the figures is particularly large, which is unfriendly to readers. Some results can also be presented better, such as Figure 3, which looks very casual. There are also some model names, such as “AirKorea PM2.5 RF”.
3. The description of the methods needs improvement. First, it lacks an overall framework flowchart to clearly explain each process; secondly, each sub-process is not well understood. Another highly important point is that a lot of results are described in the methods section, which I think is inappropriate; the results should be summarized in the results section.
4. The experiments in the current version of this manuscript are insufficient, mainly reflected in: 1) The reconstruction process of AOD data only used cross-validation for accuracy verification, but in fact, it should also include validation with ground-based AERONET measurements to verify the reconstruction accuracy; 2) Figure 4 gives the overall accuracy of PM2.5 estimates, which is not enough. Further analysis of accuracy differences in different regions and different years can be done, as site data come from different regions and years (some years’ ground PM2.5 are even estimated). Therefore, the PM2.5 accuracy evaluation is insufficient.
5. Lines 52-61: The summary of PM2.5 remote sensing estimation methods is insufficient, with many methods not mentioned.
6. Table 1: Looks very cumbersome, and the presentation of this table needs improvement. There are too many annotations; can they be directly placed in the table?
7. Lines 157-158: What does “first aggregate GOCI I AOD into an 8-h average (0:30-7:30 UTC) and GOCI II AOD into a 10-h average (23:15-8:15 UTC)” mean? This looks very confusing.
8. Figure 2: Why are the annual results worse than the 24-h results? This is puzzling. The sample size may be insufficient from 2011-2014, but the cross-validation for 2015-2020 should not be like this, which is inconsistent with previous studies.
9. The “3 Results and discussion” section can actually be divided into subsections for display. The current version looks very chaotic.
10. The presentation of PM2.5 estimation results needs improvement. Currently, only Figure 6 shows the PM2.5 concentration for three years. Consider adding seasonal distribution, multi-year average distribution, etc.
11. Figure 6: There are still white regions in the results for the year 2012. Is this data missing? The meaning is unclear. If so, AOD is filled in for missing data; why is there still missing data?
12. Lines 327-338: The explanation of the changes in PM2.5 concentration trends is insufficient; explore the underlying reasons.
13. Lines 399-406: The explanation of the pollution event is too brief and needs to be supplemented.
14. This study still has some points worth discussing, such as the simplicity of the methods used and the sources of errors in PM2.5 estimates. It is recommended to supplement a ‘Discussion’ section to analyze these points.
15. A curious question: since there is hourly AOD data and hourly PM2.5 site data, why not consider estimating hourly PM2.5?

Citation: https://doi.org/10.5194/essd-2024-172-RC1
RC2: 'Comment on essd-2024-172', Donghyun Lee, 19 Jun 2024

This study is significant as it continuously records and analyzes air quality changes in East Asia (eastern China, South Korea, Japan) from 2011 to 2022 by high-resolution measurement of fine particulate matter (PM2.5). The researchers integrated Aerosol Optical Depth (AOD) data from the GOCI I and GOCI II satellites with PM2.5 data from ground observation networks using a Random Forest (RF) algorithm to estimate daily PM2.5 concentrations at a 2×2 km² resolution.
The study provides a foundation for analyzing long-term air quality changes through continuous data over 12 years. The 2×2 km² resolution data enables detailed analysis of PM2.5 concentration changes at urban and regional levels, contributing significantly to evaluating air pollution control policies and public health research. Additionally, the use of satellite observation data allows for PM2.5 estimation in areas lacking ground observation data. The use of the Random Forest model improves data accuracy, filling in missing AOD data to create consistent PM2.5 concentration data. Notably, the model maintained relatively high prediction accuracy even during high pollution events.
However, several limitations exist. AOD data inherently has uncertainties due to satellite observation characteristics, especially under weather conditions like clouds or snow cover, which can cause AOD data to be missing or less accurate. GOCI II's AOD data tends to have a lower bias over land, affecting PM2.5 predictions. There is also uncertainty in adjusting discrepancies between GOCI I and GOCI II data. These limitations need to be clearly addressed.
Additionally, the Random Forest model itself has limitations. While it performs well for specific regions or periods, generalizing to other regions or periods can be challenging. High pollution events are particularly difficult to predict accurately. Variability in weather conditions also significantly impacts PM2.5 predictions. Seasonal changes or weather anomalies can reduce the model's accuracy.
Detailed information on the statistical properties(E.g. Descriptive statistics table) and missing values of both AOD and meteorological observation data is necessary. This information will clarify the current state and limitations of the data. Furthermore, providing detailed descriptions and diagrams of the data integration and preprocessing methods would enhance readers' understanding.
This study's integration of satellite and ground observation data to create long-term, high-resolution PM2.5 concentration data is valuable for air quality assessment and public health research. Addressing these limitations through further research will enable more precise and reliable air quality assessments.

Citation: https://doi.org/10.5194/essd-2024-172-RC2
RC3: 'Comment on essd-2024-172', Anonymous Referee #3, 25 Jun 2024

Over the past decade, many researchers have used aerosol optical depth (AOD) data obtained from polar-orbiting or geostationary satellite sensors to estimate near-surface PM_2.5concentrations at various temporal and spatial resolutions. This has become a widely discussed research direction in atmospheric environmental science. The study by Pendergrass et al. is aligned with this trend; they used 12 years of continuous AOD data from the GOCI I and II satellite instruments, employing a random forest (RF) model to estimate the daily PM_2.5 concentrations over parts of the East Asian land area (mainly the area covered by GOCI). The authors used this dataset to analyze long-term trends of PM_2.5 and its impact on population exposure. Overall, this paper extends the authors’ previous research by expanding the temporal length and filling in spatial gaps. However, compared to existing global daily PM_2.5 datasets based on satellite AOD, the dataset generated in this study does not show significant advantages in spatial and temporal resolution. Therefore, unless this study undergoes significant modifications, I do not consider this study meets the requirements for publication in ESSD. Below are my specific suggestions and comments for improvement this work:
1. As mentioned by the authors, GOCI provides hourly AOD observations. I hope the authors will extend the daily scale to hourly scale PM_2.5retrievals, which would be more attractive.
2. Considering that the authors use a trained model to hindcast historical data, site-based 10-fold cross-validation (CV) alone is insufficient. I suggest using hindcast-validation (e.g., using 2012-2014 data to predict 2011) to further assess the predictive capability of the model.
3. In recent years, machine learning models with higher computational efficiency and accuracy than RF (such as LGB, CatBoost) have been widely used. Please explain the reason for choosing RF and whether other models would improve prediction accuracy.
4. Many scholars have produced global and regional daily PM₂.₅ datasets (e.g., CHAP, TAP, LGHAP) in recent years. I suggest comparing the dataset produced in this study with them for mutual validation.
5. Using GEOS-Chem simulated monthly AOD for 2016 in the model might not be advantageous. If background AOD information is needed, I recommend using hourly assimilated AOD products provided by MERRA-2 or CAMS. Please explain the advantage of using GEOS-Chem’s simulated monthly AOD for a fixed year over reanalysis AOD products.
6. In addition to annual averages, adding seasonal trend analyses and exposure assessments of PM_2.5 would benefit readers.
7. The authors mention a “24-hour resolution” multiple times, which can be misleading as hourly inversion. I suggest replacing it with “daily average” or “24-hour daily average”.
8. Figure 6 has three sets of colors but only one legend. Please complete the legend information.

Citation: https://doi.org/10.5194/essd-2024-172-RC3

Interactive discussion

Status: closed

RC1: 'Comment on essd-2024-172', Anonymous Referee #1, 12 Jun 2024

Pendergrass et al. constructed a random forest model to estimate continuous daily PM2.5 data with 2×2 km2 resolution over eastern China, South Korea, and Japan for 2011-2022 with AOD observations from the GOCI I and II satellite instruments. Based on this dataset, long-term PM2.5 trends and population exposure were analyzed. Overall, the methods used for filling in missing AOD information and estimating PM2.5 concentrations are common and do not exhibit strong novelty. However, this paper reports the generation of long-term PM2.5 concentrations for East Asia based on GOCI AOD, which is a supplement to this field. I think it can be given a chance for revision, with specific comments as follows:
1. In recent years, studies have generated global 1-km PM2.5 concentration data, such as: https://doi.org/10.1038/s41467-023-43862-3, https://doi.org/10.5194/essd-16-2425-2024. The PM2.5 data generated here should be compared with existing products to elucidate the advantages and disadvantages of the PM2.5 data product in this study.
2.The preparation of this manuscript is not serious and meticulous enough. All the text in the figures is particularly large, which is unfriendly to readers. Some results can also be presented better, such as Figure 3, which looks very casual. There are also some model names, such as “AirKorea PM2.5 RF”.
3. The description of the methods needs improvement. First, it lacks an overall framework flowchart to clearly explain each process; secondly, each sub-process is not well understood. Another highly important point is that a lot of results are described in the methods section, which I think is inappropriate; the results should be summarized in the results section.
4. The experiments in the current version of this manuscript are insufficient, mainly reflected in: 1) The reconstruction process of AOD data only used cross-validation for accuracy verification, but in fact, it should also include validation with ground-based AERONET measurements to verify the reconstruction accuracy; 2) Figure 4 gives the overall accuracy of PM2.5 estimates, which is not enough. Further analysis of accuracy differences in different regions and different years can be done, as site data come from different regions and years (some years’ ground PM2.5 are even estimated). Therefore, the PM2.5 accuracy evaluation is insufficient.
5. Lines 52-61: The summary of PM2.5 remote sensing estimation methods is insufficient, with many methods not mentioned.
6. Table 1: Looks very cumbersome, and the presentation of this table needs improvement. There are too many annotations; can they be directly placed in the table?
7. Lines 157-158: What does “first aggregate GOCI I AOD into an 8-h average (0:30-7:30 UTC) and GOCI II AOD into a 10-h average (23:15-8:15 UTC)” mean? This looks very confusing.
8. Figure 2: Why are the annual results worse than the 24-h results? This is puzzling. The sample size may be insufficient from 2011-2014, but the cross-validation for 2015-2020 should not be like this, which is inconsistent with previous studies.
9. The “3 Results and discussion” section can actually be divided into subsections for display. The current version looks very chaotic.
10. The presentation of PM2.5 estimation results needs improvement. Currently, only Figure 6 shows the PM2.5 concentration for three years. Consider adding seasonal distribution, multi-year average distribution, etc.
11. Figure 6: There are still white regions in the results for the year 2012. Is this data missing? The meaning is unclear. If so, AOD is filled in for missing data; why is there still missing data?
12. Lines 327-338: The explanation of the changes in PM2.5 concentration trends is insufficient; explore the underlying reasons.
13. Lines 399-406: The explanation of the pollution event is too brief and needs to be supplemented.
14. This study still has some points worth discussing, such as the simplicity of the methods used and the sources of errors in PM2.5 estimates. It is recommended to supplement a ‘Discussion’ section to analyze these points.
15. A curious question: since there is hourly AOD data and hourly PM2.5 site data, why not consider estimating hourly PM2.5?

Citation: https://doi.org/10.5194/essd-2024-172-RC1
RC2: 'Comment on essd-2024-172', Donghyun Lee, 19 Jun 2024

This study is significant as it continuously records and analyzes air quality changes in East Asia (eastern China, South Korea, Japan) from 2011 to 2022 by high-resolution measurement of fine particulate matter (PM2.5). The researchers integrated Aerosol Optical Depth (AOD) data from the GOCI I and GOCI II satellites with PM2.5 data from ground observation networks using a Random Forest (RF) algorithm to estimate daily PM2.5 concentrations at a 2×2 km² resolution.
The study provides a foundation for analyzing long-term air quality changes through continuous data over 12 years. The 2×2 km² resolution data enables detailed analysis of PM2.5 concentration changes at urban and regional levels, contributing significantly to evaluating air pollution control policies and public health research. Additionally, the use of satellite observation data allows for PM2.5 estimation in areas lacking ground observation data. The use of the Random Forest model improves data accuracy, filling in missing AOD data to create consistent PM2.5 concentration data. Notably, the model maintained relatively high prediction accuracy even during high pollution events.
However, several limitations exist. AOD data inherently has uncertainties due to satellite observation characteristics, especially under weather conditions like clouds or snow cover, which can cause AOD data to be missing or less accurate. GOCI II's AOD data tends to have a lower bias over land, affecting PM2.5 predictions. There is also uncertainty in adjusting discrepancies between GOCI I and GOCI II data. These limitations need to be clearly addressed.
Additionally, the Random Forest model itself has limitations. While it performs well for specific regions or periods, generalizing to other regions or periods can be challenging. High pollution events are particularly difficult to predict accurately. Variability in weather conditions also significantly impacts PM2.5 predictions. Seasonal changes or weather anomalies can reduce the model's accuracy.
Detailed information on the statistical properties(E.g. Descriptive statistics table) and missing values of both AOD and meteorological observation data is necessary. This information will clarify the current state and limitations of the data. Furthermore, providing detailed descriptions and diagrams of the data integration and preprocessing methods would enhance readers' understanding.
This study's integration of satellite and ground observation data to create long-term, high-resolution PM2.5 concentration data is valuable for air quality assessment and public health research. Addressing these limitations through further research will enable more precise and reliable air quality assessments.

Citation: https://doi.org/10.5194/essd-2024-172-RC2
RC3: 'Comment on essd-2024-172', Anonymous Referee #3, 25 Jun 2024

Over the past decade, many researchers have used aerosol optical depth (AOD) data obtained from polar-orbiting or geostationary satellite sensors to estimate near-surface PM_2.5concentrations at various temporal and spatial resolutions. This has become a widely discussed research direction in atmospheric environmental science. The study by Pendergrass et al. is aligned with this trend; they used 12 years of continuous AOD data from the GOCI I and II satellite instruments, employing a random forest (RF) model to estimate the daily PM_2.5 concentrations over parts of the East Asian land area (mainly the area covered by GOCI). The authors used this dataset to analyze long-term trends of PM_2.5 and its impact on population exposure. Overall, this paper extends the authors’ previous research by expanding the temporal length and filling in spatial gaps. However, compared to existing global daily PM_2.5 datasets based on satellite AOD, the dataset generated in this study does not show significant advantages in spatial and temporal resolution. Therefore, unless this study undergoes significant modifications, I do not consider this study meets the requirements for publication in ESSD. Below are my specific suggestions and comments for improvement this work:
1. As mentioned by the authors, GOCI provides hourly AOD observations. I hope the authors will extend the daily scale to hourly scale PM_2.5retrievals, which would be more attractive.
2. Considering that the authors use a trained model to hindcast historical data, site-based 10-fold cross-validation (CV) alone is insufficient. I suggest using hindcast-validation (e.g., using 2012-2014 data to predict 2011) to further assess the predictive capability of the model.
3. In recent years, machine learning models with higher computational efficiency and accuracy than RF (such as LGB, CatBoost) have been widely used. Please explain the reason for choosing RF and whether other models would improve prediction accuracy.
4. Many scholars have produced global and regional daily PM₂.₅ datasets (e.g., CHAP, TAP, LGHAP) in recent years. I suggest comparing the dataset produced in this study with them for mutual validation.
5. Using GEOS-Chem simulated monthly AOD for 2016 in the model might not be advantageous. If background AOD information is needed, I recommend using hourly assimilated AOD products provided by MERRA-2 or CAMS. Please explain the advantage of using GEOS-Chem’s simulated monthly AOD for a fixed year over reanalysis AOD products.
6. In addition to annual averages, adding seasonal trend analyses and exposure assessments of PM_2.5 would benefit readers.
7. The authors mention a “24-hour resolution” multiple times, which can be misleading as hourly inversion. I suggest replacing it with “daily average” or “24-hour daily average”.
8. Figure 6 has three sets of colors but only one legend. Please complete the legend information.

Citation: https://doi.org/10.5194/essd-2024-172-RC3

Drew C. Pendergrass, Daniel J. Jacob, Yujin J. Oak, Jeewoo Lee, Minseok Kim, Jhoon Kim, Seoyoung Lee, Shixian Zhai, Hitoshi Irie, and Hong Liao

Data sets

Continuous 2011-2022 record of fine particulate matter (PM2.5) in East Asia at daily 2-km resolution from GOCI I and II satellite observations Drew C. Pendergrass, Daniel J. Jacob, Yujin J. Oak, Jeewoo Lee, Minseok Kim, Jhoon Kim, Seoyoung Lee, Shixian Zhai, Hitoshi Irie, and Hong Liao https://doi.org/10.7910/DVN/0GO7BS

Drew C. Pendergrass, Daniel J. Jacob, Yujin J. Oak, Jeewoo Lee, Minseok Kim, Jhoon Kim, Seoyoung Lee, Shixian Zhai, Hitoshi Irie, and Hong Liao

Viewed

Total article views: 1,690 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,324	323	43	1,690	58	81

HTML: 1,324
PDF: 323
XML: 43
Total: 1,690
BibTeX: 58
EndNote: 81

Views and downloads (calculated since 21 May 2024)

Month	HTML	PDF	XML	Total
May 2024	164	61	7	232
Jun 2024	170	34	9	213
Jul 2024	86	13	3	102
Aug 2024	60	13	3	76
Sep 2024	46	3	2	51
Oct 2024	31	1	2	34
Nov 2024	34	5	1	40
Dec 2024	41	18	0	59
Jan 2025	40	22	0	62
Feb 2025	33	17	2	52
Mar 2025	16	6	1	23
Apr 2025	19	28	1	48
May 2025	30	13	2	45
Jun 2025	23	34	0	57
Jul 2025	29	10	2	41
Aug 2025	82	12	1	95
Sep 2025	379	15	6	400
Oct 2025	41	18	1	60

Cumulative views and downloads (calculated since 21 May 2024)

Month	HTML	PDF	XML	Total
May 2024	164	61	7	232
Jun 2024	170	34	9	213
Jul 2024	86	13	3	102
Aug 2024	60	13	3	76
Sep 2024	46	3	2	51
Oct 2024	31	1	2	34
Nov 2024	34	5	1	40
Dec 2024	41	18	0	59
Jan 2025	40	22	0	62
Feb 2025	33	17	2	52
Mar 2025	16	6	1	23
Apr 2025	19	28	1	48
May 2025	30	13	2	45
Jun 2025	23	34	0	57
Jul 2025	29	10	2	41
Aug 2025	82	12	1	95
Sep 2025	379	15	6	400
Oct 2025	41	18	1	60

Viewed (geographical distribution)

Total article views: 1,633 (including HTML, PDF, and XML) Thereof 1,633 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 29 Oct 2025

Download

This preprint has been withdrawn.

Preprint (7201 KB)
Metadata XML

Short summary

Fine particles suspended in the atmosphere are a major form of air pollution and an important public health burden. However, measurements of particulate matter are sparse in space and in places like East Asia monitors are established after regulatory policies to improve pollution have changed. In this paper, we use machine learning to fill in the gaps. We train an algorithm to predict pollution at the surface from the atmosphere’s opacity, then produce high resolution maps of data without gaps.


Total:	0
HTML:	0
PDF:	0
XML:	0

A continuous 2011–2022 record of fine particulate matter (PM2.5) in East Asia at daily 2-km resolution from geostationary satellite observations: population exposure and long-term trends

Interactive discussion

Interactive discussion

Data sets

Viewed

Viewed (geographical distribution)

A continuous 2011–2022 record of fine particulate matter (PM_2.5) in East Asia at daily 2-km resolution from geostationary satellite observations: population exposure and long-term trends