the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A Pan-European, High-Resolution, Daily Total, Fine-Mode and Coarse-Mode Aerosol Optical Depth dataset based on Quantile Machine Learning
Abstract. Ambient particulate matter (PM) is a widespread air pollutant, consisting of a mixture of different particle species suspended in the air that negatively affects human health. Given the generally sparse distribution of in-situ PM measurement networks, spatially-resolved PM estimates are typically derived from Aerosol Optical Depth (AOD) obtained from satellites. However, satellite AOD data over land is affected by several limitations (e.g., data gaps; coarser resolution; higher uncertainty; unavailable or unreliable size fraction information), which weakens the relationship between AOD and PM. We have developed a 0.1 degree resolution daily AOD data set over Europe over the period 2003–2020, based on new Quantile Machine Learning (QML) models. The dataset provides reliable full-coverage AOD along with Fine-mode AOD (fAOD) and Coarse-mode AOD (cAOD), based on AERONET (AErosol RObotic NETwork) site observations and climate and air quality reanalyses. Our results show that the three QML AOD products guarantee better quality with an out-of-sample R2 equal to 0.68 for AOD, 0.66 for fAOD and 0.65 for cAOD, which is 23–92 %, 11–13 % and 115–132 % higher than the corresponding satellite or reanalysis products, respectively. Over 88.8 %, 80.5 % and 88.6 % of QML AOD, fAOD and cAOD predictions fall within ± 20 % Expected Error (EE) envelopes, respectively. Previous studies reported that Europe is one of the regions with the poorest satellite AOD-PM correlation (Pearson correlation coefficient (PCC) around 0.1). Our results show that the three QML products are more correlated with ground-level PMs, especially when they are paired with their corresponding PMs in terms of size: AOD with PM10, fAOD with PM2.5 and cAOD with PM coarse (R = 0.41, 0.45 and 0.26, respectively). Our results show that different PM size fractions may be better predicted using different AOD size fractions, instead of total AOD. QML long-term aerosol dataset (and associated models) not only fix some problems of existing AOD data, but also provide better tools to monitor and analyse fine-mode and coarse-mode aerosols in spatial and temporal dimensions, and to further investigate their impacts on human health, climate, visibility, and biogeochemical cycling. The QML datasets can be downloaded from https://doi.org/10.5281/zenodo.7756570 (Chen et al., 2023).
This preprint has been withdrawn.
-
Withdrawal notice
This preprint has been withdrawn.
-
Preprint
(3897 KB)
-
Supplement
(4037 KB)
-
This preprint has been withdrawn.
- Preprint
(3897 KB) - Metadata XML
-
Supplement
(4037 KB) - BibTeX
- EndNote
Interactive discussion
Status: closed
-
RC1: 'Comment on essd-2023-104', Anonymous Referee #1, 25 Apr 2023
The manuscript describes the application of a supervised machine learning algorithm (lightGBM) for the retrieval of AOD, fAOD, and cAOD over Europe. However, the method presented for aerosol retrieval is not new, and I have some main concerns about this study. Firstly, the claimed high-resolution (0.1 degree) aerosol product is questionable. Secondly, the validation of the proposed model shows severe overfitting.
Major concerns:
- The study claims that their AOD products were generated at a spatial resolution of 0.1 degrees. However, it should be noted that the key input variable, MAIAC AOD, only has a spatial resolution of 1km, and was eventually excluded from the models. Other variables used in the study have a lower spatial resolution than 0.1 degrees. Therefore, it is questionable whether the resulting product is truly a 0.1 degree product. Additionally, Figure 11 shows that the developed AOD (B1) does not provide better details than the CAMS AOD (0.75 degrees) and MERRA-2 AOD (0.625 degrees * 0.5 degrees).
- Based on the input variables listed in Table S2, it appears that only the CAMS reanalysis data provides information related to aerosol size. The study seems just used the lightGBM algorithm to correct the CAMS-based fAOD and cAOD using meteorological data.
- It is unclear how well the developed fAOD and cAOD models perform at locations where no AERONET data is available. It is also unclear whether the study used completely independent ground-based data to test the results, such as a test site that was not used in the training process. If the Table S3 intends to show this validation, but the R2 of fAOD decreased significantly from 0.68 to 0.56 in M3, suggesting that the model may have an severe issue with overfitting.
- During the lightGBM-based training for fAOD and cAOD, the AERONET only provides data for fAOD and cAOD at 500nm. However, it is unclear how the model was trained to calculate fAOD and cAOD at 550nm, which is a crucial issue that the paper did not address.
Specific concerns:
- In Figure 1, it is not clear how to use Boruta to select the variables.
- The caption of the Figure 3 says “Spatial and temporal distribution of the median value of AERONet (a) AOD, (b) fAOD and (c) cAOD data”. It makes me confused how (a), (b) and (c) reveal the temporal information.
- AERONET in the figure caption is “AERONet”, but in the text is “AERONET”.
- Typing errors: P10, L285, (Levy et al., 2010; Xiao et al., 2016; Yan et al., 2022)).
Citation: https://doi.org/10.5194/essd-2023-104-RC1 - AC1: 'Reply on RC1', Zhaoyue Chen, 23 May 2023
-
RC2: 'Comment on essd-2023-104', Anonymous Referee #2, 25 Apr 2023
This study presents a daily AOD data set over Europe over the period 2003-2020, which was derived by post-processing the current satellite and reanalysis products, based on Machine Learning method. The accuracy of the total AOD in this dataset has been greatly improved. At the same time, the dataset can provide additional fine/coarse AOD data, which are also relatively reliable and will be very helpful for particulate matter (PM) prediction. The dataset will be interesting for the scientific community. Therefore, I have some comments before it could be accepted for publication.
Major comments:
- For the Route in the absence of satellite data, the spatial resolution of all input reanalysis of AOD data (e.g. MERRA-2, CAMS) is relatively coarse lower than 0.1 degrees, it is not appropriate to increase the spatial resolution of final AOD product to 0.1 degrees through interpolation, as simple interpolation cannot increase the AOD variation in spatial details. I think the spatial resolution of the final AOD product should not be higher than the maximum spatial resolution of one of input reanalysis data.
- For the correction of total AOD, it can be understood that the information of AOD mainly comes from the AOD data of reanalysis product. But for obtaining fine AOD and coarse AOD, this study should clarify which input data plays a dominant role.
- I'm also curious, what would happen for QML AOD if two reanalysis datasets MERRA-2 and CAMS were not used as input data simultaneously?
Minor comments:
- In section 2, this manuscript should introduce the basic information of PM data, as it was used in subsequent experiments.
- Line 105, how about fAOD and cAOD at 550nm was interpolated?
- Line 108, I believe the MODIS MAIAC data that the manuscript used is Collection 6 (C6), not v6.1, as the C6.1 product (MCD19A2) has not yet completed production.
- Line 155, how is the MODIS 1km AOD product made to 0.1 degrees?
- Line 269, the description is not clear about“Sat scenario”and “Non-Sat scenario”, what do these two words mean? How to distinguish“Sat scenario”and “Non-Sat scenario”?
- Line 391, how was EE=±0.025 ±20 %/40 % determined? I think most literature uses 0.05 instead of 0.025.
Citation: https://doi.org/10.5194/essd-2023-104-RC2 - AC2: 'Reply on RC2', Zhaoyue Chen, 23 May 2023
-
RC3: 'Comment on essd-2023-104', Anonymous Referee #3, 07 Jun 2023
The dataset of AOD, fAOD and cAOD over Europe has application value for environment analysis. The machine learning method was used to produce daily AODs. The manuscript should be revised before considering publication.
General comments:
1 The spatial and temporal resolution of all input and output data for the machine learning should be listed. Due to the different resolutions of each data, the method of spatio-temporal matching should be clarified.
2 As the satellite AOD was given up, I think all the inputs are reanalysis data. So the temporal resolution of AOD, fAOD and cAOD is not necessary daily. Then, which one or some certain times in one day were selected to produce daily AOD, fAOD and cAOD? And Why?
3 Why chose LightGBM from kinds of machine learning methods? Decision-tree based machine learning methods would adopt some fixed thresholds, which may create systematic "boundary" in the product. For example, if the latitude was included in the input data, you can see a AOD systematic boundary at a latitude line. Other parameters has the similar affects.
4 The spatial distribution, I am not sure if it means some AERONET sites data were not used in training, and only used in test? If so, that's real spatial independent validation. If not, we can not give the accuracy over locations which has no AERONET site.
Minor comments:
1 The abbreviation should be explained at the first appearance, such as "NMB" in the supplement.
2 The section numbers are wrong in chapter 4.
Citation: https://doi.org/10.5194/essd-2023-104-RC3 - AC3: 'Reply on RC3', Zhaoyue Chen, 17 Jun 2023
Interactive discussion
Status: closed
-
RC1: 'Comment on essd-2023-104', Anonymous Referee #1, 25 Apr 2023
The manuscript describes the application of a supervised machine learning algorithm (lightGBM) for the retrieval of AOD, fAOD, and cAOD over Europe. However, the method presented for aerosol retrieval is not new, and I have some main concerns about this study. Firstly, the claimed high-resolution (0.1 degree) aerosol product is questionable. Secondly, the validation of the proposed model shows severe overfitting.
Major concerns:
- The study claims that their AOD products were generated at a spatial resolution of 0.1 degrees. However, it should be noted that the key input variable, MAIAC AOD, only has a spatial resolution of 1km, and was eventually excluded from the models. Other variables used in the study have a lower spatial resolution than 0.1 degrees. Therefore, it is questionable whether the resulting product is truly a 0.1 degree product. Additionally, Figure 11 shows that the developed AOD (B1) does not provide better details than the CAMS AOD (0.75 degrees) and MERRA-2 AOD (0.625 degrees * 0.5 degrees).
- Based on the input variables listed in Table S2, it appears that only the CAMS reanalysis data provides information related to aerosol size. The study seems just used the lightGBM algorithm to correct the CAMS-based fAOD and cAOD using meteorological data.
- It is unclear how well the developed fAOD and cAOD models perform at locations where no AERONET data is available. It is also unclear whether the study used completely independent ground-based data to test the results, such as a test site that was not used in the training process. If the Table S3 intends to show this validation, but the R2 of fAOD decreased significantly from 0.68 to 0.56 in M3, suggesting that the model may have an severe issue with overfitting.
- During the lightGBM-based training for fAOD and cAOD, the AERONET only provides data for fAOD and cAOD at 500nm. However, it is unclear how the model was trained to calculate fAOD and cAOD at 550nm, which is a crucial issue that the paper did not address.
Specific concerns:
- In Figure 1, it is not clear how to use Boruta to select the variables.
- The caption of the Figure 3 says “Spatial and temporal distribution of the median value of AERONet (a) AOD, (b) fAOD and (c) cAOD data”. It makes me confused how (a), (b) and (c) reveal the temporal information.
- AERONET in the figure caption is “AERONet”, but in the text is “AERONET”.
- Typing errors: P10, L285, (Levy et al., 2010; Xiao et al., 2016; Yan et al., 2022)).
Citation: https://doi.org/10.5194/essd-2023-104-RC1 - AC1: 'Reply on RC1', Zhaoyue Chen, 23 May 2023
-
RC2: 'Comment on essd-2023-104', Anonymous Referee #2, 25 Apr 2023
This study presents a daily AOD data set over Europe over the period 2003-2020, which was derived by post-processing the current satellite and reanalysis products, based on Machine Learning method. The accuracy of the total AOD in this dataset has been greatly improved. At the same time, the dataset can provide additional fine/coarse AOD data, which are also relatively reliable and will be very helpful for particulate matter (PM) prediction. The dataset will be interesting for the scientific community. Therefore, I have some comments before it could be accepted for publication.
Major comments:
- For the Route in the absence of satellite data, the spatial resolution of all input reanalysis of AOD data (e.g. MERRA-2, CAMS) is relatively coarse lower than 0.1 degrees, it is not appropriate to increase the spatial resolution of final AOD product to 0.1 degrees through interpolation, as simple interpolation cannot increase the AOD variation in spatial details. I think the spatial resolution of the final AOD product should not be higher than the maximum spatial resolution of one of input reanalysis data.
- For the correction of total AOD, it can be understood that the information of AOD mainly comes from the AOD data of reanalysis product. But for obtaining fine AOD and coarse AOD, this study should clarify which input data plays a dominant role.
- I'm also curious, what would happen for QML AOD if two reanalysis datasets MERRA-2 and CAMS were not used as input data simultaneously?
Minor comments:
- In section 2, this manuscript should introduce the basic information of PM data, as it was used in subsequent experiments.
- Line 105, how about fAOD and cAOD at 550nm was interpolated?
- Line 108, I believe the MODIS MAIAC data that the manuscript used is Collection 6 (C6), not v6.1, as the C6.1 product (MCD19A2) has not yet completed production.
- Line 155, how is the MODIS 1km AOD product made to 0.1 degrees?
- Line 269, the description is not clear about“Sat scenario”and “Non-Sat scenario”, what do these two words mean? How to distinguish“Sat scenario”and “Non-Sat scenario”?
- Line 391, how was EE=±0.025 ±20 %/40 % determined? I think most literature uses 0.05 instead of 0.025.
Citation: https://doi.org/10.5194/essd-2023-104-RC2 - AC2: 'Reply on RC2', Zhaoyue Chen, 23 May 2023
-
RC3: 'Comment on essd-2023-104', Anonymous Referee #3, 07 Jun 2023
The dataset of AOD, fAOD and cAOD over Europe has application value for environment analysis. The machine learning method was used to produce daily AODs. The manuscript should be revised before considering publication.
General comments:
1 The spatial and temporal resolution of all input and output data for the machine learning should be listed. Due to the different resolutions of each data, the method of spatio-temporal matching should be clarified.
2 As the satellite AOD was given up, I think all the inputs are reanalysis data. So the temporal resolution of AOD, fAOD and cAOD is not necessary daily. Then, which one or some certain times in one day were selected to produce daily AOD, fAOD and cAOD? And Why?
3 Why chose LightGBM from kinds of machine learning methods? Decision-tree based machine learning methods would adopt some fixed thresholds, which may create systematic "boundary" in the product. For example, if the latitude was included in the input data, you can see a AOD systematic boundary at a latitude line. Other parameters has the similar affects.
4 The spatial distribution, I am not sure if it means some AERONET sites data were not used in training, and only used in test? If so, that's real spatial independent validation. If not, we can not give the accuracy over locations which has no AERONET site.
Minor comments:
1 The abbreviation should be explained at the first appearance, such as "NMB" in the supplement.
2 The section numbers are wrong in chapter 4.
Citation: https://doi.org/10.5194/essd-2023-104-RC3 - AC3: 'Reply on RC3', Zhaoyue Chen, 17 Jun 2023
Data sets
A Pan-European, Quantile Machine learning (QML) based, Total, Fine-Mode and Coarse-Mode Aerosol Optical Depth dataset (QML AOD)) Zhao-yue Chen, Raul Méndez, Hervé Petetin, Aleksander Lacima, Carlos Pérez García-Pando, and Joan Ballester https://doi.org/10.5281/zenodo.7756570
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
1,085 | 221 | 57 | 1,363 | 96 | 50 | 52 |
- HTML: 1,085
- PDF: 221
- XML: 57
- Total: 1,363
- Supplement: 96
- BibTeX: 50
- EndNote: 52
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Raul Méndez
Hervé Petetin
Aleksander Lacima
Carlos Pérez García-Pando
Joan Ballester
This preprint has been withdrawn.
- Preprint
(3897 KB) - Metadata XML
-
Supplement
(4037 KB) - BibTeX
- EndNote