the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A Global Daily High Spatial-temporal Coverage Merged Tropospheric NO2 dataset (HSTCM-NO2) from 2007 to 2022 based on OMI and GOME-2
Abstract. Remote sensing based on satellites can provide long-term, consistent, and global coverage of NO2 (an important atmospheric air pollutant) as well as other trace gases. However, satellite data often miss data due to factors including but not limited to clouds, surface features, and aerosols. Moreover, one of the longest continuous observational platforms of NO2 observations from space, OMI, has suffered from missing data over certain rows since 2007, significantly reducing spatial coverage. This work uses the OMI based OMNO2 product, as well as an NO2 product from GOME-2 in combination with machine learning (XGBoost) and spatial interpolation (DINEOF) method to produce a 16-year global daily high spatial-temporal coverage merged tropospheric NO2 dataset (HSTCM-NO2, https://doi.org/10.5281/zenodo.10968462, Qin et al., 2024), which increases the global spatial coverage of NO2 by ~60 % compared to the original OMINO2 data. The HSTCM-NO2 dataset is validated using upward looking observations of NO2 (MAX-DOAS), other satellites (TROPOMI), and reanalysis products. The comparisons show that HSTCM-NO2 maintains a good correlation with the magnitude of other observational datasets, except for under heavily polluted conditions (>6×1015 molec.cm-2). This work also introduces a new validation technique to validate coherent spatial and temporal signals (EOF) and validates that the HSTCM-NO2 are not only consistent with the original OMNO2 data, but in some parts of the world effectively fill in missing gaps and yield a superior result when analyzing long-range atmospheric transport of NO2. The few differences are also reported to be related to areas in which the original OMNO2 signal was very low, which has been shown elsewhere, but not from this perspective, further validating that applying a minimum cutoff to retrieved NO2 data is essential. The reconstructed data product can effectively extend the utilization value of the original OMNO2 data, and the data quality of HSTCM-NO2 can meet the needs of scientific research.
- Preprint
(16388 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on essd-2024-146', Anonymous Referee #1, 07 Jun 2024
Comments for essd-2024-146
Satellite remote sensing can provide large amount data for air pollution research. However, the missing data due to factors of clouds and others. This has hinder the application of satellite data. Filling the missing data of satellite remote sensing has great significance. This paper merged OMI and GOME-2 NO2 data and produced a global HSTCM-NO2 dataset from 2007 to 2022, which can facilitate the scientific research of NO2 pollution. I only have some moderate comments.
1 In the abstract, they should introduce the model performance, such as the cross validation and external validation results.
2 Table 1. I think the figures in the table are not necessary. Please delete them to make the table more concise.
3 What is the purpose of Lines 108-112? It seems not relevant to the sections 2.1.1-2.1.3.
4 Section 2.5 should be simplified. There is no need to provide the equations of R2, RMSE, etc. Most people know them.
5 Delete “2.6 Empirical Orthogonal Functions” and change 2.7 to 2.6.
6 The validation method is not clear. I suggest them adding a section to introduce their validation strategy, including cross validation and external validation using MAX-DOAS, other satellites (TROPOMI), and reanalysis products.
7 Figure 9, please add the time scope.
8 Some previous studies have also fill OMI NO2 gaps in some countries such as in China. Please introduce them in the introduction section if necessary. E.g., Shao et al., 2023, Estimation of daily NO2 with explainable machine learning model in China, 2007–2020; Wu et al., 2023, A robust approach to deriving long-term daily surface NO2 levels across China: Correction to substantial estimation bias in back-extrapolation.
9 HSTCM-NO2 can improve the data to full coverage. This should be mentioned in abstract. Besides, “which increases the global spatial coverage of NO2 by ~60% compared to the original OMINO2 data”, the 60% here has ambiguity. I believe 60% here is the absolute coverage. But it can be misunderstood as the 60% of the original OMI data. Also revise relevant statement in the main text.
10 The method of SHAP should be moved to the method section.
Citation: https://doi.org/10.5194/essd-2024-146-RC1 -
RC2: 'Comment on essd-2024-146', Anonymous Referee #2, 27 Jun 2024
The paper “A Global Daily High Spatial-temporal Coverage Merged Tropospheric NO2 dataset (HSTCM-NO2) from 2007 to 2022 based on OMI and GOME-2”, by Qin and co-authors, presents a satellite NO2 dataset based on machine learning and spatial interpolation. I would recommend publication after few modifications.
As the study combines satellite data with morning and afternoon overpass time, additional recommendations for data use, such as data assimilation and model comparison, are suggested.
Second, polluted scenes are typically drawing more attention and performing less well in this work, therefore comments on how to improve the data for such scenes are recommended.
In addition, titles, dates, and/or colorbars in some figures are difficult to recognize. Please enlarge them for maps.
Content of Sect. 2.6 is missing.
Line 93 What are the advantages of machine learning and pattern recognition specifically? How do you compare the methods and results to previous works in terms of consistency and difference?
Line 131 How do you combine the three datasets and deal with their differences in instrument and algorithm?
Line 156 What is the reason to select these 3 stations?
Line 219 was -> is
Line 230 The slope and intercept deserve some discussion, as method 1 shows a reduced performance.
Line 244 1952 -> 1952)
Line 310 .,->.
Line 325 VCD and vertical column concentration are used Interchangeably, better be consistent. Please also be consistent with MAXDOAS or MAX-DOAS, machine learning or machine-learning, etc.
Line 394 Which color shows the results using both XGBoost and DINEOF? What does the red line show in the figure?
Line 420 Define the abbreviation RA first.
Citation: https://doi.org/10.5194/essd-2024-146-RC2 -
RC3: 'Comment on essd-2024-146', Anonymous Referee #2, 27 Jun 2024
The paper “A Global Daily High Spatial-temporal Coverage Merged Tropospheric NO2 dataset (HSTCM-NO2) from 2007 to 2022 based on OMI and GOME-2”, by Qin and co-authors, presents a satellite NO2 dataset based on machine learning and spatial interpolation. I would recommend publication after few modifications.
As the study combines satellite data with morning and afternoon overpass time, additional recommendations for data use, such as data assimilation and model comparison, are suggested.
Second, polluted scenes are typically drawing more attention and performing less well in this work, therefore comments on how to improve the data for such scenes are recommended.
In addition, titles, dates, and/or colorbars in some figures are difficult to recognize. Please enlarge them for maps.
Content of Sect. 2.6 is missing.
Line 93 What are the advantages of machine learning and pattern recognition specifically? How do you compare the methods and results to previous works in terms of consistency and difference?
Line 131 How do you combine the three datasets and deal with their differences in instrument and algorithm?
Line 156 What is the reason to select these 3 stations?
Line 219 was -> is
Line 230 The slope and intercept deserve some discussion, as method 1 shows a reduced performance.
Line 244 1952 -> 1952)
Line 310 .,->.
Line 325 VCD and vertical column concentration are used Interchangeably, better be consistent. Please also be consistent with MAXDOAS or MAX-DOAS, machine learning or machine-learning, etc.
Line 394 Which color shows the results using both XGBoost and DINEOF? What does the red line show in the figure?
Line 420 Define the abbreviation RA first.
Citation: https://doi.org/10.5194/essd-2024-146-RC3 - AC1: 'Comment on essd-2024-146', Jason Cohen, 22 Jul 2024
Status: closed
-
RC1: 'Comment on essd-2024-146', Anonymous Referee #1, 07 Jun 2024
Comments for essd-2024-146
Satellite remote sensing can provide large amount data for air pollution research. However, the missing data due to factors of clouds and others. This has hinder the application of satellite data. Filling the missing data of satellite remote sensing has great significance. This paper merged OMI and GOME-2 NO2 data and produced a global HSTCM-NO2 dataset from 2007 to 2022, which can facilitate the scientific research of NO2 pollution. I only have some moderate comments.
1 In the abstract, they should introduce the model performance, such as the cross validation and external validation results.
2 Table 1. I think the figures in the table are not necessary. Please delete them to make the table more concise.
3 What is the purpose of Lines 108-112? It seems not relevant to the sections 2.1.1-2.1.3.
4 Section 2.5 should be simplified. There is no need to provide the equations of R2, RMSE, etc. Most people know them.
5 Delete “2.6 Empirical Orthogonal Functions” and change 2.7 to 2.6.
6 The validation method is not clear. I suggest them adding a section to introduce their validation strategy, including cross validation and external validation using MAX-DOAS, other satellites (TROPOMI), and reanalysis products.
7 Figure 9, please add the time scope.
8 Some previous studies have also fill OMI NO2 gaps in some countries such as in China. Please introduce them in the introduction section if necessary. E.g., Shao et al., 2023, Estimation of daily NO2 with explainable machine learning model in China, 2007–2020; Wu et al., 2023, A robust approach to deriving long-term daily surface NO2 levels across China: Correction to substantial estimation bias in back-extrapolation.
9 HSTCM-NO2 can improve the data to full coverage. This should be mentioned in abstract. Besides, “which increases the global spatial coverage of NO2 by ~60% compared to the original OMINO2 data”, the 60% here has ambiguity. I believe 60% here is the absolute coverage. But it can be misunderstood as the 60% of the original OMI data. Also revise relevant statement in the main text.
10 The method of SHAP should be moved to the method section.
Citation: https://doi.org/10.5194/essd-2024-146-RC1 -
RC2: 'Comment on essd-2024-146', Anonymous Referee #2, 27 Jun 2024
The paper “A Global Daily High Spatial-temporal Coverage Merged Tropospheric NO2 dataset (HSTCM-NO2) from 2007 to 2022 based on OMI and GOME-2”, by Qin and co-authors, presents a satellite NO2 dataset based on machine learning and spatial interpolation. I would recommend publication after few modifications.
As the study combines satellite data with morning and afternoon overpass time, additional recommendations for data use, such as data assimilation and model comparison, are suggested.
Second, polluted scenes are typically drawing more attention and performing less well in this work, therefore comments on how to improve the data for such scenes are recommended.
In addition, titles, dates, and/or colorbars in some figures are difficult to recognize. Please enlarge them for maps.
Content of Sect. 2.6 is missing.
Line 93 What are the advantages of machine learning and pattern recognition specifically? How do you compare the methods and results to previous works in terms of consistency and difference?
Line 131 How do you combine the three datasets and deal with their differences in instrument and algorithm?
Line 156 What is the reason to select these 3 stations?
Line 219 was -> is
Line 230 The slope and intercept deserve some discussion, as method 1 shows a reduced performance.
Line 244 1952 -> 1952)
Line 310 .,->.
Line 325 VCD and vertical column concentration are used Interchangeably, better be consistent. Please also be consistent with MAXDOAS or MAX-DOAS, machine learning or machine-learning, etc.
Line 394 Which color shows the results using both XGBoost and DINEOF? What does the red line show in the figure?
Line 420 Define the abbreviation RA first.
Citation: https://doi.org/10.5194/essd-2024-146-RC2 -
RC3: 'Comment on essd-2024-146', Anonymous Referee #2, 27 Jun 2024
The paper “A Global Daily High Spatial-temporal Coverage Merged Tropospheric NO2 dataset (HSTCM-NO2) from 2007 to 2022 based on OMI and GOME-2”, by Qin and co-authors, presents a satellite NO2 dataset based on machine learning and spatial interpolation. I would recommend publication after few modifications.
As the study combines satellite data with morning and afternoon overpass time, additional recommendations for data use, such as data assimilation and model comparison, are suggested.
Second, polluted scenes are typically drawing more attention and performing less well in this work, therefore comments on how to improve the data for such scenes are recommended.
In addition, titles, dates, and/or colorbars in some figures are difficult to recognize. Please enlarge them for maps.
Content of Sect. 2.6 is missing.
Line 93 What are the advantages of machine learning and pattern recognition specifically? How do you compare the methods and results to previous works in terms of consistency and difference?
Line 131 How do you combine the three datasets and deal with their differences in instrument and algorithm?
Line 156 What is the reason to select these 3 stations?
Line 219 was -> is
Line 230 The slope and intercept deserve some discussion, as method 1 shows a reduced performance.
Line 244 1952 -> 1952)
Line 310 .,->.
Line 325 VCD and vertical column concentration are used Interchangeably, better be consistent. Please also be consistent with MAXDOAS or MAX-DOAS, machine learning or machine-learning, etc.
Line 394 Which color shows the results using both XGBoost and DINEOF? What does the red line show in the figure?
Line 420 Define the abbreviation RA first.
Citation: https://doi.org/10.5194/essd-2024-146-RC3 - AC1: 'Comment on essd-2024-146', Jason Cohen, 22 Jul 2024
Data sets
A Global Daily High Spatial-temporal Coverage Merged Tropospheric NO2 dataset (HSTCM-NO2) from 2007 to 2022 based on OMI and GOME-2 Kai Qin, Hongrui Gao, Xuancen Liu, Qin He, and Jason Blake Cohen https://doi.org/10.5281/zenodo.10968462
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
587 | 104 | 30 | 721 | 22 | 22 |
- HTML: 587
- PDF: 104
- XML: 30
- Total: 721
- BibTeX: 22
- EndNote: 22
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1