the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Recurrent mapping of Hourly Surface Ozone Data (HrSOD) across China during 2005–2020 for ecosystem and human health risk assessment
Abstract. Surface ozone is an important air pollutant detrimental to human health and vegetation productivity. Regardless of its short atmospheric lifetime, surface ozone has significantly increased since the 1970s across the Northern Hemisphere, particularly in China. However, high temporal resolution surface ozone concentration data is still lacking in China, largely hindering accurate assessment of associated environmental and human health impacts. Here, we collected hourly ground ozone observations (over 6 million records), meteorological data, remote sensing products, and social-economic information, and applied the Long Short-Term Memory (LSTM) recurrent neural networks to map hourly surface ozone data (HrSOD) at a 0.1° × 0.1° resolution across China during 2005–2020. Benefiting from its advantage in time-series prediction, the LSTM model well captured the spatiotemporal dynamics of observed ozone concentrations, with the sample-based, site-based, and by-year cross-validation coefficient of determination (R2) values being 0.72, 0.65 and 0.71, and root mean square error (RMSE) values being 11.71 ppb (mean = 30.89 ppb), 12.81 ppb (mean = 30.96 ppb) and 11.14 ppb (mean = 31.26 ppb), respectively. Air temperature, atmospheric pressure, and relative humidity were found to be the primary influencing factors. Spatially, surface ozone concentrations were high in northwestern China and low in the Sichuan Basin and northeastern China. Among the four megacity clusters in China, namely the Beijing-Tianjin-Hebei region, the Pearl River Delta, the Yangtze River Delta, and the Sichuan Basin, surface ozone concentration kept decreasing before 2016. However, it tended to increase thereafter in the former three regions, though an abrupt decrease in surface ozone concentrations occurred in 2020. Overall, the HrSOD provides critical information for surface ozone pollution dynamics in China and can support fine-resolution environmental impact and human health risk assessment. The data set is available at https://doi.org/10.5281/zenodo.7415326 (Zhang et al., 2022).
- Preprint
(1992 KB) - Metadata XML
-
Supplement
(570 KB) - BibTeX
- EndNote
Status: closed
-
CC1: 'Comment on essd-2022-428', Hui Zhang, 28 Dec 2022
The manuscript titled “Recurrent mapping of Hourly Surface Ozone Data (HrSOD) across China during 2005–2020 for ecosystem and human health risk assessment” by Zhang et al generates the surface ozone data across China. My biggest concern is that all your true air quality monitoring station data is during the period of 2015-2020, how did you predict the surface ozone before 2015? Did you build LSTM model based on data from 2015-2020 to predict results before 2015? If so, how do you assess the uncertainty in the data before 2015?
Citation: https://doi.org/10.5194/essd-2022-428-CC1 - CC2: 'Reply on CC1', Wenxiu Zhang, 17 Jan 2023
-
CC3: 'Comment on essd-2022-428', Ningpeng Dong, 23 Jan 2023
This is a nice paper, and I have a few questions regarding the technical details. 1) Did the authors only used those grid cells with ozone observations for training, and how many of those grid cells are there? 2) Why did the authors chose a time window of 24 hours for training, is other time window possible? 3) The authors carried out a 10-fold cross validation for hyperparameter optimization, which should be followed by a model performance evaluation with the testing data. I might have missed the information on model testing, but how does the model perform with the testing data?
Citation: https://doi.org/10.5194/essd-2022-428-CC3 - CC4: 'Reply on CC3', Wenxiu Zhang, 07 Feb 2023
-
RC1: 'Comment on essd-2022-428', Anonymous Referee #1, 16 Mar 2023
Zhang et al. have made an attempt to estimate the missing hourly data of surface ozone time series using a machine learning method. While appreciating their efforts, I believe that this study lacks novelty. The authors have reconstructed the missing surface ozone data using meteorological reanalysis, which is a method that has been used many times before in the study of surface ozone and other pollutants. In my opinion, the authors have been hasty both during the development of the model and the interpretation of the results. They have used inputs that are undoubtedly intercorrelated and have not provided a sufficient explanation for the observed results. Additionally, there are several other issues that need to be addressed in the manuscript. For example, during the model development phase, the authors utilized inputs (such as temperature and solar radiation, or surface ozone and ozone in the atmospheric column) that are known to be highly intercorrelated. Furthermore, the authors' explanation of their results appears to be superficial and lacks attention to the relationship between ozone and its precoursors. For instance, it would have been beneficial if the authors had provided concrete reasons for the differences in the seasonal trends of ozone observed in various regions of China, considering both atmospheric and chemical factors, and supported them with solid documentation. Additionally, in the case of other findings, the authors only presented their interpretations without providing a satisfactory explanation of why these results were observed in each region. I find it difficult to comprehend the connection between particulate matters and the variations in ozone levels discussed in lines 369 to 372. Unfortunately, there may be additional shortcomings in the manuscript that I have not been able to identify. Given these issues, I am of the opinion that the article may not be suitable for publication.
Other points:
Line 37: It appears that the model relies on a subsequent input step (line 211) and has not made a prediction, but rather has estimated past ozone data.
Line 107: This part raises a valid concern regarding the need for a model that can spatially estimate ozone levels given the limited coverage of monitoring stations in China. The manuscript's results also suggest a weakness in spatial estimation of surface ozone, which may result in significant uncertainties in the estimated ozone levels in regions with limited monitoring data, such as western China.
Line 157: Column ozone data are mostly less sensitive to surface values and can weaken model performance. You could compare this data with observations of surface ozone.
Line 163: It would be useful to compare the meteorological data used in the model with surface meteorological observations to ensure the accuracy and reliability of the model.
Line 174: Are you referring to the total amount of ozone in the atmospheric column? If so, using both surface ozone and atmospheric column ozone simultaneously may lead to issues with colinearity. It may be worth considering whether there is a scientific justification for using both types of data in the model.
Line 203: You have used the nearest neighbor method to generate a network of observational data. Did you consider a distance limit for the nearest neighbor? Otherwise, the data with large spatial distance are assigned to grids located in western China where there are few observations.
Line 253: It seems that there is no significant improvement in the model's performance in terms of correlation.
Line 254: Could you clarify if the data was divided so that 90% of the data was used for model training and 10% for model testing when they are dvided by year?
Line 287: How is the trend calculated?
Lines 294 to 303: The authors need to provide a more detailed explanation for the observed results in relation to the factors affecting the production and destruction of ozone. Section 4.3 requires a significant revision to address this issue adequately.
Citation: https://doi.org/10.5194/essd-2022-428-RC1 -
RC2: 'Comment on essd-2022-428', Anonymous Referee #2, 20 Mar 2023
The authors developed a Long Short-Term Memory (LSTM) recurrent neural networks model to map hourly surface ozone data (HrSOD) at a 0.1*0.1 resolution across China during 2005 - 2020. However, this study showed no significant advantage in method compared to previous method in former research. In a whole, the paper cannot be accepted due to lack of novelty.
Major comment:
1. In section 2.1.2. The choice of relevant variables is unreasonable. Firstly, the purpose of this study is to obtain hourly surface ozone concentrations based on satellite remote sensing data. However, the temporal resolution of the satellite data used by the author is daily, which makes it difficult to meet the hourly resolution requirements. In general, only meteorological data can meet the needs of temporal resolution. In other words, the core of the article seems estimate near-surface ozone concentrations using hourly meteorological data. In addition, for ozone column concentrations, although you select the first layer to represent surface ozone, it is still difficult to distinguish the O3 concentration between upper and near-surface level.
2. For the socioeconomic data, why not use night light data? As you described, the temporal resolution of the GDP and population is 5-years. However, as the algorithm of the reference, it can be replace by the night light data. Meanwhile, the emission data, especially for the nitrogen oxide and VOCs also should be considered as a necessary input data in model building.
3. For model validation, the authors only calculated the hourly, daily and monthly R2 and RMSE under three CV sampling strategies. However, this is not sufficient, and the spatiotemporal accuracy evaluation also should be added. Firstly, the spatial distributions of CV-R2, RMSE and MAE for each site should be draw. Secondly, the comparison of the hourly and daily variations at some sites also should be added. Then, the trends of O3 variation between estimated results and observations also should be compared.
4. In Section 4.1. Table 3 shows the comparison with the model performance of previous studies in predicting surface ozone in China. However, the predictive performance seems to be worse compared to other models. More important, some hourly O3 prediction studies were ignored. In addition, the comparison of model performance between your model and other widely applied traditional models in O3 concentration estimation using the same training data also should be described here.
5. I agree with another Reviewer, and I also think this the paper cannot be accepted due to lack of novelty.
Minor comment:
1. The O3 estimation accuracy of previous studies should be summarized and described in the first section
2. Section 2.1.1. If a 0.1°×1°grid contains multiple sites (>1), has it been processed here? Take the average?
3. In Section 2.2.2. The mean absolute error (MAE) also an important evaluation indicator, which can avoid the mutual cancellation of errors.
4. Table 1. The temporal resolution of SFO3 should be daily.
5. The fonts in Figures 2 and Figure 3 are too small.
6. The conclusion section needs to be expanded. The current version is too short.
Citation: https://doi.org/10.5194/essd-2022-428-RC2
Status: closed
-
CC1: 'Comment on essd-2022-428', Hui Zhang, 28 Dec 2022
The manuscript titled “Recurrent mapping of Hourly Surface Ozone Data (HrSOD) across China during 2005–2020 for ecosystem and human health risk assessment” by Zhang et al generates the surface ozone data across China. My biggest concern is that all your true air quality monitoring station data is during the period of 2015-2020, how did you predict the surface ozone before 2015? Did you build LSTM model based on data from 2015-2020 to predict results before 2015? If so, how do you assess the uncertainty in the data before 2015?
Citation: https://doi.org/10.5194/essd-2022-428-CC1 - CC2: 'Reply on CC1', Wenxiu Zhang, 17 Jan 2023
-
CC3: 'Comment on essd-2022-428', Ningpeng Dong, 23 Jan 2023
This is a nice paper, and I have a few questions regarding the technical details. 1) Did the authors only used those grid cells with ozone observations for training, and how many of those grid cells are there? 2) Why did the authors chose a time window of 24 hours for training, is other time window possible? 3) The authors carried out a 10-fold cross validation for hyperparameter optimization, which should be followed by a model performance evaluation with the testing data. I might have missed the information on model testing, but how does the model perform with the testing data?
Citation: https://doi.org/10.5194/essd-2022-428-CC3 - CC4: 'Reply on CC3', Wenxiu Zhang, 07 Feb 2023
-
RC1: 'Comment on essd-2022-428', Anonymous Referee #1, 16 Mar 2023
Zhang et al. have made an attempt to estimate the missing hourly data of surface ozone time series using a machine learning method. While appreciating their efforts, I believe that this study lacks novelty. The authors have reconstructed the missing surface ozone data using meteorological reanalysis, which is a method that has been used many times before in the study of surface ozone and other pollutants. In my opinion, the authors have been hasty both during the development of the model and the interpretation of the results. They have used inputs that are undoubtedly intercorrelated and have not provided a sufficient explanation for the observed results. Additionally, there are several other issues that need to be addressed in the manuscript. For example, during the model development phase, the authors utilized inputs (such as temperature and solar radiation, or surface ozone and ozone in the atmospheric column) that are known to be highly intercorrelated. Furthermore, the authors' explanation of their results appears to be superficial and lacks attention to the relationship between ozone and its precoursors. For instance, it would have been beneficial if the authors had provided concrete reasons for the differences in the seasonal trends of ozone observed in various regions of China, considering both atmospheric and chemical factors, and supported them with solid documentation. Additionally, in the case of other findings, the authors only presented their interpretations without providing a satisfactory explanation of why these results were observed in each region. I find it difficult to comprehend the connection between particulate matters and the variations in ozone levels discussed in lines 369 to 372. Unfortunately, there may be additional shortcomings in the manuscript that I have not been able to identify. Given these issues, I am of the opinion that the article may not be suitable for publication.
Other points:
Line 37: It appears that the model relies on a subsequent input step (line 211) and has not made a prediction, but rather has estimated past ozone data.
Line 107: This part raises a valid concern regarding the need for a model that can spatially estimate ozone levels given the limited coverage of monitoring stations in China. The manuscript's results also suggest a weakness in spatial estimation of surface ozone, which may result in significant uncertainties in the estimated ozone levels in regions with limited monitoring data, such as western China.
Line 157: Column ozone data are mostly less sensitive to surface values and can weaken model performance. You could compare this data with observations of surface ozone.
Line 163: It would be useful to compare the meteorological data used in the model with surface meteorological observations to ensure the accuracy and reliability of the model.
Line 174: Are you referring to the total amount of ozone in the atmospheric column? If so, using both surface ozone and atmospheric column ozone simultaneously may lead to issues with colinearity. It may be worth considering whether there is a scientific justification for using both types of data in the model.
Line 203: You have used the nearest neighbor method to generate a network of observational data. Did you consider a distance limit for the nearest neighbor? Otherwise, the data with large spatial distance are assigned to grids located in western China where there are few observations.
Line 253: It seems that there is no significant improvement in the model's performance in terms of correlation.
Line 254: Could you clarify if the data was divided so that 90% of the data was used for model training and 10% for model testing when they are dvided by year?
Line 287: How is the trend calculated?
Lines 294 to 303: The authors need to provide a more detailed explanation for the observed results in relation to the factors affecting the production and destruction of ozone. Section 4.3 requires a significant revision to address this issue adequately.
Citation: https://doi.org/10.5194/essd-2022-428-RC1 -
RC2: 'Comment on essd-2022-428', Anonymous Referee #2, 20 Mar 2023
The authors developed a Long Short-Term Memory (LSTM) recurrent neural networks model to map hourly surface ozone data (HrSOD) at a 0.1*0.1 resolution across China during 2005 - 2020. However, this study showed no significant advantage in method compared to previous method in former research. In a whole, the paper cannot be accepted due to lack of novelty.
Major comment:
1. In section 2.1.2. The choice of relevant variables is unreasonable. Firstly, the purpose of this study is to obtain hourly surface ozone concentrations based on satellite remote sensing data. However, the temporal resolution of the satellite data used by the author is daily, which makes it difficult to meet the hourly resolution requirements. In general, only meteorological data can meet the needs of temporal resolution. In other words, the core of the article seems estimate near-surface ozone concentrations using hourly meteorological data. In addition, for ozone column concentrations, although you select the first layer to represent surface ozone, it is still difficult to distinguish the O3 concentration between upper and near-surface level.
2. For the socioeconomic data, why not use night light data? As you described, the temporal resolution of the GDP and population is 5-years. However, as the algorithm of the reference, it can be replace by the night light data. Meanwhile, the emission data, especially for the nitrogen oxide and VOCs also should be considered as a necessary input data in model building.
3. For model validation, the authors only calculated the hourly, daily and monthly R2 and RMSE under three CV sampling strategies. However, this is not sufficient, and the spatiotemporal accuracy evaluation also should be added. Firstly, the spatial distributions of CV-R2, RMSE and MAE for each site should be draw. Secondly, the comparison of the hourly and daily variations at some sites also should be added. Then, the trends of O3 variation between estimated results and observations also should be compared.
4. In Section 4.1. Table 3 shows the comparison with the model performance of previous studies in predicting surface ozone in China. However, the predictive performance seems to be worse compared to other models. More important, some hourly O3 prediction studies were ignored. In addition, the comparison of model performance between your model and other widely applied traditional models in O3 concentration estimation using the same training data also should be described here.
5. I agree with another Reviewer, and I also think this the paper cannot be accepted due to lack of novelty.
Minor comment:
1. The O3 estimation accuracy of previous studies should be summarized and described in the first section
2. Section 2.1.1. If a 0.1°×1°grid contains multiple sites (>1), has it been processed here? Take the average?
3. In Section 2.2.2. The mean absolute error (MAE) also an important evaluation indicator, which can avoid the mutual cancellation of errors.
4. Table 1. The temporal resolution of SFO3 should be daily.
5. The fonts in Figures 2 and Figure 3 are too small.
6. The conclusion section needs to be expanded. The current version is too short.
Citation: https://doi.org/10.5194/essd-2022-428-RC2
Data sets
Hourly Surface Ozone data (HrSOD) across China during 2005-2020 Wenxiu Zhang; Di Liu; Hao Shi https://doi.org/10.5281/zenodo.7415326
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
1,088 | 365 | 60 | 1,513 | 111 | 58 | 57 |
- HTML: 1,088
- PDF: 365
- XML: 60
- Total: 1,513
- Supplement: 111
- BibTeX: 58
- EndNote: 57
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1