Recurrent mapping of Hourly Surface Ozone Data (HrSOD) across China during 2005&ndash;2020 for ecosystem and human health risk assessment

Zhang, Wenxiu; Liu, Di; Tian, Hanqin; Pan, Naiqin; Yang, Ruqi; Tang, Wenhan; Yang, Jia; Lu, Fei; Dayananda, Buddhi; Mei, Han; Wang, Siyuan; Shi, Hao

doi:https://doi.org/10.5194/essd-2022-428

Preprints

https://doi.org/10.5194/essd-2022-428

Preprints

21 Dec 2022

| 21 Dec 2022

Status: this discussion paper is a preprint. It has been under review for the journal Earth System Science Data (ESSD). The manuscript was not accepted for further review after discussion.

Recurrent mapping of Hourly Surface Ozone Data (HrSOD) across China during 2005–2020 for ecosystem and human health risk assessment

Wenxiu Zhang, Di Liu, Hanqin Tian, Naiqin Pan, Ruqi Yang, Wenhan Tang, Jia Yang, Fei Lu, Buddhi Dayananda, Han Mei, Siyuan Wang, and Hao Shi

Abstract. Surface ozone is an important air pollutant detrimental to human health and vegetation productivity. Regardless of its short atmospheric lifetime, surface ozone has significantly increased since the 1970s across the Northern Hemisphere, particularly in China. However, high temporal resolution surface ozone concentration data is still lacking in China, largely hindering accurate assessment of associated environmental and human health impacts. Here, we collected hourly ground ozone observations (over 6 million records), meteorological data, remote sensing products, and social-economic information, and applied the Long Short-Term Memory (LSTM) recurrent neural networks to map hourly surface ozone data (HrSOD) at a 0.1° × 0.1° resolution across China during 2005–2020. Benefiting from its advantage in time-series prediction, the LSTM model well captured the spatiotemporal dynamics of observed ozone concentrations, with the sample-based, site-based, and by-year cross-validation coefficient of determination (R²) values being 0.72, 0.65 and 0.71, and root mean square error (RMSE) values being 11.71 ppb (mean = 30.89 ppb), 12.81 ppb (mean = 30.96 ppb) and 11.14 ppb (mean = 31.26 ppb), respectively. Air temperature, atmospheric pressure, and relative humidity were found to be the primary influencing factors. Spatially, surface ozone concentrations were high in northwestern China and low in the Sichuan Basin and northeastern China. Among the four megacity clusters in China, namely the Beijing-Tianjin-Hebei region, the Pearl River Delta, the Yangtze River Delta, and the Sichuan Basin, surface ozone concentration kept decreasing before 2016. However, it tended to increase thereafter in the former three regions, though an abrupt decrease in surface ozone concentrations occurred in 2020. Overall, the HrSOD provides critical information for surface ozone pollution dynamics in China and can support fine-resolution environmental impact and human health risk assessment. The data set is available at https://doi.org/10.5281/zenodo.7415326 (Zhang et al., 2022).

Received: 09 Dec 2022 – Discussion started: 21 Dec 2022

Competing interests: At least one of the (co-)authors is a member of the editorial board of Earth System Science Data.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Preprint (PDF, 1992 KB)

Supplement (570 KB)

Download & links

Wenxiu Zhang, Di Liu, Hanqin Tian, Naiqin Pan, Ruqi Yang, Wenhan Tang, Jia Yang, Fei Lu, Buddhi Dayananda, Han Mei, Siyuan Wang, and Hao Shi

Status: closed

CC1:
'Comment on essd-2022-428', Hui Zhang, 28 Dec 2022

The manuscript titled “Recurrent mapping of Hourly Surface Ozone Data (HrSOD) across China during 2005–2020 for ecosystem and human health risk assessment” by Zhang et al generates the surface ozone data across China. My biggest concern is that all your true air quality monitoring station data is during the period of 2015-2020, how did you predict the surface ozone before 2015? Did you build LSTM model based on data from 2015-2020 to predict results before 2015? If so, how do you assess the uncertainty in the data before 2015?

Citation: https://doi.org/10.5194/essd-2022-428-CC1
- CC2: 'Reply on CC1', Wenxiu Zhang, 17 Jan 2023
  
  We thank the reviewers for their helpful comments. Our responses to each reviewer are provided in the pdf attached.
  
  Citation: https://doi.org/10.5194/essd-2022-428-CC2
CC3:
'Comment on essd-2022-428', Ningpeng Dong, 23 Jan 2023

This is a nice paper, and I have a few questions regarding the technical details. 1) Did the authors only used those grid cells with ozone observations for training, and how many of those grid cells are there? 2) Why did the authors chose a time window of 24 hours for training, is other time window possible? 3) The authors carried out a 10-fold cross validation for hyperparameter optimization, which should be followed by a model performance evaluation with the testing data. I might have missed the information on model testing, but how does the model perform with the testing data?

Citation: https://doi.org/10.5194/essd-2022-428-CC3
- CC4: 'Reply on CC3', Wenxiu Zhang, 07 Feb 2023
  
  We thank the reviewer for the positive and constructive comments. Our responses are provided in the pdf attached.
  
  Citation: https://doi.org/10.5194/essd-2022-428-CC4
RC1: 'Comment on essd-2022-428', Anonymous Referee #1, 16 Mar 2023

Zhang et al. have made an attempt to estimate the missing hourly data of surface ozone time series using a machine learning method. While appreciating their efforts, I believe that this study lacks novelty. The authors have reconstructed the missing surface ozone data using meteorological reanalysis, which is a method that has been used many times before in the study of surface ozone and other pollutants. In my opinion, the authors have been hasty both during the development of the model and the interpretation of the results. They have used inputs that are undoubtedly intercorrelated and have not provided a sufficient explanation for the observed results. Additionally, there are several other issues that need to be addressed in the manuscript. For example, during the model development phase, the authors utilized inputs (such as temperature and solar radiation, or surface ozone and ozone in the atmospheric column) that are known to be highly intercorrelated. Furthermore, the authors' explanation of their results appears to be superficial and lacks attention to the relationship between ozone and its precoursors. For instance, it would have been beneficial if the authors had provided concrete reasons for the differences in the seasonal trends of ozone observed in various regions of China, considering both atmospheric and chemical factors, and supported them with solid documentation. Additionally, in the case of other findings, the authors only presented their interpretations without providing a satisfactory explanation of why these results were observed in each region. I find it difficult to comprehend the connection between particulate matters and the variations in ozone levels discussed in lines 369 to 372. Unfortunately, there may be additional shortcomings in the manuscript that I have not been able to identify. Given these issues, I am of the opinion that the article may not be suitable for publication.
Other points:
Line 37: It appears that the model relies on a subsequent input step (line 211) and has not made a prediction, but rather has estimated past ozone data.
Line 107: This part raises a valid concern regarding the need for a model that can spatially estimate ozone levels given the limited coverage of monitoring stations in China. The manuscript's results also suggest a weakness in spatial estimation of surface ozone, which may result in significant uncertainties in the estimated ozone levels in regions with limited monitoring data, such as western China.
Line 157: Column ozone data are mostly less sensitive to surface values and can weaken model performance. You could compare this data with observations of surface ozone.
Line 163: It would be useful to compare the meteorological data used in the model with surface meteorological observations to ensure the accuracy and reliability of the model.
Line 174: Are you referring to the total amount of ozone in the atmospheric column? If so, using both surface ozone and atmospheric column ozone simultaneously may lead to issues with colinearity. It may be worth considering whether there is a scientific justification for using both types of data in the model.
Line 203: You have used the nearest neighbor method to generate a network of observational data. Did you consider a distance limit for the nearest neighbor? Otherwise, the data with large spatial distance are assigned to grids located in western China where there are few observations.
Line 253: It seems that there is no significant improvement in the model's performance in terms of correlation.
Line 254: Could you clarify if the data was divided so that 90% of the data was used for model training and 10% for model testing when they are dvided by year?
Line 287: How is the trend calculated?
Lines 294 to 303: The authors need to provide a more detailed explanation for the observed results in relation to the factors affecting the production and destruction of ozone. Section 4.3 requires a significant revision to address this issue adequately.

Citation: https://doi.org/10.5194/essd-2022-428-RC1
RC2: 'Comment on essd-2022-428', Anonymous Referee #2, 20 Mar 2023

The authors developed a Long Short-Term Memory (LSTM) recurrent neural networks model to map hourly surface ozone data (HrSOD) at a 0.1*0.1 resolution across China during 2005 - 2020. However, this study showed no significant advantage in method compared to previous method in former research. In a whole, the paper cannot be accepted due to lack of novelty.

Major comment:
1. In section 2.1.2. The choice of relevant variables is unreasonable. Firstly, the purpose of this study is to obtain hourly surface ozone concentrations based on satellite remote sensing data. However, the temporal resolution of the satellite data used by the author is daily, which makes it difficult to meet the hourly resolution requirements. In general, only meteorological data can meet the needs of temporal resolution. In other words, the core of the article seems estimate near-surface ozone concentrations using hourly meteorological data. In addition, for ozone column concentrations, although you select the first layer to represent surface ozone, it is still difficult to distinguish the O₃concentration between upper and near-surface level.
2. For the socioeconomic data, why not use night light data? As you described, the temporal resolution of the GDP and population is 5-years. However, as the algorithm of the reference, it can be replace by the night light data. Meanwhile, the emission data, especially for the nitrogen oxide and VOCs also should be considered as a necessary input data in model building.
3. For model validation, the authors only calculated the hourly, daily and monthly R²and RMSE under three CV sampling strategies. However, this is not sufficient, and the spatiotemporal accuracy evaluation also should be added. Firstly, the spatial distributions of CV-R², RMSE and MAE for each site should be draw. Secondly, the comparison of the hourly and daily variations at some sites also should be added. Then, the trends of O₃variation between estimated results and observations also should be compared.
4. In Section 4.1. Table 3 shows the comparison with the model performance of previous studies in predicting surface ozone in China. However, the predictive performance seems to be worse compared to other models. More important, some hourly O3 prediction studies were ignored. In addition, the comparison of model performance between your model and other widely applied traditional models in O₃concentration estimation using the same training data also should be described here.
5. I agree with another Reviewer, and I also think this the paper cannot be accepted due to lack of novelty.
Minor comment:
1. The O₃estimation accuracy of previous studies should be summarized and described in the first section
2. Section 2.1.1. If a 0.1°×1°grid contains multiple sites (>1), has it been processed here? Take the average?
3. In Section 2.2.2. The mean absolute error (MAE) also an important evaluation indicator, which can avoid the mutual cancellation of errors.
4. Table 1. The temporal resolution of SFO3 should be daily.
5. The fonts in Figures 2 and Figure 3 are too small.
6. The conclusion section needs to be expanded. The current version is too short.

Citation: https://doi.org/10.5194/essd-2022-428-RC2

Status: closed

CC1:
'Comment on essd-2022-428', Hui Zhang, 28 Dec 2022

The manuscript titled “Recurrent mapping of Hourly Surface Ozone Data (HrSOD) across China during 2005–2020 for ecosystem and human health risk assessment” by Zhang et al generates the surface ozone data across China. My biggest concern is that all your true air quality monitoring station data is during the period of 2015-2020, how did you predict the surface ozone before 2015? Did you build LSTM model based on data from 2015-2020 to predict results before 2015? If so, how do you assess the uncertainty in the data before 2015?

Citation: https://doi.org/10.5194/essd-2022-428-CC1
- CC2: 'Reply on CC1', Wenxiu Zhang, 17 Jan 2023
  
  We thank the reviewers for their helpful comments. Our responses to each reviewer are provided in the pdf attached.
  
  Citation: https://doi.org/10.5194/essd-2022-428-CC2
CC3:
'Comment on essd-2022-428', Ningpeng Dong, 23 Jan 2023

This is a nice paper, and I have a few questions regarding the technical details. 1) Did the authors only used those grid cells with ozone observations for training, and how many of those grid cells are there? 2) Why did the authors chose a time window of 24 hours for training, is other time window possible? 3) The authors carried out a 10-fold cross validation for hyperparameter optimization, which should be followed by a model performance evaluation with the testing data. I might have missed the information on model testing, but how does the model perform with the testing data?

Citation: https://doi.org/10.5194/essd-2022-428-CC3
- CC4: 'Reply on CC3', Wenxiu Zhang, 07 Feb 2023
  
  We thank the reviewer for the positive and constructive comments. Our responses are provided in the pdf attached.
  
  Citation: https://doi.org/10.5194/essd-2022-428-CC4
RC1: 'Comment on essd-2022-428', Anonymous Referee #1, 16 Mar 2023

Zhang et al. have made an attempt to estimate the missing hourly data of surface ozone time series using a machine learning method. While appreciating their efforts, I believe that this study lacks novelty. The authors have reconstructed the missing surface ozone data using meteorological reanalysis, which is a method that has been used many times before in the study of surface ozone and other pollutants. In my opinion, the authors have been hasty both during the development of the model and the interpretation of the results. They have used inputs that are undoubtedly intercorrelated and have not provided a sufficient explanation for the observed results. Additionally, there are several other issues that need to be addressed in the manuscript. For example, during the model development phase, the authors utilized inputs (such as temperature and solar radiation, or surface ozone and ozone in the atmospheric column) that are known to be highly intercorrelated. Furthermore, the authors' explanation of their results appears to be superficial and lacks attention to the relationship between ozone and its precoursors. For instance, it would have been beneficial if the authors had provided concrete reasons for the differences in the seasonal trends of ozone observed in various regions of China, considering both atmospheric and chemical factors, and supported them with solid documentation. Additionally, in the case of other findings, the authors only presented their interpretations without providing a satisfactory explanation of why these results were observed in each region. I find it difficult to comprehend the connection between particulate matters and the variations in ozone levels discussed in lines 369 to 372. Unfortunately, there may be additional shortcomings in the manuscript that I have not been able to identify. Given these issues, I am of the opinion that the article may not be suitable for publication.
Other points:
Line 37: It appears that the model relies on a subsequent input step (line 211) and has not made a prediction, but rather has estimated past ozone data.
Line 107: This part raises a valid concern regarding the need for a model that can spatially estimate ozone levels given the limited coverage of monitoring stations in China. The manuscript's results also suggest a weakness in spatial estimation of surface ozone, which may result in significant uncertainties in the estimated ozone levels in regions with limited monitoring data, such as western China.
Line 157: Column ozone data are mostly less sensitive to surface values and can weaken model performance. You could compare this data with observations of surface ozone.
Line 163: It would be useful to compare the meteorological data used in the model with surface meteorological observations to ensure the accuracy and reliability of the model.
Line 174: Are you referring to the total amount of ozone in the atmospheric column? If so, using both surface ozone and atmospheric column ozone simultaneously may lead to issues with colinearity. It may be worth considering whether there is a scientific justification for using both types of data in the model.
Line 203: You have used the nearest neighbor method to generate a network of observational data. Did you consider a distance limit for the nearest neighbor? Otherwise, the data with large spatial distance are assigned to grids located in western China where there are few observations.
Line 253: It seems that there is no significant improvement in the model's performance in terms of correlation.
Line 254: Could you clarify if the data was divided so that 90% of the data was used for model training and 10% for model testing when they are dvided by year?
Line 287: How is the trend calculated?
Lines 294 to 303: The authors need to provide a more detailed explanation for the observed results in relation to the factors affecting the production and destruction of ozone. Section 4.3 requires a significant revision to address this issue adequately.

Citation: https://doi.org/10.5194/essd-2022-428-RC1
RC2: 'Comment on essd-2022-428', Anonymous Referee #2, 20 Mar 2023

The authors developed a Long Short-Term Memory (LSTM) recurrent neural networks model to map hourly surface ozone data (HrSOD) at a 0.1*0.1 resolution across China during 2005 - 2020. However, this study showed no significant advantage in method compared to previous method in former research. In a whole, the paper cannot be accepted due to lack of novelty.

Major comment:
1. In section 2.1.2. The choice of relevant variables is unreasonable. Firstly, the purpose of this study is to obtain hourly surface ozone concentrations based on satellite remote sensing data. However, the temporal resolution of the satellite data used by the author is daily, which makes it difficult to meet the hourly resolution requirements. In general, only meteorological data can meet the needs of temporal resolution. In other words, the core of the article seems estimate near-surface ozone concentrations using hourly meteorological data. In addition, for ozone column concentrations, although you select the first layer to represent surface ozone, it is still difficult to distinguish the O₃concentration between upper and near-surface level.
2. For the socioeconomic data, why not use night light data? As you described, the temporal resolution of the GDP and population is 5-years. However, as the algorithm of the reference, it can be replace by the night light data. Meanwhile, the emission data, especially for the nitrogen oxide and VOCs also should be considered as a necessary input data in model building.
3. For model validation, the authors only calculated the hourly, daily and monthly R²and RMSE under three CV sampling strategies. However, this is not sufficient, and the spatiotemporal accuracy evaluation also should be added. Firstly, the spatial distributions of CV-R², RMSE and MAE for each site should be draw. Secondly, the comparison of the hourly and daily variations at some sites also should be added. Then, the trends of O₃variation between estimated results and observations also should be compared.
4. In Section 4.1. Table 3 shows the comparison with the model performance of previous studies in predicting surface ozone in China. However, the predictive performance seems to be worse compared to other models. More important, some hourly O3 prediction studies were ignored. In addition, the comparison of model performance between your model and other widely applied traditional models in O₃concentration estimation using the same training data also should be described here.
5. I agree with another Reviewer, and I also think this the paper cannot be accepted due to lack of novelty.
Minor comment:
1. The O₃estimation accuracy of previous studies should be summarized and described in the first section
2. Section 2.1.1. If a 0.1°×1°grid contains multiple sites (>1), has it been processed here? Take the average?
3. In Section 2.2.2. The mean absolute error (MAE) also an important evaluation indicator, which can avoid the mutual cancellation of errors.
4. Table 1. The temporal resolution of SFO3 should be daily.
5. The fonts in Figures 2 and Figure 3 are too small.
6. The conclusion section needs to be expanded. The current version is too short.

Citation: https://doi.org/10.5194/essd-2022-428-RC2

Wenxiu Zhang, Di Liu, Hanqin Tian, Naiqin Pan, Ruqi Yang, Wenhan Tang, Jia Yang, Fei Lu, Buddhi Dayananda, Han Mei, Siyuan Wang, and Hao Shi

Supplement

https://doi.org/10.5194/essd-2022-428-supplement

Data sets

Hourly Surface Ozone data (HrSOD) across China during 2005-2020 Wenxiu Zhang; Di Liu; Hao Shi https://doi.org/10.5281/zenodo.7415326

Wenxiu Zhang, Di Liu, Hanqin Tian, Naiqin Pan, Ruqi Yang, Wenhan Tang, Jia Yang, Fei Lu, Buddhi Dayananda, Han Mei, Siyuan Wang, and Hao Shi

Viewed

Total article views: 1,709 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
1,207	428	74	1,709	128	71	80

HTML: 1,207
PDF: 428
XML: 74
Total: 1,709
Supplement: 128
BibTeX: 71
EndNote: 80

Views and downloads (calculated since 21 Dec 2022)

Month	HTML	PDF	XML	Total
Dec 2022	165	40	7	212
Jan 2023	149	47	5	201
Feb 2023	85	34	2	121
Mar 2023	98	49	5	152
Apr 2023	49	22	3	74
May 2023	27	6	0	33
Jun 2023	29	8	2	39
Jul 2023	20	19	0	39
Aug 2023	24	14	0	38
Sep 2023	38	21	1	60
Oct 2023	23	11	1	35
Nov 2023	11	4	1	16
Dec 2023	30	11	1	42
Jan 2024	21	7	0	28
Feb 2024	18	4	1	23
Mar 2024	35	16	6	57
Apr 2024	36	5	5	46
May 2024	45	6	5	56
Jun 2024	60	4	3	67
Jul 2024	16	8	6	30
Aug 2024	26	4	4	34
Sep 2024	20	5	0	25
Oct 2024	17	3	1	21
Nov 2024	18	5	1	24
Dec 2024	16	3	0	19
Jan 2025	17	11	6	34
Feb 2025	18	4	2	24
Mar 2025	25	9	2	36
Apr 2025	20	11	2	33
May 2025	21	11	2	34
Jun 2025	30	26	0	56

Cumulative views and downloads (calculated since 21 Dec 2022)

Month	HTML	PDF	XML	Total
Dec 2022	165	40	7	212
Jan 2023	149	47	5	201
Feb 2023	85	34	2	121
Mar 2023	98	49	5	152
Apr 2023	49	22	3	74
May 2023	27	6	0	33
Jun 2023	29	8	2	39
Jul 2023	20	19	0	39
Aug 2023	24	14	0	38
Sep 2023	38	21	1	60
Oct 2023	23	11	1	35
Nov 2023	11	4	1	16
Dec 2023	30	11	1	42
Jan 2024	21	7	0	28
Feb 2024	18	4	1	23
Mar 2024	35	16	6	57
Apr 2024	36	5	5	46
May 2024	45	6	5	56
Jun 2024	60	4	3	67
Jul 2024	16	8	6	30
Aug 2024	26	4	4	34
Sep 2024	20	5	0	25
Oct 2024	17	3	1	21
Nov 2024	18	5	1	24
Dec 2024	16	3	0	19
Jan 2025	17	11	6	34
Feb 2025	18	4	2	24
Mar 2025	25	9	2	36
Apr 2025	20	11	2	33
May 2025	21	11	2	34
Jun 2025	30	26	0	56

Viewed (geographical distribution)

Total article views: 1,636 (including HTML, PDF, and XML) Thereof 1,636 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 30 Jun 2025

Download

Preprint (1992 KB)
Metadata XML

Short summary

High temporal resolution surface ozone concentration data is still lacking in China, so we used deep learning to generate hourly surface ozone data (HrSOD) during 2005–2020 across China. HrSOD showed that surface O₃ in China tended to increase from 2016 to 2019, despite a decrease in 2020. HrSOD had high spatial and temporal accuracies, long time ranges and high temporal resolution, enabling it to be easily converted to various evaluation indicators for ecosystem and human health assessments.


Total:	0
HTML:	0
PDF:	0
XML:	0