the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A 10 km daily-level ultraviolet radiation predicting dataset based on machine learning models in China from 2005 to 2020
Abstract. Ultraviolet (UV) radiation is closely related to health, but limited measurements hindered further investigation of its health effects in China. Machine learning algorithm has been widely used in predicting environmental factors with high accuracy, but limited studies have done for UV radiation. This study aimed to develop UV radiation prediction model based on random forest method, and predict UV radiation at daily level and 10 km resolution in mainland China in 2005–2020. A random forest model was employed to predict UV radiation by integrating ground UV radiation measurements from monitoring stations and multiple predictors, such as UV radiation data from satellite. Missing data of satellite-based UV radiation was filled by three-day moving average method. The model's performance was evaluated through multiple cross-validation (CV) methods. The overall R2 (root mean square error, RMSE) between measured and predicted UV radiation from model development and model 10-fold CV was 0.97 (15.64 W m-2) and 0.83 (37.44 W m-2) at daily level, respectively. The model with OMI EDD performed higher predicting accuracy than the one without it. Based on predictions of UV radiation at daily level and 10 km spatial resolution and nearly 100 % spatiotemporal coverage, we found UV radiation increased by 4.20 % while PM2.5 levels decreased by 48.51 % and O3 levels rose by 22.70 % in 2013–2020, suggesting a potential correlation among these environmental factors. Uneven spatial distribution of UV radiation was found to be associated with factors such as latitude, elevation, meteorological factors and seasons. The eastern areas of China posed higher risk with both high population density and UV radiation intensity. Based on machine learning algorithm, this study generated a gridded dataset characterized by relatively high precision and extensive spatiotemporal coverage of UV radiation, which demonstrates the spatiotemporal variability of UV radiation levels in China and can facilitate health-related research in the future. This dataset is currently freely available at https://doi.org/10.5281/zenodo.10884591 (Jiang et al., 2024).
- Preprint
(1572 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on essd-2024-111', Anonymous Referee #1, 27 Jun 2024
This is a good paper which provides valuable UV dataset for epidemiological studies. In this study, they performed very strict validations using spatial, temporal, as well as by-year cross validation methods, indicating the high accuracy of their reconstructed UV dataset. I believe this dataset is valuable for environmental health studies of UV in China. I have some comments for the authors to improve the manuscript.
1 It is not clear why missing values of OMI EDD data have been greatly increased since 2008. Please explain it.
2 The method using to fill the missing OMI EDD values is not clear. Specifically, what is the three-day moving average method? Does this method have enough accuracy to fill the missing values? If there are many consecutive days with missing values, how to address this?
3 For the method of comparing the long-term trend of UV radiation and air pollution, they should use an independent section. They should not include it in the section of 2.1.4 Other predictor variables.
4 More analyses about the relationship between PM2.5/O3 and UV should be conducted. Although they show the importance for predictor variables, which shows AOD and O3 are important variables. They should perform SHAP analysis to show the impact directions of AOD/O3 on the UV. This could further demonstrate the impacts of PM2.5/O3 on UV increase.
5 Table 1 is not necessary in main text. I recommend combine Table 1 into Table A1.
Citation: https://doi.org/10.5194/essd-2024-111-RC1 -
RC2: 'Comment on essd-2024-111', Anonymous Referee #2, 18 Jul 2024
The study developed a machine learning model to predict UV -radiation and highlighted the model performance. This research topic is very important given the rise in the UV radiation in recent years.
Overall comments: Technically the manuscript seems strong however the writing can be improved.
Highly suggest the authors to go through the language and make changes wherever necessary throughout the manuscript.Comments:
Line 14: Seems grammatically incorrect. Reword it to "but limited studies have implemented it for UV radiation"Line 14-15: The language can be improved. Reword these lines to "The main aim of this study is to develop UV radiation prediction model using the random forest approach and predict the UV radiation at daily and 10km resolution in mainland China from 2005 to 2020".
Lin 16: It is already mentioned above that random forest model was employed to predict UV radiation. Reword this line.
Line 21: OMI EDD is used for the first time, write the full form of EDD before introducing the acronym
Line 26: Consider rewording this line as it is not flowing well. May be change it to something like this: "Using machine learning this study generated gridded UV radiation dataset with extensive spatiotemporal coverage which can be utilized for future health-related research".
Line 35 - 36: Please consider rewording these lines.
Line 43: remove stands, change it to "despite being"
Line 70: Remove "What's more, missingness of satellite-based" , change it to" The missing satellite-based"
Line 108: Why did the author use O3 concentrations predicted from random forest and not use directly the monitoring data? Clarify this and explain it in the text clearly.
Line 139: provide reference for 10-fold cross validation if it was used in previous studies and explain the cross-validation process details and the differences between the various(temporal, spatial and year) 10-fold CV ?
Line 123: Why did the authors use random forest compared to the other machine learning algorithm? Include the necessary information that supports the argument.
Line 218: Fix the typo. It is Figure 5 not 3.
Line 269: reword the line to "there is no atmospheric UV standards"
Citation: https://doi.org/10.5194/essd-2024-111-RC2
Data sets
A database of 10 km Ultraviolet Radiation Product over mainland China: 2005-2020 Yichen Jiang, Su Shi, Xinyue Li, Chang Xu, Haidong Kan, Bo Hu, and Xia Meng https://doi.org/10.5281/zenodo.10884590
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
346 | 52 | 20 | 418 | 14 | 14 |
- HTML: 346
- PDF: 52
- XML: 20
- Total: 418
- BibTeX: 14
- EndNote: 14
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1