GPRChinaTemp1km: a high-resolution monthly air temperature dataset for China (1951–2020) based on machine learning
- 1Academy of Disaster Reduction and Emergency Management, Beijing Normal University, 100875 Beijing, China
- 2Faculty of Geographical Science, Beijing Normal University, 100875 Beijing, China
- 3The School of National Safety and Emergency Management, Beijing Normal University, 100875 Beijing, China
- 1Academy of Disaster Reduction and Emergency Management, Beijing Normal University, 100875 Beijing, China
- 2Faculty of Geographical Science, Beijing Normal University, 100875 Beijing, China
- 3The School of National Safety and Emergency Management, Beijing Normal University, 100875 Beijing, China
Abstract. An accurate spatially continuous air temperature dataset is crucial for multiple applications in environmental and ecological sciences. Existing spatial interpolation methods have relatively low accuracy and the resolution of available long-term gridded products of air temperature for China is coarse. Point observations from meteorological stations can provide long-term air temperature data series but cannot represent spatially continuous information. Here, we devised a method for spatial interpolation of air temperature data from meteorological stations based on powerful machine learning tools. First, to determine the optimal method for interpolation of air temperature data, we employed three machine learning models: random forest, support vector machine, and Gaussian process regression. Comparison of the mean absolute error, root mean square error, coefficient of determination, and residuals revealed that Gaussian process regression had high accuracy and clearly outperformed the other two models regarding interpolation of monthly maximum, minimum, and mean air temperatures. The machine learning methods were compared with three traditional methods used frequently for spatial interpolation: inverse distance weighting, ordinary kriging, and ANUSPLIN. Results showed that the Gaussian process regression model had higher accuracy and greater robustness than the traditional methods regarding interpolation of monthly maximum, minimum, and mean air temperatures in each month. Comparison with the TerraClimate, FLDAS, and ERA5 datasets revealed that the accuracy of the temperature data generated using the Gaussian process regression model was higher. Finally, using the Gaussian process regression method, we produced a long-term (January 1951 to December 2020) gridded monthly air temperature dataset with 1 km resolution and high accuracy for China, which we named GPRChinaTemp1km. The dataset consists of three variables: monthly mean air temperature, monthly maximum air temperature, and monthly minimum air temperature. The obtained GPRChinaTemp1km data were used to analyse the spatiotemporal variations of air temperature using Theil–Sen median trend analysis in combination with the Mann–Kendall test. It was found that the monthly mean and minimum air temperatures across China were characterized by a significant trend of increase in each month, whereas monthly maximum air temperature showed a more spatially heterogeneous pattern with significant increase, non-significant increase, and non-significant decrease. The GPRChinaTemp1km dataset is publicly available at https://doi.org/10.5281/zenodo.5112122 (He et al., 2021a) for monthly maximum air temperature, at https://doi.org/10.5281/zenodo.5111989 (He et al., 2021b) for monthly mean air temperature and at https://doi.org/10.5281/zenodo.5112232 (He et al., 2021c) for monthly minimum air temperature.
- Preprint
(3411 KB) -
Supplement
(14802 KB) - BibTeX
- EndNote
Qian He et al.
Status: closed
-
CC1: 'Comment on essd-2021-442', QiangYu Lee, 10 Dec 2021
Thank you for your work, you have provided valuable long-term monitoring data, and I have a few questions to ask for answers here, as follows:
1. Line 190, ask the machine learning dataset is
How to build it, specifically, what are the independent and dependent variables of the input model. Here I understand that you extracted the independent variables (i.e. latitude and longitude and elevation) related to the location of the corresponding meteorological stations in the training set section, and then extended them to all the space after you trained the model. Is the machine learning model trained every month or is it a model that covers the entire study period?
2. Can we open-source the core machine learning code, specifically the model building and training section, so as to deepen our understanding of the article method?-
AC1: 'Reply on CC1', Ming Wang, 11 Dec 2021
Response1: Thank you for asking the question. The independent variables are latitude, longitude and elevation; the dependent variable is the air temperature (maximum temperature/mean temperature/minimum temperature). The machine learning models were trained for each month. We have explained the model training in the paper (See Preprint Lines 191-194): “We extracted the independent variables (i.e., latitude, longitude, and elevation) relating to the meteorological stations and randomly divided the processed data into a set for model training (70%) and a set for model evaluation and validation (30%). When train the models, the 10-fold cross-validation was used. We constructed a model for each month separately which means we have 840 models for the 840 months from 1951 to 2020. All algorithms were implemented in MATLAB R2020b.”
Response 2: Thanks for your suggestion. We agree that open source code can help to understand the work. However, due to the requirements of the foundation, we may make the codes available after the acceptance of the work.
-
AC1: 'Reply on CC1', Ming Wang, 11 Dec 2021
-
RC1: 'Comment on essd-2021-442', Anonymous Referee #1, 19 Jan 2022
The manuscript aims to produce a long term dataset of monthly 2m temperature over China at high spatial resolution on a 1x1 km grid. While the objective is appealing due to the challenges related to the complex topography and the irregular data availability in the target region, the applied methods show up with significant issues. The major issues are listed subsequently:
[1] The introduction discusses advantages and disadvantages of different information sources for the targeted dataset. While strong arguments for point-wise observational data are presented, long term reanalysis data products are not considered despite they provide consistent and spatio-temporally coherent information on the atmospheric state. It is unclear why such data is not considered to provide predictor variables.
[2] The method of data splitting leads to strong autocorrelation between the training and test dataset. Due to the spatial proximity of stations in both dataset, a fundamental requirement is hurt, that is the independency (or at least a minimization of dependency) between the training and test dataset. This is especially true for the stations located in the flat eastern parts of China with a dense observational network. Thus, the statistical model are prone to learn nearest neighbor-relations rather than learning real abstractions from the features, see, e.g. Kleinert et al., 2021 for a more detailed discussion on the requirement of splitting the test and training data temporally when stations are located close to each other.
[3] Only static features are used as predictors which implies that a model must trained for each month (!) of the period under consideration. Thus, dynamic information on the atmospheric state can exclusively deduced from the optimization procedure on the predictand. It is strongly recommended to introduce dynamical data as a predictor variable instead. Besides, the chosen predictors have periods with neglectable correlation with respect to the target quantity and important features such as the ambient topography (is the meteorological station located in a valley) is absent (see, e.g., Sha et al., 2020).
[4] The evaluation does not serve the objectives of the study. The stations in the test dataset are dominated by stations over flat terrain with a dense observational network. Thus, potential deficiencies in capturing the variations due to underlying complex topography are hidden. Indeed, Figure 5 indicates that residuals are considerably larger over the mountainous region.
[5] Several issues in the follow-up study are present such as (a) a focus on large-scale temperature patterns instead of fine-scale patterns in Section 4.2. to reason the high spatial resolution of the dataset, (b) the interpretation of patterns in the Xinjiang region which look like artefacts (bulls-eye pattern in winter months) and (c) the missing notification on the better performance of the reference method ANUSPLIN for July-months in the 70s, 80s and 90s.
[6] The comparison to the competing datasets ERA5 and FLADS is misleading due to the much coarser spatial resolution of these two datasets. A fair comparison would consult datasets with similar spatial resolution such as the dataset described in Peng et al., 2019.
Further minor issues are:
* Splitting into three distinct datasets is unnecessary. Rather merge it to one dataset with one DOI.
* Refer to statistical and dynamical downscaling techniques in the introduction.
* Provide references to the problems related to remote sensing data (see l.66).
* Describe the remapping of the STRM DEM data onto the 1x1 km grid (should be an averaging method).
* The used software tool MATLAB should be only mentioned once rather than being repeated three times. More details on the respective ML-technique would be appreciated.
* l.149: Should be 'ensemble machine learning'
* l.219: "Unnecessary reference to Equation 4 which directly follows the sentence.
* l.343f. This is sentence is barely comprehensible.Mentioned literature references:
- Kleinert, Felix, Martin G. Schultz, and Lukas H. Leufen. "IntelliO3-ts v1. 0: A neural network approach to predict near-surface ozone concentrations in Germany." Geoscientic model development discussions 2020. FZJ-2020-05012 (2020): 1-69
- Sha, Yingkai, et al. "Deep-learning-based gridded downscaling of surface meteorological variables in complex terrain. Part I: Daily maximum and minimum 2-m temperature." Journal of Applied Meteorology and Climatolog 59.12 (2020): 2057-2073.
- Peng, Shouzhang, et al. "1 km monthly temperature and precipitation dataset for China from 1901 to 2017." Earth System Data Science Data 11.4 (2019): 1931-1946
-
AC2: 'Reply on RC1', Ming Wang, 05 Feb 2022
Dear Reviewer,
Thank you a lot for the constructive comments. The comments have been immensely helpful. We appreciate your insightful comments on our paper. We have responded to every question, indicating exactly how we addressed each concern.
The point-to-point responses to the comments are attached to the supplement in PDF format.
Thank you again for your reviewing and the insightful comments.
Sincerely,
Qian He on behalf of all Co-Authors
-
RC2: 'Comment on essd-2021-442', Anonymous Referee #2, 21 Mar 2022
ESSD paper review:
This study describes a new 1km dataset of monthly-mean, monthly-maximum and monthly-minimum surface temperature’s over China, developed using machine learning methods. The method used for the final data set was chosen as the best performing method, after a comparison of three modern techniques. A dataset of 613 weather stations over China was used to train and test the machine learning methods. This study is very clearly written and the Figures are of high quality. I agree with all of reviewer 1’s comments, so will not repeat these points and assume they have been addressed within the manuscript, but I will add a few further comments below.
General comments:
- There is always a tradeoff between spatial and temporal resolution when designing new data products. Can you explain in a few sentences in the manuscript why you chose to create a product with such high spatial resolution but such low temporal resolution? You make comparisons at the end to the ERA5 dataset which does have much lower spatial resolution (~30 x less) but it has hourly temporal resolution (720 x more) which is very useful for a number of applications. Comments suggesting the applications where you think this dataset may be preferable to the others mentioned would also be useful.
- You mention ERA5 is only available from 1979, but it is now available back to 1950, so could be used to incorporate dynamical variables (as suggested by reviewer 1). I’m not suggesting you do this, but in the limitations this could be a point for future development. And the text should be updated to reflect the availability of ERA5.
- Is any quality control performed on the meteorological station data you use as inputs? A few stations with low quality data could skew the results in data sparse regions.
- Do you know if the final model output is sensitive to the choice of stations used in the test/training dataset? I imagine that this could heavily influence the results in the data sparse regions.
- Although you’ve included elevation, latitude and longitude there are multiple climatic regions in China, and a large amount of external drivers to variations in temperatures. The strength of these may modulate surface temperature behavior (e.g. the strength/location of the monsoon criculation, El Nino southern Oscillation, and other global teleconnections). Distance from the ocean could also play a role. Have you considered these in your explanations for months/stations with particuarly large residuals, or stations with strange behaviors? It could be that if a month had anomalous large scale weather conditions, which your machine learning methods are not trained to capture there are large residuals? These could make interesting case studies and could motivate future work incorporating some dynamical predictors.
- Figure 2: This is a nice depiction of the relationships. If you could briefly unpack the meteorological understanding behind this in the text it would be beneficial to readers. Have you checked that the relationships hold if different climatic regions of China are subset out?
- Line 190-195. So you have 840 different models. Can you comment on how different are all the 70 models for each month? (e.g. do all the January models look very similar?) This could be useful to understand if there are dynamical meteorological explanations for any outliers.
- Line 243: Do you have a sense of why the errors are larger in the colder months? Are the impacts of local meteorological conditions larger in the cold season, which would make it more difficult for the methods to work? There may be meteorological literature on this.
- Figure 6: Can you comment on any features you’re resolving here that are not seen in the lower resolution gridded products you will compare to? There are some very high resolution features on the map, but clarification that they are physical would be useful.
- Does your trend analysis agree with the existing literature on global warming over China? If so include references to this.
- Figure 11: The Taylor diagrams show clear improvement from your new dataset. Also including some timeseries from locations not sampled from the observation network compared between the three datasets would be useful to understand how the four products sample the seasonal cycles of the variables.
Small corrections:
- The height of the air temperatures (surface, 1.5m, 2m) should be added to the manuscript when this is mentioned.
- The acronyms for datasets/methods should be defined in the abstract to make it easier to read.
- Line 38: after commenting on the limitations of the observing stations you could comment here on the limitations of reanalysis based products.
- Throughout the text when you say ‘high resolution’ this should be changed to ‘high spatial resolution’ e.g. line 55.
- Line 56: ‘traditional interpolation techniques’ might be clearer?
- Line 58-60: You comment on a few studies which talk about the superior performance of machine learning techniques but you do not say what the benchmark is that they’ve succeeded against. This should be included.
- Line 61: ‘estimation of short-term air temperature ‘ – I’m not sure what you mean by this?
- Line 86: The link here gives me an Error 404.
- Around the discussion for Figure 1 it would be interesting to know the spatial distance between observation sites. This might be a small indication of confidence in the final machine learning model output.
- Section 2.3: When commenting on the spatial resolution of the gridded products used for comparison it would be useful to also have this in km.
- Line 149: ‘machining learning’ should be ‘machine learning’
- Section 3.2.2 Are the choices of parameters for the SVM method standard in the literature? Can you please comment on your choices?
- Line 342: ‘ shows a cyclic pattern’ might be clearer.
-
AC4: 'Reply on RC2', Ming Wang, 27 Mar 2022
Dear Reviewer,
Thank you a lot for the constructive comments. The comments have been immensely helpful. We appreciate your insightful comments on our paper. We have responded to every question, indicating exactly how we addressed each concern.
The point-to-point responses to the comments are attached to the supplement in PDF format.
Thank you again for your reviewing and the insightful comments.
Sincerely,
Qian He on behalf of all Co-Authors
-
AC5: 'Reply on RC2', Ming Wang, 27 Mar 2022
Dear Reviewer,
Thank you a lot for the constructive comments. The comments have been immensely helpful. We appreciate your insightful comments on our paper. We have responded to every question, indicating exactly how we addressed each concern.
The point-to-point responses to the comments are attached to the supplement in PDF format.
Thank you again for your reviewing and the insightful comments.
Sincerely,
Qian He on behalf of all Co-Authors
-
RC3: 'Comment on essd-2021-442', Athanasia Iona, 22 Mar 2022
General Comments:
The paper is presenting a method for spatial interpolation of air temperature data for China from meteorological stations based on machine learning tools. The authors analyze the technique used and present the limitations of the experiment. Three ML models were tested and three interpolation method and the Gaussian Process Regression was chosen based on its better performance. The results compared with existing published datasets. A detailed trend analysis of the predicted dataset is also presented. ML techniques are very promising as they are addressing current challenges in computational research.
Specific comments:
- It is not clear to me if all stations contribute equally to the analysis, for example red and blue stations (Figure S2). Is there any weighting technique applied to the training model(s)? If yes, I think it could be mentioned.
- Besides the trends statistical analysis, are there available error spatial distributions of the predicted temperatures so as to illustrate the confidence level of the analysis results especially in station empty areas?
- In addition, as the study of the climatic dynamics is in the epicenter of this work, the use of remote sensing data jointly with the land meteo stations could overcome the data scarcity, improve the results and reveal trends with more accuracy after 2000.
- While the height of the air Temperature is mentioned for the ERA5 dataset (2m), this is not the case for the GPRChinaTemp1km product or the other datasets mentioned in the analysis.
- Could the experiment be tested in other atmospheric parameters ? If so, I think that a few sentences on the perspectives of the specific approach would be beneficial.
-
AC3: 'Reply on RC3', Ming Wang, 27 Mar 2022
Dear Reviewer,
Thank you a lot for the constructive comments. The comments have been immensely helpful. We appreciate your insightful comments on our paper. We have responded to every question, indicating exactly how we addressed each concern.
The point-to-point responses to the comments are attached to the supplement in PDF format.
Thank you again for your reviewing and the insightful comments.
Sincerely,
Qian He on behalf of all Co-Authors
Status: closed
-
CC1: 'Comment on essd-2021-442', QiangYu Lee, 10 Dec 2021
Thank you for your work, you have provided valuable long-term monitoring data, and I have a few questions to ask for answers here, as follows:
1. Line 190, ask the machine learning dataset is
How to build it, specifically, what are the independent and dependent variables of the input model. Here I understand that you extracted the independent variables (i.e. latitude and longitude and elevation) related to the location of the corresponding meteorological stations in the training set section, and then extended them to all the space after you trained the model. Is the machine learning model trained every month or is it a model that covers the entire study period?
2. Can we open-source the core machine learning code, specifically the model building and training section, so as to deepen our understanding of the article method?-
AC1: 'Reply on CC1', Ming Wang, 11 Dec 2021
Response1: Thank you for asking the question. The independent variables are latitude, longitude and elevation; the dependent variable is the air temperature (maximum temperature/mean temperature/minimum temperature). The machine learning models were trained for each month. We have explained the model training in the paper (See Preprint Lines 191-194): “We extracted the independent variables (i.e., latitude, longitude, and elevation) relating to the meteorological stations and randomly divided the processed data into a set for model training (70%) and a set for model evaluation and validation (30%). When train the models, the 10-fold cross-validation was used. We constructed a model for each month separately which means we have 840 models for the 840 months from 1951 to 2020. All algorithms were implemented in MATLAB R2020b.”
Response 2: Thanks for your suggestion. We agree that open source code can help to understand the work. However, due to the requirements of the foundation, we may make the codes available after the acceptance of the work.
-
AC1: 'Reply on CC1', Ming Wang, 11 Dec 2021
-
RC1: 'Comment on essd-2021-442', Anonymous Referee #1, 19 Jan 2022
The manuscript aims to produce a long term dataset of monthly 2m temperature over China at high spatial resolution on a 1x1 km grid. While the objective is appealing due to the challenges related to the complex topography and the irregular data availability in the target region, the applied methods show up with significant issues. The major issues are listed subsequently:
[1] The introduction discusses advantages and disadvantages of different information sources for the targeted dataset. While strong arguments for point-wise observational data are presented, long term reanalysis data products are not considered despite they provide consistent and spatio-temporally coherent information on the atmospheric state. It is unclear why such data is not considered to provide predictor variables.
[2] The method of data splitting leads to strong autocorrelation between the training and test dataset. Due to the spatial proximity of stations in both dataset, a fundamental requirement is hurt, that is the independency (or at least a minimization of dependency) between the training and test dataset. This is especially true for the stations located in the flat eastern parts of China with a dense observational network. Thus, the statistical model are prone to learn nearest neighbor-relations rather than learning real abstractions from the features, see, e.g. Kleinert et al., 2021 for a more detailed discussion on the requirement of splitting the test and training data temporally when stations are located close to each other.
[3] Only static features are used as predictors which implies that a model must trained for each month (!) of the period under consideration. Thus, dynamic information on the atmospheric state can exclusively deduced from the optimization procedure on the predictand. It is strongly recommended to introduce dynamical data as a predictor variable instead. Besides, the chosen predictors have periods with neglectable correlation with respect to the target quantity and important features such as the ambient topography (is the meteorological station located in a valley) is absent (see, e.g., Sha et al., 2020).
[4] The evaluation does not serve the objectives of the study. The stations in the test dataset are dominated by stations over flat terrain with a dense observational network. Thus, potential deficiencies in capturing the variations due to underlying complex topography are hidden. Indeed, Figure 5 indicates that residuals are considerably larger over the mountainous region.
[5] Several issues in the follow-up study are present such as (a) a focus on large-scale temperature patterns instead of fine-scale patterns in Section 4.2. to reason the high spatial resolution of the dataset, (b) the interpretation of patterns in the Xinjiang region which look like artefacts (bulls-eye pattern in winter months) and (c) the missing notification on the better performance of the reference method ANUSPLIN for July-months in the 70s, 80s and 90s.
[6] The comparison to the competing datasets ERA5 and FLADS is misleading due to the much coarser spatial resolution of these two datasets. A fair comparison would consult datasets with similar spatial resolution such as the dataset described in Peng et al., 2019.
Further minor issues are:
* Splitting into three distinct datasets is unnecessary. Rather merge it to one dataset with one DOI.
* Refer to statistical and dynamical downscaling techniques in the introduction.
* Provide references to the problems related to remote sensing data (see l.66).
* Describe the remapping of the STRM DEM data onto the 1x1 km grid (should be an averaging method).
* The used software tool MATLAB should be only mentioned once rather than being repeated three times. More details on the respective ML-technique would be appreciated.
* l.149: Should be 'ensemble machine learning'
* l.219: "Unnecessary reference to Equation 4 which directly follows the sentence.
* l.343f. This is sentence is barely comprehensible.Mentioned literature references:
- Kleinert, Felix, Martin G. Schultz, and Lukas H. Leufen. "IntelliO3-ts v1. 0: A neural network approach to predict near-surface ozone concentrations in Germany." Geoscientic model development discussions 2020. FZJ-2020-05012 (2020): 1-69
- Sha, Yingkai, et al. "Deep-learning-based gridded downscaling of surface meteorological variables in complex terrain. Part I: Daily maximum and minimum 2-m temperature." Journal of Applied Meteorology and Climatolog 59.12 (2020): 2057-2073.
- Peng, Shouzhang, et al. "1 km monthly temperature and precipitation dataset for China from 1901 to 2017." Earth System Data Science Data 11.4 (2019): 1931-1946
-
AC2: 'Reply on RC1', Ming Wang, 05 Feb 2022
Dear Reviewer,
Thank you a lot for the constructive comments. The comments have been immensely helpful. We appreciate your insightful comments on our paper. We have responded to every question, indicating exactly how we addressed each concern.
The point-to-point responses to the comments are attached to the supplement in PDF format.
Thank you again for your reviewing and the insightful comments.
Sincerely,
Qian He on behalf of all Co-Authors
-
RC2: 'Comment on essd-2021-442', Anonymous Referee #2, 21 Mar 2022
ESSD paper review:
This study describes a new 1km dataset of monthly-mean, monthly-maximum and monthly-minimum surface temperature’s over China, developed using machine learning methods. The method used for the final data set was chosen as the best performing method, after a comparison of three modern techniques. A dataset of 613 weather stations over China was used to train and test the machine learning methods. This study is very clearly written and the Figures are of high quality. I agree with all of reviewer 1’s comments, so will not repeat these points and assume they have been addressed within the manuscript, but I will add a few further comments below.
General comments:
- There is always a tradeoff between spatial and temporal resolution when designing new data products. Can you explain in a few sentences in the manuscript why you chose to create a product with such high spatial resolution but such low temporal resolution? You make comparisons at the end to the ERA5 dataset which does have much lower spatial resolution (~30 x less) but it has hourly temporal resolution (720 x more) which is very useful for a number of applications. Comments suggesting the applications where you think this dataset may be preferable to the others mentioned would also be useful.
- You mention ERA5 is only available from 1979, but it is now available back to 1950, so could be used to incorporate dynamical variables (as suggested by reviewer 1). I’m not suggesting you do this, but in the limitations this could be a point for future development. And the text should be updated to reflect the availability of ERA5.
- Is any quality control performed on the meteorological station data you use as inputs? A few stations with low quality data could skew the results in data sparse regions.
- Do you know if the final model output is sensitive to the choice of stations used in the test/training dataset? I imagine that this could heavily influence the results in the data sparse regions.
- Although you’ve included elevation, latitude and longitude there are multiple climatic regions in China, and a large amount of external drivers to variations in temperatures. The strength of these may modulate surface temperature behavior (e.g. the strength/location of the monsoon criculation, El Nino southern Oscillation, and other global teleconnections). Distance from the ocean could also play a role. Have you considered these in your explanations for months/stations with particuarly large residuals, or stations with strange behaviors? It could be that if a month had anomalous large scale weather conditions, which your machine learning methods are not trained to capture there are large residuals? These could make interesting case studies and could motivate future work incorporating some dynamical predictors.
- Figure 2: This is a nice depiction of the relationships. If you could briefly unpack the meteorological understanding behind this in the text it would be beneficial to readers. Have you checked that the relationships hold if different climatic regions of China are subset out?
- Line 190-195. So you have 840 different models. Can you comment on how different are all the 70 models for each month? (e.g. do all the January models look very similar?) This could be useful to understand if there are dynamical meteorological explanations for any outliers.
- Line 243: Do you have a sense of why the errors are larger in the colder months? Are the impacts of local meteorological conditions larger in the cold season, which would make it more difficult for the methods to work? There may be meteorological literature on this.
- Figure 6: Can you comment on any features you’re resolving here that are not seen in the lower resolution gridded products you will compare to? There are some very high resolution features on the map, but clarification that they are physical would be useful.
- Does your trend analysis agree with the existing literature on global warming over China? If so include references to this.
- Figure 11: The Taylor diagrams show clear improvement from your new dataset. Also including some timeseries from locations not sampled from the observation network compared between the three datasets would be useful to understand how the four products sample the seasonal cycles of the variables.
Small corrections:
- The height of the air temperatures (surface, 1.5m, 2m) should be added to the manuscript when this is mentioned.
- The acronyms for datasets/methods should be defined in the abstract to make it easier to read.
- Line 38: after commenting on the limitations of the observing stations you could comment here on the limitations of reanalysis based products.
- Throughout the text when you say ‘high resolution’ this should be changed to ‘high spatial resolution’ e.g. line 55.
- Line 56: ‘traditional interpolation techniques’ might be clearer?
- Line 58-60: You comment on a few studies which talk about the superior performance of machine learning techniques but you do not say what the benchmark is that they’ve succeeded against. This should be included.
- Line 61: ‘estimation of short-term air temperature ‘ – I’m not sure what you mean by this?
- Line 86: The link here gives me an Error 404.
- Around the discussion for Figure 1 it would be interesting to know the spatial distance between observation sites. This might be a small indication of confidence in the final machine learning model output.
- Section 2.3: When commenting on the spatial resolution of the gridded products used for comparison it would be useful to also have this in km.
- Line 149: ‘machining learning’ should be ‘machine learning’
- Section 3.2.2 Are the choices of parameters for the SVM method standard in the literature? Can you please comment on your choices?
- Line 342: ‘ shows a cyclic pattern’ might be clearer.
-
AC4: 'Reply on RC2', Ming Wang, 27 Mar 2022
Dear Reviewer,
Thank you a lot for the constructive comments. The comments have been immensely helpful. We appreciate your insightful comments on our paper. We have responded to every question, indicating exactly how we addressed each concern.
The point-to-point responses to the comments are attached to the supplement in PDF format.
Thank you again for your reviewing and the insightful comments.
Sincerely,
Qian He on behalf of all Co-Authors
-
AC5: 'Reply on RC2', Ming Wang, 27 Mar 2022
Dear Reviewer,
Thank you a lot for the constructive comments. The comments have been immensely helpful. We appreciate your insightful comments on our paper. We have responded to every question, indicating exactly how we addressed each concern.
The point-to-point responses to the comments are attached to the supplement in PDF format.
Thank you again for your reviewing and the insightful comments.
Sincerely,
Qian He on behalf of all Co-Authors
-
RC3: 'Comment on essd-2021-442', Athanasia Iona, 22 Mar 2022
General Comments:
The paper is presenting a method for spatial interpolation of air temperature data for China from meteorological stations based on machine learning tools. The authors analyze the technique used and present the limitations of the experiment. Three ML models were tested and three interpolation method and the Gaussian Process Regression was chosen based on its better performance. The results compared with existing published datasets. A detailed trend analysis of the predicted dataset is also presented. ML techniques are very promising as they are addressing current challenges in computational research.
Specific comments:
- It is not clear to me if all stations contribute equally to the analysis, for example red and blue stations (Figure S2). Is there any weighting technique applied to the training model(s)? If yes, I think it could be mentioned.
- Besides the trends statistical analysis, are there available error spatial distributions of the predicted temperatures so as to illustrate the confidence level of the analysis results especially in station empty areas?
- In addition, as the study of the climatic dynamics is in the epicenter of this work, the use of remote sensing data jointly with the land meteo stations could overcome the data scarcity, improve the results and reveal trends with more accuracy after 2000.
- While the height of the air Temperature is mentioned for the ERA5 dataset (2m), this is not the case for the GPRChinaTemp1km product or the other datasets mentioned in the analysis.
- Could the experiment be tested in other atmospheric parameters ? If so, I think that a few sentences on the perspectives of the specific approach would be beneficial.
-
AC3: 'Reply on RC3', Ming Wang, 27 Mar 2022
Dear Reviewer,
Thank you a lot for the constructive comments. The comments have been immensely helpful. We appreciate your insightful comments on our paper. We have responded to every question, indicating exactly how we addressed each concern.
The point-to-point responses to the comments are attached to the supplement in PDF format.
Thank you again for your reviewing and the insightful comments.
Sincerely,
Qian He on behalf of all Co-Authors
Qian He et al.
Data sets
GPRChinaTemp1km: 1 km monthly minimum air temperature for China from January 1951 to December 2020 He, Qian; Wang, Ming; Liu, Kai; Li, Kaiwen; Jiang, Ziyu https://doi.org/10.5281/zenodo.5112232
GPRChinaTemp1km: 1 km monthly mean air temperature for China from January 1951 to December 2020 He, Qian; Wang, Ming; Liu, Kai; Li, Kaiwen; Jiang, Ziyu https://doi.org/10.5281/zenodo.5111989
GPRChinaTemp1km: 1 km monthly maximum air temperature for China from January 1951 to December 2020 He, Qian; Wang, Ming; Liu, Kai; Li, Kaiwen; Jiang, Ziyu https://doi.org/10.5281/zenodo.5112122
Qian He et al.
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
637 | 159 | 30 | 826 | 52 | 10 | 15 |
- HTML: 637
- PDF: 159
- XML: 30
- Total: 826
- Supplement: 52
- BibTeX: 10
- EndNote: 15
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1