GPRChinaTemp1km: a high-resolution monthly air temperature dataset for China (1951&ndash;2020) based on machine learning

He, Qian; Wang, Ming; Liu, Kai; Li, Kaiwen; Jiang, Ziyu

doi:https://doi.org/10.5194/essd-2021-267

Preprints

https://doi.org/10.5194/essd-2021-267

Preprints

23 Aug 2021

| 23 Aug 2021

Status: this discussion paper is a preprint. It has been under review for the journal Earth System Science Data (ESSD). The manuscript was not accepted for further review after discussion.

GPRChinaTemp1km: a high-resolution monthly air temperature dataset for China (1951–2020) based on machine learning

Qian He, Ming Wang, Kai Liu, Kaiwen Li, and Ziyu Jiang

Abstract. An accurate spatially continuous air temperature dataset is crucial for multiple applications in environmental and ecological sciences. Existing spatial interpolation methods have relatively low accuracy and the resolution of available long-term gridded products of air temperature for China is coarse. Point observations from meteorological stations can provide long-term air temperature data series but cannot represent spatially continuous information. Here, we devised a method for spatial interpolation of air temperature data from meteorological stations based on powerful machine learning tools. First, to determine the optimal method for interpolation of air temperature data, we employed three machine learning models: random forest, support vector machine, and Gaussian process regression. Comparison of the mean absolute error, root mean square error, coefficient of determination, and residuals revealed that Gaussian process regression had high accuracy and clearly outperformed the other two models regarding interpolation of monthly maximum, minimum, and mean air temperatures. The machine learning methods were compared with three traditional methods used frequently for spatial interpolation: inverse distance weighting, ordinary kriging, and ANUSPLIN. Results showed that the Gaussian process regression model had higher accuracy and greater robustness than the traditional methods regarding interpolation of monthly maximum, minimum, and mean air temperatures in each month. Comparison with the TerraClimate, FLDAS, and ERA5 datasets revealed that the accuracy of the temperature data generated using the Gaussian process regression model was higher. Finally, using the Gaussian process regression method, we produced a long-term (January 1951 to December 2020) gridded monthly air temperature dataset with 1 km resolution and high accuracy for China, which we named GPRChinaTemp1km. The dataset consists of three variables: monthly mean air temperature, monthly maximum air temperature, and monthly minimum air temperature. The obtained GPRChinaTemp1km data were used to analyse the spatiotemporal variations of air temperature using Theil–Sen median trend analysis in combination with the Mann–Kendall test. It was found that the monthly mean and minimum air temperatures across China were characterized by a significant trend of increase in each month, whereas monthly maximum air temperature showed a more spatially heterogeneous pattern with significant increase, non-significant increase, and non-significant decrease. The GPRChinaTemp1km dataset is publicly available at https://doi.org/10.5281/zenodo.5112122 (He et al., 2021a) for monthly maximum air temperature, at https://doi.org/10.5281/zenodo.5111989 (He et al., 2021b) for monthly mean air temperature and at https://doi.org/10.5281/zenodo.5112232 (He et al., 2021c) for monthly minimum air temperature.

Received: 09 Aug 2021 – Discussion started: 23 Aug 2021

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 3377 KB)

Supplement (14052 KB)

Download & links

Preprint (3377 KB)
Metadata XML
Supplement (14052 KB)
BibTeX
EndNote

Qian He, Ming Wang, Kai Liu, Kaiwen Li, and Ziyu Jiang

Status: closed

RC1:
'Comment on essd-2021-267', Anonymous Referee #1, 21 Sep 2021
The manuscript describes ML tools for spatial interpolation of air temperature data in China to a 1k resolution from meteorological stations. This topic is essential. However, many significant issues must be addressed. Here are some comments that hopefully can help to improve the manuscript qualtiy:

It is not clear what is the spatial resolution of the meteorological station? Since the covered period and the temporal resolution, missing data of the monitoring stations may differ, it may be useful to provide a table or figure to summarize all this information.

The methods used in this study are inappropriate, and the experiments lack sufficient detail. In ML, the dataset should be divided into train, validation, and test subsets. The validation can be adopted to evaluate the model while tuning model hyperparameters. Hyperparameters are crucial to obtain the best performance model, which is missing in this manuscript.

Missing details regarding how to process the data., i.e., it is unclear how to deal with the missing data, how to normalize the data, etc.

The model is compared based on one dataset, and without a statistical test, I would like to say there is a high chance that the GPR outperforms others by accident.

Why the error RMSE, MAE and R2 shows a cycle pattern? Any reason for that?

Minor comments and questions?

1) Line 7 need to clarify why to use the "subset features" option of Geostatistic Analysis tools. Is it used to split features or datasets?

2) The explanation of SVM is not clear and needs to be further improved.

3) In Line 189, the sentence is not understandable.
Citation: https://doi.org/10.5194/essd-2021-267-RC1
RC2: 'Comment on essd-2021-267', Anonymous Referee #2, 04 Oct 2021

This study conducted by Qian He et.al produced a high-resolution air temperature dataset using three types of machine learning methods. The dataset is timely, and fits well the scope of the journal, which could be valuable and interested to the readers and community. The language and the methods of the work is overall good and I enjoyed reading it. I would really like to see this dataset published.

However, there are some points/aspects not clearly enough or needed to be clarified further. I have a number of general comments and suggestions listed below:

1) Generating high precision long time series of temperature data in China can effectively meet the needs of scientific research, but there are already high precision temperature data with 1km resolution in China have been released (Zhu X et al, 2019; Peng S et al, 2019), what are the innovative and different points of your data/methods?

2) The selection of the characteristic factors: The authors chose three spatially invariant variables, lon, lat and elevation, to predict the dynamic changes of temperature. Whereas these three static factors do not really reflect the changes of temperature and the real spatial distribution characteristics of temperature. Have you ever considered factors such as NDVI vegetation index, land use change, surface temperature, and temporal and spatial correlations, month changes, etc.

3) The accuracy of machine learning depends on the adjustment and calibration of hyperparameters. Here, 840 models are used in this study, are these 840 models using the same set of parameters or are each set of parameters different?

4) How do you conduct the accuracy verification of the raster products? The authors used a limited number of 613 sites to generate 1km raster data products. However, as far as I know, the climate modelling sites are too sparse for the Qinghai-Tibet Plateau and Northwest China.

Citation: https://doi.org/10.5194/essd-2021-267-RC2

Status: closed

RC1:
'Comment on essd-2021-267', Anonymous Referee #1, 21 Sep 2021
The manuscript describes ML tools for spatial interpolation of air temperature data in China to a 1k resolution from meteorological stations. This topic is essential. However, many significant issues must be addressed. Here are some comments that hopefully can help to improve the manuscript qualtiy:

It is not clear what is the spatial resolution of the meteorological station? Since the covered period and the temporal resolution, missing data of the monitoring stations may differ, it may be useful to provide a table or figure to summarize all this information.

The methods used in this study are inappropriate, and the experiments lack sufficient detail. In ML, the dataset should be divided into train, validation, and test subsets. The validation can be adopted to evaluate the model while tuning model hyperparameters. Hyperparameters are crucial to obtain the best performance model, which is missing in this manuscript.

Missing details regarding how to process the data., i.e., it is unclear how to deal with the missing data, how to normalize the data, etc.

The model is compared based on one dataset, and without a statistical test, I would like to say there is a high chance that the GPR outperforms others by accident.

Why the error RMSE, MAE and R2 shows a cycle pattern? Any reason for that?

Minor comments and questions?

1) Line 7 need to clarify why to use the "subset features" option of Geostatistic Analysis tools. Is it used to split features or datasets?

2) The explanation of SVM is not clear and needs to be further improved.

3) In Line 189, the sentence is not understandable.
Citation: https://doi.org/10.5194/essd-2021-267-RC1
RC2: 'Comment on essd-2021-267', Anonymous Referee #2, 04 Oct 2021

This study conducted by Qian He et.al produced a high-resolution air temperature dataset using three types of machine learning methods. The dataset is timely, and fits well the scope of the journal, which could be valuable and interested to the readers and community. The language and the methods of the work is overall good and I enjoyed reading it. I would really like to see this dataset published.

However, there are some points/aspects not clearly enough or needed to be clarified further. I have a number of general comments and suggestions listed below:

1) Generating high precision long time series of temperature data in China can effectively meet the needs of scientific research, but there are already high precision temperature data with 1km resolution in China have been released (Zhu X et al, 2019; Peng S et al, 2019), what are the innovative and different points of your data/methods?

2) The selection of the characteristic factors: The authors chose three spatially invariant variables, lon, lat and elevation, to predict the dynamic changes of temperature. Whereas these three static factors do not really reflect the changes of temperature and the real spatial distribution characteristics of temperature. Have you ever considered factors such as NDVI vegetation index, land use change, surface temperature, and temporal and spatial correlations, month changes, etc.

3) The accuracy of machine learning depends on the adjustment and calibration of hyperparameters. Here, 840 models are used in this study, are these 840 models using the same set of parameters or are each set of parameters different?

4) How do you conduct the accuracy verification of the raster products? The authors used a limited number of 613 sites to generate 1km raster data products. However, as far as I know, the climate modelling sites are too sparse for the Qinghai-Tibet Plateau and Northwest China.

Citation: https://doi.org/10.5194/essd-2021-267-RC2

Qian He, Ming Wang, Kai Liu, Kaiwen Li, and Ziyu Jiang

Supplement

https://doi.org/10.5194/essd-2021-267-supplement

Data sets

GPRChinaTemp1km: 1 km monthly maximum air temperature for China from January 1951 to December 2020 He, Qian; Wang, Ming; Liu, Kai; Li, Kaiwen; Jiang, Ziyu https://doi.org/10.5281/zenodo.5112122

GPRChinaTemp1km: 1 km monthly minimum air temperature for China from January 1951 to December 2020 He, Qian; Wang, Ming; Liu, Kai; Li, Kaiwen; Jiang, Ziyu https://doi.org/10.5281/zenodo.5112232

GPRChinaTemp1km: 1 km monthly mean air temperature for China from January 1951 to December 2020 He, Qian Beijing Normal University ; Wang, Ming; Liu, Kai; Li, Kaiwen; Jiang, Ziyu https://doi.org/10.5281/zenodo.5111989

Qian He, Ming Wang, Kai Liu, Kaiwen Li, and Ziyu Jiang

Viewed

Total article views: 2,665 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
2,133	438	94	2,665	204	118	151

HTML: 2,133
PDF: 438
XML: 94
Total: 2,665
Supplement: 204
BibTeX: 118
EndNote: 151

Views and downloads (calculated since 23 Aug 2021)

Month	HTML	PDF	XML	Total
Aug 2021	138	35	2	175
Sep 2021	120	29	7	156
Oct 2021	124	23	2	149
Nov 2021	66	23	2	91
Dec 2021	43	13	2	58
Jan 2022	39	14	2	55
Feb 2022	37	10	0	47
Mar 2022	30	11	1	42
Apr 2022	35	9	2	46
May 2022	14	6	1	21
Jun 2022	16	7	1	24
Jul 2022	28	6	0	34
Aug 2022	14	4	1	19
Sep 2022	21	3	0	24
Oct 2022	30	6	1	37
Nov 2022	95	23	1	119
Dec 2022	39	7	2	48
Jan 2023	37	11	0	48
Feb 2023	24	11	1	36
Mar 2023	42	4	1	47
Apr 2023	28	10	2	40
May 2023	17	2	1	20
Jun 2023	30	10	2	42
Jul 2023	44	13	2	59
Aug 2023	27	7	1	35
Sep 2023	42	4	2	48
Oct 2023	34	12	1	47
Nov 2023	15	3	0	18
Dec 2023	20	7	1	28
Jan 2024	40	3	0	43
Feb 2024	27	4	6	37
Mar 2024	30	7	8	45
Apr 2024	25	4	3	32
May 2024	21	3	5	29
Jun 2024	38	4	2	44
Jul 2024	43	4	7	54
Aug 2024	35	1	3	39
Sep 2024	26	9	1	36
Oct 2024	23	2	2	27
Nov 2024	27	3	1	31
Dec 2024	14	1	0	15
Jan 2025	22	2	0	24
Feb 2025	22	1	0	23
Mar 2025	23	7	1	31
Apr 2025	10	10	1	21
May 2025	21	3	2	26
Jun 2025	31	14	2	47
Jul 2025	20	7	3	30
Aug 2025	64	6	3	73
Sep 2025	295	11	1	307
Oct 2025	27	9	2	38

Cumulative views and downloads (calculated since 23 Aug 2021)

Month	HTML	PDF	XML	Total
Aug 2021	138	35	2	175
Sep 2021	120	29	7	156
Oct 2021	124	23	2	149
Nov 2021	66	23	2	91
Dec 2021	43	13	2	58
Jan 2022	39	14	2	55
Feb 2022	37	10	0	47
Mar 2022	30	11	1	42
Apr 2022	35	9	2	46
May 2022	14	6	1	21
Jun 2022	16	7	1	24
Jul 2022	28	6	0	34
Aug 2022	14	4	1	19
Sep 2022	21	3	0	24
Oct 2022	30	6	1	37
Nov 2022	95	23	1	119
Dec 2022	39	7	2	48
Jan 2023	37	11	0	48
Feb 2023	24	11	1	36
Mar 2023	42	4	1	47
Apr 2023	28	10	2	40
May 2023	17	2	1	20
Jun 2023	30	10	2	42
Jul 2023	44	13	2	59
Aug 2023	27	7	1	35
Sep 2023	42	4	2	48
Oct 2023	34	12	1	47
Nov 2023	15	3	0	18
Dec 2023	20	7	1	28
Jan 2024	40	3	0	43
Feb 2024	27	4	6	37
Mar 2024	30	7	8	45
Apr 2024	25	4	3	32
May 2024	21	3	5	29
Jun 2024	38	4	2	44
Jul 2024	43	4	7	54
Aug 2024	35	1	3	39
Sep 2024	26	9	1	36
Oct 2024	23	2	2	27
Nov 2024	27	3	1	31
Dec 2024	14	1	0	15
Jan 2025	22	2	0	24
Feb 2025	22	1	0	23
Mar 2025	23	7	1	31
Apr 2025	10	10	1	21
May 2025	21	3	2	26
Jun 2025	31	14	2	47
Jul 2025	20	7	3	30
Aug 2025	64	6	3	73
Sep 2025	295	11	1	307
Oct 2025	27	9	2	38

Viewed (geographical distribution)

Total article views: 2,573 (including HTML, PDF, and XML) Thereof 2,573 with geography defined and 0 with unknown origin.

Country	#	Views	%

Cited

Latest update: 29 Oct 2025

Download

Preprint (3377 KB)
Metadata XML

Short summary

We used three machine learning models and determined that Gaussian process regression (GPR) is best suited to interpolation of air temperature data for China. The GPR-derived results were compared with that of traditional interpolation techniques and existing datasets and it was found that the accuracy of the GPR-derived data was better. Finally, we generated a gridded monthly air temperature dataset with 1 km resolution and high accuracy for China (1951–2020) using the GPR model.


Total:	0
HTML:	0
PDF:	0
XML:	0

GPRChinaTemp1km: a high-resolution monthly air temperature dataset for China (1951–2020) based on machine learning

Supplement

Data sets

Viewed

Viewed (geographical distribution)

Cited

5 citations as recorded by crossref.