Deep-Learning-Based Harmonization and Super-Resolution of Near-Surface Air Temperature from CMIP6 Models (1850&ndash;2100)

Wei, Xikun; Wang, Guojie; Feng, Donghan; Duan, Zheng; Hagan, Daniel Fiifi Tawia; Tao, Liangliang; Miao, Lijuan; Su, Buda; Jiang, Tong

doi:10.5194/essd-2021-418

Preprints

https://doi.org/10.5194/essd-2021-418

Preprints

10 Dec 2021

| 10 Dec 2021

Status: this preprint has been withdrawn by the authors.

Deep-Learning-Based Harmonization and Super-Resolution of Near-Surface Air Temperature from CMIP6 Models (1850–2100)

Xikun Wei, Guojie Wang, Donghan Feng, Zheng Duan, Daniel Fiifi Tawia Hagan, Liangliang Tao, Lijuan Miao, Buda Su, and Tong Jiang

Abstract. Future global temperature change would have significant effects on society and ecosystems. Earth system models (ESM) are the primary tools to explore the future climate change. However, ESMs still exist great uncertainty and often run at a coarse spatial resolution (The majority of ESMs at about 2 degree). Accurate temperature data at high spatial resolution are needed to improve our understanding of the temperature variation and for many applications. We innovatively apply the deep-learning(DL) method from the Super resolution (SR) in the computer vision to merge 31 ESMs data and the proposed method can perform data merge, bias-correction and spatial-downscaling simultaneously. The SR algorithms are designed to enhance image quality and outperform much better than the traditional methods. The CRU TS (Climate Research Unit gridded Time Series) is considered as reference data in the model training process. In order to find a suitable DL method for our work, we choose five SR methodologies made by different structures. Those models are compared based on multiple evaluation metrics (Mean square error(MSE), mean absolute error(MAE) and Pearson correlation coefficient(R)) and the optimal model is selected and used to merge the monthly historical data during 1850–1900 and monthly future scenarios data (SSP1-2.6, SSP2-4.5, SSP3-7.0, SSP5-8.5) during 2015–2100 at the high spatial resolution of 0.5 degree. Results showed that the merged data have considerably improved performance than any of the individual ESM data and the ensemble mean (EM) of all ESM data in terms of both spatial and temporal aspects. The MAE displays a great improvement and the spatial distribution of the MAE become larger and larger along the latitudes in north hemisphere, presenting like a ‘tertiary class echelon’ condition. The merged product also presents excellent performance when the observation data is smooth with few fluctuations in time series. Additionally, this work proves that the DL model can be transferred to deal with the data merge, bias-correction and spatial-downscaling successfully when enough training data are available. Data can be accessed at https://doi.org/10.5281/zenodo.5746632 (Wei et al., 2021).

This preprint has been withdrawn.

Received: 20 Nov 2021 – Discussion started: 10 Dec 2021

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 1785 KB)

Withdrawal notice
This preprint has been withdrawn.
Preprint (1785 KB)

Supplement (1050 KB)

Download & links

This preprint has been withdrawn.

Xikun Wei, Guojie Wang, Donghan Feng, Zheng Duan, Daniel Fiifi Tawia Hagan, Liangliang Tao, Lijuan Miao, Buda Su, and Tong Jiang

Interactive discussion

Status: closed

RC1:
'Comment on essd-2021-418', Anonymous Referee #1, 07 Feb 2022
The manuscript uses five deep learning models to train CMIP6 air temperature to CRU TS gridded air temperature. The statistical evaluation shows that CMIP6 air temperature is improved. However, I have some concerns about the methodology. Machine learning models have been widely applied in many fields and can easily beat traditional methods and raw data inputs concerning statistical accuracy. I am not surprised that DL-based CMIP6 temperature is closer to CRU TS than raw CMIP6 temperature because raw models are not designed to approximate CRU TS. But the DL model design in this study has some problems. This makes me worry that the results are right for wrong reasons. Probably using simpler methods such as random forest or even linear regression can achieve the same improvement. Besides, the writing of the manuscript is a big problem. The manuscript needs extensive revision to be publishable.

The language of the manuscript needs substantial improvement. Reading the manuscript is a challenge for me. The manuscript needs thorough language revision to meet the standard of a publishable paper. I tried to list some examples of grammar errors, format problems, or awkward expressions, but failed because there are too many problems.

Temperature data from all models are interpolated to the 2° resolution using the bilinear interpolation, which is incorrect from my perspective. First, since air temperature shows a good relationship with elevation, temperature downscaling is often achieved using the temperature lapse rate method. You cannot simply use the bilinear method like for other variables such as precipitation. This method can cause notable bias in complex terrain. Second, the interpolation of model data will cause information loss. In other word, the interpolated air temperature is worse than the raw data due to my first question. Downscaling the temperature from 2° to 0.5° can show some improvement, but how to consider the contribution of information loss to this improvement is difficult. Therefore, the authors should use a more appropriate temperature downscaling method.

CRU TS uses limited stations to estimate air temperature. Most regions of the world only have sparse stations, and thus the quality of CRU TS is Different observation-based datasets can show large differences particularly in regions with complex topography and few stations. Sometimes, model outputs could be more reliable than interpolation-based datasets such as CRU TS. Training the model in those regions using CRU TS as the true value does not sound reliable to me.

The study divides the world into five parts to train the model (Figure 2). However, the division is problematic. EUR-AF contains Europe, Africa, and part of Asia. The three sub-regions are very different concerning their climate, area, and station density. Europe has dense stations and thus CRU TS has a good quality. In contrast, Africa has sparse stations and thus CRU TS has a low quality. If you train them as a whole, the quality in Europe could be degraded because most training samples are from Africa and part of Asia which are not reliable. Moreover, each part covers a broad domain (especially EUR-AF) with quite different climate conditions and temperature schemes, while training a model in such a large domain cannot consider those complexities. That’s why I say you may get right results for wrong reasons.

The introduction part is overlong and needs reorganization. Many contents introduce the background such as climate change and data which are widely known and do not have close relation to this study. In contrast, the deep learning techniques have been widely applied in Earth science fields recently and cover many variables including air temperature. Literature review of those studies is important but insufficient.

The introduction to the five DL models in 2.2.1 is too abstract. The authors should focus more on the implementation of DL models such as model structures, parameters, training and testing strategies. In short, the method part should ensure that readers are able to reproduce the work. Some contents in “3 Results and discussion” can be moved to the method part.

It probably better to adopt a traditional method as the benchmark in this study. The work shows results from five DL models, but readers can hardly know whether the improvement is good enough without comparing to a widely used method.

Some minor comments:

The manuscript states that compared to traditional methods, the new DL methods can do downscaling, bias correction, and merging as a whole, while traditional methods need to address them separately. There are two problems here. First, some traditional methods such geographically weighted regression do the same thing. Some traditional machine learning methods such as RF and ANN can also do this job. Second, black-box models makes data production much easier for researchers. However, this can sometimes block understanding of the world. Therefore, I recommend the authors revise some relevant descriptions in the manuscript.

There is no need to separate SPAEF and other metrics. The concept of SPAEF and similar metrics are widely used in many studies. Please merge 2.2.2.1 and 2.2.2.2.

How the 0.5-degree merged data is produced? I did not see a detailed description about how DL models realize this.

Please unify the usage of air temperature and surface temperature in this manuscript. They can be confusing to some readers. It is better to always use air temperature throughout the manuscript.

The authors claim the resolution of most ESMs are close to 2-degree. The word “most” is inappropriate to some extent considering some products in Table 1 have higher resolution. It is better to say that “most ESMs have a low resolution” than “most ESMs have a resolution close to 2 degree”
Citation: https://doi.org/10.5194/essd-2021-418-RC1
RC2:
'Comment on essd-2021-418', Anonymous Referee #2, 10 Feb 2022
General comments

This manuscript used five deep-learning (DL) methods to downscale 31 ESMs-simulated surface air temperature (SAT) from a coarse spatial resolution (~2 degree) to a higher spatial resolution 0.5 degree. However, the work is not innovative and no important and robust findings were obtained. The authors mechanically downscaled surface temperatures using 5 different DL methods, which have been widely applied in this field. The results do not convince me since this method highly depends on the training data, including sample numbers, spatial and temporal scales, etc., as presented in table 2. The only observed training data CRU TS also have large uncertainties, since it is derived from unevenly distributed stations, which may deliver wrong signals to the DL methods, especially at complex terrain areas. Furthermore, the SAT is highly dependent on local climate conditions, terrain factors, as well as large-scale atmospheric and local circulation. No such physical-based ancillary data were used in this study, which limited the further applications of produced 0.5 degree data, especially in the mountain regions. Therefore, I cannot recommend publishing this manuscript in ESSD at the current stage.

Specific comments

In Abstract, the authors claim “The SR algorithms are designed to enhance image quality and outperform much better than the traditional methods.” The authors did not use any traditional methods, so this conclusion is not evidentially supported. Furthermore, what do the “traditional methods” mean?

The authors concluded a ‘tertiary class echelon’ condition based on MAE. Can this conclusion be supported by RMSE or R, as both are used to assess the errors in this manuscript?

Some abbreviations are not defined, what is ECM in the Introduction?

The authors claim the “traditional SR methods”. What are traditional SR methods? What is the difference between traditional and non-traditional methods?

At 2.1.1 section, 80% CRU TS (1901-1992) was assigned as training data and the rest 20% (1993-2014) was assigned as validating datasets. This is not appropriate. Because the short-term validate data may be highly influenced by climate variability.

All the ESMs outputs were interpolated at 2 degree may bring new errors. For this reason, the poor performance of downscaled 0.5 degree data may originate from this step.

For the applications of 5 DL methods. How many parameters? How to tune these parameters? More information should be given for the methods.
Citation: https://doi.org/10.5194/essd-2021-418-RC2

Interactive discussion

Status: closed

RC1:
'Comment on essd-2021-418', Anonymous Referee #1, 07 Feb 2022
The manuscript uses five deep learning models to train CMIP6 air temperature to CRU TS gridded air temperature. The statistical evaluation shows that CMIP6 air temperature is improved. However, I have some concerns about the methodology. Machine learning models have been widely applied in many fields and can easily beat traditional methods and raw data inputs concerning statistical accuracy. I am not surprised that DL-based CMIP6 temperature is closer to CRU TS than raw CMIP6 temperature because raw models are not designed to approximate CRU TS. But the DL model design in this study has some problems. This makes me worry that the results are right for wrong reasons. Probably using simpler methods such as random forest or even linear regression can achieve the same improvement. Besides, the writing of the manuscript is a big problem. The manuscript needs extensive revision to be publishable.

The language of the manuscript needs substantial improvement. Reading the manuscript is a challenge for me. The manuscript needs thorough language revision to meet the standard of a publishable paper. I tried to list some examples of grammar errors, format problems, or awkward expressions, but failed because there are too many problems.

Temperature data from all models are interpolated to the 2° resolution using the bilinear interpolation, which is incorrect from my perspective. First, since air temperature shows a good relationship with elevation, temperature downscaling is often achieved using the temperature lapse rate method. You cannot simply use the bilinear method like for other variables such as precipitation. This method can cause notable bias in complex terrain. Second, the interpolation of model data will cause information loss. In other word, the interpolated air temperature is worse than the raw data due to my first question. Downscaling the temperature from 2° to 0.5° can show some improvement, but how to consider the contribution of information loss to this improvement is difficult. Therefore, the authors should use a more appropriate temperature downscaling method.

CRU TS uses limited stations to estimate air temperature. Most regions of the world only have sparse stations, and thus the quality of CRU TS is Different observation-based datasets can show large differences particularly in regions with complex topography and few stations. Sometimes, model outputs could be more reliable than interpolation-based datasets such as CRU TS. Training the model in those regions using CRU TS as the true value does not sound reliable to me.

The study divides the world into five parts to train the model (Figure 2). However, the division is problematic. EUR-AF contains Europe, Africa, and part of Asia. The three sub-regions are very different concerning their climate, area, and station density. Europe has dense stations and thus CRU TS has a good quality. In contrast, Africa has sparse stations and thus CRU TS has a low quality. If you train them as a whole, the quality in Europe could be degraded because most training samples are from Africa and part of Asia which are not reliable. Moreover, each part covers a broad domain (especially EUR-AF) with quite different climate conditions and temperature schemes, while training a model in such a large domain cannot consider those complexities. That’s why I say you may get right results for wrong reasons.

The introduction part is overlong and needs reorganization. Many contents introduce the background such as climate change and data which are widely known and do not have close relation to this study. In contrast, the deep learning techniques have been widely applied in Earth science fields recently and cover many variables including air temperature. Literature review of those studies is important but insufficient.

The introduction to the five DL models in 2.2.1 is too abstract. The authors should focus more on the implementation of DL models such as model structures, parameters, training and testing strategies. In short, the method part should ensure that readers are able to reproduce the work. Some contents in “3 Results and discussion” can be moved to the method part.

It probably better to adopt a traditional method as the benchmark in this study. The work shows results from five DL models, but readers can hardly know whether the improvement is good enough without comparing to a widely used method.

Some minor comments:

The manuscript states that compared to traditional methods, the new DL methods can do downscaling, bias correction, and merging as a whole, while traditional methods need to address them separately. There are two problems here. First, some traditional methods such geographically weighted regression do the same thing. Some traditional machine learning methods such as RF and ANN can also do this job. Second, black-box models makes data production much easier for researchers. However, this can sometimes block understanding of the world. Therefore, I recommend the authors revise some relevant descriptions in the manuscript.

There is no need to separate SPAEF and other metrics. The concept of SPAEF and similar metrics are widely used in many studies. Please merge 2.2.2.1 and 2.2.2.2.

How the 0.5-degree merged data is produced? I did not see a detailed description about how DL models realize this.

Please unify the usage of air temperature and surface temperature in this manuscript. They can be confusing to some readers. It is better to always use air temperature throughout the manuscript.

The authors claim the resolution of most ESMs are close to 2-degree. The word “most” is inappropriate to some extent considering some products in Table 1 have higher resolution. It is better to say that “most ESMs have a low resolution” than “most ESMs have a resolution close to 2 degree”
Citation: https://doi.org/10.5194/essd-2021-418-RC1
RC2:
'Comment on essd-2021-418', Anonymous Referee #2, 10 Feb 2022
General comments

This manuscript used five deep-learning (DL) methods to downscale 31 ESMs-simulated surface air temperature (SAT) from a coarse spatial resolution (~2 degree) to a higher spatial resolution 0.5 degree. However, the work is not innovative and no important and robust findings were obtained. The authors mechanically downscaled surface temperatures using 5 different DL methods, which have been widely applied in this field. The results do not convince me since this method highly depends on the training data, including sample numbers, spatial and temporal scales, etc., as presented in table 2. The only observed training data CRU TS also have large uncertainties, since it is derived from unevenly distributed stations, which may deliver wrong signals to the DL methods, especially at complex terrain areas. Furthermore, the SAT is highly dependent on local climate conditions, terrain factors, as well as large-scale atmospheric and local circulation. No such physical-based ancillary data were used in this study, which limited the further applications of produced 0.5 degree data, especially in the mountain regions. Therefore, I cannot recommend publishing this manuscript in ESSD at the current stage.

Specific comments

In Abstract, the authors claim “The SR algorithms are designed to enhance image quality and outperform much better than the traditional methods.” The authors did not use any traditional methods, so this conclusion is not evidentially supported. Furthermore, what do the “traditional methods” mean?

The authors concluded a ‘tertiary class echelon’ condition based on MAE. Can this conclusion be supported by RMSE or R, as both are used to assess the errors in this manuscript?

Some abbreviations are not defined, what is ECM in the Introduction?

The authors claim the “traditional SR methods”. What are traditional SR methods? What is the difference between traditional and non-traditional methods?

At 2.1.1 section, 80% CRU TS (1901-1992) was assigned as training data and the rest 20% (1993-2014) was assigned as validating datasets. This is not appropriate. Because the short-term validate data may be highly influenced by climate variability.

All the ESMs outputs were interpolated at 2 degree may bring new errors. For this reason, the poor performance of downscaled 0.5 degree data may originate from this step.

For the applications of 5 DL methods. How many parameters? How to tune these parameters? More information should be given for the methods.
Citation: https://doi.org/10.5194/essd-2021-418-RC2

Xikun Wei, Guojie Wang, Donghan Feng, Zheng Duan, Daniel Fiifi Tawia Hagan, Liangliang Tao, Lijuan Miao, Buda Su, and Tong Jiang

Supplement

https://doi.org/10.5194/essd-2021-418-supplement

Data sets

Deep-Learning-Based Harmonization and Super-Resolution of Near-Surface Air Temperature from CMIP6 Models (1850-2100) Xikun Wei, Guojie Wang, Donghan Feng, Zheng Duan,Daniel Fiifi Tawia Hagan , Liangliang Tao, Lijuan Miao, Buda Su, Jiang Tong https://doi.org/10.5281/zenodo.5746632

Xikun Wei, Guojie Wang, Donghan Feng, Zheng Duan, Daniel Fiifi Tawia Hagan, Liangliang Tao, Lijuan Miao, Buda Su, and Tong Jiang

Viewed

Total article views: 2,424 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
1,761	558	105	2,424	217	110	175

HTML: 1,761
PDF: 558
XML: 105
Total: 2,424
Supplement: 217
BibTeX: 110
EndNote: 175

Views and downloads (calculated since 10 Dec 2021)

Month	HTML	PDF	XML	Total
Dec 2021	216	51	2	269
Jan 2022	69	26	3	98
Feb 2022	117	32	6	155
Mar 2022	58	20	1	79
Apr 2022	47	12	2	61
May 2022	33	12	1	46
Jun 2022	21	13	3	37
Jul 2022	18	6	0	24
Aug 2022	24	29	2	55
Sep 2022	23	2	0	25
Oct 2022	22	4	3	29
Nov 2022	15	9	1	25
Dec 2022	7	7	0	14
Jan 2023	7	17	0	24
Feb 2023	22	5	9	36
Mar 2023	12	3	0	15
Apr 2023	14	5	0	19
May 2023	7	4	0	11
Jun 2023	13	7	3	23
Jul 2023	18	11	0	29
Aug 2023	15	5	0	20
Sep 2023	25	10	2	37
Oct 2023	20	5	1	26
Nov 2023	16	1	0	17
Dec 2023	32	10	2	44
Jan 2024	25	1	0	26
Feb 2024	18	9	4	31
Mar 2024	13	7	1	21
Apr 2024	20	5	6	31
May 2024	16	4	10	30
Jun 2024	8	5	1	14
Jul 2024	12	4	6	22
Aug 2024	21	7	4	32
Sep 2024	8	7	0	15
Oct 2024	10	8	1	19
Nov 2024	8	3	0	11
Dec 2024	20	3	0	23
Jan 2025	15	5	1	21
Feb 2025	19	6	1	26
Mar 2025	19	2	2	23
Apr 2025	13	8	2	23
May 2025	20	4	2	26
Jun 2025	12	18	0	30
Jul 2025	16	23	4	43
Aug 2025	89	16	0	105
Sep 2025	279	6	2	287
Oct 2025	19	18	2	39
Nov 2025	54	26	2	82
Dec 2025	25	12	1	38
Jan 2026	82	9	4	95
Feb 2026	29	24	3	56
Mar 2026	20	12	5	37

Cumulative views and downloads (calculated since 10 Dec 2021)

Month	HTML	PDF	XML	Total
Dec 2021	216	51	2	269
Jan 2022	69	26	3	98
Feb 2022	117	32	6	155
Mar 2022	58	20	1	79
Apr 2022	47	12	2	61
May 2022	33	12	1	46
Jun 2022	21	13	3	37
Jul 2022	18	6	0	24
Aug 2022	24	29	2	55
Sep 2022	23	2	0	25
Oct 2022	22	4	3	29
Nov 2022	15	9	1	25
Dec 2022	7	7	0	14
Jan 2023	7	17	0	24
Feb 2023	22	5	9	36
Mar 2023	12	3	0	15
Apr 2023	14	5	0	19
May 2023	7	4	0	11
Jun 2023	13	7	3	23
Jul 2023	18	11	0	29
Aug 2023	15	5	0	20
Sep 2023	25	10	2	37
Oct 2023	20	5	1	26
Nov 2023	16	1	0	17
Dec 2023	32	10	2	44
Jan 2024	25	1	0	26
Feb 2024	18	9	4	31
Mar 2024	13	7	1	21
Apr 2024	20	5	6	31
May 2024	16	4	10	30
Jun 2024	8	5	1	14
Jul 2024	12	4	6	22
Aug 2024	21	7	4	32
Sep 2024	8	7	0	15
Oct 2024	10	8	1	19
Nov 2024	8	3	0	11
Dec 2024	20	3	0	23
Jan 2025	15	5	1	21
Feb 2025	19	6	1	26
Mar 2025	19	2	2	23
Apr 2025	13	8	2	23
May 2025	20	4	2	26
Jun 2025	12	18	0	30
Jul 2025	16	23	4	43
Aug 2025	89	16	0	105
Sep 2025	279	6	2	287
Oct 2025	19	18	2	39
Nov 2025	54	26	2	82
Dec 2025	25	12	1	38
Jan 2026	82	9	4	95
Feb 2026	29	24	3	56
Mar 2026	20	12	5	37

Viewed (geographical distribution)

Total article views: 2,351 (including HTML, PDF, and XML) Thereof 2,351 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 14 Mar 2026

Download

This preprint has been withdrawn.

Preprint (1785 KB)
Metadata XML

Short summary

In this study, we use the deep learning (DL) method to generate the temperature data for the global land (except Antartica) at higher spatial resolution (0.5 degree) based on 31 different CMIP6 Earth system model(ESM). Our methods can perform bias correction, spatial downscaling and data merging simultaneously. The merged data have a remarkably better quality compared with the individual ESMs in terms of both spatial dimension and time dimension.


Total:	0
HTML:	0
PDF:	0
XML:	0