the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Deep-Learning-Based Harmonization and Super-Resolution of Near-Surface Air Temperature from CMIP6 Models (1850–2100)
Abstract. Future global temperature change would have significant effects on society and ecosystems. Earth system models (ESM) are the primary tools to explore the future climate change. However, ESMs still exist great uncertainty and often run at a coarse spatial resolution (The majority of ESMs at about 2 degree). Accurate temperature data at high spatial resolution are needed to improve our understanding of the temperature variation and for many applications. We innovatively apply the deep-learning(DL) method from the Super resolution (SR) in the computer vision to merge 31 ESMs data and the proposed method can perform data merge, bias-correction and spatial-downscaling simultaneously. The SR algorithms are designed to enhance image quality and outperform much better than the traditional methods. The CRU TS (Climate Research Unit gridded Time Series) is considered as reference data in the model training process. In order to find a suitable DL method for our work, we choose five SR methodologies made by different structures. Those models are compared based on multiple evaluation metrics (Mean square error(MSE), mean absolute error(MAE) and Pearson correlation coefficient(R)) and the optimal model is selected and used to merge the monthly historical data during 1850–1900 and monthly future scenarios data (SSP1-2.6, SSP2-4.5, SSP3-7.0, SSP5-8.5) during 2015–2100 at the high spatial resolution of 0.5 degree. Results showed that the merged data have considerably improved performance than any of the individual ESM data and the ensemble mean (EM) of all ESM data in terms of both spatial and temporal aspects. The MAE displays a great improvement and the spatial distribution of the MAE become larger and larger along the latitudes in north hemisphere, presenting like a ‘tertiary class echelon’ condition. The merged product also presents excellent performance when the observation data is smooth with few fluctuations in time series. Additionally, this work proves that the DL model can be transferred to deal with the data merge, bias-correction and spatial-downscaling successfully when enough training data are available. Data can be accessed at https://doi.org/10.5281/zenodo.5746632 (Wei et al., 2021).
This preprint has been withdrawn.
-
Withdrawal notice
This preprint has been withdrawn.
-
Preprint
(1785 KB)
-
Supplement
(1050 KB)
-
This preprint has been withdrawn.
- Preprint
(1785 KB) - Metadata XML
-
Supplement
(1050 KB) - BibTeX
- EndNote
Interactive discussion
Status: closed
-
RC1: 'Comment on essd-2021-418', Anonymous Referee #1, 07 Feb 2022
The manuscript uses five deep learning models to train CMIP6 air temperature to CRU TS gridded air temperature. The statistical evaluation shows that CMIP6 air temperature is improved. However, I have some concerns about the methodology. Machine learning models have been widely applied in many fields and can easily beat traditional methods and raw data inputs concerning statistical accuracy. I am not surprised that DL-based CMIP6 temperature is closer to CRU TS than raw CMIP6 temperature because raw models are not designed to approximate CRU TS. But the DL model design in this study has some problems. This makes me worry that the results are right for wrong reasons. Probably using simpler methods such as random forest or even linear regression can achieve the same improvement. Besides, the writing of the manuscript is a big problem. The manuscript needs extensive revision to be publishable.
- The language of the manuscript needs substantial improvement. Reading the manuscript is a challenge for me. The manuscript needs thorough language revision to meet the standard of a publishable paper. I tried to list some examples of grammar errors, format problems, or awkward expressions, but failed because there are too many problems.
- Temperature data from all models are interpolated to the 2° resolution using the bilinear interpolation, which is incorrect from my perspective. First, since air temperature shows a good relationship with elevation, temperature downscaling is often achieved using the temperature lapse rate method. You cannot simply use the bilinear method like for other variables such as precipitation. This method can cause notable bias in complex terrain. Second, the interpolation of model data will cause information loss. In other word, the interpolated air temperature is worse than the raw data due to my first question. Downscaling the temperature from 2° to 0.5° can show some improvement, but how to consider the contribution of information loss to this improvement is difficult. Therefore, the authors should use a more appropriate temperature downscaling method.
- CRU TS uses limited stations to estimate air temperature. Most regions of the world only have sparse stations, and thus the quality of CRU TS is Different observation-based datasets can show large differences particularly in regions with complex topography and few stations. Sometimes, model outputs could be more reliable than interpolation-based datasets such as CRU TS. Training the model in those regions using CRU TS as the true value does not sound reliable to me.
- The study divides the world into five parts to train the model (Figure 2). However, the division is problematic. EUR-AF contains Europe, Africa, and part of Asia. The three sub-regions are very different concerning their climate, area, and station density. Europe has dense stations and thus CRU TS has a good quality. In contrast, Africa has sparse stations and thus CRU TS has a low quality. If you train them as a whole, the quality in Europe could be degraded because most training samples are from Africa and part of Asia which are not reliable. Moreover, each part covers a broad domain (especially EUR-AF) with quite different climate conditions and temperature schemes, while training a model in such a large domain cannot consider those complexities. That’s why I say you may get right results for wrong reasons.
- The introduction part is overlong and needs reorganization. Many contents introduce the background such as climate change and data which are widely known and do not have close relation to this study. In contrast, the deep learning techniques have been widely applied in Earth science fields recently and cover many variables including air temperature. Literature review of those studies is important but insufficient.
- The introduction to the five DL models in 2.2.1 is too abstract. The authors should focus more on the implementation of DL models such as model structures, parameters, training and testing strategies. In short, the method part should ensure that readers are able to reproduce the work. Some contents in “3 Results and discussion” can be moved to the method part.
- It probably better to adopt a traditional method as the benchmark in this study. The work shows results from five DL models, but readers can hardly know whether the improvement is good enough without comparing to a widely used method.
Some minor comments:
- The manuscript states that compared to traditional methods, the new DL methods can do downscaling, bias correction, and merging as a whole, while traditional methods need to address them separately. There are two problems here. First, some traditional methods such geographically weighted regression do the same thing. Some traditional machine learning methods such as RF and ANN can also do this job. Second, black-box models makes data production much easier for researchers. However, this can sometimes block understanding of the world. Therefore, I recommend the authors revise some relevant descriptions in the manuscript.
- There is no need to separate SPAEF and other metrics. The concept of SPAEF and similar metrics are widely used in many studies. Please merge 2.2.2.1 and 2.2.2.2.
- How the 0.5-degree merged data is produced? I did not see a detailed description about how DL models realize this.
- Please unify the usage of air temperature and surface temperature in this manuscript. They can be confusing to some readers. It is better to always use air temperature throughout the manuscript.
- The authors claim the resolution of most ESMs are close to 2-degree. The word “most” is inappropriate to some extent considering some products in Table 1 have higher resolution. It is better to say that “most ESMs have a low resolution” than “most ESMs have a resolution close to 2 degree”
Citation: https://doi.org/10.5194/essd-2021-418-RC1 -
RC2: 'Comment on essd-2021-418', Anonymous Referee #2, 10 Feb 2022
General comments
This manuscript used five deep-learning (DL) methods to downscale 31 ESMs-simulated surface air temperature (SAT) from a coarse spatial resolution (~2 degree) to a higher spatial resolution 0.5 degree. However, the work is not innovative and no important and robust findings were obtained. The authors mechanically downscaled surface temperatures using 5 different DL methods, which have been widely applied in this field. The results do not convince me since this method highly depends on the training data, including sample numbers, spatial and temporal scales, etc., as presented in table 2. The only observed training data CRU TS also have large uncertainties, since it is derived from unevenly distributed stations, which may deliver wrong signals to the DL methods, especially at complex terrain areas. Furthermore, the SAT is highly dependent on local climate conditions, terrain factors, as well as large-scale atmospheric and local circulation. No such physical-based ancillary data were used in this study, which limited the further applications of produced 0.5 degree data, especially in the mountain regions. Therefore, I cannot recommend publishing this manuscript in ESSD at the current stage.
Specific comments
- In Abstract, the authors claim “The SR algorithms are designed to enhance image quality and outperform much better than the traditional methods.” The authors did not use any traditional methods, so this conclusion is not evidentially supported. Furthermore, what do the “traditional methods” mean?
- The authors concluded a ‘tertiary class echelon’ condition based on MAE. Can this conclusion be supported by RMSE or R, as both are used to assess the errors in this manuscript?
- Some abbreviations are not defined, what is ECM in the Introduction?
- The authors claim the “traditional SR methods”. What are traditional SR methods? What is the difference between traditional and non-traditional methods?
- At 2.1.1 section, 80% CRU TS (1901-1992) was assigned as training data and the rest 20% (1993-2014) was assigned as validating datasets. This is not appropriate. Because the short-term validate data may be highly influenced by climate variability.
- All the ESMs outputs were interpolated at 2 degree may bring new errors. For this reason, the poor performance of downscaled 0.5 degree data may originate from this step.
- For the applications of 5 DL methods. How many parameters? How to tune these parameters? More information should be given for the methods.
Citation: https://doi.org/10.5194/essd-2021-418-RC2
Interactive discussion
Status: closed
-
RC1: 'Comment on essd-2021-418', Anonymous Referee #1, 07 Feb 2022
The manuscript uses five deep learning models to train CMIP6 air temperature to CRU TS gridded air temperature. The statistical evaluation shows that CMIP6 air temperature is improved. However, I have some concerns about the methodology. Machine learning models have been widely applied in many fields and can easily beat traditional methods and raw data inputs concerning statistical accuracy. I am not surprised that DL-based CMIP6 temperature is closer to CRU TS than raw CMIP6 temperature because raw models are not designed to approximate CRU TS. But the DL model design in this study has some problems. This makes me worry that the results are right for wrong reasons. Probably using simpler methods such as random forest or even linear regression can achieve the same improvement. Besides, the writing of the manuscript is a big problem. The manuscript needs extensive revision to be publishable.
- The language of the manuscript needs substantial improvement. Reading the manuscript is a challenge for me. The manuscript needs thorough language revision to meet the standard of a publishable paper. I tried to list some examples of grammar errors, format problems, or awkward expressions, but failed because there are too many problems.
- Temperature data from all models are interpolated to the 2° resolution using the bilinear interpolation, which is incorrect from my perspective. First, since air temperature shows a good relationship with elevation, temperature downscaling is often achieved using the temperature lapse rate method. You cannot simply use the bilinear method like for other variables such as precipitation. This method can cause notable bias in complex terrain. Second, the interpolation of model data will cause information loss. In other word, the interpolated air temperature is worse than the raw data due to my first question. Downscaling the temperature from 2° to 0.5° can show some improvement, but how to consider the contribution of information loss to this improvement is difficult. Therefore, the authors should use a more appropriate temperature downscaling method.
- CRU TS uses limited stations to estimate air temperature. Most regions of the world only have sparse stations, and thus the quality of CRU TS is Different observation-based datasets can show large differences particularly in regions with complex topography and few stations. Sometimes, model outputs could be more reliable than interpolation-based datasets such as CRU TS. Training the model in those regions using CRU TS as the true value does not sound reliable to me.
- The study divides the world into five parts to train the model (Figure 2). However, the division is problematic. EUR-AF contains Europe, Africa, and part of Asia. The three sub-regions are very different concerning their climate, area, and station density. Europe has dense stations and thus CRU TS has a good quality. In contrast, Africa has sparse stations and thus CRU TS has a low quality. If you train them as a whole, the quality in Europe could be degraded because most training samples are from Africa and part of Asia which are not reliable. Moreover, each part covers a broad domain (especially EUR-AF) with quite different climate conditions and temperature schemes, while training a model in such a large domain cannot consider those complexities. That’s why I say you may get right results for wrong reasons.
- The introduction part is overlong and needs reorganization. Many contents introduce the background such as climate change and data which are widely known and do not have close relation to this study. In contrast, the deep learning techniques have been widely applied in Earth science fields recently and cover many variables including air temperature. Literature review of those studies is important but insufficient.
- The introduction to the five DL models in 2.2.1 is too abstract. The authors should focus more on the implementation of DL models such as model structures, parameters, training and testing strategies. In short, the method part should ensure that readers are able to reproduce the work. Some contents in “3 Results and discussion” can be moved to the method part.
- It probably better to adopt a traditional method as the benchmark in this study. The work shows results from five DL models, but readers can hardly know whether the improvement is good enough without comparing to a widely used method.
Some minor comments:
- The manuscript states that compared to traditional methods, the new DL methods can do downscaling, bias correction, and merging as a whole, while traditional methods need to address them separately. There are two problems here. First, some traditional methods such geographically weighted regression do the same thing. Some traditional machine learning methods such as RF and ANN can also do this job. Second, black-box models makes data production much easier for researchers. However, this can sometimes block understanding of the world. Therefore, I recommend the authors revise some relevant descriptions in the manuscript.
- There is no need to separate SPAEF and other metrics. The concept of SPAEF and similar metrics are widely used in many studies. Please merge 2.2.2.1 and 2.2.2.2.
- How the 0.5-degree merged data is produced? I did not see a detailed description about how DL models realize this.
- Please unify the usage of air temperature and surface temperature in this manuscript. They can be confusing to some readers. It is better to always use air temperature throughout the manuscript.
- The authors claim the resolution of most ESMs are close to 2-degree. The word “most” is inappropriate to some extent considering some products in Table 1 have higher resolution. It is better to say that “most ESMs have a low resolution” than “most ESMs have a resolution close to 2 degree”
Citation: https://doi.org/10.5194/essd-2021-418-RC1 -
RC2: 'Comment on essd-2021-418', Anonymous Referee #2, 10 Feb 2022
General comments
This manuscript used five deep-learning (DL) methods to downscale 31 ESMs-simulated surface air temperature (SAT) from a coarse spatial resolution (~2 degree) to a higher spatial resolution 0.5 degree. However, the work is not innovative and no important and robust findings were obtained. The authors mechanically downscaled surface temperatures using 5 different DL methods, which have been widely applied in this field. The results do not convince me since this method highly depends on the training data, including sample numbers, spatial and temporal scales, etc., as presented in table 2. The only observed training data CRU TS also have large uncertainties, since it is derived from unevenly distributed stations, which may deliver wrong signals to the DL methods, especially at complex terrain areas. Furthermore, the SAT is highly dependent on local climate conditions, terrain factors, as well as large-scale atmospheric and local circulation. No such physical-based ancillary data were used in this study, which limited the further applications of produced 0.5 degree data, especially in the mountain regions. Therefore, I cannot recommend publishing this manuscript in ESSD at the current stage.
Specific comments
- In Abstract, the authors claim “The SR algorithms are designed to enhance image quality and outperform much better than the traditional methods.” The authors did not use any traditional methods, so this conclusion is not evidentially supported. Furthermore, what do the “traditional methods” mean?
- The authors concluded a ‘tertiary class echelon’ condition based on MAE. Can this conclusion be supported by RMSE or R, as both are used to assess the errors in this manuscript?
- Some abbreviations are not defined, what is ECM in the Introduction?
- The authors claim the “traditional SR methods”. What are traditional SR methods? What is the difference between traditional and non-traditional methods?
- At 2.1.1 section, 80% CRU TS (1901-1992) was assigned as training data and the rest 20% (1993-2014) was assigned as validating datasets. This is not appropriate. Because the short-term validate data may be highly influenced by climate variability.
- All the ESMs outputs were interpolated at 2 degree may bring new errors. For this reason, the poor performance of downscaled 0.5 degree data may originate from this step.
- For the applications of 5 DL methods. How many parameters? How to tune these parameters? More information should be given for the methods.
Citation: https://doi.org/10.5194/essd-2021-418-RC2
Data sets
Deep-Learning-Based Harmonization and Super-Resolution of Near-Surface Air Temperature from CMIP6 Models (1850-2100) Xikun Wei, Guojie Wang, Donghan Feng, Zheng Duan,Daniel Fiifi Tawia Hagan , Liangliang Tao, Lijuan Miao, Buda Su, Jiang Tong https://doi.org/10.5281/zenodo.5746632
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
1,027 | 365 | 74 | 1,466 | 106 | 55 | 73 |
- HTML: 1,027
- PDF: 365
- XML: 74
- Total: 1,466
- Supplement: 106
- BibTeX: 55
- EndNote: 73
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Xikun Wei
Guojie Wang
Donghan Feng
Zheng Duan
Daniel Fiifi Tawia Hagan
Liangliang Tao
Lijuan Miao
Buda Su
Tong Jiang
This preprint has been withdrawn.
- Preprint
(1785 KB) - Metadata XML
-
Supplement
(1050 KB) - BibTeX
- EndNote