the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
An AI-Driven Reconstruction of Global Surface Temperature with Emphasis on Refining the Antarctic Record
Abstract. Accurate estimates of long-term surface temperature (ST) changes are fundamental not only for assessing observed warming, but also for improving the reliability of future climate projections. However, substantial missing information in global ST datasets, remains a major source of uncertainty in estimating global or regional temperature changes. Recent advances in artificial intelligence (AI) have promoted the effective application of deep learning approaches, such as image inpainting and transfer learning, in reconstructing incomplete geophysical datasets. In this study, partial convolutional neural network (PConv) models were trained using the 20CR reanalysis data and CMIP6 climate model outputs as training samples, with the aim of achieving a proper reconstruction of the global surface temperature dataset. To address differences among existing sea surface temperature (SST) datasets, we reconstruct global monthly ST fields since 1850 by merging the China global Land Surface Air Temperature (C-LSAT2.1) dataset with Extended Reconstructed Sea Surface Temperature (ERSSTv6) dataset and Met Office Hadley Centre's sea surface temperature (HadSST4) dataset, respectively. Although both reconstructions reliably reproduce large-scale spatial patterns and long-term variations, the merge of C-LSAT2.1 with HadSST4 exhibits greater physical consistency and is therefore adopted as our preferred reconstruction. In particular, validation against station observations indicates that the reconstructions perform well over the Antarctica after 1961, where observational coverage is extremely sparse. Based on this framework, we developed the China global Artificial Intelligence Reconstructed Surface Temperature20CR/CMIP6 (C-AIRSTR/M) datasets, providing spatially complete global monthly ST anomaly reconstructions since 1850 with a spatial resolution of 5° × 2.5°. These datasets offer improved support for extending long-term climate records and for applications in polar climate assessment, as well as in climate monitoring, detection, and attribution studies. The C-AIRSTR/M datasets can be downloaded at https://doi.org/10.6084/m9.figshare.30663797.v1 (Ouyang et al., 2025). They are also available from http://www.gwpu.net/en/h-col-103.html (last access: 21 November 2025).
Competing interests: At least one of the (co-)authors is a member of the editorial board of Earth System Science Data.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.- Preprint
(3724 KB) - Metadata XML
-
Supplement
(4784 KB) - BibTeX
- EndNote
Status: open (until 22 Apr 2026)
-
CC1: 'Comment on essd-2025-717', David Bromwich, 24 Jan 2026
reply
-
AC1: 'Reply on CC1', Ouyang Chenxi, 02 Feb 2026
reply
Dear Professor David Bromwich,
Thank you very much for your interest in our study and for carefully reviewing our manuscript, and for providing important suggestions regarding the station records in Fig. S1 and the explanation of the anomalous temperature trends in ERA5 over Antarctica.
The early starting dates in Fig. S1 arise from the way the station time series were constructed for the purpose of defining the climatological reference period, rather than from the use of actual observations prior to station deployment. Specifically, part of the station data used in this study was taken from the GHCNm dataset, including both QCF and QFE data. The QFE records are statistically infilled using the Pairwise Homogenization Algorithm (PHA), and many stations therefore provide continuous values over the 1961–2010 period (Williams et al., 2012).
We adopted GHCNm data because surface air temperature observations in Antarctica are extremely sparse, particularly before the satellite era. To maximize spatial representativeness, we deliberately incorporated all available homogenized and reconstructed information, rather than relying only on the limited raw in situ observations.
Our intention in using these records was to maximize station coverage when calculating the climatological mean (1961–1990) and to provide the reconstruction model with a more spatially complete initial data. Consequently, Fig. S1 displays the full station series as used in the reconstruction framework, including statistically reconstructed (QFE) values, which explains why many station series appear to start around 1960.
We would like to emphasize that these early segments do not represent direct in situ observations before the actual station installation dates, but rather statistically reconstructed values. They were included deliberately as part of the input data to improve spatial representativeness, given that the reconstruction performance depends strongly on the availability of such initial fields.
Fig. S1, as currently presented, may be misleading in this respect, and we thank you for pointing this out. In the revised version, we will clarify the data sources and explicitly distinguish between actual observations and statistically reconstructed values in both the figure caption and the text.
We thank you for drawing our attention to the recent study by Bromwich et al. (2024), which is highly relevant to the interpretation of the ERA5 Antarctic temperature trends discussed in our manuscript. This study provides important evidence that the pronounced warming in ERA5 over Antarctica prior to 1979 is likely exaggerated, largely due to a cold bias in the ECMWF forecast model that could not be adequately constrained by the very limited observational coverage and early satellite data assimilation over the Southern Ocean. After 1979, the anomalous warming trends in ERA5 along the coastal regions near 0°E, the Ronne Ice Shelf, and Marie Byrd Land further amplify its overall temperature trend.
We agree that these findings provide valuable context for understanding the limitations of ERA5 in the Antarctic region. In the revised manuscript, we will cite Bromwich et al. (2024) and expand the discussion to clarify that ERA5 temperature trends over Antarctica—particularly prior to the satellite era and in certain coastal sectors—should be interpreted with caution.
We sincerely thank you again for these constructive suggestions, which have greatly contributed to improving the quality of our study.
Sincerely,
Chenxi Ouyang Feb. 2, 2026
Reference:
Bromwich D., Ensign A., Wang S and Zou X.: Major Artifacts in ERA5 2-m Air Temperature Trends Over Antarctica Prior to and During the Modern Satellite Era. Geophys. Res. Lett., 51(21), https://doi.org/10.1029/2024GL111907, 2024.
Williams, C. N., M. J. Menne, and J. H. Lawrimore, 2012: Modifications to the Pairwise Homogeneity Adjustment (PHA) software to address coding errors and improve run-time efficiency. NOAA National Climatic Data Center Tech. Rep. GHCNM-12-02
Citation: https://doi.org/10.5194/essd-2025-717-AC1
-
AC1: 'Reply on CC1', Ouyang Chenxi, 02 Feb 2026
reply
-
CC2: 'Comment on essd-2025-717', David Bromwich, 01 Feb 2026
reply
Continuing my commentary of Jan. 24.
A reconstruction such as yours depends on the data fed into it.
Fig. S1 is the particular concern. A quick count indicates that at least half of the station observations are
wrongly depicted as starting around 1960. To give some related examples:
Concordia staffed station started in 2005. Dome C AWS started in 1980. Dome C II AWS started in ~ 1995.
Early AWS on the Ross Ice Shelf - Elaine, Lettau, Gill, Schwerdtfeger did not start until ~ 1985.
I trust that this situation is just a plotting mistake and these records as depicted were not used
to reconstruct the near surface temperatures.
Citation: https://doi.org/10.5194/essd-2025-717-CC2 -
AC2: 'Reply on CC2', Ouyang Chenxi, 02 Feb 2026
reply
Dear Professor David Bromwich,
We sincerely thank you for your additional comments and for highlighting this important issue concerning Fig. S1. We also greatly appreciate the opportunity to clarify this issue.
The issue regarding Fig. S1 has been explained in detail in our response to CC1.
Thank you again for your constructive comment, which helps us improve the clarity and transparency of our work.
Sincerely,
Chenxi Ouyang Feb. 2, 2026
Citation: https://doi.org/10.5194/essd-2025-717-AC2
-
AC2: 'Reply on CC2', Ouyang Chenxi, 02 Feb 2026
reply
-
RC1: 'Comment on essd-2025-717', Anonymous Referee #1, 22 Mar 2026
reply
In this manuscript, the authors apply deep learning to reconstruct historical global surface temperature fields, with a specific focus on Antarctic climate dynamics. While this topic is relevant to research on climate change impacts and attribution, I have several concerns regarding the methodology, the robustness of the validation, and the demonstrated utility and uniqueness of the final products. Additionally, various statements throughout the text require further clarification. Please find my detailed comments below.
1. To my knowledge, the reconstruction framework based on a partial convolutional neural network (PConv) was first applied to historical climate field reconstruction by Kadow et al. (2020). The authors follow a nearly identical methodological approach. The authors must explicitly acknowledge Kadow et al. (2020) in the methodology section. This will help readers accurately contextualize the foundational literature and better understand the specific novel contributions of this work.
2. Following the previous point, the authors should at least use the latest dataset built by Kadow et al. (2020) as a benchmark to evaluate the reliability and added value of the newly proposed dataset. Also, regarding the uniqueness of the proposed dataset, it would be interesting and necessary to see how the latest Kadow et al. (2020) product performs in the Antarctic region compared to this study.
3. The evaluation and validation sections need to be strengthened. Currently, it is difficult to distinguish the unique strengths and irreplaceability of these new datasets from those of existing reference datasets solely on the basis of global/regional temperature trends. Providing a clearer demonstration of where and why this dataset outperforms existing ones would greatly improve the manuscript.
4. I recommend moderating some of the claims regarding the capabilities of AI methods. While powerful, some statements about AI's strengths and the overall contribution of the study come across as overconfident and could be nuanced to reflect limitations of methods and datasets.
5. The study provides two reconstruction products, but it is not entirely clear which one is considered superior or more reliable. A more solid comparison between the strengths and characteristics of the two products would be helpful. Furthermore, providing explicit guidance on which dataset users should select for specific use cases would add practical value to the paper.
Specific Comments:
L47-50: The authors mention that the "propagation of observational errors", "inconsistencies among reanalysis products", and "harsh environmental conditions" affect existing statistical reconstruction methods (PCA, EOT, DINEOF). While true, it is worth noting that AI-based methods often struggle with these exact same regional challenges. I suggest softening the critique of traditional methods here.
L58: Please clarify what is specifically meant by "extreme geographical conditions" in this context.
L98-99: Please remove the claim regarding interpretability. PConv models are generally not recognized for their physical interpretability.
L104-109: The introduction of the C-MST3.0 dataset is somewhat confusing, given that the reconstruction primarily relies on C-LSAT2.1. Consider revising this section to streamline the data origins.
L136: Please elaborate on what the "influence by features at the edges of missing data" specifically entails in this context.
Figure 1: This schematic appears to share strong similarities with the framework figure in Kadow et al. (2020). Please ensure appropriate citations or copyright permissions are included if it was adapted. Additionally, the numbers at the bottom of most boxes are somewhat confusing and could benefit from an explanation in the figure caption.
L149-150: Please specify exactly what is meant by "the AI reconstruction follows a similar approach." Similar to what?
L168: Provide more details on the regridding process. What exact resampling method was used?
L205-211: The authors claim that the AI reconstruction performance under the Merge-E mask is better than that of Merge-H, attributing the smoother reconstructed fields in Antarctica to the AI being "influenced to some extent by the colour, texture, and style features at the edges of missing regions." This conclusion is debatable. Because ERSST is a spatially complete, interpolated dataset, it lacks the vast oceanic gaps present in datasets like HadSST. Therefore, the "smoother" performance observed in Merge-E is likely not due to PConv's enhanced inpainting capabilities, but rather to the statistical interpolation already performed within the ERSST dataset itself. Please revisit and clarify this discussion.
L215-217: Highlighting the single-year result of the 1877 El Niño is a good visual check, but it is not sufficient to claim "robust performance and spatial consistency." Moreover, to my knowledge, the infilled HadCRUT5 dataset can also represent this 1877 event. Additional spatial validation is needed to support this claim.
L224-253: The detailed analysis comparing Merge-H and Merge-E feels somewhat redundant. Since ERSST is already reconstructed and spatially complete, applying AI reconstruction to it does not add much novel insight. The authors might significantly condense this section to simply justify the choice of Merge-H rather than present it as a core comparative result.
L253-254: It is difficult to visually distinguish among the various products. Consider analyzing and plotting the residuals (differences) instead, which would make the variations much clearer to the reader.
L255-256: Only one out of the four AI reconstructions shows a GWL > 1.5. Given this, the conclusion drawn here feels too confident. It would be appropriate to discuss the spread/uncertainty among the models.
L259-270: Inferring historical climate dynamics (i.e., ENSO) purely from the Global Mean Temperature (GMT) time series in Fig. 3 seems to be an over-interpretation. The claim that this "further demonstrating the robustness and reliability of the AI framework" feels overstated.
Tables 2 & 3: Please add the other reference datasets to these tables for comparison. Without them, it is difficult to objectively assess the performance and reliability of the AI reconstructions.
L285-286: It is difficult to visually confirm in Figure 4 that "the differences between different reconstruction results decrease markedly." Consider specifying timeframes and supporting them with a quantifiable metric.
Figure 5: Please improve the visual clarity of this figure. The lines overlap significantly, making it difficult to distinguish the differences between the datasets.
L361-362: There appears to be a contradiction here: the text states that CMIP6-AI performs better than 20CR-AI, but Figure S6 seems to show the reverse. Please verify and clarify.
L368-371: Please rephrase this statement for clarity. It is not entirely clear how the reference datasets are being quantitatively or qualitatively compared with the station data.
L389-390: Please specify the exact statistical method used to measure the significance here.
L423-424: The conclusion that the products show high spatiotemporal consistency is not fully supported because it has not been benchmarked against other references or methods. Basing this conclusion solely on GMT and regional trends is clearly insufficient.
Section 5 (Limitations and future perspectives): As the ESSD journal is data-focused, this section should focus specifically on data limitations, uncertainties, and the irreplaceability of the generated datasets. Broad discussions of the future of AI methodologies are somewhat outside the journal's primary scope and should be minimized.
Overall, while the core concept of this study is compelling and timely, the manuscript's current presentation, the robustness of its validation methods, and the understanding of the underlying datasets do not yet meet the necessary threshold for publication.
Citation: https://doi.org/10.5194/essd-2025-717-RC1 -
RC2: 'Comment on essd-2025-717', Anonymous Referee #2, 22 Mar 2026
reply
This manuscript presents a deep learning approach based on partial convolution, trained using 20CR reanalysis data and CMIP6 historical simulations, to reconstruct global monthly gridded surface air temperature from 1850 to 2024, with a particular improvement over the Antarctic region. Overall, the manuscript is well-structured, the methodology is clearly described, and the design of the reconstruction framework as well as the selection of sea surface temperature datasets are appropriate. The reconstructed results are validated against multiple observational datasets and Antarctic station records, demonstrating good consistency. The resulting dataset is valuable for studies of global temperature variability, particularly for Antarctic climate analysis and global temperature change assessments, and is generally well aligned with the scope of ESSD as a data journal.
It is also noteworthy that Dr. David Bromwich, an expert in Antarctic temperature research, pointed out during the public discussion that the Antarctic temperature reconstruction presented in this study shows certain advantages compared to state-of-the-art reanalysis datasets (e.g., ERA5).
I have no objection to the publication of this manuscript after minor revisions. The following comments mainly concern aspects such as figure presentation and result interpretation, which should be further improved by the authors during revision:
- In Figs. 2, 4, and 5, the font sizes of the axis labels and tick marks are relatively small and difficult to read. It is recommended to increase the font size to improve readability. In addition, the legend in Fig. 6 appears somewhat blurred; improving the resolution of this figure would help enhance clarity.
- Figure 2 presents reconstruction results for four representative months, but the manuscript does not explain the criteria used to select these months. It would be helpful if the authors could briefly clarify whether these months represent different observational coverage periods, climate conditions, or other considerations.
- Figure 3 shows the global annual mean temperature time series. Since the reconstructed series is visually very similar to the CMST3.0-Imax dataset, the contrast between the curves could be improved. For example, the CMST3.0-Imax series could be plotted with a thicker solid line to make the comparison clearer.
- In Table 1, the warming trends for Merge-H 20CR-AI and Merge-H CMIP6-AI during 1850–2024 are very similar (both 0.064 ± 0.006 °C per decade), yet the corresponding global warming levels (GWL) differ substantially (1.45 and52 °C). However, lines 255–256 state that “The GWL reconstructed by CMIP6-AI reaches 1.52 °C, which is slightly higher than the 1.5 °C estimated from 20CR-AI.” The authors may wish to check whether the GWL value reported in Table 1 is correct or if there is a discrepancy between the table and the text.
- The manuscript notes that observational records in Antarctica are sparse prior to 1961, making direct validation difficult. It would be helpful if the authors could briefly discuss the limitations or uncertainties associated with the reconstructed temperature fields during this early period, so that potential users of the dataset are aware of these limitations.
- In line 412, the manuscript refers to “optimal interpolation, EOF, or spatially weighted averaging”. The term “EOF” in this context may need clarification. If the authors intend to refer to the empirical orthogonal teleconnection method, it may be more appropriate to use “empirical orthogonal teleconnection (EOT)”.
- There are a few minor language issues throughout the manuscript. For example, in line 10, the comma in “substantial missing information in global ST datasets, remains a major source of uncertainty” should be removed. In line 63, “observation-based regional temperature fields often hard to capture” could be revised to “often fail to capture”. A careful language editing pass would further improve the readability of the manuscript.
Citation: https://doi.org/10.5194/essd-2025-717-RC2
Data sets
China global Artificial Intelligence Reconstructed Surface Temperature20CR/CMIP6 (C-AIRSTR/M) Chenxi Ouyang https://doi.org/10.6084/m9.figshare.30663797.v1
Model code and software
climatereconstructionAI Naoto Inoue et al. https://github.com/FREVA-CLINT/climatereconstructionAI
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 332 | 240 | 34 | 606 | 104 | 20 | 44 |
- HTML: 332
- PDF: 240
- XML: 34
- Total: 606
- Supplement: 104
- BibTeX: 20
- EndNote: 44
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
I enjoyed reading your analysis and seeing the results of AI-based 2-m temperature reconstruction, especially for Antarctica that are close to the results of Bromwich et al. (2025a). This must be
because the actual observations and their spatial and temporal variations form the basis of the reconstructions.
There has to be a strong trend in time in the availability of observations away from the Antarctic Peninsula that must strongly influence your results. AWS started to deployed in Antarctica in
1980. The annual time series plots in Fig. S1 have many errors. All need rechecking. Bonaparte Point, Briana, Cape Bird, Cape Phillips, Cape Ross, Concordia, D-47, Dome C, etc. do not start
around 1960.
Specifically regarding your Antarctic reconstruction results, the paper by Bromwich et al. (2024, https://doi.org/10.1029/2024GL111907) is relevant to your discussion
of ERA5 results shown in Figs. 6 and 7. That manuscript makes the case that ERA5 warming for Antarctica prior to 1979 is much too rapid because of a cold bias in the ECMWF model
that assimilation of limited satellite data over the Southern Ocean cannot overcome. Further, even after 1979, anomalous warming occurs in ERA5 along the coast near 0E and the Filchner Ice
Shelf and in Marie Byrd Land (near Siple Coast). These anomalies can be clearly seen in Fig. 7(h) and significantly contribute to the strong warming in ERA5 1979-2024 shown in Fig. 6(c).
David Bromwich Jan. 23, 2026