A global base temperature dataset for building energy demand modeling

He, Xiujuan; Eom, Jiyong; Yu, Sha; Liu, Shu; Xu, Wenru; Zhou, Yuyu

doi:10.5194/essd-2025-709

Preprints

https://doi.org/10.5194/essd-2025-709

Preprints

28 Dec 2025

| 28 Dec 2025

Status: a revised version of this preprint is currently under review for the journal ESSD.

A global base temperature dataset for building energy demand modeling

Xiujuan He, Jiyong Eom, Sha Yu, Shu Liu, Wenru Xu, and Yuyu Zhou

Abstract. Accurate building energy demand modeling is critical to decarbonizing regional energy systems. The cooling and heating degree-day models are widely used due to their simplicity and low data requirements; however, the lack of accurate base temperature data limits their performance. In particular, the scarcity of high temporal resolution building energy demand data constrains regional-scale base temperature estimation through conventional methods such as the energy signature method and the performance line method. To address this limitation, this study develops a global regional-scale base temperature dataset based on the BiLSTM neural network framework with an attention mechanism. The dataset includes both cooling base temperature (Tcool) and heating base temperature (Theat) for each region, defined at a spatial scale equivalent to a U.S. state or a Chinese province. The BiLSTM framework demonstrates strong performance, with RMSE values of 1.39°C for training and 1.33°C for testing, and Pearson correlation coefficients of 0.84 for Tcool and 0.70 for Theat. Predicted results show that global Tcool ranges from 19–25°C and Theat from 14–18°C, consistent with physical principles. External validations using 16 independent datasets demonstrate that the predicted base temperatures significantly improve the accuracy of building energy demand modeling, reducing RMSE by 10.01% for cooling and 10.02% for heating, compared to official or empirical base temperatures. This dataset supplements sparse observational base temperature data and enhances the accuracy of building energy demand modeling, contributing to low-carbon energy system planning, broader climate impact assessment and weather-related financial applications. The proposed global Tbase dataset can be acquired from https://doi.org/10.6084/m9.figshare.30646376.v2 (He et al., 2025).

Received: 18 Nov 2025 – Discussion started: 28 Dec 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Xiujuan He, Jiyong Eom, Sha Yu, Shu Liu, Wenru Xu, and Yuyu Zhou

Status: final response (author comments only)

RC1:
'Comment on essd-2025-709', Anonymous Referee #1, 05 Jun 2026

Overall, this paper presents an important dataset to show the base temperature globally. Without doubt, their effort will speed up the method to benchmark/set the base temperature, thereby regulating the adoption of air-conditioning systems or central heating systems for heating or cooling. This paper is overall well-written and logical. However, I suggest authors should make some revisions.
Authors may reflect residential buildings in the title of this paper.
The introduction can be rewritten to shorten the information presented before line 100, while the sentence in Line 103-105 and subsequent descriptions in this paragraph are odd. These contents do not really show a solid research gap. I encourage authors should re-visit how existing methods are costly and impractical. The mechanism to benchmark the base temperature in existing studies should also be criticized with their strengths and weaknesses.
Another comment on the introduction is that authors have well-charted the factors which affect energy use and the methods to characterize and benchmark the base temperature. However, authors have not presented the factors the terrain or topography. The terrain has also shaped energy use, occupant behaviors, and methods to calculate them. Moreover, in many countries, the determination of base temperature is associated with the culture and behaviors. There are also many studies linking the base temperature with indoor thermal comfort which is diverse among different populations.
Line 210, when describing methods, it is better to show the unit/metric of different variables. Based on this way, we can make sure the adoption of right indicators and basic mechanism. Furthermore, beyond ASHRAE, the IPCC also advocated the UK model to use and its accuracy in Europe is more solid than ASHRAE. Therefore, authors may re-think the adoption of Eq-1 to develop the model.
Figure 2, authors may reflect some areas that do not have any cooling demand or some areas without heating demand.

Citation: https://doi.org/10.5194/essd-2025-709-RC1
- AC1: 'Reply on RC1', Xiujuan He, 08 Jul 2026
  
  We thank the reviewer for the positive assessment and the constructive suggestions. Following the comments, we have revised the title, streamlined and strengthened the introduction, added the missing influencing factors and variable units, clarified our methodological choice, and refined Figures. Detailed point-by-point responses are provided below.
  
  Citation: https://doi.org/10.5194/essd-2025-709-AC1
RC2:
'Comment on essd-2025-709', Anonymous Referee #2, 05 Jun 2026

This manuscript presents a global dataset of cooling and heating base temperatures for building energy demand modelling. Authors combined energy-demand-derived base temperature estimates with a BiLSTM framework, and provided a spatially explicit global dataset that addresses a long-standing limitation in HDD/CDD-based analyses. This topic is timely and relevant, and I believe the resulting dataset could be useful for a broad range of applications, including climate impact assessment and integrated assessment modelling. The manuscript is well organized and the validation effort is extensive (than what is often seen in similar studies).
However, my main suggestion concerns the discussion of robustness and generalizability. The core contribution of this study is the global extrapolation from a relatively limited number of regions with energy-demand-derived base temperatures to thousands of administrative units worldwide. While the reported validation results are encouraging, I felt the manuscript could do more to help readers understand the reliability of this extrapolation. In particular, the training labels themselves are derived through a segmented regression procedure and therefore contain their own assumptions and uncertainties. Since these labels ultimately drives the machine learning framework, I would encourage the authors to discuss/explore how sensitive the resulting dataset may be to choices made during the label-generation process. For example, it would be useful to know whether alternative breakpoint selection approaches, different fitting thresholds, or slightly different parameter settings (no need to do all of them, just as examples) would materially affect the final results. I do not necessarily view this as a weakness of the study, but additional discussion/exploration would help readers better understand the confidence that can be placed in the dataset.
A related point is about the representativeness of the training sample. Although the manuscript acknowledges uneven geographical coverage, the model is ultimately applied globally, including regions that appear to be sparsely represented in the training data. I would therefore think that the authors might want to provide a little more discussion of how well the training dataset captures the climatic and socioeconomic diversity of the final application domain. Similarly, the uncertainty analysis currently focuses on model uncertainty, whereas uncertainty associated with limited spatial coverage may also be relevant. Even a qualitative discussion of this issue would be helpful.
I also wondered whether the manuscript could benefit from a clearer demonstration of the added value of the chosen machine-learning framework. The BiLSTM approach appears to perform well, but readers may naturally ask how much improvement it provides relative to simpler alternatives and common benchmarks. Some discussion of why this architecture was selected and what advantages it offers over more conventional approaches would improve the methodological justification.
Beyond these, a few minor comments: Some parts of the discussion occasionally move beyond what can be directly inferred from the dataset itself. For example, several regional patterns are interpreted through housing quality, adaptation behaviour, or policy conditions. These explanations are certainly plausible, but they are not directly evaluated in the present study. I would therefore suggest slightly more cautious wording in these sections or simply drop them. In addition, parts of the introduction and application discussion could potentially be streamlined. The manuscript already makes a strong case for the importance of regional base temperatures, and reducing some of the “too broad” background discussion would allow greater emphasis on methodology itself, which is likely to be of primary interest to ESSD readers.

Citation: https://doi.org/10.5194/essd-2025-709-RC2
- AC2: 'Reply on RC2', Xiujuan He, 08 Jul 2026
  
  We thank the reviewer for the positive assessment and constructive suggestions. Following the comments, we have added a sensitivity analysis of the label-generation process, expanded the discussion of training-sample representativeness and spatial-coverage uncertainty, added a benchmark comparison for the BiLSTM, adopted more cautious wording, and streamlined the introduction; detailed point-by-point responses are provided below.
  
  Citation: https://doi.org/10.5194/essd-2025-709-AC2

Xiujuan He, Jiyong Eom, Sha Yu, Shu Liu, Wenru Xu, and Yuyu Zhou

Data sets

A global base temperature dataset for building energy demand modeling Xiujuan He et al. https://doi.org/10.6084/m9.figshare.30646376.v2

Xiujuan He, Jiyong Eom, Sha Yu, Shu Liu, Wenru Xu, and Yuyu Zhou

Viewed

Total article views: 788 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
497	230	61	788	51	46

HTML: 497
PDF: 230
XML: 61
Total: 788
BibTeX: 51
EndNote: 46

Views and downloads (calculated since 28 Dec 2025)

Month	HTML	PDF	XML	Total
Dec 2025	109	21	5	135
Jan 2026	142	77	10	229
Feb 2026	29	41	12	82
Mar 2026	57	31	12	100
Apr 2026	85	30	11	126
May 2026	42	14	3	59
Jun 2026	11	2	3	16
Jul 2026	22	14	5	41

Cumulative views and downloads (calculated since 28 Dec 2025)

Month	HTML	PDF	XML	Total
Dec 2025	109	21	5	135
Jan 2026	142	77	10	229
Feb 2026	29	41	12	82
Mar 2026	57	31	12	100
Apr 2026	85	30	11	126
May 2026	42	14	3	59
Jun 2026	11	2	3	16
Jul 2026	22	14	5	41

Viewed (geographical distribution)

Total article views: 781 (including HTML, PDF, and XML) Thereof 781 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 25 Jul 2026

Short summary

Buildings consume significant energy for heating and cooling. To reduce carbon emissions, we need accurate predictions of energy demand, which depend on knowing the outdoor temperature at which buildings start heating or cooling. We used artificial intelligence to create the first global dataset of these temperature thresholds for regions worldwide. Our dataset improves energy demand prediction accuracy by 10%, supporting better energy planning and climate policy decisions.


Total:	0
HTML:	0
PDF:	0
XML:	0