A global base temperature dataset for building energy demand modeling
Abstract. Accurate building energy demand modeling is critical to decarbonizing regional energy systems. The cooling and heating degree-day models are widely used due to their simplicity and low data requirements; however, the lack of accurate base temperature data limits their performance. In particular, the scarcity of high temporal resolution building energy demand data constrains regional-scale base temperature estimation through conventional methods such as the energy signature method and the performance line method. To address this limitation, this study develops a global regional-scale base temperature dataset based on the BiLSTM neural network framework with an attention mechanism. The dataset includes both cooling base temperature (Tcool) and heating base temperature (Theat) for each region, defined at a spatial scale equivalent to a U.S. state or a Chinese province. The BiLSTM framework demonstrates strong performance, with RMSE values of 1.39°C for training and 1.33°C for testing, and Pearson correlation coefficients of 0.84 for Tcool and 0.70 for Theat. Predicted results show that global Tcool ranges from 19–25°C and Theat from 14–18°C, consistent with physical principles. External validations using 16 independent datasets demonstrate that the predicted base temperatures significantly improve the accuracy of building energy demand modeling, reducing RMSE by 10.01% for cooling and 10.02% for heating, compared to official or empirical base temperatures. This dataset supplements sparse observational base temperature data and enhances the accuracy of building energy demand modeling, contributing to low-carbon energy system planning, broader climate impact assessment and weather-related financial applications. The proposed global Tbase dataset can be acquired from https://doi.org/10.6084/m9.figshare.30646376.v2 (He et al., 2025).