the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A 1 km daily high-accuracy meteorological dataset of air temperature, atmospheric pressure, relative humidity, and sunshine duration across China (1961–2021)
Keke Zhao
Denghua Yan
Tianling Qin
Chenhao Li
Dingzhi Peng
Yifan Song
The lack of high-accuracy, fine-resolution meteorological datasets in China has hindered progress in climate, hydrological, and ecological studies. In this study, we present a 1 km daily dataset spanning 1961–2021 across China, which includes six key variables – average, maximum, and minimum temperature, atmospheric pressure, relative humidity, and sunshine duration – to provide a reliable foundation for advancing related research and applications. The dataset was generated using a novel hierarchical reconstruction framework that leveraged daily observations from 2345 meteorological stations and incorporated topographic attributes. This approach effectively decodes the nonlinear relationships between the meteorological variables and their spatial covariates, ensuring the generation of gridded daily fields that are both high-resolution and spatially continuous. Validation against 146 independent stations confirmed the high accuracy of the dataset. For average, maximum, and minimum temperatures, the errors are minimal (median root mean square errors (RMSEs): 1.16, 1.19, 1.29 °C; median mean errors (MEs): −0.04, −0.10, −0.01 °C), and the consistency with in-situ data is very high (median correlation coefficients (CCs): 0.99, 0.99, 0.99). Atmospheric pressure also shows very small errors (median RMSE: 2.65 hPa; median ME: −0.06 hPa) and strong correlation (median CC: 0.97). Relative humidity exhibits relatively lower accuracy (median RMSE: 6.33 %; median ME: −0.52 %; median CC: 0.90), but it still exceeds standard benchmarks. Sunshine duration maintains high precision (median RMSE: 1.48 h; median ME: 0.05 h; median CC: 0.93), indicating the robustness and reliability of the dataset. Further comparison reveals that in high-altitude and topographically complex regions, the reconstructed product demonstrates higher actual accuracy than suggested by station-to-grid validation, as spatial mismatches between stations and grid cells lead to systematic underestimation. Free access to the dataset is available at https://doi.org/10.11888/Atmos.tpdc.301341 or https://cstr.cn/18406.11.Atmos.tpdc.301341 (last access: 25 November 2025) (Zhao et al., 2024).
- Article
(10220 KB) - Full-text XML
- BibTeX
- EndNote
With advances in computational power and remote sensing technologies, hydrological modeling has increasingly evolved toward fully distributed simulations (Lettenmaier et al., 2015; Singh, 2018), while climate change research continues to expand across broader spatial and temporal scales (IPCC, 2021). These developments have placed growing demands on the resolution and accuracy of basic meteorological inputs, particularly in ungauged and topographically complex basins such as the Tibetan Plateau (Fu et al., 2020; Zhou et al., 2024). High-resolution and high-quality meteorological datasets are essential for capturing fine-scale climate signals, representing land–atmosphere interactions, and supporting hydrological, ecological, and environmental assessments.
In recent decades, a wide range of meteorological and environmental variables – such as land and sea surface temperatures, precipitation (King et al., 2003), vegetation indices (Zeng et al., 2022), soil moisture (Brocca et al., 2017), air quality (Martin, 2008), and carbon emissions (Wunch et al., 2017) – have been derived from remote sensing observations and data assimilation systems. These satellite-based products offer broad spatial coverage and long-term continuity, enabling significant advances in water resources monitoring and drought-related climate assessment, particularly in data-scarce regions (Sheffield et al., 2018). However, despite their strengths, such products often struggle to represent near-surface meteorological conditions with sufficient precision. Their performance is typically constrained by atmospheric interference, cloud contamination, and limited spatial resolution – factors that become particularly problematic in regions with highly variable terrain. As a result, many satellite-derived datasets fail to meet the spatial and temporal requirements of land surface modeling, hydrological forecasting, and local-scale climate analysis. To mitigate these limitations, assimilation-based approaches have been increasingly adopted to integrate satellite data, reanalysis fields, and ground-based observations for near-surface meteorological forcing generation (Rodell et al., 2004; Laiolo et al., 2015; Liu et al., 2019; Khaki et al., 2020). While these efforts improve data consistency and spatial completeness, significant uncertainties persist – especially in areas like western China, where rugged topography and sparse station distribution pose persistent challenges (Gao and Liu, 2013; Yang et al., 2013; Wang et al., 2016; Tang et al., 2016; Qi et al., 2018). These limitations underscore the pressing need for regionally tailored, high-resolution meteorological datasets that are capable of capturing local climatic variability and supporting reliable simulation in hydrological modeling, drought risk forecasting, and water resources management.
Recent efforts to generate gridded meteorological forcing datasets in China have primarily followed three methodological approaches. The first approach is based on spatial interpolation of in-situ station data to generate gridded fields (Li, 2008). However, interpolation methods that do not explicitly account for topographic complexity and environmental gradients often yield limited accuracy, particularly in mountainous regions (Li and Heap, 2011; Yu et al., 2015; Yang and Xing, 2021). To improve spatial realism, elevation-dependent interpolation schemes have been applied to reconstruct precipitation and temperature in regions such as the Heihe River Basin, the Tibetan Plateau, and the headwaters of the Yangtze and Yellow Rivers (Wang et al., 2017; Sun and Su, 2020; Zhao et al., 2022a; Zhang et al., 2024). The second approach involves spatial downscaling and multi-source data fusion. This includes deriving high-resolution fields from coarse-resolution reanalysis or climate datasets, or combining satellite, reanalysis, and station data to reconstruct near-surface meteorological variables. For instance, Li et al. (2014) employed a two-step interpolation method to generate 1 km gridded datasets of air temperature, pressure, humidity, and wind speed across China. Peng et al. (2019) produced monthly gridded temperature and precipitation data for 1901–2017 using delta downscaling applied to CRU and WorldClim inputs. He et al. (2020) developed the China Meteorological Forcing Dataset (CMFD), which integrates observations from over 1000 stations with GLDAS and MERRA reanalysis products to provide daily meteorological variables at 0.1° resolution. Zhao et al. (2022b) further enhanced precipitation accuracy over the Yarlung Zangbo Basin by correcting and merging multiple satellite precipitation products with in-situ records. The third approach draws upon machine learning techniques to model complex relationships between meteorological variables and spatial covariates. Global satellite-derived precipitation products such as CMORPH (Joyce et al., 2004; Xie et al., 2017) and PERSIANN (Sorooshian et al., 2014; Sadeghi et al., 2019) exemplify early use of neural networks for rainfall estimation. In the Chinese context, recent studies – including those by Wu et al. (2020), Hong et al. (2021), and Jing et al. (2022) – have applied deep learning models to improve the spatial resolution and accuracy of multi-source precipitation datasets. For temperature, Pang et al. (2017) evaluated machine learning methods for downscaling daily mean temperature in the Pearl River Basin using global climate model outputs. Zhang et al. (2021) showed that a gradient boosting approach outperformed traditional reanalysis datasets such as JRA-55 and ERA-Interim over the Tibetan Plateau. He et al. (2022) applied Gaussian process regression to generate the GPRChinaTemp1km dataset, a 1 km resolution monthly temperature product for 1951–2020. However, the development of machine learning-based gridded products for other meteorological variables – such as atmospheric pressure, humidity, sunshine duration, and wind speed – remains limited and warrants further research (Li and Zha, 2018; Liu et al., 2022).
To address the limitations of existing meteorological datasets in spatial resolution, temporal continuity, and variable completeness, this study introduces a high-resolution dataset of daily near-surface meteorological variables – including average, maximum, and minimum air temperature, atmospheric pressure, relative humidity, and sunshine duration – across mainland China. Spanning six decades (1961–2021) with kilometer-level granularity, the dataset is designed to support fine-scale applications such as land surface modeling, drought assessment, and water resource management. It is particularly suited for both scientific investigations and operational decision-making in data-sparse and topographically complex regions, such as western China. To achieve this, a hierarchical and progressive reconstruction framework is implemented to generate gridded estimates of six variables at approximately 2 m a.g.l., based on in-situ observations and a 1 km digital elevation model (DEM). A multilayer perceptron (MLP) regression model is employed in this framework to capture nonlinear relationships between station observations and topographic predictors (e.g., latitude, longitude, and elevation), enabling fine-scale reconstruction across complex terrain.
2.1 Training and validation data from CMA
Daily records of station metadata and meteorological variables – including longitude, latitude, elevation, average temperature, maximum temperature, minimum temperature, atmospheric pressure, relative humidity, and sunshine duration – were obtained from 2440 meteorological stations operated by the China Meteorological Administration (CMA) for the period 1961–2021. According to the official documentation and metadata, these daily records are part of the CMA Surface Climate Daily Dataset, which follows a nationally standardized observation protocol with unified day boundaries and homogenized records subjected to multi-tier quality control procedures. To support independent model validation, a total of 95 stations were selected as evaluation sites based on three principles: (1) ensuring geographical representativeness in terms of longitude, latitude, and elevation; (2) in densely monitored areas such as eastern China, a greater number of evaluation stations were retained without significantly reducing the size of the training dataset; and (3) in sparsely monitored regions such as western China (including Tibet and Xinjiang), the number of evaluation stations was intentionally reduced to ensure adequate data availability for model training. The remaining 2345 stations were used exclusively for training purposes. The spatial distribution of both training and evaluation stations is illustrated in Fig. 1.
For the years 2020 and 2021, daily records are limited to air temperature, as measurements of atmospheric pressure, relative humidity, and sunshine duration are unavailable during this period. Due to variations in the temporal coverage of individual stations, the amount of available daily data for model training and evaluation also differs across sites. The temporal distribution of operational meteorological stations from 1961 to 2021 is presented in Fig. 2.
2.2 Validation data from supplementary ground-based observations
2.2.1 Ground observations provided by DWR
To address the limited spatial coverage of validation stations in the Tibet region, daily average temperature observations from 12 ground-based meteorological stations were obtained from the Department of Water Resources (DWR). These supplementary data enhance the robustness of model evaluation in western China. The locations of the DWR stations are shown in Fig. 1, and metadata for each station are provided in Table 1.
2.2.2 Literature-based datasets from the National Tibetan Plateau Data Center
To supplement observational data for the evaluation of gridded meteorological products, a variety of station-based datasets were obtained from the National Tibetan Plateau Data Center (TPDC, http://data.tpdc.ac.cn, last access: 25 November 2025), as represented by the blue flag symbols in Fig. 1. These include: (1) a publicly available dataset of hourly land–atmosphere interaction observations (Ma et al., 2024), covering the period 2005–2021, of which two stations were employed as independent validation sites; (2) data from 18 stations within the HiWATER hydrometeorological observation network in the upper reaches of the Heihe River Basin (Liu et al., 2018; Che et al., 2019); and (3) additional station-based records from 11 individual stations, including 2 stations from Zhang (2018a, b); 3 stations from Gao (2018); 2 stations from Luo (2019); and 1 station each from Ma (2018), Wang and Wu (2019), Luo and Zhu (2020), and Meng and Li (2023).
2.2.3 Validation data from GSOD
The Global Surface Summary of Day (GSOD) dataset, compiled by the National Centers for Environmental Information (NCEI), is based on international data exchanges conducted under the World Meteorological Organization (WMO) World Weather Watch Program. This dataset provides daily summaries of 18 surface meteorological variables from more than 9000 global stations, with records available from 1929 to the present. Observation data from eight meteorological stations in the Taiwan region were obtained from the NCEI online archive (https://www.ncei.noaa.gov/access/search/data-search/global-summary-of-the-day, last access: 25 November 2025) and processed for use in validation. Detailed metadata and data availability for these stations are summarized in Table 2.
2.3 Static geospatial input: SRTM DEM (1km)
The Digital Elevation Model (DEM) provides high-resolution geographic information – including longitude, latitude, and elevation – that is required for the spatial reconstruction of meteorological variables. In this study, the DEM was used as an essential input for the reconstruction model to ensure spatial consistency and accuracy. Although the model supports flexible output resolutions, a spatial resolution of 1 km was selected to balance computational efficiency and data detail. The DEM used herein was derived by resampling the latest version of the Shuttle Radar Topography Mission (SRTM) data (version 4.1), as provided by the Consortium for Spatial Information of the CGIAR (Jarvis et al., 2008).
2.4 Climate regionalization map of China
The Climate Regionalization Map of China, compiled by the China Meteorological Administration in 1978 using climate data from 1951 to 1970, divides the country into nine climatic zones. The dataset is publicly available via the Resource and Environmental Science Data Platform (https://www.resdc.cn/, last access: 25 November 2025). For the purpose of comparative analysis of regional climatic patterns, the four subtropical zones – Northern Subtropical, Middle Subtropical, Southern Subtropical, and Northern Tropical – were merged into a single Subtropical Zone. The revised classification scheme consists of six zones: Plateau Climate Zone, Northern Temperate Zone, Middle Temperate Zone, Southern Temperate Zone, Subtropical Zone, and Middle Tropical Zone, as illustrated in Fig. 1.
2.5 Existing gridded products for comparison
To assess the reliability and application potential of the reconstructed meteorological variables, representative and widely used gridded datasets were selected for comparison based on their scientific relevance and availability. Specifically, for average temperature, atmospheric pressure, and relative humidity, we employed the latest version of the China Meteorological Forcing Dataset (CMFD 2.0), whose earlier versions have been extensively used in land surface, hydrological, and ecological modeling over China (He et al., 2020).
The CMFD 2.0 (He et al., 2024) provides high-resolution (0.1°), 3-hourly gridded meteorological data for the period 1951–2020, covering the land area between 70–140° E and 15–55° N. It includes near-surface temperature, surface pressure, specific humidity, wind speed, radiation, and precipitation. Compared to previous versions, CMFD 2.0 incorporates ERA5 reanalysis and station observations through updated data sources and artificial intelligence techniques, particularly for radiation and precipitation variables. It also introduces metadata on station relocations and expands the spatial coverage beyond China's borders, thereby improving temporal consistency and cross-regional applicability.
As CMFD 2.0 does not include sunshine duration, we incorporated two additional datasets for its evaluation. This step is critical because sunshine duration reconstruction constitutes the final step in our hierarchical framework, necessitating a thorough accuracy assessment to evaluate potential uncertainty propagation. To this end, we selected two complementary benchmarks: one long-term station-based product and one recent high-resolution satellite product. (1) The sunshine duration (SSD) dataset (He, 2024, 2025) serves as the long-term, station-based benchmark. It provides a homogenized daily sunshine duration record across China from 1961 to 2022 at a 2.0° × 2.0° resolution. Developed from over 2200 meteorological stations and corrected for non-climatic influences (e.g., station relocations and instrumental changes), it offers a reliable baseline for evaluating the temporal stability and long-term climatological consistency of our reconstruction. (2) The Himawari AHI-based daily sunshine duration (SD) dataset (Zhang et al., 2025) provides a recent, high-resolution (5 km) satellite perspective for 2016–2023. It enables a direct assessment of our product's quality during the 2016–2019 overlap period and serves as a benchmark for evaluating fine-scale spatial accuracy.
3.1 MLP-based hierarchical progressive reconstruction framework
The reconstruction of near-surface meteorological fields in this study is based on multilayer perceptron (MLP) models – a class of deep feedforward neural networks capable of capturing complex nonlinear relationships through layered transformations (Bisong, 2019). Each MLP consists of an input layer, multiple hidden layers, and an output layer, and is trained using a two-phase process: feedforward propagation, in which input data are transmitted through the network to produce predictions, and backpropagation, during which model parameters are iteratively adjusted to minimize prediction errors. This learning mechanism enables MLPs to extract spatial and statistical patterns from high-dimensional data while maintaining strong generalization capability. Owing to these characteristics, MLPs have been successfully applied in diverse domains such as medical diagnostics (Karayilan and Kilic, 2017; Desai and Shah, 2021), finance (Duan, 2019; Weytjens et al., 2021), and hydrology (Singh et al., 2012; Choubin et al., 2016; Ren et al., 2020).
In this study, MLP models serve as the computational foundation of the hierarchical progressive reconstruction framework developed to generate high-resolution, spatially complete datasets of near-surface meteorological variables. This framework is designed to address both variable interdependence and geographic heterogeneity by reconstructing each target variable sequentially using a tailored set of spatial and meteorological predictors. As illustrated in Fig. 3, it consists of two functional modules: a training module and a reconstruction module. The training module learns nonlinear spatial mapping functions from in-situ station data, capturing daily spatial patterns across complex terrain. The reconstruction module then applies the trained parameters to gridded predictor layers to generate continuous spatial fields at the desired resolution. To ensure both the accuracy and feasibility of the reconstruction, input features are selected based on their relevance to the spatial distribution of each variable and the availability of high-resolution gridded data. Topographic predictors (latitude, longitude, and elevation) are used consistently throughout the framework, while previously reconstructed meteorological variables are incorporated as auxiliary inputs in subsequent steps.
The hierarchical reconstruction framework comprises four sequential steps, each targeting a specific meteorological variable – (a) air temperature, (b) atmospheric pressure, (c) relative humidity, and (d) sunshine duration. This ordering is guided by both physical dependencies and statistical considerations, allowing upstream variables to serve as essential inputs for reconstructing downstream variables. In the first step, air temperature is reconstructed using only geographic predictors – longitude, latitude, and elevation. Although solar radiation and land surface characteristics, which fundamentally shape temperature patterns, are not explicitly included (Peixoto and Oort, 1992; Hartmann, 2016), these geographic features serve as effective proxies for capturing dominant spatial gradients. In the second step, atmospheric pressure is modeled using a three-layer MLP, incorporating geographic variables and temperature. Atmospheric pressure is jointly determined by air density and gravitational acceleration, both of which vary with temperature and elevation due to their effects on the atmospheric hydrostatic balance (Mason et al., 2016). Including temperature as a predictor thus improves the model's ability to reproduce its spatial variability. The third step addresses relative humidity, modeled using a four-layer MLP with geographic predictors, temperature, and atmospheric pressure as inputs. Relative humidity depends on both actual and saturation vapor pressures (Wallace and Hobbs, 2006; Mason et al., 2016); the former is partially influenced by atmospheric pressure, while the latter is primarily governed by temperature and increases exponentially according to the Clausius–Clapeyron relationship. Incorporating both temperature and pressure enhances the model's ability to capture the complex spatial behavior of humidity. Building on the preceding steps, the final reconstruction targets sunshine duration, which is influenced by the combined effects of the solar astronomical position, atmospheric radiative processes, and synoptic-scale weather systems. According to WMO (2023), sunshine duration is defined as the total time during which direct solar irradiance exceeds 120 W m−2. Geographic predictors provide the spatial context, while temperature, pressure, and humidity reflect dynamic atmospheric states and cloud-related feedbacks. These variables are physically grounded and observationally accessible. A four-layer MLP model is therefore employed in the final step to reconstruct the spatial distribution of sunshine duration.
Overall, this progressive framework ensures that each reconstruction step is guided by physically meaningful and context-specific predictors. By integrating the hierarchical dependencies among meteorological variables, the approach yields spatially complete and physically consistent gridded datasets that are suitable for large-scale climate and environmental applications.
3.2 Evaluation metrics
In this study, four evaluation metrics were employed: Mean Error (ME), Mean Squared Error (MSE), Root Mean Square Error (RMSE), and Correlation Coefficient (CC). These metrics were utilized in two distinct phases: the MLP model training phase and the meteorological products evaluation phase. The formulas for the four metrics are as follows:
where n denotes the total number of days in the time series; t represents the tth day; Yt and denote the in-situ value of the target variable and the mean in-situ value of the target variable, respectively; and and denote the model's estimated value and the mean estimated value, respectively.
During the training phase, MSE was used as the loss function to measure and optimize the performance of the MLP model. Upon completion of the training, ME and CC were computed between the estimated outputs – derived from the model parameters at the optimal training state – and in-situ records of the target variable, with particular emphasis on CC to ensure comprehensive model performance evaluation. If the MSE was low but the CC was poor, the hyperparameters of the deep learning model were adjusted, and training continued until satisfactory results were achieved.
In the subsequent evaluation phase of the meteorological reconstruction products, RMSE, ME, and CC were calculated between in-situ records and corresponding grid estimates. These metrics effectively validated the accuracy and reliability of the reconstruction products, confirming discrepancies with the observed data.
4.1 MLP training and test results
To evaluate the generalization capability of the reconstruction models and prevent overfitting, we randomly assigned 10 % of the daily in-situ observations from 1961 to 2021 to the test dataset using a fixed random seed, with the remaining 90 % used for training. Figure 4 presents the performance metrics of the daily MLP models across six meteorological variables: average temperature, maximum temperature, minimum temperature, atmospheric pressure, relative humidity, and sunshine duration. Three standard evaluation metrics are used: ME (Fig. 4a), MSE (Fig. 4b), and CC (Fig. 4c). The mean values of all metrics are highly consistent between training and test phases, indicating strong generalization and no evidence of overfitting. These results confirm the stability and precision of the deep learning-based hierarchical progressive reconstruction framework. Notable deviations across all metrics are limited to a very small number of days and are primarily attributed to substantial gaps in the in-situ observations.
Figure 4Line graphs of metrics (MSE, ME, CC) for optimal parameters in daily training and testing of MLP models from 1961 to 2021.
The ME values are close to zero for all variables in both phases. Specifically, the mean ME for maximum and minimum temperatures is exactly 0 °C, while the other four variables also show near-zero mean errors, with at least one phase yielding a mean ME of 0. The range of ME values is also narrow. During training, ME ranges from −0.49 °C to 0.46 °C for average temperature, −3.55 to 2.61 hPa for atmospheric pressure, −2.15 % to 1.96 % for relative humidity, and −0.54 to 0.50 h for sunshine duration. The test phase exhibits even narrower ME ranges: −0.32 to 0.36 °C (average temperature), −2.25 to 1.94 hPa (atmospheric pressure), −1.83 % to 1.49 % (relative humidity), and −0.42 to 0.41 h (sunshine duration). These results suggest minimal systematic bias in the model predictions across all variables. The MSE, which emphasizes the impact of large residuals by squaring the error magnitude, consistently exceeds the ME across all variables. As shown in Fig. 4b, the daily MSE values are low in both phases, with only a slight increase in the test phase. Temperature-related variables – including average, maximum, and minimum temperature – exhibit low and stable MSE values, with means below 1 °C2 and only minor differences (typically 0.1–0.3 °C2) between training and test phases. This indicates that the model captures temperature dynamics with high accuracy and strong generalization. For atmospheric pressure, which inherently exhibits a larger numerical scale, the mean MSE values remain relatively low – 6.9 hPa2 in the training phase and 8.5 hPa2 in the test phase. Notably elevated MSE values are observed only on a few days in 1961, primarily due to substantial gaps in the observed atmospheric pressure records. Relative humidity and sunshine duration also show consistently low error levels, with training phase MSEs of 14.1 %2 and 1.2 h2 , and slightly higher values of 20.7 %2 and 1.8 h2 in testing phase. Analysis of the CC value indicates strong agreement between model estimates and observed values across all variables. Notably, atmospheric pressure achieves perfect agreement, with a mean CC of 1.00 in both phases. Average, maximum, and minimum temperatures also show consistently high correlations, with mean CCs of 0.98, 0.98, and 0.99 in the training phase, and 0.97, 0.97, and 0.98 in the testing phase. Although the CCs for relative humidity and sunshine duration are slightly lower, they remain strong – 0.94 and 0.91 in training, and 0.92 and 0.87 in testing, respectively.
Collectively, the results highlight the proposed framework's ability to accurately identify and reconstruct the spatial structures of diverse meteorological variables, demonstrating strong generalization across different element types and conditions.
4.2 Validation of gridded meteorological element products using in-situ data
An independent validation was conducted using long-term in-situ records from 146 stations, as described in Sect. 2.1 and 2.2. These stations were entirely excluded from the model training and testing phases, and their observations served as reference data for an objective evaluation of the reconstructed products' accuracy and spatial generalizability. The validation results confirm that the reconstructed meteorological products achieve high overall accuracy, with particularly strong performance in regions with dense training data. Notably, even in areas with sparse or absent observations – such as northwestern China and Taiwan – the model maintains stable and reliable performance, indicating strong spatial generalizability and a capacity to extrapolate beyond the training domain. This highlights the potential of the proposed framework for broad application in diverse climatic and geographic settings. Model performance was quantified by calculating RMSE, ME, and CC between the 1 km gridded estimates and the corresponding station observations. The evaluation metrics were visualized through box plots (Fig. 5) and spatial distribution maps (Fig. 6).
Figure 5Box plots of RMSE, ME, and CC for grid-modelled data of six meteorological element products and in-situ data.
As shown in the box plots of RMSE, ME, and CC (Fig. 5), the reconstructed products for average, maximum, and minimum air temperature exhibit minimal errors and excellent consistency with in-situ observations. Median RMSEs are 1.16, 1.19, and 1.29 °C, respectively; median MEs are close to zero (−0.04, −0.10, and −0.01 °C); and median CCs are exceptionally high (0.99, 0.99, and 0.99). Despite its inherently larger magnitude, atmospheric pressure also demonstrates high precision, with a median RMSE of 2.65 hPa, ME of −0.06 hPa, and CC of 0.97. In comparison, the relative humidity product shows moderately lower agreement with observations, reflected in a median RMSE of 6.34 %, ME of −0.52 %, and CC of 0.90. However, since it is primarily used as an input for the reconstruction of sunshine duration, its effect on overall model performance is limited. Indeed, the sunshine duration product demonstrates higher accuracy, with a median RMSE of 1.48 h, ME of 0.05 h, and CC of 0.93. Although relative humidity exhibits slightly weaker performance than other variables, its accuracy still exceeds typical benchmarks and remains suitable for practical applications.
The spatial distribution of RMSE, ME, and CC for all six meteorological variables are further illustrated in Fig. 6, and consistent with expectations, the Subtropical and Southern Temperate Zones in southeastern China (STZ-southeastern China) display the best performance across all variables, largely due to the high density of training stations in these regions. In contrast, performance metrics are relatively lower in the Middle Temperate, Southern Temperate, and Plateau Climate Zones of northwestern China (MSPZ–northwest China), as well as in Taiwan, where no stations were included in training. Nevertheless, model performance in these regions remains robust. Notably, despite the absence of training data in Taiwan, the MLP model accurately reconstructs air temperature in that region, suggesting strong spatial generalizability.
Figure 6Distribution maps of RMSE (a), ME (b) and CC (c) between grid-modelled data of six meteorological element products and in-situ data.
For temperature variables, both Figs. 5 and 6 indicate minimal spatial variation, with most RMSEs, MEs, and CCs in STZ southeastern China and MSPZ northwest China falling within the ranges of 0.49 to 2 °C, −2 to 2 °C, and 0.95 to 1.00, respectively. A few outliers, primarily located in the Tibetan Plateau, Xinjiang, and Taiwan, fall outside these ranges. Specifically, temperature errors in Taiwan range from 3.3 to 6 °C for RMSE, −0.5 to −4 °C for ME, and 0.7 to 0.9 for CC, indicating a general underestimation of air temperature in this region. For the atmospheric pressure product, RMSE, ME, and CC values in STZ southeastern China generally range from 0.8 to 15 hPa, −5 to 5 hPa, and 0.85 to 1.00, respectively. In MSPZ northwest China, most ME values range from −41 to 0 hPa, indicating a slight tendency toward underestimation. For the relative humidity product, spatial patterns of RMSE reveal that values in western China consistently range from 8 % to 31 %, whereas in eastern China they are generally smaller, with the majority of stations falling within 3.6 % to 8 %. ME values further indicate that only a few stations (six, mostly located along the margins of the Tibetan Plateau and in Xinjiang) in western China exhibit larger negative biases in the range of −29 % to −5 %, while the vast majority of stations in both regions fall within −5 % to 5 %. In eastern China, ME values for almost all stations fall within −5 % to 5 %. Similarly, CC values show a consistent spatial pattern nationwide, generally ranging from 0.80 to 1.00, with only three isolated stations in western China falling within the range of 0 to 0.7. For the sunshine duration product, RMSE, ME, and CC values exhibit minimal spatial variability across China. RMSE values generally range from 1.2 to 2.0 h, ME values from −0.4 to 0.5 h, and CC values from 0.80 to 1.00. Values beyond these ranges are observed only at a few isolated stations.
4.3 Evaluation and comparison against existing gridded products
4.3.1 Average temperature, atmospheric pressure, relative humidity
Although 95 CMA stations were initially reserved for validating the gridded meteorological products developed in this study, they were not used in the evaluation of CMFD 2.0 due to the lack of publicly available information on the station sources used in its construction. This raised concerns that some or all of these CMA stations might have already contributed to the CMFD 2.0. To avoid potential data overlap and ensure an objective and independent evaluation, the CMA stations were excluded from the validation analysis. Instead, observational data from 51 ground stations introduced in Sect. 2.2.1, 2.2.2, and 2.2.3 were used to assess the accuracy of the reconstructed meteorological variables against CMFD 2.0. These stations provided daily records for one to three of the following variables: average temperature (48 stations), atmospheric pressure (25 stations), and relative humidity (29 stations). As maximum/minimum temperature and sunshine duration were largely unavailable at these sites and not included in CMFD 2.0, the evaluation focused exclusively on the three core variables.
As shown in Fig. 7, except for atmospheric pressure – where CMFD 2.0 exhibits a higher median CC value (0.96) than this reconstructed dataset (0.87) – the gridded meteorological dataset developed in this study demonstrates generally comparable or slightly improved performance relative to CMFD 2.0 in terms of median RMSE, ME, and correlation coefficient across the evaluated variables. Notably, although the correlation for atmospheric pressure is marginally lower in the dataset developed in this study, it yields substantially smaller errors, with median RMSE and ME of 3.61 and −0.61 hPa for this dataset, and 17.14 and 9.41 hPa for CMFD 2.0, respectively. For average temperature and relative humidity, the two gridded products exhibit similar median CC values. However, the reconstructed dataset yields consistently lower median RMSE and ME, suggesting slightly improved accuracy. Specifically, the values for temperature are 1.98 and −0.21 °C, compared to 2.08 and −0.46 °C for CMFD 2.0. For relative humidity, the corresponding values are 10.75 % and −1.05% for the reconstructed dataset, while CMFD 2.0 reports 11.12 % and −2.40 %.
Figure 7Boxplot comparison of RMSE, ME, and CC for average temperature, atmospheric pressure, and relative humidity between CMFD 2.0 and the reconstructed dataset developed in this study.
These findings are particularly evident in high-altitude regions represented by 51 validation sites predominantly located in the southern Tibetan Plateau and the Heihe River Basin, where the gridded fields of average temperature, atmospheric pressure, and relative humidity developed in this study demonstrate good agreement with station observations. Compared with CMFD 2.0, a widely used multi-source reanalysis product in China, the reconstructed dataset provides improved spatial resolution and slightly enhanced accuracy at these alpine sites. These results suggest the potential of the dataset to support regional-scale hydrometeorological studies in cold and topographically complex environments.
4.3.2 Sunshine duration
To comprehensively evaluate the accuracy of the reconstructed product, two representative benchmark datasets were employed: the homogenized station-based SSD product (2°) to assess long-term temporal consistency, and the high-resolution satellite-based Himawari SD product (5 km) to examine spatial performance. In addition, daily sunshine duration observations from 95 CMA stations were used as independent references, since the supplementary stations presented in Sect. 2.2 did not provide sunshine duration records.
As shown in Fig. 8, when compared with the SSD dataset over 1961–2019, the reconstructed product demonstrated highly consistent accuracy. The median RMSE values were identical for both products (1.48 h), and the median CC values were likewise identical (0.93). The ME differed only slightly (0.05 h for the reconstructed dataset and 0.02 h for SSD), indicating comparable bias levels. Boxplot analysis further indicated that the reconstructed product exhibited slightly narrower interquartile ranges, whereas the SSD dataset showed fewer outliers in RMSE and CC. It should be noted that although some of the 95 CMA validation stations may have been included in the SSD development, our reconstruction model excluded these stations from training, ensuring a higher degree of validation independence.
Figure 8Boxplot comparison of RMSE, ME, and CC for sunshine duration between SSD (2.0°) and the reconstructed dataset developed in this study (1 km) from 1961 to 2019.
For spatial performance, the reconstructed dataset was compared with the Himawari SD dataset over the overlapping period of 2016–2019 (Fig. 9). The evaluation was based on 91 stations, since three of the 95 validation stations had invalid sunshine duration values during this period and one station was located within the SD control region. Both products showed comparable RMSE levels (1.53 h for the reconstructed dataset compared with 1.48 h for Himawari). The satellite dataset achieved a slightly higher CC (0.94 compared with 0.92), reflecting stronger agreement in daily variations, while the reconstructed dataset exhibited a smaller ME (0.08 h compared with 0.21 h), indicating reduced bias.
Figure 9Boxplot comparison of RMSE, ME, and CC for sunshine duration between the Himawari AHI–based SD dataset (5 km) and the reconstructed dataset developed in this study (1 km) from 2016 to 2019.
These complementary results indicate that the reconstruction framework can achieve accuracy comparable to both a long-term homogenized station-based dataset and a high-resolution satellite-derived dataset.
4.4 Influence of elevation mismatch on validation accuracy
In certain areas of the MSPZ northwest China region – particularly in Tibet and Xinjiang – the validation metrics presented in Sect. 4.2 indicate relatively lower performance. To examine whether this discrepancy is related to spatial inconsistencies between meteorological station elevations and those of the corresponding grid cells, we analyzed elevation differences using the 1 km DEM. Specifically, elevation mismatch was calculated as the difference between the recorded elevation of the 146 validation stations and the DEM-derived elevation of their corresponding grid cells, as shown in Fig. 10. A total of 36 stations were identified where the elevation difference exceeded 50 m, marked with red numbered symbols in Fig. 10a. These stations are primarily located in high-relief regions, and while not all lie within the Plateau Climate Zone, that zone exhibits the largest elevation mismatches. Figure 10b ranks these stations by descending elevation difference, with the maximum discrepancy of 591 m observed at Station 1 (DWR: Pangduo), followed by 323 m at Station 2 (CMA: Tianshan Daxigou) in Xinjiang. To assess the influence of elevation mismatch on validation accuracy, we used the actual longitude, latitude, and elevation of the 36 stations as inputs to the reconstruction module of the MLP-based framework. For each station, the long-term time series of six meteorological variables – average temperature, maximum temperature, minimum temperature, atmospheric pressure, relative humidity, and sunshine duration – were estimated. RMSE, ME, and CC values were then calculated by comparing these station-based estimates with the corresponding in-situ observations, and further compared with the original grid-based validation results.
Figure 10Elevation differences between station elevations and corresponding DEM grid values: (a) spatial distribution, where red numbers denote station IDs with differences greater than 50 m; (b) point-line plot showing absolute elevation differences as a function of station ID.
Figure 11 summarizes the key findings. First, for average temperature, maximum temperature, minimum temperature, and atmospheric pressure, the RMSE and ME between in-situ observations and station-based estimates show substantially greater improvement than those derived from gridded estimates. Notably, the magnitude of improvement increases with larger absolute elevation differences. While relative humidity and sunshine duration also exhibit improvements, the extent is considerably smaller. In contrast, the CCs show modest increases across variables, though the improvement is less pronounced than that observed in error metrics. These results confirm that the MLP-based reconstruction framework yields more accurate estimates than the grid-based approach discussed in Sect. 4.2, particularly in high-altitude and topographically complex regions.
Figure 11Comparison of dotted line plots for RMSE, ME, and CC between in-situ data and station-based estimates, as well as between in-situ data and gridded data.
These findings also highlight potential limitations in using in-situ station data to validate gridded meteorological products – especially in regions with coarse spatial resolution or substantial terrain variability. As grid size increases, spatial mismatches between stations and grid cell averages (in terms of latitude, longitude, and elevation) become more pronounced. Even at 1 km resolution, notable elevation mismatches were observed in high-altitude areas. For variables highly sensitive to elevation and geographic location – such as air temperature and atmospheric pressure – relying on a single station to represent an entire grid cell can introduce significant uncertainty in complex terrain.
4.5 Spatial distribution of meteorological elements in China at 1 km resolution
To evaluate the spatial performance and climatic representativeness of the reconstructed dataset, we analyzed the long-term mean values of six meteorological variables at a spatial resolution of 1 km across mainland China from 1961 to 2019. The spatial distributions show strong consistency with known climatic gradients and topographic variations, reflecting the combined effects of latitude, elevation, and oceanic influence on regional meteorological conditions, as illustrated in Fig. 12. Temperature exhibits clear spatial variation governed by both latitude and elevation. The Northern Temperate Zone and the Plateau Climate Zone record the lowest values, with annual mean, maximum, and minimum temperatures of −3.8, 4.3, and −11.0 °C in the Northern Temperate Zone, and −1.7, 6.2, and −8.3 °C in the Plateau Climate Zone. In contrast, the Subtropical Zone records 16.1, 21.3, and 12.5 °C, while the Tropical Zone reaches 24.2, 28.9, and 21.1 °C, respectively. Atmospheric pressure strongly reflects elevation differences. While most zones maintain annual mean values above 900 hPa, the Plateau Climate Zone shows a significantly lower pressure of approximately 608 hPa. Relative humidity decreases from southeast to northwest, shaped by maritime influence and topographic relief. The Tropical and Subtropical coastal zones record the highest annual mean values of 83 % and 78 %, respectively. The Northern Temperate Zone reaches 70 %, while interior zones, including the Middle Temperate and Plateau Climate Zones, record lower values of approximately 55 %. Sunshine duration shows an inverse pattern relative to humidity and cloudiness. The longest annual average sunshine durations are observed in the Qinghai–Tibet Plateau and the Middle Temperate Zone in Xinjiang and Inner Mongolia, with 8.0 and 7.8 h per day, respectively. In contrast, the Subtropical coastal zone receives only about 4.6 h due to persistent cloud cover and high moisture levels.
Figure 12Annual spatial distribution of 6 meteorological elements in China from 1961 to 2019 based on daily reconstructed products.
The reconstructed spatial patterns show strong agreement with China's climatic zonation and physiographic structure, demonstrating that the dataset reliably captures the spatial distribution of key climate-controlling factors, including elevation, latitude, and terrain complexity. This consistency highlights the physical soundness and regional adaptability of the reconstruction framework, which is informed by topographic features rather than relying solely on spatial proximity. The dataset thereby offers robust support for regional-scale analyses in hydrology, meteorology, and ecology, especially in contexts where high spatial resolution and internal data consistency are required.
The 1 km daily dataset of near-surface meteorological variables over mainland China includes air temperature (average, maximum, and minimum) for the period 1961–2021, and atmospheric pressure, relative humidity, and sunshine duration for the period 1961–2019. The dataset is expected to undergo ongoing maintenance and temporal extension contingent on the availability of new observational data. The GeoTIFF-formatted output files at 1 km spatial resolution are freely accessible at https://doi.org/10.11888/Atmos.tpdc.301341 (Zhao et al., 2024).
This study presents a nationwide, high-resolution dataset of six daily near-surface meteorological variables – average, maximum, and minimum temperature, atmospheric pressure, relative humidity, and sunshine duration – reconstructed at 1 km spatial resolution over mainland China for the period 1961–2019 (1961–2021 for air temperature). Instead of relying on traditional spatial interpolation, the reconstruction framework models nonlinear relationships between meteorological variables and topographic predictors – such as elevation, latitude, and longitude – enabling physically informed estimation across a wide range of climatic and geographic conditions.
Validation using 146 independent meteorological stations demonstrates that the dataset achieves consistently high accuracy across all variables. For average, maximum, and minimum temperature, the median RMSEs are 1.16, 1.19, and 1.29 °C, respectively; the corresponding median MEs are approximately −0.04, −0.10, and −0.01 °C, with correlation coefficients equal to 0.99. Atmospheric pressure shows similarly strong performance, with a median RMSE of 2.65 hPa, a median ME of −0.06 hPa, and a correlation coefficient of 0.97. Relative humidity and sunshine duration also perform reliably, with median RMSEs of 6.33 % and 1.48 h, MEs of −0.52 % and 0.05 h, and correlation coefficients of 0.90 and 0.93, respectively. Further comparison reveals that station-to-grid validation underestimates the true accuracy of gridded products, particularly in topographically complex regions where elevation mismatches distort point-to-grid comparisons. In such areas, model estimates based on exact station coordinates consistently yield better validation metrics than those derived from station-to-grid comparisons, especially for elevation-sensitive variables.
The comparative evaluation against existing gridded products further confirms the quality and robustness of the reconstructed dataset, while complementing existing benchmark products with enhanced spatial resolution (1 km), particularly suited for heterogeneous environments. For average temperature, atmospheric pressure, and relative humidity, the reconstructed product exhibits consistently lower RMSE and ME than CMFD 2.0 at independent validation stations, with particularly substantial error reduction observed for atmospheric pressure. In the comparison of sunshine duration, the reconstructed dataset achieves temporal accuracy nearly identical to the homogenized, long-term station-based SSD product and spatial accuracy comparable to the recent, high-resolution satellite-based Himawari SD dataset, while further reducing systematic bias, thereby providing a more balanced and reliable benchmark across both temporal and spatial scales.
In addition to its high overall accuracy, the dataset demonstrates stable spatial performance across China's major climatic zones. Temperature and pressure variables maintain low RMSEs and strong correlations in both humid southeastern and arid northwestern regions, with most temperature RMSEs, MEs, and CCs falling within the ranges of 0.49 to 2, −2 to 2 °C, and 0.95 to 1.00, respectively. Relative humidity and sunshine duration show limited spatial variability, with only a few isolated stations displaying notable deviations. Even in data-sparse regions like Taiwan – excluded from model training – the reconstructed temperature fields align reasonably well with in-situ observations, indicating the dataset's spatial robustness beyond the training domain.
The dataset provides spatially continuous, temporally complete, and variable-accurate daily meteorological records, supporting a wide range of regional-scale applications in hydrology, meteorology, and ecology.
KZ: methodology, conceptualization, formal analysis, visualization, and writing – original draft. DY and TQ: supervision, methodology, and writing – review and editing. CL and DP: software, formal analysis, and investigation. YS: data curation, visualization.
The contact author has declared that none of the authors has any competing interests.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
We would like to express our sincere gratitude to Alibaba Cloud for providing high-performance computing support. We also extend our thanks to the National Meteorological Information Center of the China Meteorological Administration for supplying the observed climate data.
This research has been supported by the National Natural Science Foundation of China (grant no. 52130907), the Special Project on Basic Scientific Research Funds of the China Institute of Water Resources and Hydropower Research (grant no. JZ110145B0032025), the Postdoctoral Fellowship Program of the China Postdoctoral Science Foundation (grant no. GZC20233116), and the Five Major Excellent Talent Programs of IWHR (grant no. WR0199A012021).
This paper was edited by Zihao Bian and reviewed by three anonymous referees.
Bisong, E.: The Multilayer Perceptron (MLP), in: Building Machine Learning and Deep Learning Models on Google Cloud Platform, Apress, Berkeley, CA, 401–405, https://doi.org/10.1007/978-1-4842-4470-8_31, 2019.
Brocca, L., Crow, W. T., Ciabatta, L., Massari, C., De Rosnay, P., Enenkel, M., Hahn, S., Amarnath, G., Camici, S., Tarpanelli, A., and Wagner, W.: A Review of the Applications of ASCAT Soil Moisture Products, IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing, 10, 2285–2306, https://doi.org/10.1109/JSTARS.2017.2651140, 2017.
Che, T., Li, X., Liu, S., Li, H., Xu, Z., Tan, J., Zhang, Y., Ren, Z., Xiao, L., Deng, J., Jin, R., Ma, M., Wang, J., and Yang, X.: Integrated hydrometeorological, snow and frozen-ground observations in the alpine region of the Heihe River Basin, China, Earth Syst. Sci. Data, 11, 1483–1499, https://doi.org/10.5194/essd-11-1483-2019, 2019.
Choubin, B., Khalighi-Sigaroodi, S., Malekian, A., and Kişi, Ö.: Multiple linear regression, multi-layer perceptron network and adaptive neuro-fuzzy inference system for forecasting precipitation based on large-scale climate signals, Hydrological Sciences Journal, 61, 1001–1009, https://doi.org/10.1080/02626667.2014.966721, 2016.
Desai, M. and Shah, M.: An anatomization on breast cancer detection and diagnosis employing multi-layer perceptron neural network (MLP) and Convolutional neural network (CNN), Clinical eHealth, 4, 1–11, https://doi.org/10.1016/j.ceh.2020.11.002, 2021.
Duan, J.: Financial system modeling using deep neural networks (DNNs) for effective risk assessment and prediction, Journal of the Franklin Institute, 356, 4716–4731, https://doi.org/10.1016/j.jfranklin.2019.01.046, 2019.
Fu, Y., Ma, Y., Zhong, L., Yang, Y., Guo, X., Wang, C., Xu, X., Yang, K., Xu, X., Liu, L., Fan, G., Li, Y., and Wang, D.: Land-surface processes and summer-cloud-precipitation characteristics in the Tibetan Plateau and their effects on downstream weather: a review and perspective, National Science Review, 7, 500–515, https://doi.org/10.1093/nsr/nwz226, 2020.
Gao, H.: Meteorological observation data of the Xiying River on the east section of the Qilian Mountains (2006–2010), National Tibetan Plateau/Third Pole Environment Data Center [data set], https://doi.org/10.11888/AtmosphericPhysics.tpe.5.db, 2018.
Gao, Y. C. and Liu, M. F.: Evaluation of high-resolution satellite precipitation products using rain gauge observations over the Tibetan Plateau, Hydrol. Earth Syst. Sci., 17, 837–849, https://doi.org/10.5194/hess-17-837-2013, 2013.
Hartmann, D. L.: Global physical climatology, 2nd edn., Elsevier, Amsterdam; Boston, 485 pp., ISBN 978-0-12-328531-7, 2016.
He, J., Yang, K., Tang, W., Lu, H., Qin, J., Chen, Y., and Li, X.: The first high-resolution meteorological forcing dataset for land process studies over China, Sci Data, 7, 25, https://doi.org/10.1038/s41597-020-0369-y, 2020.
He, J., Yang, K., Li, X., Tang, W., Shao, C., Jiang, Y., and Ding, B.: China meteorological forcing dataset v2.0 (1951–2020), National Tibetan Plateau/Third Pole Environment Data Center [data set], https://doi.org/10.11888/Atmos.tpdc.302088, 2024.
He, Q., Wang, M., Liu, K., Li, K., and Jiang, Z.: GPRChinaTemp1km: a high-resolution monthly air temperature data set for China (1951–2020) based on machine learning, Earth Syst. Sci. Data, 14, 3273–3292, https://doi.org/10.5194/essd-14-3273-2022, 2022.
He, Y.: Homogenized daily sunshine duration at 2° × 2° over China from 1961 to 2022, National Tibetan Plateau/Third Pole Environment Data Center [data set], https://doi.org/10.11888/Atmos.tpdc.301478, 2024.
He, Y., Wang, K., Yang, K., Zhou, C., Shao, C., and Yin, C.: Homogenized daily sunshine duration over China from 1961 to 2022, Earth Syst. Sci. Data, 17, 1595–1611, https://doi.org/10.5194/essd-17-1595-2025, 2025.
Hong, Z., Han, Z., Li, X., Long, D., Tang, G., and Wang, J.: Generation of an improved precipitation data set from multisource information over the Tibetan Plateau, Journal of Hydrometeorology, https://doi.org/10.1175/JHM-D-20-0252.1, 2021.
IPCC: Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change, edited by: Masson-Delmotte, V., Zhai, P., Pirani, A., Connors, S. L., Péan, C., Berger, S., Caud, N., Chen, Y., Goldfarb, L., Gomis, M. I., Huang, M., Leitzell, K., Lonnoy, E., Matthews, J. B. R., Maycock, T. K., Waterfield, T., Yelekçi, O., Yu, R., and Zhou, B., Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, 2391 pp., https://doi.org/10.1017/9781009157896, 2021.
Jarvis, A., Reuter, H., Nelson, A., and Guevara, E.: Hole-filled seamless SRTM data v4, International Centre for Tropical Agriculture (CIAT), http://srtm.csi.cgiar.org (last access: 25 November 2025), 2008.
Jing, Y., Lin, L., Li, X., Li, T., and Shen, H.: An attention mechanism based convolutional network for satellite precipitation downscaling over China, Journal of Hydrology, 613, 128388, https://doi.org/10.1016/j.jhydrol.2022.128388, 2022.
Joyce, R. J., Janowiak, J. E., Arkin, P. A., and Xie, P.: CMORPH: A Method that Produces Global Precipitation Estimates from Passive Microwave and Infrared Data at High Spatial and Temporal Resolution, J. Hydrometeorol., 5, 487–503, https://doi.org/10.1175/1525-7541(2004)005<0487:CAMTPG>2.0.CO;2, 2004.
Karayilan, T. and Kilic, O.: Prediction of heart disease using neural network, in: 2017 International Conference on Computer Science and Engineering (UBMK), 2017 International Conference on Computer Science and Engineering (UBMK), Antalya, 719–723, https://doi.org/10.1109/UBMK.2017.8093512, 2017.
Khaki, M., Hendricks Franssen, H.-J., and Han, S. C.: Multi-mission satellite remote sensing data for improving land hydrological models via data assimilation, Sci. Rep., 10, 18791, https://doi.org/10.1038/s41598-020-75710-5, 2020.
King, M. D., Menzel, W. P., Kaufman, Y. J., Tanre, D., Bo-Cai Gao, Platnick, S., Ackerman, S. A., Remer, L. A., Pincus, R., and Hubanks, P. A.: Cloud and aerosol properties, precipitable water, and profiles of temperature and water vapor from MODIS, IEEE Trans. Geosci. Remote Sensing, 41, 442–458, https://doi.org/10.1109/TGRS.2002.808226, 2003.
Laiolo, P., Gabellani, S., Campo, L., Cenci, L., Silvestro, F., Delogu, F., Boni, G., Rudari, R., Puca, S., and Pisani, A. R.: Assimilation of remote sensing observations into a continuous distributed hydrological model: Impacts on the hydrologic cycle, in: 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), IGARSS 2015 – 2015 IEEE International Geoscience and Remote Sensing Symposium, Milan, Italy, 1308–1311, https://doi.org/10.1109/IGARSS.2015.7326015, 2015.
Lettenmaier, D. P., Alsdorf, D., Dozier, J., Huffman, G. J., Pan, M., and Wood, E. F.: Inroads of remote sensing into hydrologic science during the WRR era, Water Resources Research, 51, 7309–7342, https://doi.org/10.1002/2015WR017616, 2015.
Li, J.: A review of spatial interpolation methods for environmental scientists, Geoscience Australia, Canberra, https://d28rz98at9flks.cloudfront.net/68229/Rec2008_023.pdf (last access: 25 November 2025), 2008.
Li, J. and Heap, A. D.: A review of comparative studies of spatial interpolation methods in environmental sciences: Performance and impact factors, Ecological Informatics, 6, 228–241, https://doi.org/10.1016/j.ecoinf.2010.12.003, 2011.
Li, L. and Zha, Y.: Mapping relative humidity, average and extreme temperature in hot summer over China, Science of The Total Environment, 615, 875–881, https://doi.org/10.1016/j.scitotenv.2017.10.022, 2018.
Li, T., Zheng, X., Dai, Y., Yang, C., Chen, Z., Zhang, S., Wu, G., Wang, Z., Huang, C., Shen, Y., and Liao, R.: Mapping near-surface air temperature, pressure, relative humidity and wind speed over Mainland China with high spatiotemporal resolution, Adv. Atmos. Sci., 31, 1127–1135, https://doi.org/10.1007/s00376-014-3190-8, 2014.
Liu, J., Shi, C., Sun, S., Liang, J., and Yang, Z.-L.: Improving Land Surface Hydrological Simulations in China Using CLDAS Meteorological Forcing Data, J. Meteorol. Res., 33, 1194–1206, https://doi.org/10.1007/s13351-019-9067-0, 2019.
Liu, N., Yan, Z., Tong, X., Jiang, J., Li, H., Xia, J., Lou, X., Ren, R., and Fang, Y.: Meshless Surface Wind Speed Field Reconstruction Based on Machine Learning, Adv. Atmos. Sci., 39, 1721–1733, https://doi.org/10.1007/s00376-022-1343-8, 2022.
Liu, S. M., Li, X., Xu, Z. W., Che, T., Xiao, Q., Ma, M. G., Liu, Q. H., Jin, R., Guo, J. W., Wang, L. X., Wang, W. Z., Qi, Y., Li, H. Y., Xu, T. R., Ran, Y. H., Hu, X. L., Shi, S. J., Zhu, Z. L., Tan, J. L., Zhang, Y., and Ren, Z. G.: The Heihe Integrated Observatory Network: A Basin-Scale Land Surface Processes Observatory in China, Vadose Zone J., 17, 180072, https://doi.org/10.2136/vzj2018.04.0072, 2018.
Luo, L.: Shergyla Mountain meteorological data (2005–2017), National Tibetan Plateau/Third Pole Environment Data Center [data set], https://doi.org/10.11888/AtmosphericPhysics.tpe.249395.db, 2019.
Luo, L. and Zhu, L.: Meteorological observation data of the comprehensive observation and research station of alpine environment in Southeast Tibet (2017–2018), National Tibetan Plateau/Third Pole Environment Data Center [data set], https://doi.org/10.11888/Meteoro.tpdc.270313, 2020.
Ma, Y.: Meteorological observation data from Qomolangma Station for atmospheric and environmental research (2005–2016), National Tibetan Plateau/Third Pole Environment Data Center [data set], https://doi.org/10.11888/AtmosEnviron.tpe.0000014.file, 2018.
Ma, Y., Xie, Z., Chen, Y., Liu, S., Che, T., Xu, Z., Shang, L., He, X., Meng, X., Ma, W., Xu, B., Zhao, H., Wang, J., Wu, G., and Li, X.: Dataset of spatially extensive long-term quality-assured land–atmosphere interactions over the Tibetan Plateau, Earth Syst. Sci. Data, 16, 3017–3043, https://doi.org/10.5194/essd-16-3017-2024, 2024.
Martin, R. V.: Satellite remote sensing of surface air quality, Atmospheric Environment, 42, 7823–7843, https://doi.org/10.1016/j.atmosenv.2008.07.018, 2008.
Mason, J. A., Muller, P. O., Burt, J. E., and De Blij, H. J.: Physical geography: the global environment, Oxford University Press, ISBN 9780190246860, 2016.
Meng, X. and Li, Z.: Zoige Plateau Wetland Ecosystem Research Station meteorological dataset (2019–2022), National Tibetan Plateau/Third Pole Environment Data Center [data set], https://doi.org/10.11888/Atmos.tpdc.300548, 2023.
Pang, B., Yue, J., Zhao, G., and Xu, Z.: Statistical Downscaling of Temperature with the Random Forest Model, Advances in Meteorology, 2017, 1–11, https://doi.org/10.1155/2017/7265178, 2017.
Peixoto, J. P. and Oort, A. H.: Physics of climate, New York, NY (United States); American Institute of Physics, United States, ISBN 9780883187128, 1992.
Peng, S., Ding, Y., Liu, W., and Li, Z.: 1 km monthly temperature and precipitation dataset for China from 1901 to 2017, Earth Syst. Sci. Data, 11, 1931–1946, https://doi.org/10.5194/essd-11-1931-2019, 2019.
Qi, W., Liu, J., and Chen, D.: Evaluations and Improvements of GLDAS2.0 and GLDAS2.1 Forcing Data's Applicability for Basin Scale Hydrological Simulations in the Tibetan Plateau, J. Geophys. Res.-Atmos., 123, https://doi.org/10.1029/2018JD029116, 2018.
Ren, T., Liu, X., Niu, J., Lei, X., and Zhang, Z.: Real-time water level prediction of cascaded channels based on multilayer perception and recurrent neural network, Journal of Hydrology, 585, 124783, https://doi.org/10.1016/j.jhydrol.2020.124783, 2020.
Rodell, M., Houser, P. R., Jambor, U., Gottschalck, J., Mitchell, K., Meng, C.-J., Arsenault, K., Cosgrove, B., Radakovich, J., Bosilovich, M., Entin, J. K., Walker, J. P., Lohmann, D., and Toll, D.: The Global Land Data Assimilation System, B. Am. Meteorol. Soc., 85, 381–394, https://doi.org/10.1175/BAMS-85-3-381, 2004.
Sadeghi, M., Asanjan, A. A., Faridzad, M., Nguyen, P., Hsu, K., Sorooshian, S., and Braithwaite, D.: PERSIANN-CNN: Precipitation Estimation from Remotely Sensed Information Using Artificial Neural Networks–Convolutional Neural Networks, Journal of Hydrometeorology, 20, 2273–2289, https://doi.org/10.1175/JHM-D-19-0110.1, 2019.
Sheffield, J., Wood, E. F., Pan, M., Beck, H., Coccia, G., Serrat-Capdevila, A., and Verbist, K.: Satellite Remote Sensing for Water Resources Management: Potential for Supporting Sustainable Development in Data-Poor Regions, Water Resources Research, 54, 9724–9758, https://doi.org/10.1029/2017WR022437, 2018.
Singh, A., Imtiyaz, M., Isaac, R. K., and Denis, D. M.: Comparison of soil and water assessment tool (SWAT) and multilayer perceptron (MLP) artificial neural network for predicting sediment yield in the Nagwa agricultural watershed in Jharkhand, India, Agricultural Water Management, 104, 113–120, https://doi.org/10.1016/j.agwat.2011.12.005, 2012.
Singh, V. P.: Hydrologic modeling: progress and future directions, Geosci. Lett., 5, 15, https://doi.org/10.1186/s40562-018-0113-z, 2018.
Sorooshian, S., Hsu, K., Braithwaite, D., Ashouri, H., and NOAA CDR Program: NOAA Climate Data Record (CDR) of Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks (PERSIANN-CDR), Version 1 Revision 1, https://doi.org/10.7289/V51V5BWQ, 2014.
Sun, H. and Su, F.: Precipitation correction and reconstruction for streamflow simulation based on 262 rain gauges in the upper Brahmaputra of southern Tibetan Plateau, Journal of Hydrology, 590, 125484, https://doi.org/10.1016/j.jhydrol.2020.125484, 2020.
Tang, G., Ma, Y., Long, D., Zhong, L., and Hong, Y.: Evaluation of GPM Day-1 IMERG and TMPA Version-7 legacy products over Mainland China at multiple spatiotemporal scales, Journal of Hydrology, 533, 152–167, https://doi.org/10.1016/j.jhydrol.2015.12.008, 2016.
Wallace, J. M. and Hobbs, P. V.: Atmospheric science: an introductory survey, Elsevier, ISBN 9780127329512, 2006.
Wang, J. and Wu, G.: Meteorological observation data of Namuco multi-circle comprehensive observation and research station (2017–2018), National Tibetan Plateau/Third Pole Environment Data Center [data set], https://doi.org/10.11888/AtmosphericPhysics.tpe.5.db, 2019.
Wang, W., Cui, W., Wang, X., and Chen, X.: Evaluation of GLDAS-1 and GLDAS-2 Forcing Data and Noah Model Simulations over China at the Monthly Scale, Journal of Hydrometeorology, 17, 2815–2833, https://doi.org/10.1175/JHM-D-15-0191.1, 2016.
Wang, Y., Yang, H., Yang, D., Qin, Y., Gao, B., and Cong, Z.: Spatial Interpolation of Daily Precipitation in a High Mountainous Watershed Based on Gauge Observations and a Regional Climate Model Simulation, Journal of Hydrometeorology, 18, 845–862, https://doi.org/10.1175/JHM-D-16-0089.1, 2017.
Weytjens, H., Lohmann, E., and Kleinsteuber, M.: Cash flow prediction: MLP and LSTM compared to ARIMA and Prophet, Electron. Commer. Res., 21, 371–391, https://doi.org/10.1007/s10660-019-09362-7, 2021.
World Meteorological Organization (WMO): Guide to Meteorological Instruments and Methods of Observation, World Meteorological Organization, Geneva, https://www.weather.gov/media/epz/mesonet/CWOP-WMO8.pdf (last access: 25/11/2025), 2023.
Wu, H., Yang, Q., Liu, J., and Wang, G.: A spatiotemporal deep fusion model for merging satellite and gauge precipitation in China, Journal of Hydrology, 584, 124664, https://doi.org/10.1016/j.jhydrol.2020.124664, 2020.
Wunch, D., Wennberg, P. O., Osterman, G., Fisher, B., Naylor, B., Roehl, C. M., O'Dell, C., Mandrake, L., Viatte, C., Kiel, M., Griffith, D. W. T., Deutscher, N. M., Velazco, V. A., Notholt, J., Warneke, T., Petri, C., De Maziere, M., Sha, M. K., Sussmann, R., Rettinger, M., Pollard, D., Robinson, J., Morino, I., Uchino, O., Hase, F., Blumenstock, T., Feist, D. G., Arnold, S. G., Strong, K., Mendonca, J., Kivi, R., Heikkinen, P., Iraci, L., Podolske, J., Hillyard, P. W., Kawakami, S., Dubey, M. K., Parker, H. A., Sepulveda, E., García, O. E., Te, Y., Jeseck, P., Gunson, M. R., Crisp, D., and Eldering, A.: Comparisons of the Orbiting Carbon Observatory-2 (OCO-2) XCO2 measurements with TCCON, Atmos. Meas. Tech., 10, 2209–2238, https://doi.org/10.5194/amt-10-2209-2017, 2017.
Xie, P., Joyce, R., Wu, S., Yoo, S.-H., Yarosh, Y., Sun, F., and Lin, R.: Reprocessed, Bias-Corrected CMORPH Global High-Resolution Precipitation Estimates from 1998, Journal of Hydrometeorology, 18, 1617–1641, https://doi.org/10.1175/JHM-D-16-0168.1, 2017.
Yang, J., Gong, P., Fu, R., Zhang, M., Chen, J., Liang, S., Xu, B., Shi, J., and Dickinson, R.: The role of satellite remote sensing in climate change studies, Nature Clim. Change, 3, 875–883, https://doi.org/10.1038/nclimate1908, 2013.
Yang, R. and Xing, B.: A Comparison of the Performance of Different Interpolation Methods in Replicating Rainfall Magnitudes under Different Climatic Conditions in Chongqing Province (China), Atmosphere, 12, 1318, https://doi.org/10.3390/atmos12101318, 2021.
Yu, W., Nan, Z., Wang, Z., Chen, H., Wu, T., and Zhao, L.: An Effective Interpolation Method for MODIS Land Surface Temperature on the Qinghai–Tibet Plateau, IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing, 8, 4539–4550, https://doi.org/10.1109/JSTARS.2015.2464094, 2015.
Zeng, Y., Hao, D., Huete, A., Dechant, B., Berry, J., Chen, J. M., Joiner, J., Frankenberg, C., Bond-Lamberty, B., Ryu, Y., Xiao, J., Asrar, G. R., and Chen, M.: Optical vegetation indices for monitoring terrestrial ecosystems globally, Nat. Rev. Earth Environ., 3, 477–493, https://doi.org/10.1038/s43017-022-00298-5, 2022.
Zhang, H., Immerzeel, W. W., Zhang, F., De Kok, R. J., Gorrie, S. J., and Ye, M.: Creating 1-km long-term (1980–2014) daily average air temperatures over the Tibetan Plateau by integrating eight types of reanalysis and land data assimilation products downscaled with MODIS-estimated temperature lapse rates based on machine learning, Int. J. Appl. Earth Obs., 97, 102295, https://doi.org/10.1016/j.jag.2021.102295, 2021.
Zhang, X., Yang, Y., Gao, H., Xu, S., Feng, J., and Qin, T.: Land Cover Changes and Driving Factors in the Source Regions of the Yangtze and Yellow Rivers over the Past 40 Years, Land, 13, 259, https://doi.org/10.3390/land13020259, 2024.
Zhao, K., Peng, D., Gu, Y., Luo, X., Pang, B., and Zhu, Z.: Temperature lapse rate estimation and snowmelt runoff simulation in a high-altitude basin, Sci. Rep., 12, 13638, https://doi.org/10.1038/s41598-022-18047-5, 2022a.
Zhao, K., Peng, D., Gu, Y., Pang, B., and Zhu, Z.: Daily precipitation dataset at 0.1° for the Yarlung Zangbo River basin from 2001 to 2015, Sci. Data, 9, 349, https://doi.org/10.1038/s41597-022-01471-7, 2022b.
Zhao, K., Yan, D., Qin, T., Li, C., Peng, D., and Song, Y.: China's 1km Daily Reconstructed Product of Six Meteorological Elements (1961–2021). National Tibetan Plateau/Third Pole Environment Data Center, https://doi.org/10.11888/Atmos.tpdc.301341, https://cstr.cn/18406.11.Atmos.tpdc.301341, 2024.
Zhang, Y.: Meteorological observation data of Kunsha Glacier (2015–2017), National Tibetan Plateau/Third Pole Environment Data Center [data set], https://doi.org/10.11888/Meteoro.tpdc.270086, 2018a.
Zhang, Y.: Meteorological observation dataset of Shiquan River Source (2012–2015), National Tibetan Plateau/Third Pole Environment Data Center [data set], https://doi.org/10.11888/Meteoro.tpdc.270548, 2018b.
Zhang, Z., Fang, S., and Han, J.: A daily sunshine duration (SD) dataset in China from Himawari AHI imagery (2016–2023), Earth Syst. Sci. Data, 17, 1427–1439, https://doi.org/10.5194/essd-17-1427-2025, 2025.
Zhou, P., Tang, J., Ma, M., Ji, D., and Shi, J.: High resolution Tibetan Plateau regional reanalysis 1961–present, Sci Data, 11, 444, https://doi.org/10.1038/s41597-024-03282-4, 2024.