HRLT: a high-resolution (1 d, 1 km) and long-term (1961–2019) gridded dataset for surface temperature and precipitation across China

. Accurate long-term temperature and precipitation estimates at high spatial and temporal resolutions are vital for a wide variety of climatological studies. We have produced a new, publicly available, daily, gridded maximum temperature, minimum temperature, and precipitation dataset for China with a high spatial resolution of 1 km that covers a long-term period (1961 to 2019). It has been named the HRLT, and the dataset is publicly available at https://doi.org/10.1594/PANGAEA.941329 (Qin and Zhang, 2022). In this study, the daily gridded data were interpolated using comprehensive statistical analyses, which included machine learning methods, the generalized additive model, and thin plate splines. It was based on the 0.5 ◦ × 0.5 ◦ gridded dataset from the China Meteorological Administration, together with covariates for elevation, aspect, slope, topographic wetness index, latitude, and longitude. The accuracy of the HRLT daily dataset was assessed using observation data from meteorological stations across China. The maximum and minimum temperature estimates were more accurate than the precipitation estimates. For maximum temperature, the mean absolute error (MAE), root mean square error (RMSE), Pearson’s correlation coefﬁcient (Cor), coefﬁcient of determination after adjustment ( R 2 ), and Nash– Sutcliffe modeling efﬁciency (NSE) were 1.07 ◦ C, 1.62 ◦ C, 0.99, 0.98, and 0.98, respectively. For minimum temperature, the MAE, RMSE, Cor, R 2 , and NSE were 1.08 ◦ C, 1.53 ◦ C, 0.99, 0.99, and 0.99, respectively. For precipitation, the MAE, RMSE, Cor, R 2 , and NSE were 1.30 mm, 4.78 mm, 0.84, 0.71, and 0.70, respectively. The accuracy of the HRLT was compared to those of three other existing datasets, and its accuracy was either greater than the others, especially for precipitation, or comparable in accuracy, but with higher spatial resolution or over a longer time period. In summary, the HRLT dataset, which has a high spatial resolution, covers a longer period of time and has reliable accuracy.


Introduction
Climate change has led to an increase in the frequency and severity of extreme temperature and precipitation events (Myhre et al., 2019), and these events have affected vegetation growth , especially crop growth (Rao et al., 2015;Lu et al., 2018;Lobell et al., 2011;Lesk et al., 2016). Thus, long-term and accurate daily maximum temperature, minimum temperature, and precipitation data are important when attempting to reveal the mechanism underlying the effects of extreme climate on plants, for predicting disasters (such as drought, frost, and floods), and for agricultural and forestry management. Although the meteorological observation network makes better use of the data from meteorological stations (Merino et al., 2014;, there is a tradeoff between large spatial scale and the high density of stations in the meteorological observation network. Moreover, the installation and maintenance of meteorological stations are challenging in harsh areas (Hartl et al., 2020). Daily and gridded meteorological datasets are also essential inputs for many models related to terrestrial, hydrological, and ecological systems (Iizumi et al., 2017; Published by Copernicus Publications. 4794 R. Qin et al.: HRLT: A high-resolution and long-term gridded dataset for surface temperature and precipitation Wang et al., 2018;Zhang et al., 2018;Lee et al., 2019). Highresolution, long-term, and accurate gridded datasets can help improve the performance of these models.
Researchers have previously used interpolation methods, such as inverse distance weighting, kriging, and regression analysis, to produce gridded meteorological data (Brinckmann et al., 2016;Herrera et al., 2019;Schamm et al., 2014). However, the accuracy of these interpolation results is limited by the density of the meteorological stations. In recent years, artificial intelligence has been gradually and widely applied to meteorological data estimation, as have machine learning methods such as random forest (Chen et al., 2021;Sekulić et al., 2021), artificial neural networks (Sadeghi et al., 2021), and support vector machines (He et al., 2021) have been gradually and widely applied to meteorological data estimation. Therefore, comprehensive statistical analyses using machine learning and traditional interpolation, such as thinplate-smoothing splines, are feasible and reliable methods that can be used to estimate meteorological data.
At present, only a few research institutes in China are developing meteorological datasets for temperature and precipitation with high spatial and temporal resolutions. Among them, Beijing Normal University has produced meteorological datasets for 1958-2010 with a resolution of 1 km, but the latest data are not available (Li et al., 2014). The China Meteorological Administration is also developing the CMA Land Data Assimilation System product (Shi et al., 2011), and Tsinghua University has published a driving dataset from 1979 to 2018 with a resolution of 0.1 • over China (He et al., 2020).
We present a new high-resolution daily gridded maximum temperature, minimum temperature, and precipitation dataset for China (HRLT) with a spatial resolution of 1 × 1 km for the period 1961 to 2019. We created the HRLT dataset using comprehensive statistical analyses, which included machine learning, the generalized additive model, and thin plate splines. It uses the 0.5 • × 0.5 • gridded dataset from the China Meteorological Administration (CMA) as input data together with other covariates, including elevation, aspect, slope, topographic wetness index (TWI), latitude, and longitude. The dataset was created in three steps: (1) preparation of input data and covariates; (2) the creation of the gridded dataset using comprehensive statistical analyses; and (3) an evaluation of the accuracy of the gridded dataset and an accuracy comparison with three other existing products that use meteorological station data.

The CMA dataset and meteorological station data
The CMA dataset, which includes the daily surface temperature 0.5 • × 0.5 • gridded dataset and the daily precipitation 0.5 • × 0.5 • gridded dataset for China (V2.0) (https:// data.cma.cn/, last access: 15 September 2022), was obtained from the China Meteorological Data Service Centre and was used as the basic input data. The researchers also reported a daily precipitation 0.5 • × 0.5 • gridded dataset for 1961-2010 from the CAM dataset (Zhao and Zhu, 2015). The daily dataset of surface climatological data for China (V3.0) (https://data.cma.cn/, last access: 15 September 2022), which includes 699 meteorological stations, was also obtained from the China Meteorological Data Service Centre and was used to evaluate the new dataset (Fig. 1).

Topographic data
The basic topographic data, including elevation, flow direction, and flow accumulation with 30 s (approximately 1 km) resolution, were obtained from the HydroSHEDS database. More detailed information can be found at these links: http:// www.worldwildlife.org/hydrosheds (last access: 15 September 2022) (for general information) and http://hydrosheds.cr. usgs.gov (last access: 15 September 2022) (for downloading data and for technical information). The "Aspect" and "Slope" options of the Spatial Analyst Tools in ArcGIS10.6 were used to calculate the aspect and slope. The specific catchment area (SCA) was calculated based on the flow direction and flow accumulation.
The TWI is formulated as follows: where TWI and SCA are the topographic wetness index and specific catchment area, respectively.

Other datasets
We used observed data from meteorological stations ( Fig. 1) to evaluate our dataset and the three existing daily datasets, and then the accuracies of the three existing daily datasets were compared to that of our dataset.  from the CMFD were also used for evaluation and comparison.

The input data and covariates
In this study, the input data (dependent variable) was the daily 0.5 • × 0.5 • CMA dataset, which included the daily maximum temperature, minimum temperature and precipitation. Other covariates (independent variables) included elevation, aspect, slope, TWI (with a spatial resolution of 1 km), latitude, and longitude.

The interpolation scheme
As shown in Fig. 2, different combinations of six algorithms -boosted regression trees (BRT), random forests (RF), neural network (NN), multivariate adaptive regression splines (MAR), support vector machines (SVM), and the generalized additive model (GAM) -were used to predict the input data. Firstly, through k-fold cross-validation (k = 10), the input data were randomly divided into 10 sub-training datasets and sub-testing datasets. Each algorithm ran in a loop through all the sub-training sets and calculated the residuals from the sub-testing sets. The residuals obtained in each loop were retained. The residual of each algorithm was assigned a weight of 0-1 and the residuals of all the algorithms were summed, and the ensemble of models with the lowest residual sum was chosen. After determining the best ensemble of models, the surface results were interpolated using the best ensemble of models, input data, and covariates. Thin-plate-smoothing splines (TPS) were used to correct the residual error from the ensemble of models. Therefore, residuals of the ensemble were calculated from the input data and these values were interpolated using TPS. Surface results from the ensemble were added to the residuals from the thin-plate-smoothing splines to get the surface results for the final model. The R 2 of the surface result for the ensemble was compared to that of the final model, and the surface result with the higher R 2 was retained.

The interpolation methods
We now introduce the individual algorithms (methods) and the implementations for model training (R packages and functions). After model training, the function "predict" in the R package "raster" was used to implement spatial interpolation for the BRT, RF, NN, MAR, SVM, and GAM models, and the function "interpolate" in the R package "raster" was used to perform spatial interpolation with TPS. More details on R packages and functions can be found on the the web (https://www.rdocumentation.org/, last access: 15 September 2022).

Figure 2.
The process of spatial interpolation. r1 to r6 are the residual errors from the algorithms, respectively. w1 to w6 are the weights of the algorithms, respectively. BRT, RF, NN, MAR, SVR, GAM, and TPS refer to boosted regression trees, random forests, neural network, multivariate adaptive regression splines, support vector machines, generalized additive model, and thin-plate-smoothing splines, respectively. R 2 is the coefficient of determination between the estimated and observed values. TWI is the topographic wetness index.

The BRT model
A powerful tool for exploratory regression analysis, BRT is a combination of two techniques: decision trees and the boosting method (Elith et al., 2008). BRT can automatically detect the best fit and is robust to missing values and outliers; therefore, BRT is now widely used in remote sensing and in species distribution and meteorological interpolation (Pouteau et al., 2011;Appelhans et al., 2015;Froeschke and Froeschke, 2011). There are two important parameters in BRT: (1) the tree complexity (TC), which controls the number of splits in each tree; (2) the learning rate (LR), which determines the contribution of each tree to the growth model (the smaller the value of LR, the larger the number of trees built). These two parameters together determine the number of trees required for the best prediction in order to find the combination of parameters that leads to the least prediction error. The function "gbm.step" in the R package "dismo" was used for BRT implementation. The tree complexity was set at 5 and the learning rate was set at 0.001. In addition, the "bag.fraction", which specifies the proportion of data to be selected at each step, was set at 0.5, and other parameters were set at their default values in "gbm.step".
R. Qin et al.: HRLT: A high-resolution and long-term gridded dataset for surface temperature and precipitation 4797

The RF model
Like BRT, the main technology of RF also includes decision trees; however, the way in which the data used to build the trees are selected is different (the boosting method for BRT, the bagging method for RF). For regression analysis, the bagging method, which takes a random subset of all the data for each new tree that is built, makes the final output based on the average of multiple trees (Breiman, 2001). As it is one of the most accurate algorithms, RF has been used widely for predicting spatiotemporal variables, such as temperature and precipitation (He et al., 2016;Mital et al., 2020;Webb et al., 2016). The function "randomForest" in the R package "randomForest" was used for RF implementation. The importance was set to TRUE and other parameters were set to their default values in "randomForest".

The NN model
A powerful set of tools for solving problems in pattern recognition, data processing, and nonlinear control (Bishop, 1994), an NN consists of a large number of nodes and connections and includes an input layer, a hidden layer, and an output layer (Lek and Guégan, 1999). Information from each node in the input layer is fed to the hidden layer. Connections between input layer nodes and hidden layer nodes can all be given specific weights according to their importance. The connection between the hidden layer and the output layer is also weighted, so the output is the result of the weighted sum of the hidden nodes. Information is transferred between the hidden layer and the output layer through the transfer function. Since the 1980s, NNs have been used in a number of fields, such as for the prediction of meteorological variables (Snell et al., 2000;Lek and Guégan, 1999;Tang et al., 2020). The function "nnet" in the R package "nnet" was used for NN implementation. The number of units in the hidden layer (size) was set to 10, the transfer function was linear for the output layer (linout was set to TRUE), the maximum number of iterations (maxit) was set to 10 000, and other parameters were set to their default values in "nnet".

The MAR model
MAR is an extension of the linear model that can build multiple linear regression models within the range of predictive variable values by partitioning data (Friedman, 1991;Friedman and Roosen, 1995). MAR consists of two steps: firstly, it creates a set of so-called basis functions. In this process, the range of predictive variable values is divided into several groups. For each group, a separate linear regression is modeled. Secondly, MAR estimates a least-squares model with its basis function as the independent variable. Overfitting is avoided by iterating to remove the basis functions that contribute least to the model fitting. MAR works well with a large number of predictor variables, it automatically detects interactions between variables, and it is robust to out-liers; therefore, studies have done on downscaling or predicting meteorological data using MAR (Panda et al., 2022;D. H. W. Li et al., 2019;Zawadzka et al., 2020). The function "earth" in the R package "earth" was used for MAR implementation. A linear model was used to estimate the standard deviation as a function of the predicted response (varmod.method = "lm"). nfold was set to 10, ncross was set to 30, and other parameters were set to their default values in "earth".

The SVM model
SVM is another machine learning supervised algorithm, and mainly deals with the ideas of classification and regression (Vapnik, 1999(Vapnik, , 1991Brereton and Lloyd, 2010). SVM is well supported by mathematical theory and can use kernel tricks to efficiently process nonlinear data. With the development of SVM, it has also been widely used in the regression and prediction of meteorological variables (Belaid and Mellit, 2016;Chen et al., 2010;Tripathi et al., 2006). In this study, the function "ksvm" in the R package "kernlab" was used for SVM implementation, and all parameters were set to their default values in "ksvm".

The GAM model
The GAM is an extension of the generalized linear model (GLM). Like the GLM, the GAM consists of three important components: the probability distribution of the dependent variable, the linear predictor, and the link function; however, in the GAM, the coefficient of the independent variable in the linear regression is replaced by a sum of smooth functions (Hastie and Tibshirani, 1990;Liu, 2008). Because the GAM can deal with nonlinear and nonmonotone relationships between dependent and independent variables, it has been used to predict and interpolate meteorological data (Hjort et al., 2016;Burnett and Anderson, 2019;Aalto et al., 2013). The function "gam" in the R package "mgcv" was used for GAM implementation, and all parameters were set to their default values in "gam".

The TPS method
A traditional interpolation method, TPS has been widely used to spatially interpolate surface climate data (Gong et al., 2022;Hancock and Hutchinson, 2006;Risk and James, 2022). In this study, it was used to correct the residual error from the ensemble of models. The function "Tps" in the R package "fields" was used for TPS implementation. The matrix of independent variables consisted of the latitude and longitude, the vector of dependent variables consisted of the residual errors in the above algorithms. Other parameters were set to their default values in "Tps" function.

The interpolation implementation
A complete operation was performed per day per variable, so there were 64 647 operations (21 549 d × 3 variables) from 1 January 1961 to 31 December 2019 for maximum temperature, minimum temperature, and precipitation. A complete operation for a day per variable required a central processing unit core, 18 GB of operating memory, and 2 h of time.
In order to shorten the running time, we carried out parallel computing on a supercomputer platform. Spatial interpolation work was executed by R version 4.0.2 (R Core Team, 2018), and the R package "machisplin" (Brown, 2019) was referenced to achieve it.

Evaluation metrics
The mean absolute error (MAE), root mean square error (RMSE), Pearson's correlation coefficient (Cor), coefficient of determination after adjustment (R 2 ), and Nash-Sutcliffe modeling efficiency (NSE) were used to evaluate the interpolation results. Pearson's correlation coefficient was used to evaluate the correlation between the simulated and observed values, and the other metrics are defined separately as follows: where S i and O i are the model-predicted and the experimentally observed values, respectively;Ō is the mean of the observed values; n is the number of observations; and k is the value of the independent variable. High Cor, R 2 , and NSE values between the predicted and observed values. . Spatial distributions of R 2 and MAE for daily maximum temperature, minimum temperature, and precipitation between 1961 and 2019. The value before the ± is the R 2 or MAE mean value and the value after the ± is the R 2 or MAE standard deviation for all meteorological stations. Figure 5. The relationship between latitude and MAE of daily precipitation. The inset shows the relationship between rainfall frequency above light rainfall and MAE of daily precipitation. MAE is the mean absolute error. Cor is Pearson's correlation coefficient. Rain frequency is the rainfall frequency above light rainfall, which is defined as the daily rainfall from 0 to 4 mm (Alpert et al., 2002).

Validation of temperature and precipitation
The spatial interpolation results, including daily maximum temperature, minimum temperature, and precipitation, were validated using meteorological station data. The results of the validation showed that the daily maximum and minimum temperatures were highly accurate ( Fig. 3 and Table 1). The fitting slopes between the simulated and observed values were both close to 1 and the coefficients of determination after adjustment were 0.98 and 0.99, respectively, for daily maximum and minimum temperature ( Fig. 3a and b). As shown in Table 1, the MAE was 1.07 and 1.08 • C and the RMSE was 1.62 and 1.53 • C for daily maximum and minimum temperatures, respectively. In addition, the Cor and NSE values were close to 1 for both the daily maximum and the daily minimum temperatures. Daily precipitation was less accurate than temperature, with an R 2 of 0.71 (Fig. 3c), which was mainly caused by underestimating the high daily precipitation. However, most of the points were concentrated in the low daily precipitation section. Furthermore, the MAE and RMSE for daily precipitation were 1.30 and 4.78 mm, respectively; the Cor between the simulated and observed daily precipitation was 0.84, and the NSE was 0.70 (Table 1). The interpolation accuracy shows spatial differences (Fig. 4). The R 2 values of the daily maximum and minimum temperatures in Southwest China were less than 0.94 and lower than those for other regions (Fig. 4a and c). The mean absolute errors for the daily maximum and minimum temperature ranges at most meteorological stations were less than 1 • C. However, there were some meteorological sta-tions with mean absolute errors of more than 2 • C, and these were evenly distributed across China (Fig. 4b and d). The R 2 value for daily precipitation at most meteorological stations was greater than 0.7 and the MAE decreased from south to north across China (Fig. 4e and f). For precipitation, the R 2 map (Fig. 4e) shows a west-east gradient in the scores, which is different from the north-south gradient present in the MAE map (Fig. 4f). There are fewer meteorological observation stations in the western region than in the eastern region, which may lead to the subtle east-west gradient in R 2 for daily precipitation. The obvious north-south gradient for MAE of daily precipitation could be caused by the rainfall frequency (Figs. 4f, 5); the MAE of monthly precipitation in China from another study showed a similar pattern . Rainfall frequency above light rainfall, which is defined as daily rainfall ranging from 0 to 4 mm (Alpert et al., 2002), is strongly correlated with the MAE of daily precipitation (illustration in Fig. 5), so that the MAE of daily precipitation in the southern region with a higher rainfall frequency is larger than that in the northern region with a lower rainfall frequency.
The meteorological stations were divided into the middle and lower reaches of the Yangtze River (MLYR), North China (NC), Northeast China (NEC), Northwest China (NWC), South China (SC), and Southwest China (SWC) (Fig. 1) according to their diverse geographic and climatic conditions and administrative areas . The trend in the cumulative distribution function curve of the difference between the simulated and observed values was always similar for daily maximum temperature, minimum temperature, and precipitation in the six regions, as well as for the whole of China. The daily maximum and minimum temperatures were all underestimated in the MLYR, NEC, NWC, SC, and SWC (Fig. 6a). The daily minimum temperatures were all underestimated in the MLYR, NC, NWC, SC, and SWC (Fig. 6b). For both daily maximum and minimum temperatures, the lowest average difference between the simulated and observed values occurred in NC and NEC, while the greatest difference occurred in SWC ( Fig. 6a and b). Except in the NWC region, the average difference between simulated and observed values for daily precipitation was less than 0 mm in the regions (Fig. 6c). The largest average difference between simulated and observed values for daily precipitation occurred in the SC region, with a value of 0.49 mm (Fig. 6c). Across the whole of China, the average difference between simulated and observed values for daily maximum temperature, minimum temperature, and precipitation was 0.36 • C, 0.30 • C, and 0.12 mm, respectively.

Temporal and spatial distributions of temperature and precipitation
The results showed that detailed spatial changes in temperature and precipitation over time could be obtained (Fig. 7). For example, the increases in the annual average values of both maximum temperature and minimum temperature were obvious over the Tibetan Plateau from 1965 to 2010 ( Fig. 7ah, the d1 and h1 subregions). In addition, compared with other years, the annual average daily minimum temperature clearly increased in some areas of NWC ( Fig. 7e-h, the h2 and h3 subregions) and MLYR ( Fig. 7e-h, the h4 subregion) in 2010. The most significant annual precipitation changes occurred in NEC ( Fig. 7i-l, the l1 subregion) between 1965 and 2010. The distributions of annual average daily maximum and minimum temperatures and annual precipitation across the six regions of China in 1965China in , 1980China in , 1995China in , and 2010 were analyzed (Fig. 8). Compared with other years, the areas with smaller values for annual average daily maximum temperature (less than 0) and annual average daily minimum temperature (less than −10) in SWC and NWC decreased in 2010 (Fig. 8a1, a2, b1, b2). These areas are mainly distributed on the Qinghai-Tibet Plateau, which has seen a large increase in temperature over the past few decades. The density distribution peaks for the annual average daily maximum and minimum temperatures in NEC moved to the right from 1965 to 1995 but moved to the left in 2010 (Fig. 8a3 and b3). The mean annual average daily minimum temperature in 2010 was higher in the MLYR, NC, and SC than in the other 3 years (Fig. 8b4-b6). There was an increase in mean annual precipitation in the northern part of China over the period 1965-2010 ( Fig. 8c2-c4). It increased from 335 to 415 mm across NWC (Fig. 8c2), from 487 to 593 mm across NEC (Fig. 8c3), and from 531 to 654 mm across NC (Fig. 8c4). In the MLYR, there were more areas with an annual precipitation of less than 1000 mm, and areas with an annual precipitation of more than 2000 mm increased in 1995 and 2010 compared with 1965 and 1980 (Fig. 8c5). Similarly, compared with other years, there were more areas with an annual precipitation of less than 1000 mm and more than 2000 mm in SC in 2010 (Fig. 8c6).

Accuracy comparison with other products
The performances of the CMFD, CLDAS, and ISIMIP3a generated daily temperatures and precipitations were evaluated against observations from all the meteorological stations, and their performances were compared with that of our dataset (Figs. 9-11; Tables 2-4). The fitting slopes between the simulated and observed daily temperature values were always close to 1 for all datasets . The R 2 for the CMFD daily average temperature was slightly smaller than that for daily minimum temperature in our dataset (Fig. 9b and c), but was equal to that of our dataset for daily maximum temperature (Fig. 9a and c). The Cor and  annual precipitation in 1965, 1980, 1990, and 2010. The regions within the ellipses are where the change is most visible. NSE for the CMFD daily average temperature were also similar to those for our estimated daily maximum and minimum temperatures (Table 2). By contrast, the MAE and RMSE for the CMFD daily average temperature were 1.12 and 1.64 • C, respectively, which were greater than those for our estimated daily maximum and minimum temperatures ( Table 2). The MAEs of daily maximum and minimum temperature for our dataset were 1.07 and 1.08 • C, respectively, and the RM-SEs of daily maximum and minimum temperature for our dataset were 1.63 and 1.54 • C, respectively, between 1979 and 2018 ( Table 2). The R 2 , Cor, NSE, MAE, and RMSE for the CLDAS daily maximum temperature were 0.91, 0.95,  1965, 1980, 1990, and 2010. The values shown in the plots are mean values. 0.90, 2.54, and 3.63 • C, respectively. Accuracy was clearly improved for our daily maximum temperature, and the corresponding metrics were 0.98, 0.99, 0.98, 1.10, and 1.73 • C ( Fig. 10a and b; Table 3). The MAE and RMSE for the CLDAS daily minimum temperature were clearly higher than our estimates for daily minimum temperature, and the R 2 , Cor, and NSE for daily minimum temperature in our dataset were higher than those for the CLDAS daily minimum temperature ( Fig. 10c and d; Table 3), thus indicating that the accuracy of our daily minimum temperature estimates was superior to that of the CLDAS daily minimum temperature product. Compared with those of the ISIMIP3a, the R 2 , Cor, and NSE of the daily maximum and minimum temperatures in our dataset were always higher and the MAE and RMSE of those temperatures were always smaller Table 4). The R 2 value for our estimated daily precipitation was clearly improved compared to the other three datasets, espe-   (Table 4). Thus, the daily precipitation accuracy of our dataset was generally higher than those of CMFD, CLDAS, and ISIMIP3a.   MAE, RMSE, Cor, and NSE are the mean absolute error, root mean square error, Pearson's correlation coefficient, and Nash-Sutcliffe modeling efficiency, respectively. N is the number of observations. Period shows the first and last years covered by the data.

Data availability
The HRLT dataset includes daily maximum temperature, minimum temperature, and precipitation at 1 km spatial resolution across China from January 1961 to December 2019. The datasets are publicly available in NetCDF format at https://doi.org/10.1594/PANGAEA.941329 .

Conclusions
The result of this study is a long-term (1961-2019), highresolution (1 km) daily gridded maximum temperature, minimum temperature, and precipitation dataset across China (HRLT). The HRLT dataset shows a high correlation overall with the observations from meteorological stations for daily maximum and minimum temperatures (R 2 was 0.98 and 0.99, respectively; Cor was 0.99 for both; NSE was 0.98 and 0.99, respectively), and the errors were small (MAE was 1.07 and 1.08 • C, respectively; RMSE was 1.62 and 1.53 • C, respectively). Although the HRLT dataset showed that the daily precipitation accuracy was lower than the daily temperature accuracy (R 2 , Cor, NSE, MAE, and RMSE were 0.71, 0.84, 0.70, 1.30, and 4.78 mm, respectively), the daily precipitation data in the HRLT dataset were more accurate and had a finer spatial resolution compared to three other existing datasets (CMFD, CLDAS, and ISIMIP3a). Furthermore, the accuracies for daily maximum and minimum temperatures and precipitation were lower in the southwestern part of China, probably because of the complex topography in that area compared to other areas. Calculation and interpolation by subregion may solve this problem in future studies. The use of satellite data as an input covariate in future studies will further improve the accuracy of the HRLT dataset, especially for precipitation. The HRLT dataset will help identify future extreme climatic events and can also be used to improve process-based models for prediction, adaptation, and mitigation strategies.
Author contributions. RQ and FZ calculated the dataset, analyzed the results, and wrote the manuscript; all other authors reviewed and revised the manuscript.

Competing interests.
The contact author has declared that none of the authors has any competing interests.
Disclaimer. Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.