An accurate spatially continuous air temperature data set is crucial for
multiple applications in the environmental and ecological sciences. Existing
spatial interpolation methods have relatively low accuracy, and the
resolution of available long-term gridded products of air temperature for
China is coarse. Point observations from meteorological stations can provide
long-term air temperature data series but cannot represent spatially
continuous information. Here, we devised a method for spatial interpolation
of air temperature data from meteorological stations based on powerful
machine learning tools. First, to determine the optimal method for
interpolation of air temperature data, we employed three machine learning
models: random forest, support vector machine, and Gaussian process
regression. A comparison of the mean absolute error, root mean square error,
coefficient of determination, and residuals revealed that a Gaussian process
regression had high accuracy and clearly outperformed the other two models
regarding the interpolation of monthly maximum, minimum, and mean air
temperatures. The machine learning methods were compared with three
traditional methods used frequently for spatial interpolation: inverse
distance weighting, ordinary kriging, and ANUSPLIN (Australian
National University Spline). Results showed that the Gaussian process
regression model had higher accuracy and greater robustness than the
traditional methods regarding interpolation of monthly maximum, minimum, and
mean air temperatures in each month. A comparison with the TerraClimate
(Monthly Climate and Climatic Water Balance for Global Terrestrial
Surfaces), FLDAS (Famine Early Warning Systems Network (FEWS NET) Land Data
Assimilation System), and ERA5 (ECMWF, European Centre for Medium-Range Weather Forecasts, Climate Reanalysis) data sets revealed
that the accuracy of the temperature data generated using the Gaussian
process regression model was higher. Finally, using the Gaussian process
regression method, we produced a long-term (January 1951 to December 2020)
gridded monthly air temperature data set, with 1 km resolution and high
accuracy for China, which we named GPRChinaTemp1km. The data set consists of
three variables: monthly mean air temperature, monthly maximum air
temperature, and monthly minimum air temperature. The obtained
GPRChinaTemp1km data were used to analyse the spatiotemporal variations of
air temperature using Theil–Sen median trend analysis in combination with
the Mann–Kendall test. It was found that the monthly mean and minimum air
temperatures across China were characterised by a significant trend of
increase in each month, whereas monthly maximum air temperatures showed a
more spatially heterogeneous pattern, with significant increase,
non-significant increase, and non-significant decrease. The GPRChinaTemp1km
data set is publicly available at
Air temperature is a fundamental variable in various research fields that include the impact of global warming and climate change, ecology, hydrology, agriculture, and human health (Sippel et al., 2020; Abatzoglou et al., 2018; Pathak et al., 2018; Chen et al., 2018). The monthly temperature data are crucial for multiple studies and applications, such as agriculture (Meshram et al., 2020), meteorological disasters (Tigkas et al., 2019), and ecology (Leihy et al., 2018). Long-term records of air temperature data with high spatial resolution are necessary for such research. Generally, air temperature data are measured by meteorological station networks or simulated using numerical climate models (dos Santos, 2020; Fu and Weng, 2018). Meteorological stations can provide long-term point-based information on observed air temperature; however, they cannot reflect spatially continuous information regarding regional air temperature. The downscaling technique is often used to obtain high-resolution data sets using coarse-resolution products, while there are multiple low spatial resolution data sets, such as the Climatic Research Unit (CRU) (Harris et al., 2014), the Global Precipitation Climatology Centre (GPCC) (Schneider et al., 2014; Becker et al., 2013), and Willmott & Matsuura (W&M) (Matsuura and Willmott, 2012), are generated using data from the observational stations. Interpolation is a reliable way to obtain spatially continuous data (Peng et al., 2019). The reanalysis-based data usually have low spatial resolution, which limits their ability to reflect the effects of complex topographies, land surface characteristics, and other processes on climate systems (Peng et al., 2019; Xu et al., 2017). Besides this, some reanalysis products have uncertainty per se (Tang et al., 2020b; Yin et al., 2021).
Various interpolation techniques that include inverse distance weighting (IDW) and ordinary kriging (OK) (Dawood, 2017; Li et al., 2011a, 2012; Hadi and Tombul, 2018; Stahl et al., 2006; Benavides et al., 2007; Duhan et al., 2013) are often employed to derive gridded temperature data sets for data-sparse areas. However, the accuracy of the derived results depends on the density of the meteorological stations used for the interpolation (Wang et al., 2017; Peng et al., 2019; Gao et al., 2018; Peng et al., 2014). Using conventional methods for data interpolation in areas with uneven coverage of meteorological stations could diminish the accuracy of the derived data (dos Santos, 2020; X. Li et al., 2018). The network of meteorological stations in China is characterised by irregular spatial coverage. For example, the observation network has low density in mountain areas (Gao et al., 2018; dos Santos, 2020; Guo et al., 2020), especially on the Tibetan Plateau (Xu et al., 2018; Zhang et al., 2016). Additionally, the number of meteorological stations operational in China in the 1950s was low. Therefore, the use of conventional interpolation methods cannot guarantee the accuracy of the derived spatial data sets of air temperature across China. Although various air temperature products are available, e.g. the TerraClimate (Abatzoglou et al., 2018), FLDAS (McNally et al., 2017), and ERA5 (Copernicus Climate Change Service, 2017) data sets, their spatial resolution is usually coarse (2.5 arcmin, 0.1 arcdeg, and 0.25 arcdeg, respectively), which restricts their ability to reflect the topographical characteristics and spatial heterogeneity of air temperature across China (Peng et al., 2019; Zhang et al., 2016). Thus, the demand remains for long-term spatially continuous data sets of air temperature with a high spatial resolution.
In comparison with traditional interpolation techniques, machine learning methods are better able to model nonlinear and highly interactive relationships (Xu et al., 2018). Using mud content samples from the southwest margin of Australia, Li et al. (2011a) proved the superior performance of machine learning methods in application to spatial interpolation of environmental variables. The subsequent application of machine learning methods further confirmed their effectiveness as tools for interpolation of environmental variables, in which secondary information considered, such as slope, latitude, and longitude, can improve the performance of machine learning (Li et al., 2011b, Appelhans et al., 2015; Zhu et al., 2018; Alizamir et al., 2020; Kisi et al., 2017). Many previous studies have demonstrated the potential of machine learning techniques in application to the estimation of air temperature in small regions, although most of these studies interpolated air temperature using satellite-derived predictors, such as the Land Surface Temperature and Normalised Difference Vegetation Index based on Moderate Resolution Imaging Spectroradiometer (MODIS) products (Appelhans et al., 2015; dos Santos, 2020; Meyer et al., 2016; Xu et al., 2018; Zhang et al., 2016; Yoo et al., 2018). However, MODIS data are only available from 2000, which means that air temperature in earlier years cannot be interpolated using such products. Moreover, optical remote sensing images are easily affected by clouds, limiting the ability of associated models to produce long-term spatially continuous data sets for air temperature across large regions such as China (Dong and Xiao, 2016; Mao et al., 2019; Xiao et al., 2018, pp. 2013–2016). Therefore, it is necessary to develop a universal model to interpolate long-term air temperature data sets for China. However, how best to design a simple and accurate model for temperature interpolation using machine learning remains unclear.
To interpolate air temperature across China, we employed three machine learning approaches: random forest (RF), support vector machine (SVM), and Gaussian process regression (GPR). Both RF and SVM have been proven effective in previous studies on remote-sensing-based air temperature estimation studies (Yoo et al., 2018; Zhang et al., 2016; Ho et al., 2014; Zeng et al., 2021). GPR is a powerful state-of-the-art probabilistic non-parametric regression method (Calandra et al., 2016; Schulz et al., 2018), which has produced satisfactory results regarding the prediction of daily river temperature (Zhu et al., 2018; Grbić et al., 2013) but has rarely been used for air temperature estimation. In this study, we utilised the RF, SVM, and GPR machine learning methods to develop a model for interpolation of long-term air temperature data for China.
The ultimate objective of the study is to produce a long-term high-resolution spatially continuous monthly air temperature product for China, based on meteorological station data and the best-performing model constructed using machine learning techniques. The specific variables contained in the generated product include monthly mean air temperature (Tmean), monthly maximum air temperature (Tmax), and monthly minimum air temperature (Tmin) from January 1951 to December 2020 across China.
Observational data of monthly Tmax, Tmin, and Tmean recorded from January 1951 to December 2020 at meteorological stations distributed across China
were downloaded from the China Meteorological Data Service Centre
(
Elevation and spatial distribution of meteorological stations across China (70 % were used for training and 30 % were used for testing).
The topographic data used in this study comprised a digital elevation model
(DEM) obtained from the NASA Shuttle Radar Topographic Mission (SRTM)
(
We used three existing temperature products for comparison: (1) the Monthly
Climate and Climatic Water Balance for Global Terrestrial Surfaces,
University of Idaho (TerraClimate) data set (resolution: 2.5 arcmin,
about 4.6 km)
(
The spatial distribution of air temperature is closely related to latitude, longitude, and elevation (Shao et al., 2012). The use of such auxiliary data can help alleviate, to a certain extent, the limitation of spatial interpolation associated with the sparse and irregular distribution of meteorological stations, and increase estimation accuracy (Chen et al., 2015; Alvarez et al., 2014; Li and Heap, 2011; Newlands et al., 2011). Figure 2 displays the correlation coefficients between air temperature (i.e. Tmax, Tmin, and Tmean) and the above three geographical variables. Note that the correlation coefficient value for each month represents the average of all years (1951–2020), which was obtained based on all the observed data from meteorological stations. The box plots of the correlation coefficient for each month are provided in Fig. S2. Overall, Tmax, Tmin, and Tmean have positive (negative) correlations with respect to longitude (latitude and elevation). Longitude and elevation have opposite correlations, but a similar trend with Tmax, Tmin, and Tmean, i.e. a reasonably high correlation during summer (June–August) and low correlation during winter (December–February). Latitude is correlated negatively with Tmax, Tmin, and Tmean, i.e. strong (weak) correlation in winter (summer). It is evident that strong regularity exists in the relationships between air temperature and longitude, latitude, and elevation. In the subregions of mainland China, the relationships between temperature and the variables still hold. Thus, we chose the three variables as predictor variables to obtain the gridded temperature raster from the point observations. Owing to the incompleteness of remote sensing data attributable to imaging time constraints and cloud contamination, we did not consider satellite-derived independent variables. We considered only longitude, latitude, and elevation as predictor variables to give the derived model the advantages of ease of use, generalisability, and universality.
Correlation coefficients between
RF, proposed by Breiman (2001), has been used widely for the regression of
geographical variables. RF is an ensemble machine learning method that
consists of multiple decision trees. RF can produce high rates of accuracy,
and the performance of RF in predicting new data is determined by the
aggregation of the results of all the trees (Hengl et al., 2018). The
randomisation of RF lies in two aspects: the random selection of training
samples for a tree through bagging (a form of bootstrapping) and the random
selection of predictor variables as the splitting attributes at each node of
the tree (Merghadi et al., 2020; Yoo et al., 2018). The randomness of RF
makes it resistant to the problem of overfitting. RF, which has been
demonstrated as being promising and flexible in dealing with heterogeneity in the
geographical environment, has been applied to the prediction of spatial and
temporal variables (Hengl et al., 2018; Zeng et al., 2021; Yoo et al.,
2018). For further detailed information regarding RF, the reader is referred
to Breiman (2001). We used the ensemble algorithm for regression in MATLAB
R2020b for the RF implementation and used the default parameters. The
minimum observations per leaf were set at 8 and the number of ensemble
learning cycles was set at 30. The reader is referred to the MATLAB help
centre for further details
(
SVMs, developed by Vapnik (2013), utilise the inductive principle of
structural risk minimisation to obtain the overall optimal response. SVMs
transform input data from lower-dimensional into a high-dimension space
based on a series of kernel functions (Fan et al., 2018). The input space
and the output space are non-linearly related in real applications, and the
limitation is solved by mapping the input space on to higher dimension. In
regression applications, an optimal hyperplane is constructed that is as
close to as many samples as possible. The SVM does not only consider the
error approximation to the data, but also the model generalisation. SVMs have
been used widely in various fields such as meteorology, hydrology, and
agriculture for regression and prediction applications (Ghorbani et al.,
2017; Shrestha and Shukla, 2015; Fan et al., 2018). Detailed information
regarding SVMs can be found in Vapnik (2013). The Gaussian kernel was adopted
as the kernel function of SVMs and the kernel scale parameter was set to 1.7.
The value of epsilon is an estimate of a 10th of the standard deviation
using the interquartile range of the response variable (by default). The box
constraint value for the Gaussian kernel function was obtained by dividing
the interquartile range of the response variable by 1.349. The box
constraint and the epsilon hyperparameters vary from month to month
according to the training data of each month. The predictors were
standardised in the SVM model. The reader is referred to the MATLAB help
documentation for further technical details
(
GPR is a non-parametric Bayesian technique for solving nonlinear regression
problems (Grbić et al., 2013). GPR was originally proposed to provide a
“principle, practical, and probabilistic approach to learning in kernel
machines” (Rasmussen, 1997, 2004). GPR is based on Bayesian theory and
statistical learning theory, which is applicable to regression problems
(Zhang et al., 2019). GPR has strength in its seamless combination of
several machine learning tasks, such as model training, hyperparameter
estimation, and uncertainty estimation, which can compute the prediction
intervals using the trained model (Sun et al., 2014; Zhu et al., 2018). GPR
has been utilised in diverse applications that include model approximation,
experiment design, and multivariate regression (Zhu et al., 2018; Karbasi,
2018). However, previous applications of GPR to the prediction of air temperature
have been limited. For detailed information regarding the GPR model, the
reader is referred to Rasmussen (1997, 2004). The explicit basis in the GPR
model is “constant” and the kernel function of the GPR algorithm is the
exponential kernel. The predictor variables were standardised in the GPR
model. The reader is referred to the MATLAB help documentation for further
details regarding GPR (
We extracted the independent variables (i.e. latitude, longitude, and elevation) relating to the meteorological stations and randomly divided the processed data into a set for model training (70 %) and a set for model evaluation and validation (30 %). For each month, we used the temperature data of a month (training set) to train the model and then used this model to generate the grid data of the same month. When training the models, the 10-fold cross-validation was used. We constructed a model for each month separately, which means we have 840 models for the 840 months from 1951 to 2020.
We used three metrics to evaluate model performance: mean absolute error
(MAE), root mean square error (RMSE), and the coefficient of determination
(
The Theil–Sen slope estimator used in combination with Mann–Kendall (MK) detection, which is an effective approach for trend analysis that reflects the variation in trends of each pixel in a time series, has been used widely in various fields such as hydrology and meteorology (Cai and Yu, 2009; Gocic and Trajkovic, 2013; Jiang et al., 2015). In this study, we used the Theil–Sen estimator coupled with the MK test to detect the trend of the temperature time series.
The Theil–Sen estimator, which is a robust non-parametric approach for
estimating the slope of a trend, has been used widely in relation to
hydrometeorological time series data (Jiang et al., 2015; Gocic and
Trajkovic, 2013; Shifteh Some'e et al., 2012; Sayemuzzaman and Jha, 2014).
The Theil–Sen slope estimator, which represents the magnitude of a trend,
can be expressed as (Theil, 1950; Sen, 1968):
The MK test quantifies the significance of a trend. It is a non-parametric
statistical test, meaning that it does not require samples to follow
specific distributions and is not influenced by outliers. The MK test has
frequently been applied to measure the significance of trends in
hydrological and meteorological time series data (Jiang et al., 2015;
Shifteh Some'e et al., 2012; Da Silva et al., 2015; Gocic and Trajkovic,
2013). The
We used the testing data set to evaluate the performance of each model.
Figure 3a–c presents the MAE, RMSE, and
The residuals were obtained as the observed values minus the predicted values. Figure 4 shows box plots of the residuals for Tmean for the test meteorological stations each month during 1951–2020. Overall, the mean residuals of the three models are generally close to 0 and the residuals are smaller during the warm months (June–September) than during the cool/cold months (October–April), particularly for RF and SVM. In comparison with SVM and RF, GPR has the most stable accuracy over the 12 months, i.e. the difference in the residuals among the months is relatively small. GPR also has a quantile range that is narrower than that of the other models. For Tmax and Tmin, the bias of GPR over the 12 months is smaller than that of both RF and SVM (Figs. S6 and S7). Additionally, the accuracy of the estimated Tmax is higher than that of Tmin, consistent with the findings of Tang et al. (2020a). The results show that the GPR model could be a better choice than either RF or SVM to estimate Tmean, Tmax, and Tmin for China. The frequency distributions of the residuals of the three machine learning models for Tmean, Tmax, and Tmin are provided in the Supplement (Figs. S8–S10), in which it can be found that GPR generally has the greatest concentration of residuals close to zero.
Residuals of the monthly Tmean predicted by the machine learning models with respect to in situ Tmean for the test meteorological stations. Note that the average of the residuals of Tmean from 1951–2020 for each test meteorological station is shown for each month. See Figs. S6 and S7 for the residual graph of Tmax and Tmin.
The spatial distribution of the average values of the residuals of the GPR
results for Tmean throughout the 70 years (1951–2020) at each of the test
meteorological stations is displayed in Fig. 5.
Most areas have relatively low absolute residuals, although certain stations
in some western areas have relatively high residuals. In January and
December, the number of stations with high absolute residuals (
Spatial distribution of residuals between the observed Tmean and the predicted Tmean by GPR for the test meteorological stations for each month. Note that the exhibited residuals are the average residual of 70 years (1951–2020) for each month.
According to the model evaluation, we concluded that GPR is the best model
for estimating air temperature across China. Therefore, we employed the GPR
model to generate the long-term spatial data set of Tmean, Tmax, and Tmin
from January 1951 to December 2020, which we named GPRChinaTemp1km.
Figure 6 illustrates the spatial pattern of Tmean
estimated by GPR in 2020. The differences between northwestern and
southeastern regions are remarkable. Generally, Tmean decreases from the
southeast toward the northwest. In winter, the temperature range between
northern and southern China is large, whereas the temperature range in
summer is relatively small. The lowest Tmean (
Spatial distribution of monthly Tmean predicted by GPR across China for each month in 2020. Note that only the maps for 2020 are presented as an example (all the data are available in the China GPRChinaTemp1km database).
A Theil–Sen median trend analysis was integrated with the MK test and the results were classified into four categories: significant increase, non-significant increase, significant decrease, and non-significant decrease. Figure 7 shows that the trend of the variation of Tmean (1951–2020) in China is dominated by significant increase in each month. There is only a small region in northwestern China that has significant decrease in Tmean in January and December. We found that there is always a small region showing a different trend in comparison with surrounding areas in the Xinjiang Uygur Autonomous Region in northwestern China, which is characterised by a decreasing trend in most months and non-significant increase in the hot months (June–September). This phenomenon could be related to the complex conditions of the region. For example, Bayinbuluke is an intermountain basin surrounded by the Tianshan Mountains, with an alpine wetland ecosystem in the arid temperate zone. During summer (June–August), Tmean shows distinct non-significant decrease in central areas of China. In December, the spatial differentiation is the most remarkable and the increasing trend in most of eastern China is non-significant, which differs from that of other months. There is also a region representing a trend of non-significant decrease on the Yungui Plateau in southwestern China. Overall, the trend of Tmean in China during 1951–2020 shows significant increase in each month, while only a few areas have a trend of decrease. The distribution of the mean temperature trend in China in our study agrees with the existing literature (Dong et al., 2015; Sun et al., 2018; You et al., 2021; Cui et al., 2017). Tmax is characterised by significant increase, non-significant increase, and a non-significant decreasing trend (Fig. S21). Tmin exhibits a spatial pattern similar to that of Tmean, showing a significant increasing trend in most areas in each month (Fig. S22).
Monthly trends of Tmean changes in China during 1951–2020 obtained by the Theil–Sen median slope analysis. The significance of the trends is quantified by the Mann–Kendall statistical test at the 95 % confidence level. The separate Theil–Sen trend analysis and MK test results for Tmean, Tmax, and Tmin are provided in the Supplement (Figs. S15–S20).
Two traditional methods used widely for spatial interpolation are IDW and OK
(Li and Heap, 2014, 2011). In this study, we used ANUSPLIN in addition to
IDW and OK for comparison with the machine learning models. ANUSPLIN, which
is professional interpolation software that uses the thin-plate smoothing
spline algorithm (Hutchinson, 1995, 2004; Xu and Hutchinson, 2013), has been
used to create many climatic data sets, such as the monthly Climatic Research
Unit data set (New et al., 2000) and the WorldClim data set (Fick and Hijmans,
2017; Hijmans et al., 2005). We compared the interpolation results derived
using the machine learning models with the results obtained using the
traditional methods to further assess the interpolation power of the machine
learning methods regarding air temperature across China. The accuracy
metrics (Fig. 8) show that the performances of GPR, SVM, and ANUSPLIN are of
a similar level, while RF, IDW, and OK perform less well. Both IDW and OK
have relatively high interpolation errors with higher MAEs and RMSEs than
GPR and SVM (Fig. 8). Overall, IDW and OK do not
perform well in July and January of all the studied years.
Figure 9 shows scatter plots of the observed monthly
Tmean and Tmean estimated by the six models for January and July in 2020. It
can be seen that OK and IDW both have clear differences between January and
July (Fig. 9g, h, i, and j), in which the points
are relatively widely dispersed in July. GPR, SVM, and ANUSPLIN are slightly
affected by the seasonal variation with lower errors (i.e. lower MAEs and
RMSEs) in July (Fig. 9). As shown in
Fig. 3, the RMSE, MAE, and
Comparison of the performances of the six models for Tmax and Tmin reveals that GPR performs better in terms of Tmax and has the lowest errors (MAEs and RMSEs) in almost all the studied months (Fig. S23). OK and IDW have similar performances, consistent with the findings of previous related studies (Plouffe et al., 2015; Li et al., 2011a). It is noticeable that IDW and OK perform relatively poorly. Both IDW and OK depend on the spatial autocorrelation of air temperature and cannot capture the geomorphic characteristics of the interpolation area because neither method includes elevation information (Ozelkan et al., 2015; Wang et al., 2017; Li et al., 2011a). Unlike IDW and OK, ANUSPLIN considers longitude, latitude, and elevation (Hijmans et al., 2005). The frequency distributions of the residuals for Tmean, Tmax, and Tmin of the six models for the same months as in Fig. 9 are presented in the Supplement (Figs. S24–S26). The distributions follow a normal distribution and the residuals of GPR, SVM, and ANUSPLIN are concentrated mainly around zero. Scatter plots of Tmean, Tmax, and Tmin for the same periods as shown in Fig. 9 are provided in the Supplement (Figs. S27–S49), in which the robustness of GPR is clearly demonstrated for Tmean, Tmax, and Tmin in comparison with other methods. Studies have shown that Gaussian processes are one of the most intuitive techniques for modelling spatial surfaces (Yu et al., 2017; Berger et al., 2001). Besides this, we conducted an experiment by cutting out a region using a 300 km square, and we used all stations inside this square for testing and stations outside it for training models to estimate the robustness of the GPR models. The results indicated high accuracy and good robustness of the GPR models (Figs. S50–S52).
Accuracy of Tmean derived from the machine learning methods and traditional methods for January and July during 1951–2020 with an interval of 10 years.
Scatter plots of Tmean estimated by the machine learning models and traditional models against observed monthly mean temperature in January and July 2020.
Figure 10 presents maps of the residuals (observed values minus estimated values) for Tmean in January and July 2020 estimated using the six methods. Bias is apparent in RF (Fig. 10c and d), IDW (Fig. 10g and h) and OK (Fig. 10i and j). A comparison of the maps of the residuals reveals that Tmean estimated by GPR generally agrees well with the in situ data, with large bias at only a few stations that are distributed mainly in western and northern China, which might be related to the scarcity of meteorological stations and the complex regional topography (Ji et al., 2015). It is also evident that the absolute residuals in July are generally lower than those in January (Fig. 10). For China, the spatial homogeneity of temperature in summer is stronger than that in winter, which might be one reason for the lower bias observed in July. We note that RF has poor performance in comparison with the other machine learning methods. Although we do not have sufficient evidence to deduce the causes for the lower accuracy of RF, the small number of meteorological stations might be a major reason. Additionally, RF regression has a limitation regarding the conditions beyond the range of the training data set because only the values included in the training data are used for splitting the trees (Mutanga et al., 2012; Jeong et al., 2016). Among the three machine learning algorithms, GPR and SVM both perform relatively well, although the performance of GPR is better. Note that we used the medium Gaussian SVM and exponential GPR in MATLAB R2020b. GPR and SVM are both non-parametric kernel-based models that rely on the Gaussian principle. The Gaussian function has the desired characteristics of being an inverse-distance algorithm and a smoothing filter (Thornton et al., 1997), which might explain the better performances of GPR and SVM. The comparison with Peng's data (Peng et al., 2019) in the Tibetan Plateau region shows the strength of the GPR data in regions with complicated topography (Fig. S55).
Comparison of the spatial distribution of the residuals between machine learning methods and traditional methods for Tmean in January and July 2020 (comparisons for Tmax and Tmin are provided in Figs. S53 and S54).
In summary, ANUSPLIN is an interpolation method that is better than IDW and OK in modelling air temperature over complex terrain (Plouffe et al., 2015; Newlands et al., 2011). However, the robustness of ANUSPLIN is no better than that of GPR. Moreover, ANUSPLIN is based on the principle of thin-plate splines, the skill of which can be limited in regions with high elevations and sparse observations, i.e. areas such as the Tibetan Plateau (Jobst et al., 2017). Furthermore, in our study, running ANUSPLIN was more time-consuming in comparison with running the GPR model, making it difficult to generate long-term monthly data sets for all 12 months over 70 years. The spatial maps of temperature generated by the six models (Figs. S56–S58) reveal that GPR obtained reasonable results for Tmean, Tmax, and Tmin. In the case of Tmax and Tmin, ANUSPLIN does not appear to have a rational range for Tmin (Fig. S58k and l). Therefore, in production of long-term high-resolution data sets over large land areas such as China, it is more feasible, efficient, and accurate to use the GPR model. Furthermore, the GPR generated data can capture the high temperature of the anomalous event (Fig. S59), e.g. the 2006 summer drought of the eastern Sichuan Basin (Y. Li et al., 2011). The spatial anomaly pattern can also be captured using our generated gridded data, as shown in Figs. S60–S70. In our study, the GPR method is employed to generate the temperature data. In future work, we may dig into the potential of GPR in other meteorological variables.
Taylor diagrams displaying a statistical comparison with observations between our products generated using the GPR model and the other products under the same spatial resolution. Given the overlapping time of the data sets, January 1979 to December 2019 was used to compare Tmax and Tmin, and January 1982 to December 2019 was used to compare Tmean. Comparisons for each month are presented in the Supplement (Figs. S73–S75).
We used the ERA5, FLDAS, and TerraClimate temperature data sets for comparison with our data set generated using the GPR model. The spatial resolution of the three data sets is about 28, 11, and 4.6 km, respectively. Our data sets are 1 km. We resampled all the data to the resolution of the ERA5 data set to keep the resolution consistent and we then made comparisons. Note that the generated data in our study represent the temperature at 2 m height, since the station records the temperature at 2 m above ground (Liu et al., 2011; Zhang et al., 2010). Taylor diagrams were constructed to compare the accuracy between our data and that of the other products for Tmax, Tmin, and Tmean (Fig. 11). For Tmax, it can be seen that the GPR-simulated air temperature best matches the observations, with a closer standard deviation to the observed variability, lower centred RMSE, and higher correlation than both ERA5 and TerraClimate. For Tmin, the standard deviation and RMSE values of ERA5 are clearly greater than those of both TerraClimate and GPR. GPR has almost the same standard deviation as the observations, with the lowest RMSE and highest correlation, whereas TerraClimate has slightly less spatial variability (lower standard deviation), with a higher RMSE value and lower correlation. In the case of Tmean, GPR and FLDAS have almost the same variability (with a standard deviation close to the observed variability), while GPR has the highest correlation and lowest RMSE. Generally, the GPR-derived data set is better in terms of Tmax, Tmin, and Tmean than the data sets obtained using other products. The better outcome using the GPR model is characterised by the closest distance in terms of the variability compared with the observations, the lowest RMSE, and the highest correlation for all three temperature variables. The Taylor diagrams also show that the GPR model performs better in terms of the reliability of the gridded temperature data sets and has greater potential regarding spatial interpolation of air temperature. Besides this, we also compared our data sets with Peng's data (Peng et al., 2019), which shows the mean temperature from GPR data sets has relatively higher accuracy than that from Peng's data on the whole, especially in warm months (Fig. S71). Additionally, the high-resolution GPR data can provide more spatial details than the coarse-resolution products like ERA and FLDAS (Fig. S72).
China covers a vast territory, with a complex topography and diverse climate,
meaning that auxiliary data such as elevation are particularly important
regarding temperature interpolation (Appelhans et al., 2015; Vicente-Serrano
et al., 2003). Air temperature is strongly impacted by topography and the
DEM represents a fundamental variable for interpolating air temperature in
our methodology. The terrain semantics can be learned from the elevation
data (Sha et al., 2020). The quality of auxiliary environmental predictors
is vital and an appropriate DEM is crucial for accurate interpolation (Li
and Heap, 2011; Diodato, 2005). The DEM data adopted in this study were from
the SRTM Version 4, which represents a substantial improvement on previous
versions. Although the updated version is promoted as the highest quality
SRTM data set available (
The Euclidean distance of observation stations is quite small in most regions, while it is relatively large in the west of the Tibetan Plateau and a small region in Inner Mongolia (Fig. S76). The larger Euclidean distance means the stations in that region are sparse, which can have an impact on the interpolation accuracy (Hijmans et al., 2005; Li et al., 2011a). The spatial distribution of the width of the predicted intervals with a significance level of 5 % (using the upper limit minus the lower limit of the confidence interval) for the 12 months in 2010 using the trained models shows that most of the regions have quite small uncertainty, while the Tibetan Plateau areas have relatively larger uncertainty (Fig. S77).
Some studies use the remote sensing data to generate the air temperature data set, such as land surface temperature, normalised differential vegetation index (NDVI), and land use (Hooker et al., 2018; Li and Zha, 2019; Li et al., 2018). Although these variables are correlated with the air temperature, these remote sensing data are usually not available before 2000, since our goal is to generate long-term data series from 1951 to 2020. Furthermore, the MODIS data are not available for each month from January 2000 to December 2020. As shown in Fig. S78, the percentage of the available MODIS images is low in northeast China and southern regions. Thus, the remote sensing data are not appropriate for generating long-term temperature data in our study. Furthermore, there is inherent data uncertainty in the remote sensing data itself, such as the land use data.
In our study, we split the stations into testing and training stations in
ArcGIS, which considers the spatial distribution of the weather
stations. We also conducted a case study using the Tmean from 1990, 2000, and
2010 to figure out if the model output is sensitive to the choice of
stations used in the test/training data set. We conducted the experiment by
randomly splitting the data into training and testing sets (
It should be noted that in July 1951, there were only 38 samples available
for testing and 96 samples available for training. The scarcity of
meteorological stations in the early years of the 1950s represents one of
the major limitations regarding the use of machine learning methods.
Generally, this study found that the GPR estimates Tmean better than Tmax and
Tmin. The average MAEs and RMSEs of the GPR model for Tmean are both
0.79
The GPRChinaTemp1km data set includes monthly maximum air temperature,
minimum air temperature, and mean air temperature at 1 km spatial resolution
over China from January 1951 to December 2020. The data sets are publicly
available in GeoTIFF format on Zenodo at
A long-term, high-resolution, current, and spatially continuous data set of air temperature over China is fundamental for understanding climatic dynamics and conducting related scientific research. We used meteorological station data available from January 1951 to December 2020 throughout China as the dependent variable and longitude, latitude, and elevation were considered as independent variables for interpolation. We used three machine learning models (i.e. RF, SVM, and GPR) to investigate the potential of machine learning techniques regarding the interpolation of air temperature over China. Results showed that GPR performed best, followed by SVM and RF. The machine learning models were also compared with conventional interpolation methods (i.e. IDW, OK, and ANUSPLIN) and the results showed that GPR was generally superior for interpolating Tmax, Tmin, and Tmean for each month over China. Comparison of the GPR-derived results with existing products (i.e. TerraClimate, FLDAS, and ERA5) revealed that GPR outperformed the three products with regard to Tmax, Tmin, and Tmean. We constructed a new 1 km resolution monthly maximum, minimum, and mean air temperature data set (named GPRChinaTemp1km) for China from 1951 to 2020 using the advanced GPR machine learning method. Most regions of China display significant increases for Tmean and Tmin in each month, while the trends of significant increase, non-significant increase, and non-significant decrease are prominent for Tmax. A more profound analysis can be conducted based on our temperature data sets, which could help further understanding of global warming and climate change.
The supplement related to this article is available online at:
QH and MW designed the research and developed the methodology. QH wrote the original manuscript. QH, MW, KLiu, KLi and ZJ reviewed and revised the manuscript.
The contact author has declared that neither they nor their co-authors have any competing interests.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
We thank the China Meteorological Data Service Centre for providing the required temperature data of the meteorological stations. We also gratefully acknowledge other institutions and researchers that have contributed to the study.
This research has been supported by the National Key Research and Development Program of China (grant no. 2017YFC1502902).
This paper was edited by Martin Schultz and reviewed by Athanasia Iona and two anonymous referees.