Dataset of daily near-surface air temperature in China from 1979 to 2018

: Near-surface air temperature (T a ) is an important physical parameter that reflects climate change. Many methods are used to obtain the daily maximum (T max ), minimum (T min ), 25 and average (T avg ) temperature but are affected by multiple factors. To obtain daily T a data (T max , T min , and T avg ) with high spatio-temporal resolution in China, we fully analyzed the advantages and 1.61 ℃ , and the R 2 ranges from 0.95 to 0.99. For T avg , the RMSE ranges from 0.35 to 1.00 ℃ , the MAE varies from 0.27 to 0.68 ℃ , and the R 2 ranges from 0.99 to 1.00. Furthermore, various evaluation indicators were used to analyze the temporal and spatial variation trends of T a , and the T avg increase was more than 0.03 °C/a, which is consistent with the general global warming trend. In summary, this dataset has high spatial resolution and high accuracy, which compensates for the temperature values (T max , T min , and T avg ) previously missing at high spatial resolution and provides 40 key parameters for the study of climate change, especially high-temperature drought and low-temperature chilling damage. The dataset is publicly available at


Introduction
Near-surface air temperature (Ta) is an important variable that reflects global climate change and 45 4 2020). The land surface modeling forcing dataset was developed by Princeton University (1948.1-2006.12), with a temporal resolution of 3 h and a spatial resolution of 1.0° (Deng et al., 2010). To improve the accuracy of regional data, some researchers have developed meteorological 95 forcing datasets for China. The representative dataset is the China Meteorological Forcing Dataset (CMFD) released by the Institute of Tibetan Plateau Research, Chinese Academy of Sciences (1979.1-2018.12), with a temporal resolution of 3 h and a spatial resolution of 0.1° (He et al., 2010;Yang et al., 2010;Yang and He, 2019). However, the dataset does not provide daily maximum and minimum temperatures. The grid dataset of daily surface temperature in China 100 (V2.0) was released by the China Meteorological Administration (CMA; 1961.1-2021.9), with a spatial resolution of 0.5°. This dataset comprisesthe daily maximum, minimum, and average temperatures; its spatial resolution is low; and the accuracy of local areas needs improvement.
Although reanalysis datasets can obtain global near-surface air temperature data, the number of Tmax, Tmin, and Tavg datasets with high spatial resolution and high precision is insufficient. 105 In this study, we aimed to obtain a long-term Ta (Tmax, Tmin, and Tavg) dataset with high spatial resolution in China. We first analyzed the advantages and disadvantages of various data (e.g., reanalysis, remote sensing, in situ data). Next, we constructed daily Ta models for clear and nonclear sky conditions. This method compensates for the deficiency that studies have estimated Ta mostly under clear sky conditions rather than under all-sky conditions. We further improve data 110 accuracy by building correction equations for different regions. Finally, a dataset of daily Ta (Tmax, Tmin, and Tavg) in China from 1979 to 2018 was obtained with a spatial resolution of 0.1°, and we cross-validated this dataset with existing datasets.

Study area
China's vast territory has significant undulations on the earth's surface, and a wide range of 115 climate changes. To explore the temporal and spatial characteristics of Ta, we divided China into six subregions (Figure 1) according to climatic conditions, such as temperature and rainfall, and topographical conditions, such as elevation. (I) The Northeastern Region mainly includes northeast China, located to the east of the Greater Khingan Range. This region is located in the temperate monsoon climate zone, the annual precipitation is 400-1000 mm, and cumulative 120 temperature is between 2500 and 4000 °C (Mao et al., 2000). (Ⅱ) The North China region is located north of the Qinling-Huaihe River and south of the Inner Mongolia Plateau. This region is mostly located in the temperate monsoon climate zone, and the annual accumulated temperature is between 3000 and 4500 °C , with hot, rainy summers and cold, dry winters.
(Ⅲ) The Central Southern region is located south of the Qinling-Huaihe River and north of the 125 tropical monsoon climate type. This region is located in the subtropical monsoon climate zone, the annual accumulated temperature is between 4500 and 8000 °C, and the precipitation is mostly between 800 and 1600 mm. (Ⅳ) The Southern region is south of the Tropic of Cancer. This region is located in the tropical monsoon climate zone, the annual accumulated temperature is greater than 8000 °C, the annual minimum temperature is not less than 0 °C, and there is no frost year-130 round. Annual precipitation mostly ranges from 1500 to 2000 mm. (V) The Northwest region is mainly distributed in the inland areas above 40 °N latitude in China, located northwest of the Greater Khingan Range-Yin Shan-Ho-lan Mountains-Qilian Mountains line. This region is far from the coast, water vapor transport is limited, annual precipitation is between 300 and 500 mm, and the annual accumulated temperature is between 2000 and 3500 °C. The daily and annual 135 temperature differences are large, including those in the temperate desert, temperate grassy, and sub-frigid coniferous climates. (Ⅵ) The Qinghai-Tibet Plateau region includes the Qinghai-Tibet Plateau, the Andes Mountains, Mount Everest, and other areas. This region is located in the plateau and mountainous climate zone, the annual accumulated temperature is lower than 2000 ℃, the daily temperature range is large, and the annual temperature range is small. This region has 140 strong solar radiation, sufficient sunshine, and little precipitation. Figure 1. Scope map of the total study area and the six subregions. Blackdots indicate distribution locations of meteorological stations; blue frame lines indicate the sub-study area range, represented by Ⅰ, Ⅱ, Ⅲ, Ⅳ, Ⅴ, and Ⅵ.

Reanalysis data
The reanalysis dataset contains drivers of surface elements in a large area, which can provide highly complementary information and avoid data gaps and low-quality pixels caused by abnormal weather conditions. This study primarily used the CMFD and ERA5 dataas reanalysis data sources. 150 The CMFD are a set of meteorological forcing datasets developed by the Institute of Tibetan Plateau Research, Chinese Academy of Sciences Yang et al., 2010;Yang and He, 2019). They are mainly based on the Global Land Data Assimilation System (GLDAS) as a background dataset, using empirical knowledge algorithms and combining GLDAS with measured data to obtain temperature data with a spatial resolution of 0.1°. The CMFD contains 155 seven variables: 2 m air temperature, surface pressure, specific humidity, 10 m wind speed, downward shortwave radiation, downward longwave radiation, and precipitation rate. The CMFD covers from January 1979 to December 2018 and provides four types of temporal resolution (3 h, daily, monthly, and yearly). The CMFD are comprehensive and have the longest time series and the highest spatial resolution in China. Studies have used the temperature data as input parameters 160 to construct a surface air temperature model, which shows that the correlation coefficient between the CMFD temperature and the measured data is greater than 0.99 and has high consistency, and that grid data can reflect the temporal and spatial changes in regional air temperature (Zhang et al., 2019;Wang et al., 2017). The CMFD as an input element to build a surface temperature model can also significantly reduce model deviation and improve model accuracy (Chen et al., 2011). 165 Therefore, we used the 3 h temperature of the CMFD to build the Ta Model and verified the new product with the daily temperature from the CMFD. The CMFD is available from the China National Qinghai-Tibet Plateau Science Data Center (http://data.tpdc.ac.cn/zhhans/data/8028b944-daaa-4511-8769-965612652c49/, last access: 1 November 2020).
ERA5 is the fifth-generation product of the atmospheric reanalysis global climate data launched 170 by the ECMWF, replacing the ERA-Interim reanalysis data, which was discontinued on August 31, 2019. ERA5 data are generated based on the Cy41r2 model of the integrated forecasting system which has benefited from the development of data assimilation, model simulation, and model physics and is generated by assimilating many ground monitoring, aircraft weather observation, and radio detection data. ERA5 data are significantly better than ERA-Interim data, 175 for example, the former has a higher spatio-temporal resolution, more vertical mode levels, and more parameter products than the latter. ERA5 provides timely, updated quality checks on the data, which is convenient for providing stable, real-time, and long-term climate information.
ERA5 provides many meteorological elements, including 2 m air temperature, 2 m relative humidity, sea level pressure, sea surface temperature, and precipitation. Since the release of the 180 ERA5 reanalysis data, many researchers have tested their applicability and accuracy. The results show that the accuracy of the ERA5 is better than that of the ERA-Interim data, and the higher spatio-temporal resolutions are conducive to the precise description of regional atmospheres. The details of these improvements are convenient for studying changes in small-scale atmospheric environments (Meng et al., 2018;Mo et al., 2021;Hillebrand et al., 2021). These data can be 185 obtained from https://cds.climate.copernicus.eu/cdsapp#!/search?type=dataset&text=ERA5 (last access: 1 December 2020).

In situ data
The in situ data from 1979 to 2018 used in this study were employed to build a Ta model and evaluate existing datasets and new products. The measured data of meteorological stations were 190 from the China National Meteorological Information Center (http://www.nmic.cn/site/index.html, last access: 1 November 2020), including hourly air temperature, hourly land surface temperature, maximum daily temperature (Tmax), minimum daily temperature (Tmin), daily average temperature (Tavg), and weather condition records. Due to the inconsistency of recorded data of meteorological conditions at many stations, some data are missing, and there are no meteorological stations in 195 most areas; thus, the data are used as auxiliary data.
The ground observations we obtained from the China Meteorological Administration underwent uniform data processing and homogeneity testing. To further ensure the quality of the data, we checked the in situ data. First, we set a fixed threshold to eliminate the overflow value.
Second, we tested the time series of station data and eliminated abnormal and missing data due to 8 instrument damage or bad weather (Zhao et al., 2020). Finally, we checked the spatio-temporal consistency of the in situ data, deleted the meteorological stations with location migration during the study period, and maintained the temperature data of meteorological stations with a long monitoring time and stable temperature values.

Supplementary data 205
China's daily near-surface temperature grid dataset was released by the CMA, with a spatial resolution of 0.5°. This grid dataset containsthe daily maximum, minimum, and average temperatures in China (http://www.nmic.cn/site/index.html, last access: 11 April 2021). The CMA dataset was obtained by combining the daily temperature data monitored by meteorological stations and the digital elevation model (DEM) data generated by re-sampling with three-210 dimensional geospatial information via a thin-plate spline interpolation algorithm. The spatial resolution of the CMA data was 0.5°, which we used for cross-validation.
The Moderate Resolution Imaging Spectroradiometer (MODIS) is an important sensor in the Earth Observation System program and is mounted on the Terra and Aqua satellites. Terra is a morning orbiting satellite that passes through the equator at approximately 10:30 local time from 215 north to south. Aqua is an afternoon orbiting satellite that passes through the equator at approximately 1:30 local time from south to north. The Terra satellite has been in service since 1999, and the Aqua satellite since 2002. Since 2002, the surface temperature data can be obtained four times per day from MODIS data through inversion calculation. In this study, we used the MOD11A1 and MYD11A1 products: they provide daily surface temperature data on a global 220 scale with a spatial resolution of 1 km. MODIS LST has a quality control (QC) field that indicates data quality and is encoded in a binary form. MODIS data can be downloaded from the LAADS DAAC website (https://ladsweb.modaps.eosdis.nasa.gov/search/order, last access: 1 December 2020).
In addition to the aforementioned data, DEM data were used. The Shuttle Radar Topography 225 Mission (SRTM) DEM used in this study was a radar topographic mapping project jointly implemented by NASA and the National Imagery and Mapping Agency, which was implemented by the Space Shuttle Endeavour. Temperature data were regulated via the topographical correction of the SRTM DEM with 90 m resolution to eliminate the influence of topographical fluctuations on air temperature. SRTM DEM data can be obtained from the Geospatial Data Cloud 230 (http://www.gscloud.cn/search, last access: 10 February 2021).

Methodology
The Tmax, Tmin, and Tavg data were provided by meteorological stations. Other non-station locations or grid values were estimated by interpolation or indirect methods such as remote sensing. Because of the limited number of meteorological stations and their uneven distribution, 235 it is difficult to guarantee the accuracy of Tmax, Tmin, and Tavg obtained through interpolation in some areas. Under rainfall and cloud cover weather conditions, estimating the air temperature from remotely sensed surface temperature data is impossible. Even in clear sky conditions, the formula for estimating near-surface air temperature is not universally applicable, which hinders the development of a high-precision Ta dataset to a certain extent. Therefore, to obtain a Ta dataset 240 with a high spatio-temporal resolution and long time series, it is necessary to build a reliable and robust Ta model to estimate Tmax and Tmin, and further improve the accuracy of Tavg. Consequently, the product could be widely used for climate change and research on extreme weather events.
Daily temperature changes are affected by many factors and are extremely sensitive to fluctuations under different weather conditions. This study used multiple methods to calculate Ta. First, the 245 daily weather conditions were divided into clear sky and non-clear sky conditions. Second, based on the physical process of daily temperature changes and combined with existing reanalysis data, in situ data, and remote sensing data, we estimated Tmax and Tmin under different weather conditions. To further improve the accuracy of the dataset, we constructed a modified model for each region. Details are provided in the following sections. The overall process of this study is 250 illustrated in Figure 2. The construction of the dataset was mainly divided into three steps: (1) the process of daily weather condition determination, (2) the process of establishing Ta models under different weather conditions, and (3) data correction.

Scheme for dividing weather conditions
Different weather conditions have different rules of temperature changes. To improve the estimation accuracy of the maximum and minimum temperature, we conducted specific 260 calculations by distinguishing daily weather conditions. The quality of observation data is affected by weather, and some remote sensing products such as MODIS LST products have quality control fields. Therefore, the quality control field of MODIS can be used to distinguish between clear sky and non-clear sky conditions. However, we could only obtain MODIS observation data four times per day since 2002, which cannot cover the time range involved in 265 this study. Therefore, we divided the time series of this study into two periods: 1979-2001 and 2002-2018, and different methods are used for the two-time series to distinguish the daily weather condition. For the study period from 2002 to 2018, we distinguished each pixel mainly based on the MODIS quality control field. When the MODIS quality control of all four Ts corresponding to a pixel is in the clear sky condition, the pixel was judged to be in the clear sky condition; 270 otherwise, it was judged to be in the non-clear sky condition.
For the study period from 1979 to 2002, we used the in situ, CMFD, and ERA5 data to determine the daily weather condition. First, we filtered each pixel and divided it into two types: meteorological stations corresponding to pixels with and without weather condition records. For pixels with weather condition records, we used many statistical discrimination methods to analyze 275 the impact of non-clear sky weather phenomena on temperature fluctuations, which can facilitate the subsequent determination of pixels without weather condition records. Statistical analysis shows a significant difference in daily temperature fluctuations between clear sky and non-clear sky conditions, and non-clear sky weather conditions may cause abnormal temperature fluctuations. Therefore, we converted the judgment of the weather state into the abnormal 280 judgment of the time and frequency of the occurrence of Tmax and Tmin (occurrence time of Tmax and Tmin is hereinafter cited as Hmax and Hmin, respectively). Specifically, when Hmax and Hmin occur abnormally or the temperature change is wavy, a non-clear sky condition is used (Zhao and Duan, 2014;Ren et al., 2011). In other cases, they are regarded as clear sky conditions, and the position of each pixel is marked. Therefore, we had to further fill the daily time series of each 285 pixel to determine the weather condition. In this study, we used two strategies to perfect the temperature series for distinguishing weather conditions. The specific implementation steps for determining weather conditions are shown in In the first strategy, when the pixel location had a corresponding meteorological station or when the Euclidean distance between adjacent stations was less than 0.3°, we filled in the gaps to improve the integrity and continuity of the time series. The time series filling process was as follows: (1) when the temperature data at the observation site swas missing and not consecutively 295 missing, in the case of the same spatial range, we used the average temperature of two adjacent time points before and after the missing value at the same site to fill in the missing value, and (2) when the observation data of a station was continuously missing, in the same time range, we filled it with the observation data of the stations within 0.3°. This method was mainly based on the principle that the closer the distance between stations, the stronger the spatial consistency and 300 correlation of temperature changes. (3) When the station data were continuously missing and the adjacent station data could not be filled, other relevant data were used for repair within the same time and space. In this study, we estimated the weather state from the Ts monitored by the same station. This method theoretically originates from the approximate consistency between the daily variation ranges of Ts and Ta and is suitable for situations where there are many missing values 305 and incomplete time series at meteorological stations and adjacent meteorological stations. Many studies have analyzed the correlation between the daily trend of Ta and Ts and found strong consistency. The Ts retrieved by remote sensing satellites is also widely used to estimate Ta, which proves the reliability of determining the pixel weather state through the Ts time series Yoo et al., 2018;Johnson and Fitzpatrick, 1977;Caesar et al., 2006;Mostovoy et al., 2006). 310 (4) When there is no meteorological station at the pixel location and the distance from the meteorological station is less than 0.3°, we use the inverse distance weighting method to perform spatial interpolation on adjacent pixels. Notably, before interpolation, we need to consider the impact of elevation differences. To improve the interpolation accuracy, we first correct the data of the observation station to a uniform sea level, and then perform further calculations according 315 to the elevation of the interpolation point to obtain the corresponding temperature.
The second strategy was to target areas where the distribution of stations was sparse, and the Euclidean distance between two adjacent stations was greater than 0.3°. To compensate for the insufficient coverage and uneven distribution of stations in these areas, we uesd hourly data from ERA5 to determine the approximate time of occurrence of Tmax and Tmin. Because of a certain 13 difference between the spatial resolution of ERA5 and this dataset, it was difficult to fulfill our demand for higher spatial resolution. Consequently, we developed an effective downscaling process based on the spatial correlation between the ERA5 data and CMFD temperature data.
ERA5 data (with a spatial resolution of 0.3°) were spatially downscaled with the aid of the CMFD data (with a spatial resolution of 0.1°). The downscaling process is illustrated in Figure 4. First, 325 quality control of the ERA5 data and CMFD was performed to eliminate temperature outliers.
Second, the ERA5 data and CMFD were matched according to time series and central latitude and longitude to construct pixel pairs. Subsequently, we weighted the high-resolution data to the low-resolution ERA5 data pixel by pixel. Finally, the weight was used to downscale the ERA5 data to the same spatial resolution of the CMFD. The ERA5 downscaling was computed using 330 Eqs.1 and 2.
where T E , T C , and T M represent the ERA5 data, CMFD, and MODIS data, respectively.T E �x o ,y o � is the temperature data after downscaling; T E �x m ,y n � is the temperature data before downscaling; and i, j are pixel coordinates. m, n are the pixel coordinates before downscaling. 335

Tmax and Tmin estimation under clear sky conditions
In addition to the severe temperature fluctuations caused by abnormal weather phenomena, the 340 daily temperature changes under clear sky conditions have a certain regularity, periodicity, and asymmetry (Leuning et al., 1995;Johnson and Fitzpatrick, 1977). According to the similarity between the surface temperature and the diurnal variation trend of air temperature, a method of estimating Ta is established by the daily air temperature variation model. Verified by in situ data, this method is feasible (Du et al., 2020;Zhu et al., 2013;Perkins et al., 2007;Cesaraccio et al., 345 2001;Serrano-Notivoli et al., 2019). However, using the surface temperature retrieved by remote sensing methods to estimate the changing trend of air temperature is complicated, additional parameters need to be input, and the relationship between Ts and Ta is not fixed. Therefore, it is difficult to unify the types and quantities of parameters and ensure accuracy. Thus, we established a piecewise local sine function of temperature under clear sky conditions for each pixel, which 350 can simulate the change in Ta and calculate Tmax and Tmin (Mao et al., 2016;Jiang et al., 2010). and Hmin values were substituted into the derivation formula to obtain Tmax and Tmin as preliminary results for subsequent correction and analysis. We constructed a temperature model pixel by pixel to fulfill the temporal and spatial heterogeneity of each region. 360 where Hmax is the occurrence time of the daily maximum temperature. Hmin is the occurrence time of the daily minimum temperature. Ho is the input time, and At and Bt are unknown parameters.

Tmax and Tmin estimation under non-clear sky conditions
The daily temperature fluctuations in non-clear sky conditions are relatively large, and there may location information of each pixel, the most reliable and representative data source is the in situ data. Therefore, if there are in situ data for the pixel location, the temperature data at the same time will be directly obtained from the station to replace the pixel values Tmax and Tmin. For the pixels corresponding to non-meteorological stations, similar to the method of spatial downscaling for the pixel positions of non-meteorological stations in the weather condition judgment, we used 370 ERA5 data to perform spatial downscaling with the assistance of the CMFD. By adding high spatial resolution MODIS data, the downscaling method was further expanded to improve the accuracy of each pixel. We mainly wanted to fully use the advantages of various data, especially with the help of high-resolution MODIS data. According to the QC field of MODIS data, we used MODIS data with high spatio-temporal resolution to improve local accuracy while ensuring high-375 quality MODIS data. The corresponding time of the effective pixel was matched with the ERA5 data according to the nearby time, to obtain the data weight for spatial downscaling. The downscaling process and the validity determination of MODIS data are shown in Figure 4, and the downscaling formulas are shown in Eqs. 1 and 2.

380
Usually, the aim of calculating average temperature is to use the temperature value observed every day to obtain an arithmetic average. If each pixel has hourly temperature data, the calculated daily average temperature is the most representative. Because the observational conditions have been limited, hourly temperature data is difficult to obtain; thus, often, the temperature values of four observation times (e.g., 02:00, 08:00, 14:00, and 20:00) are used to obtain the daily average 385 temperature, or the daily maximum and minimum temperatures are directly averaged to obtain the daily average temperature. To improve the accuracy of the average temperature as much as possible, we used the 3 h temperature data provided by the CMFD and the maximum and minimum values we have calculated to conduct an arithmetic average to obtainthe daily average temperature. Finally, to improve the accuracy, we performed multiple linear regression correction 390 on the Tavg output value according to the in situ data (the linear correction method was the same as that described in Sect. 4.2) and obtained the daily Tavg dataset.

Ta data calibration scheme
Surface temperature is sensitive to changes in altitude and easily affected by the surrounding environment. For non-meteorological station pixels, we use interpolation to fill in the pixel values 395 based on the principle of regional consistency. To improve the accuracy of the pixel temperature at non-meteorological stations, we fully considered the influence of altitude on temperature. First, the in situ Ta was unified to sea level according to the vertical rate of temperature drop. Next, the non-station pixels were interpolated according to the station data, and finally, the interpolated pixel values were restored to the corresponding elevation. This method can reduce the influence 400 of altitude on temperature to a certain extent and improve the accuracy of the dataset. In this study, we used a uniform vertical temperature drop rate (γ), that is, for every 100 m increase in altitude, the atmospheric temperature decreases vertically by 0.65 °C, and vice versa. The height correction formula is provided by Eq. 5 (He and Wang, 2020;Schicker et al., 2015;Wang et al., 2013).
where TSL is the sea level temperature, Ta is the temperature of the meteorological station, and 405 H SL is the sea level height, where the value of γ is approximately 0.0065 °C/m. We used the jackknife method: 699 in situ stations across China were divided into 140 verification points and 559 calibration points according to the ratio of 20% and 80% to establish a multiple linear regression equation (Benali et al., 2012;. The preliminary accuracy results (Sect. 5.1) show that although the overall accuracy was high, there remains the 410 problem of abnormal temperature values of the model output data caused by the violent fluctuations in daily temperature changes. Further correction is required to reduce the deviation and improve the accuracy of the dataset. The data correction process is illustrated in Figure 5. For the abnormal temperature value, we replaced the Ta at the pixel location with the observation Ta from the meteorological station and performed the adjacent pixel temperature correction for the 415 pixel without the meteorological station at the pixel location. The multiple linear regression method was used to process the original temperature, and the stepwise regression relationship between the measured value of the station and the fitted value of the corresponding pixel was established. Next, we calculated the predicted value of the regression temperature according to the regression equation and obtained the temperature residual value by calculating the observed 420 value and the predicted value to obtain the final corrected temperature. (Cristobal et al., 2006).
The modified expression is shown in Eq. 6.
V(x, y) = m � (x, y) + ε �(x, y) where x and y are the numbers of rows and columns of pixels, respectively, V(x, y) is the correction value of the regression equation; m � (x, y) is the regression prediction value of air temperature; and ε �(x, y) is the residual value. 425 Figure 5. Flowchart for calibration of Ta model data.

Evaluation metrics
We mainly selected areas with a single surface type and flat terrain under clear skies as the 430 comparative study area to verify the original dataset and reconstructed dataset. A scatter diagram can represent the overall distribution and aggregation of the data and intuitively convey accurate information from the data; thus, we used a scatter chart to display the accuracy range of this product. In addition, before establishing the model, we retained a part of the reanalyzed data excluded from the calculation and used it for cross-validation. We used three indicators as metrics 435 to measure the accuracy of variables: R 2 , MAE, and RMSE.
We compared Tmax and Tmin with the ERA5 data and CMA data. Notably, the ERA5 reanalysis dataset is an hourly temperature grid dataset; thus, we obtained the highest and lowest temperature values of ERA5 by constructing a local sine function similar to that in the prior section and further calculated the average daily temperature. The accuracy of Tavg products in this study was verified 440 with the ERA5 data, CMA data, and CMFD daily temperature data. Because the spatial resolution of CMA is 0.5°, to facilitate comparison, we resampled the spatial resolution of all datasets to 0.5°.

Analysis of the Ta series trend
We not only compared the output Ta data with the in situ data, but also assessed the climate change 445 trends of Tmax, Tmin, and Tavg in various regions of China, and further tested the effectiveness and regional applicability of the dataset through various climate variables. The World Meteorological Organization defined a series of extreme climate indexes, including 27 core indexes. We used four of them (TXx,TNn,TX90p,and TN10p) to analyze the trend of extreme temperature changes in Tmax and Tmin (Karl et al., 1999;Peterson et al., 2001). Specifically, the TXx (TNn) anomaly 450 refers to the difference between the sum of monthly Tmax (Tmin) and the multi-year average of monthly Tmax (Tmin) in each year. The multi-year period of this study is 40 years. In addition, linear regression was performed on the TXx (TNn) anomaly to analyze the interannual variation trend.
The TX90p (TN10p) means that the daily Tmax (Tmin) of each month during the study period is arranged in ascending order, and the 90% (10%) corresponding value in the time series is used as 455 the threshold for judging warm days (cold nights) (Zhang et al., 2005).
To study the spatio-temporal variation trend of Tavg, we used linear regression analysis (K), correlation coefficient analysis (R), and the T-test (Du et al., 2020;Yan et al., 2020;Cao et al., 2021). The interannual change rate and correlation of Tavg were calculated by K and R, and the formula is provided by Eqs. 7 and 8, respectively. We performed a two-tailed significance test on 460 the T-test to measure the significance of the temperature and time series changes (Eq. 9).
where n represents the total number of years of the time series length, i represents the year, and Ti represents Tavg in the i-th year. K > 0 indicates that the temperature increases within the time series, and K < 0 indicates that the temperature decreases within the time series.

Evaluation of the original product
According to the six subregions divided in Figure 1, comparative analyses of this product (Tmax, Tmin and Tavg) based on in situ data were made respectively conducted. Figure 6 shows the accuracy scatter plot between the original data of Tmax and the in situ data. The R 2 fluctuated from 0.91 to 0.99, the MAE ranged from 1.69 to 2.71 °C, and the RMSE ranged from 2.15 to 3.20 °C. 470 Figure 7 shows the accuracy scatter plot of Tmin. The R 2 fluctuated from 0.93 to 0.97, the MAE ranged from 1.34 to 2.17 °C, and the RMSE fluctuated from 1.68 to 2.79 °C. Figure 8 shows the accuracy scatter plot of Tavg. The R 2 fluctuated between 0.97 and 0.99, the MAE ranged from 0.58 to 0.96 °C, and the RMSE fluctuated from 0.86 to 1.60 °C. As shown in Figures 6, 7, and 8, the R 2 of Tmax, Tmin, and Tavg and the temperature measured at the meteorological station were all 475 greater than 0.90. In general, our method performed well in estimating the daily temperature values. However, due to the impact of complex changes in weather, the distribution of temperature values on certain days is discrete, especially in study areas V and VI. Further corrections are necessary to reduce errors and improve the accuracy of the dataset.

Evaluation of the new product
The temperature was further corrected using the linear correction method. The data verification 490 results of Ta after correction are shown in Figures 9, 10, and 11. The results show that the corrected data had a higher consistency with the in situ data. The fitted and observed temperatures were linearly distributed and gradually approached the regression line, and the outliers were significant reduced. Figure 9 shows the corrected scatter plot of Tmax for each study area. The R 2 fluctuated from 0.96 to 0.99, the MAE ranged from 0.63 to 1.40 °C, and the RMSE fluctuated from 0.86 to 495 1.78 °C. Figure 10 shows the corrected scatter plot of Tmin for each study area. The R 2 fluctuated between 0.95 and 0.99, the MAE ranged from 0.58 to 1.61 °C, and the RMSE fluctuated from 0.78 to 2.09 °C. Figure 11 depicts the corrected scatter plot of Tavg in each study area, where R 2 fluctuated between 0.99 and 1.00, the MAE ranged from 0.27 to 0.68 °C, and the RMSE fluctuated from 0.35 to 1.00 °C. The results show that the distribution of numerical points in each area after 500 the correction was denser mostly concentrated near the 1:1 line, and the degree of clustering with the measured data was higher than before calibration. Our detailed analysis of the daily temperature in the six study areas demonstrated that the accuracy measurement values differed significantly between the east and west. For example, the accuracy error of study area IV is small, and the accuracy error of study area VI and V is large, which may be affected by the regional 505 topography and the distribution of meteorological stations. Studyarea IV is in the tropical monsoon climate zone, affected by latitude and topography, and the temperature is relatively high throughout the year. Moreover, the area is in eastern China and has densely distributed meteorological stations and relatively flat terrain. Linear correction can significantly improve the agreement between the estimated value and the observed value. Study areas VI and V have the 510 highest RMSE. They are in the Qinghai-Tibet Plateau in southwest China and Xinjiang in the northwest. Such areas have similar characteristics, such as high altitude, large spatial heterogeneity, and few meteorological stations. This result shows that the temperature has strong spatial heterogeneity. In general, the corrected dataset has higher accuracy than the original dataset, satisfies the spatial heterogeneity of different regions, and better estimates the temperature under 515 different weather conditions.   To further verify the robustness and accuracy of this product, Table 1 shows the cross-validation results of this product and other datasets, the mean average precision (MAP) of each region, and that this product has a high regional consistency with other datasets. Study area IV in the tropical monsoon climate zone has the highest accuracy, and study area VI located in the Qinghai-Tibet 535 Plateau region of China has the lowest data accuracy. This result may be because the reanalysis dataset is also affected by the number and distribution of meteorological stations, and the spatial heterogeneity. The accuracy and robustness of the product were confirmed from another perspective. The accuracy comparison of each area shows that this product has higher accuracy and spatial representation than other datasets. R 2 is closer to 1, and MAE and RMSE remain low. 540 Through the accuracy evaluation and data comparison between this product and the existing dataset, we found that our product has a better temperature estimation of each area, and the overall accuracy and accuracy of the dataset are higher.  545 We analyzed temperature changes in various regions of China through extreme climate indexes and change trend values to further test the validity and regional applicability of the dataset. As shown in Figures 12 and 13, the TXx anomalies and TNn anomalies are consistent in the regional change trend. Although the annual anomalies fluctuated during the study period, they gradually changed from negative to positive. This phenomenon confirmed that the temperature fluctuated 550 and increased, and the Tmax and Tmin gradually increased, which is consistent with the global warming trend. The average temperature rise of TXx anomalies in each study area was 0.42 °C/a, and the average temperature rise of TXx anomalies was 0.47 °C/a. The histograms in Figures 12   and 13 show that the number of warm days and cold nights fluctuates in an increasing and decreasing trend, respectively. In addition, similarities are in the change trends between warm 555 days and cold nights. For example, in 1980, under the continual influence of strong cold air in the north, low-temperature weather occurred continuously in most areas of China, and many areas experienced low-temperature disasters, which led to a decrease in the number of warm days and an increase in the number of cold nights. In 2015, 2016, and 2017, the temperature continued to rise, with high temperatures that occurred once in decades. This finding is closely related to the 560 severe El Niño events that occurred in 2015 and 2016, the impact of the subtropical high in 2017, and the overall global warming trend. From 1979 to 2018, there has also been an increase in the number of warm days and a decrease in the number of cold nights. Meteorological events can indirectly verify the accuracy of this product, indicating that the corrected data can be used to analyze long-term temporal and spatial changes in temperature. 565

Application of the product for trend analysis
To further analyze the change rate and regional differences of Tavg during the study period, we analyzed the temperature change rate (K), correlation coefficient (R), and significance test of the correlation coefficient (T-test(R)). As shown in Figure 14  and (b') shows that they show a strong correlation of approximately 48.77% and a correlation in the area of 84.06%, which shows that there is a high correlation between temperature changes and time. Figure 14 (c) and (c') show that after performing a significance test on the R between temperature and time, 83.17% of the area passed the 95% significance test, and 75.23% of the area passed the 99% significance test, which shows that the correlation between temperature and 575 time development is significant.

Code availability
The technical code of the Ta dataset based on the reconstruction model and verification can be 595 downloaded at https://doi.org/10.5281/zenodo.5513811 (Fang et al., 2021b). We have been finishing and improving the code and plan to upload it as a supplementary version.

Conclusions
Ta is an indispensable variable for global climate change research. Therefore, how to obtain high precision and high temporal resolution air temperature data products is important. Many 600 researchers have endeavored to produce datasets by using different data sources for the global or local region. However, because of the need for the refinement of research, further improvements the accuracy and spatio-temporal resolution are necessary. Based on the full analysis of the advantages and disadvantages of various datasets and data sources, this study integrated various data sources, such as in situ data, remote sensing data, and reanalysis data, and proposed a 605 reconstruction model of Ta under clear sky and non-clear sky weather conditions, respectively. A multiple linear regression model was used to further improve the accuracy of the data, and we obtained a new set of grid high-resolution daily temperature datasets in China from 1979 to 2018.
For Tmax, validation using in situ data shows that the RMSE ranges from 0.86 to 1.78 ℃, the MAE varies from 0.63 to 1.40 ℃, and the R 2 ranges from 0.96 to 0.99. For Tmin, the RMSE ranges from 610 0.78 to 2.09 ℃, the MAE varies from 0.58 to 1.61 ℃, and the R 2 ranges from 0.95 to 0.99. For Tavg, the RMSE ranges from 0.35 to 1.00 ℃, the MAE varies from 0.27 to 0.68 ℃, and the R 2 ranges from 0.99 to 1.00. Furthermore, we verified the Ta dataset with the existing reanalysis dataset and found that the proposed dataset has credibility and accuracy. Moreover, based on the particularity of geographic climate change in different regions, we used four extreme climate 615 indicators (TXx and TNn anomalies, TX90p, and TN10p) and three climate change indices (K, R, and T-test) to analyze the trend changes of Tmax, Tmin, and Tavg. In summary, the temperature in most regions of China has been gradually increasing. The number of cold nights and warm days gradually decreased and increased, respectively, and the Tmax and Tmin gradually increased, which is consistent with the general trend of global warming. 620 However, due to various factors, the weather may occasionally change drastically, such as to hail. Historical data cannot provide weather information at a greater specificity than was possible at that time; especially in areas without meteorological stations, refining past data is difficult..
However, further research should consider more meteorological satellite data, especially geostationary meteorological satellites data, to improve the accuracy of surface temperature 625 datasets used to monitor climate change.