A high-resolution air temperature data set for the Chinese Tianshan Mountains in 1979-2016

The Chinese Tianshan Mountains has a complex ecological environment system. It not only has a large number of desert oases, but also gave birth to a large number of glaciers. The arid climate and the shortage of water resources are the important factors to restrict the socio-economic development in this area. This study presents a unique high-resolution (1 km, 6-hourly) air temperature data set for the Chinese Tianshan Mountains (41.1814-45.9945 °N, 77.3484-96.9989 °E) from 2

series is the prerequisite for accurate climate change assessment especially on a regional scale (Gao et al., 2012;Pepin and Seidel, 2005;Minder et al., 2010;Maurer et al., 2002;Mooney et al., 2011). However, as the most common sources for air temperature time series, observational networks suffer from the low-station density in complex terrains, in particular at high mountains (Gao et al., 2014a). The installation and maintenance of the stations in these regions are the main challenges (Kunkel, 1989;Rolland, 2003). In order to obtain spatially continuous temperature data, interpolation technologies like 5 natural neighbourhood, inverse distance weighting and a series of Kriging methods are usually applied. However, these methods rely on the density of surface stations. The reliability of interpolated results decreases with increasing distance from the selected stations, especially for the inverse distance weighting method. Thus, interpolation approaches may induce large errors for a large region with low density of stations (e.g. Vogt et al., 1997).
In contrast to low availability of observations, reanalysis products provide long-term and spatially consistent data sets, which 10 have been increasingly applied for climate change assessment in the past two decades (e.g. Gao et al., 2014a;Mooney et al., 2011). Reanalysis is designed to estimate the state of real atmosphere and land surface characteristics by assimilating a lot of observations (Decker et al., 2012;Simmons et al., 2010). However, reanalysis contains uncertainties, such as observation changes and model misrepresentation (Simmons et al., 2010). With the development of reanalysis assimilation system, the spatial resolution has been enhanced down to 0.125°, for example ERA-Interim.. However, the local processes such as 15 temperature inversion in deep valleys, as well as snowpack accumulation and melting, are still not explicitly considered.
Furthermore, due to the heterogeneity over the land surface, many hydrological and climatic impact models go for applications on high resolutions, which tend to run on the scale of 0.1-1 km (Bernhardt and Schulz, 2010;Gao et al., 2012;Maraun et al., 2010). To this end, downscaling and correcting reanalysis data is necessary (Gao et al., 2012(Gao et al., , 2014b(Gao et al., , 2016. Previous studies have shown that the elevation difference between the reanalysis grid point and the corresponding 20 meteorological station leads to a large systematic bias (Gao et al., 2012(Gao et al., , 2014a(Gao et al., , 2016. Thus, the elevation correction scheme based on a lapse rate, which explains the empirical relationship between air temperature and altitude, can reduce this bias significantly. A constant lapse rate from the range of -6.0 °C km -1 to -6.5 °C km -1 (e.g. Dodson and Marks, 1997;Lundquist and Cayan, 2007;Maurer et al., 2002;Marshall et al., 2007) is commonly used. The monthly temperature gradients within the atmosphere are also widely applied in different regions (Kunkel, 1989;Liston and Elder, 2006). However, previous 25 studies have shown that a fixed lapse rate may be problematic since the values of the lapse rate can vary significantly within short time periods of less than a month (Minder et al., 2010;Lundquist and Cayan, 2007;Rolland, 2003). To tackle this issue, using the lapse rate calculated from the meteorological stations has the best performance in many regions (e.g. Gao et al., 2012Gao et al., , 2017. However, the observed lapse rate completely relies on the density of sites and it is not applicable for the regions without observations like high mountains. 30 Gao et al. (2012) introduced one another strategy that obtains the temporal variability of lapse rates by using ERA-Interim internal temperature profiles. It is completely derived from ERA-Interim on the internal pressure levels. It is therefore Earth Syst. Sci. Data Discuss., https://doi.org /10.5194/essd-2018-73  independent of local station observations. This method has been successfully applied for the cases in the European Alps and the Tibetan Plateau (Gao et al., 2012(Gao et al., , 2017. Furthermore, this approach has the potential to be used to downscale ERA-Interim temperature data for any other high mountainous areas. Following this approach, for the first time, the 0.25°×0.25°, 6-hourly ERA-Interim 2 m temperature data from 1979 to 2016 is downscaled to 1 km, 6-hourly for the Chinese Tianshan Mountains (CTM). The temperature data set presented here is extraordinarily unique, because it covers such a large, complex 5 terrain area with a long-term continuous both in space and in time. The validation with observations from meteorological stations shows that this data set is generally reliable and suitable for climate change impact assessment as well as for hydrological and environmental modelling.
The specific information about the Chinese Tianshan Mountains is described in Sect. 2. The used data including ERA-Interim and observations, downscaling methods as well as evaluation criteria are presented in Sect. 3. Section 4 provides the 10 validation results, and discussion and conclusions are presented in Sect. 5. Data accessibility is presented in Sect. 6.

Study area
The Tianshan Mountains is one of the seven major as well as the largest independent latitude mountain systems over the world. It spans four countries including China, Kazakhstan, Kyrgyzstan, and Uzbekistan in the east-west direction. It stretches around 2500 km length with an average width of 250-350 km (Hu, 2004, Chen et al., 2014Wang et al., 2011). The 15 East Tianshan Mountains is called the Chinese Tianshan Mountains (CTM) which extends over 1700 km in Xinjiang Province, China ( Figure 1). It occupies approximately one third of the entire area of Xinjiang Province (~570,000 km 2 ). The CTM consists of three parallel mountains (Hu, 2004;Wang et al., 2011). The north branch of the Mountains mainly includes the Ala Mount, Keguqin, Borokoonu, and Bogda Mountain. The central branch includes the Arakar, Nalati, Erwingen, and Hora Mountain. The south branch mainly includes Kochsal, Khark, Lerke, and Karake Mountain (Hu, 2004). The elevation 20 decreases from west to east with an average elevation of around 4000 m. The CTM also is a boundary for the northern and southern hillsides, which is represented by the Junggar Basin and the Tarim Basin, respectively. Mt. Tomur is the highest peak of the Tianshan Mountains with the elevation of 7443.8 m (Hu, 2004).
The Tianshan Mountains has its special arid climate regime due to the longest distance to the sea compared with other major mountain systems over the world. Many rivers such as the Sir River, the Chu River and the Yili River are originated in the 25 Tianshan Mountains (Chen et al., 2014;Chen et al., 2017;Hu, 2004). The CTM not only has a large number of desert oases in the basin, but also gave birth to a large number of glaciers. In statistics, there are 9035 glaciers in the CTM with an area of 9225 km 2 and a volume of 1011 km 3 (Shi, 2008;Shi et al., 2010). However, most of the glaciers in CTM are in a state of rapid retreat due to the global warming (e.g. Ding et al., 2006;Li et al., 2003;Li et al., 2010). As a climate transition zone, the CTM is known as the "wet island" in arid regions (Chen et al., 2015;Deng and Chen, 2017). The glaciers and snow cover 30 in high mountains are the most sensitive indicators of climate change. Global warming, especially the significantly increased temperature impacts on the retreat of glaciers and the ablation of snow, which further influences the regional water resources, and ecological environment (Chen et al., 2014;Chen et al., 2017;Wang et al., 2011;Wei et al., 2008;Zhang et al., 2012).
Because of the special geographical location and complex terrain, the temperature changes in northern and southern hillsides of the CTM are significant in regional and seasonal. Due to limited long-term observations in the CTM, the link between climate change and glacier variation is still unclear. In the past decades, most of the studies have focused on the Glacier No.1 5 at the headwaters of Urumqi River (Li et al., 2003). With the development of Geographic Information System (GIS), Remote Sensing (RS) and climate reconstruction technologies, great progress has been made on the glacier fluctuation investigations (e.g. Chen et al., 2012;Li et al., 2010). However, a reliable long-term series of temperature data is great needed to explore the glacier variations under the warming and wetting climate (Chen et al., 2015).

ERA-Interim data
The European Centre for Medium Range Weather Forecast (ECMWF) reanalysis product ERA-Interim is downscaled in this study. ERA-Interim provides data from 1979 onwards, and continues in real time (Berrisford et al., 2009;Dee et al., 2011).
The Cycle 31r2 of ECMWF's Integrated Forecast System (IFS) was used for the ERA-Interim product. Compared with ERA-40, ERA-Interim improved significant in the representation of the hydrological cycle, the quality of the stratospheric 15 circulation, as well as the handling of biases by using a 4-dimensional variation analysis (Dee and Uppala, 2009;Simmons et al., 2006;Uppala et al., 2008;Dee et al., 2011). ERA-Interim model in this configuration comprises 60 vertical levels, with the top level at 0.1 hPa. It used the T255 spectral harmonic representation for the basic dynamical fields and a reduced Gaussian grid (N128) with an approximately uniform spacing of 79 km (Dee et al., 2011;Uppala et al., 2008). ERA-Interim assimilates four analyses per day at 00, 06, 12 and 18 UTC. Two 10-day forecasts with a 3-hour resolution are initialized on 20 the basis of the 00 and 12 UTC analyses (Dee et al., 2011;Uppala et al., 2008). Due to usage limitation, 6-hourly assimilation data at 00, 06, 12 and 18 UTC from 1979-2016 which are projected on a grid of 0.25°×0.25° were used. This grid is interpolated from the original reduced Gaussian grid. The used output variables are 2 m temperature, surface pressure, as well as temperature and geopotential height at 925 hPa, 850 hPa, 700 hPa, 600 hPa and 500 hPa levels. The geopotential height is related to the variation of gravity with latitude and elevation, and is calculated by the normalization of the 25 geopotential over the gravity (Gao et al., 2012).

Observations
Daily mean air temperatures from 24 meteorological stations located on the CTM derived from the China Meteorological Data Sharing Service System of the National Meteorological Information Center (CMA-CMDC, http://data.cma.cn/) are used in the analysis. An overview of the observations is given in Figure 1  lead to the different initial time (Table 1). The elevation was adjusted for the station No. 24 but the coordinate is not changed.
The quality of observed data is strictly controlled by the National Meteorological Information Center of China. The observed daily mean temperature is averaged from four records (6-hourly) from previous 20:00 to 20:00 in Beijing time. Therefore, 6hourly ERA-Interim temperature is adjusted according to the time differences to match observations. Please note that, only 7 out of 24 meteorological stations are available for international users via CMA-CMDC with a registration. These 7 sites are a 5 small part of data set, which covers 194 sites since January 1951 over China for global exchange (http://data.cma.cn/en/?r=data/detail&dataCode=SURF_CLI_CHN_MUL_DAY_CES_V3.0). These 7 sites are Nos. 2, 5,6,8,15,18 and 23 in this study.

Downscaling method
The original ERA-Interim temperature can be downscaled via Equation (1). T ERA_2m is the original 6-hourly ERA-Interim 2 m 10 temperature at 0.25°×0.25° grid. Г describes the lapse rate with a decrease of air temperature with elevation. h ∆ is the altitude difference between 1 km grid (re-sampled from 90 m of SRTM DEM to 1 km) and ERA-Interim grid.
Here, Г represents the ERA-Interim internal lapse rates calculated from temperatures and geopotential heights at different pressure levels such as between 500 hPa and 700 hPa. In general, pressure levels covered the largest altitude range according 15 to each 1 km grid are used. More details about the downscaling method can be found in Gao et al. (2012Gao et al. ( , 2017.

Evaluation criteria
In order to evaluate the downscaled results, two statistical accuracy measures are applied. The root mean square error (RMSE) is used for an assessment of the bias between downscaled data set and observations (Equation (2)). The Nash-Sutcliffe efficiency coefficient (NSE) evaluates the performance of the downscaled data set using Equation (3), which ranges 20 from 1 (perfect fit) to minus infinity (worst fit) (Nash and Sutcliffe, 1970).
Where, T o is observed temperature at time t, T d is downscaled temperature at time t, and n is number of records in the same time series. 25 A measure of skill based on probability density function (PDF) proposed by Perkins et al. (2007) was applied in this analysis.
This skill score calculates the cumulative minimum intersections (or overlaps) between the two distribution binned values in the given PDFs. The skill score ranges from 0 to 1. The value 1 means the PDF shape of downscaled temperature match the observed PDF perfectly. The value 0 means there is no common area between the PDFs of observations and downscaled temperatures. In other words, these two PDFs are independent completely. The PDF-based skill score is calculated via Equations (4) and (5): Where n is the number of bins for PDF calculation. P d is the frequency of values in a given bin from the downscaled 5 temperatures. P o is the frequency of values in a given bin from the observations (Perkins et al., 2007, Gao et al., 2016. In this downscaling, 1 °C is used as the intervals of bins for all skill score calculations. The PDF-based evaluation method allows merging data from multiple stations across different time periods (Perkins et al., 2007, Gao et al., 2016. This evaluation method has been proven that it can show more credible climate variations especially for extreme values than the conventional mean-based assessment method (Gao et al., 2016;Mao et al., 2010). 10 Furthermore, 1% quantile and 5% quantile temperatures are represented for extreme low temperatures, while 95% quantile and 99% quantile temperatures are selected for extreme high temperatures. Quantile function is more reliable than the absolute minimum and maximum values, especially for the data sets in different time series. It is worth noting that downscaled temperature is accordingly averaged over 9 grid points surrounding each meteorological station. The downscaled temperature is in 1979-2016 and the observation is in 1979-2013. These different durations of two data sets does 15 not affect the comparison using PDF-based skill score and quantile function.

Evaluation of original ERA-Interim temperature data
The correlation coefficient ranges from 0.949 to 0.995 with an average of 0.986 for all stations (the detailed results not shown here). This high correlation coefficient indicates that the original ERA-Interim temperature captures the temporal 20 variation of observations very well.   Figure 2 shows the comparison of observation and original ERA-Interim temperature data at station Nos. 10, 24, 9 and 15. ERA-Interim underestimates observations for station Nos. 9 5 and 15 due to its higher grid elevation, especially for higher temperatures (Figure 2c and 2d). The ERA-Interim estimates quite well at station Nos. 10 and 24 for both lower and higher temperatures (Figure 2a and 2b).
The PDF-based skill score ranges from 0.67 to 0.94 with an average of 0.86 over all stations. Four stations (Nos. 8,9,13 and 15) have the skill scores lower than 0.8. The highest skill score is found at station No. 24 while the smallest one is found at station No. 9. Figure 3 shows greater detail especially in terms of the temperature bins of the PDFs, which could help to 10 easily identify how well ERA-Interim estimates for lower and higher temperatures. For station No. 10, the original ERA-Interim captures the shape of the observed PDF very well, particularly in the range of 0 °-15 °C. In the range of round -10 °-0 °C and higher than 20 °C, the observed probability is higher than original ERA-Interim. However, the probability of original ERA-Interim is higher than observations for lower temperatures (< -10 °C). For station No. 24, the original ERA-Interim fits the shape of observed PDF very well almost for entire temperature range. It only has lower probability for 15 temperatures lower than -10 °C. For higher temperature (> 20 °C), the probability of original ERA-Interim is slight higher than observation. For station No. 9, the shape of PDF from original ERA-Interim is higher (higher probability) for the temperatures lower than 12 °C and is lower (lower probability) for the temperatures higher than 12 °C compared with observed PDF. For station No. 15, the original ERA-Interim does not match the shape of observed PDF generally. The probability of original ERA-Interim is much higher than observation for the temperatures cooler than about 25 °C while is 20 much lower than observation for higher temperatures (> 25 °C). The above analysis indicates that although the original ERA-Interim captures the temporal variation of observation very well, the large RMSE and poor performance for extreme temperatures suggested that a downscaling process of ERA-Interim is necessary for the application at individual sites.

Validation of downscaled ERA-Interim temperature data
The average correlation coefficient changes from 0.986 to 0.987 for all stations (not shown here) with respect to original and 25 downscaled ERA-Interim temperature. have RMSEs lower than 3.0 °C after downscaling process. Figure 4 shows the comparison of observation and downscaled ERA-Interim temperature data at station Nos.10, 24, 9 and 15. The scatter plot shows slight improvement (for higher temperature) at station Nos. 10 and 24 compared with original ERA-Interim (Figure 2 and 4). However, the scatter plot are 5 more concentrated along the 1:1 line for station Nos. 9 and 15, especially No. 15 (Figure 4c and 4d), which suggests that the downscaling method works significant at these two stations.
The average PDF-based skill score increases from 0.86 to 0.91 for all stations ( Table 2) procedure reduces significant PDF discrepancy for station No. 15 which has a lower elevation (35 m). The validation suggests that the downscaling method is reliable generally and it could reduce RMSE and PDF discrepancy significantly for most of the stations, although it does not perform well for some individual sites.

Climatology of the Chinese Tianshan Mountainous based on the high-resolution data set
The low density of meteorological stations in the CTM may lead to an uncertainty for the representation of the entire area 25 temperature climatology. Interpolation is also risk to represent the climatology because it is based on the meteorological stations. Here, the entire area climatology is investigated using the downscaled high-resolution data set from 1979-2016.
Quantile temperature is more reliable than absolute minimum and maximum temperatures if outlier exists. Thus, 1% quantile and 99% quantile of long-term series is selected to represent the extreme cold and hot temperatures, respectively. Figure 5 shows a new comprehensive climatology of extreme low temperature over the CTM based on downscaled ERA-Interim 30 temperature. The 1% quantile temperature ranges from -49 to -12 °C, which is consistent with the topography. The lower temperatures could be found at the Borokonou, Bogda and Khark Mountain. The extreme cold temperatures (< -40 °C) are in  high mountain areas such as Tomur Peak and Bogda Peak could be higher than 0 °C. The minimum temperature of 99% quantile is around -3 °C. Figure 7 shows the mean temperature in the CTM. The temperature ranges from -25 to 16 °C. The mean temperature distribution is consistent with extreme cold and hot temperatures. It suggests that the topography is a significant impact factor for temperature. Table 3 illustrates the climatology comparison for the entire CTM using downscaled ERA-Interim temperature and 24 10 stations. 1% and 5% quantile are selected to represent the colder temperature while 95% and 99% quantile are for higher temperatures. Please note that these two data sets have different spatial resolution and time series, which means the different sample numbers. However, it does not affect too much on the general climatology comparison. The new date set underestimates than observations except 1% quantile, which suggests that it may be warmer than observation in the cold season. Only a 0.5 °C difference is found between the two data sets for 5% quantile. 1.3 °C, 1.9 °C and 1.8 °C is 15 underestimated by the new data set for 95% quantile, 95% quantile and mean temperature, respectively. The average elevation of all 24 sites is 1862 m, which is 429 m higher than the average 1km grids height (1433 m). The altitude difference could explain partly for the bias in Table 3. It can be concluded that the new data set could generally capture the climatology of the entire CTM, especially for warmer temperatures.

Discussion and conclusion 20
Although the average temporal correlation (R = 0.986) between ERA-Interim and observations over the CTM is encouraging, an average RMSE of 3.75 °C suggests that a downscaling process of ERA-Interim is needed. There are many factors may lead to the errors such as assimilated observation errors, model background errors and operator errors in ECMWF system. However, previous studies have shown that the elevation difference between the ERA-Interim grid and the individual site is the key factor for the errors in the high mountains such as the European Alps and the Tibetan Plateau (Gao et al., 2012, Gao 25 et al., 2014aGao et al., 2017). Therefore, it is possible to reduce such errors as well as downscale the grid value to finer local scale via elevation-based methods. Gao et al. (2012Gao et al. ( , 2017 claimed that the method based on ERA-Interim internal vertical lapse rates outperformed several conventional methods such as using fixed monthly lapse rates and observed lapse rates from meteorological stations in the European Alps and the Tibetan Plateau. The performances were similar to ERA-Interim internal vertical lapse rates derived 30 from different pressure levels in European Alps (Gao et al. 2012). Among these methods, using the lapse rate calculated from the pressure levels, covering the highest and lowest meteorological stations, was proven more accurate in the Tibetan Plateau (Gao et al., 2017). The most prominent advantage is that this method is fully independent from the observed data.
Therefore, it provides a possibility to extrapolate ERA-Interim temperature data for any other high mountain areas where no measurements exist.
Base on this hypothesis, the 0.25°×0.25°, 6-hourly ERA-Interim 2 m temperature data is downscaled to 1 km grid from 1979 5 to 2016 in the CTM, where is serious lack of long-term temperature observations. To our knowledge, the presented data is the first set (Version 1.0) with the high spatial-temporal resolution in such a long term series for this region. To evaluate the quality of the new data set, observations from 24 meteorological stations were used for comparison. The average NSE enhances by 5% from 0.90 to 0.94 and an average 0.9 °C (24%) of RMSE is reduced from 3.75 to 2.85 °C. The average PDF-based skill score increases from 0.86 to 0.91 after downscaling over all test sites. Except a few sites, the downscaling 10 method has a good performance for most of sites, especially for the site located in the valley (No. 16 Baicheng). The PDF shape of new high-resolution grid data fits the observed PDF much better in comparison with the original ERA-Interim temperature. The validation indicates that the downscaling method is reliable and the RMSE and PDF discrepancy is reduced significantly for most of the stations.
The new data set captures the climatology of the entire CTM very well. It seizes the distribution characteristics like high 15 temperature in the river valleys and basins and low temperature in the peaks. The bias is only 1.8 °C between the downscaled and the observations. Except the extreme cold temperature, the new data set has less 2 °C bias compared with observations.
In general, the downscaled data set is appropriate for the climate impact assessment and has the potential to be used for hydrological and environmental modelling.
For sure, some issues about the quality of the data set as well as the validation should be addressed here. The main hypothesis is 20 that the elevation plays the crucial role for temperature changes. It means that the temperature changes follow the lapse rate law in the vertical direction. However, in the horizontal direction, the micro-topographical features like aspect and slope of the mountain possibly affect the temperatures in a short time. Temperature can be significant different in the shady slope and sunny slope. The grid height of the data set is derived from SRTM DEM, which is re-sampled from 90 m to 1 km. As shown in Figure 1, the highest altitude from the re-sampled SRTM is about 7100 m, which is lower around 300 m than the highest peak (Mount 25 Tomur, 7443.8 m) in the CTM. Therefore, the DEM bias may lead to a systematic error compared with observations. The number of meteorological stations for validation here is limited to 24. These stations are mainly located in the valleys and basins. Thus, it is difficult to evaluate the credibility of data set in the high mountains where more glaciers covered. Other observational resources like remote sensing data are helpful for further validation. Due to the limitation of computational resource and the accessibility of data source (only 6-hourly ERA-Interim temperatures is open access for public), the resolution of this data 30 set is limited to 6-hourly and 1 km grid spacing. However, the current data set (~187 GB) is huge to process and store. The computational resource and the disk usage of the data set will increase exponentially when the spatial-temporal resolution becomes Earth Syst. Sci. Data Discuss., https://doi.org /10.5194/essd-2018-73 Open       Table 2.