An integrated dataset of daily lake surface water temperature over Tibetan Plateau

. Lake surface water temperature (LSWT) is a critical physical property of the aquatic ecosystem and an evident indicator of climate change. By combining the strengths of satellite-based observation and modelling, we have produced an integrated daily lake surface water temperature for 16 0 lakes across the Tibetan Plateau where in-situ observation is limited. The MODISsatellite-based lake-wide mean LSWT in the integrated dataset includes that for the daytime, night-time and for 15 the daily mean for the period 2000 – 2017. The MODIS-based daily mean LSWT is used to c. The dataset is comparable with other satellite-based LSWT products (e.g., LSWT from AVHRR and ARC-Lake) with Nash–Sutcliffe efficiency coefficient (NSE)>0.7 for most lakes and unique for its tempo-spatial span (1978-2017) and resolution (daily). Calibrated a simplified physically based model (i.e., modified air2water model), upon which a and validated against the satellite-based LSWT, complete and consistent daily LSWT dataset is reconstructed have been reconstructed for and extended to the period 1978 – 20 2017 basing on the modified air2water model. The reconstructed LSWT dataset is validated by comparing against both the satellite-based (including LSWT from MODIS, AVHRR and ARC-Lake) and in-situ observations. The validation shows that the reconstructed LSWT


Introduction
Lake surface water temperature (LSWT) is a critical physical property of the aquatic ecosystem and an evident indicator of climate change (Austin and Colman, 2008;Livingstone, 2003;Williamson et al., 2009). Rapid rise in water temperature has been observed in many lakes around the world, which not only reflects the changes in lake heat budget associated with global 35 warming (Bates et al., 2008;Dokulil, 2014) but also has resulted at a succession of changes in physical, chemical, and biological processes within the lake system (Hondzo and Stefan, 1993;Ke and Song, 2014;Naumenko et al., 2006;Ngai et al., 2013;Rahel and Olden, 2008;Schindler, 2001;Woolway and Merchant, 2017;Liu et al., 2021). One of the noticeable consequences of changes in lake water temperature is the considerable change in lake ice phenology (e.g., the freezing-up and breaking-up dates of lake ice) found at mid-to high-latitude or high-altitude regions around the world during the past decades (Livingstone, 40 1997;Takacs et al., 2018;Tian et al., 2015;Prowse et al., 2011).
Lake water temperature records based on in-situ measurements are not easily widely available due to its costs and the geographical restrictions of the TP. In-situ lake water temperature of 2 lakes (Bangong Co and Dagze Co) could collected in National Tibetan Plateau Third Pole Environment Data Center. However, iIn the Tibetan Plateau (TP), where there are more than 1,100 alpine lakes with area larger than 1 km 2 and elevation above 4000 m, most of the lakes have no in-situ water 45 temperature records due to its harsh nature for ground observation (Zhang et al., 2014). Satellite-based observation has been showing highly powerful in providing continuous worldwide records of lake surface temperatures and has developed rapidly in recent decades with increasingly higher temporal and spatial resolutions (Liu et al., 2019;Prats et al., 2018;Schneider and Hook, 2010). At global scale, currently accessible satellite-based lake surface temperature datasets include that from the ARC-Lake (ATSR Reprocessing for Climate: LSWT & Ice Cover) (Layden et al., 2015) and the GLTC (Global Lake Temperature 50 Collaboration) (Sharma et al., 2015). The global datasets are highly valuable in stimulating research on inland water bodies (e.g., Moukomla and Blanken, 2017;Piccolroaz et al., 2020;Torbick et al., 2016;Zhang et al., 2014). The datasets however are limited for use in the Tibetan Plateau as they cover only a handful of lakes within the TP region and are temporally incomplete (Liu et al., 2019). Two other surface water temperature datasets forof lakes in the Tibetan Plateau have recently been produced by Wan et al. (2017) and Liu et al. (2019) respectively using AVHRR and MODIS respectivelydifferent sensors. 55 One of the datasets bases on MODIS land surface temperature products and provides 8-day mean surface temperature of 374 lakes for the period 2001-2015 (Wan et al., 2017), while the other bases on AVHRR and presents daytime lake surface water temperature of 97 lakes with area above 80km 2 for the period 1981-2015 (Liu et al., 2019). Both the two datasets for lakes in the Tibetan Plateau, however, have quite a few missing data caused by revisit period of satellite orand inconsistency in the time series due to calibration among the successive satellites. 60 Lake surface water temperature can alternatively be derived or reconstructed basing on well calibrated modelling (Layden et al., 2016;Prats and Danis, 2019). Process-based numerical models have been used widely to investigate thermodynamics of lakes at local scales Kirillin et al., 2017;Launiainen and Cheng, 1998;Peeters et al., 2002;Stepanenko et al., 2016), which usually requires detailed over-lake meteorological data (e.g., wind speed, humidity, cloud cover) as inputs and in-situ measurements to calibrate and validate the models (Bruce et al., 2018). For regions with scarce in-situ 65 meteorological observations, simple statistical models (Mccombie, 1959;Webb, 1974;Sharma et al., 2008) or simplified physically based model (Piccolroaz et al., 2013) have been developed for use, which may require only air temperature as input.
The air2water model proposed by Piccolroaz et al. (2013) is a hybrid model with strong physical base but simplifies the thermodynamic equations to minimize the input requirement while preserving the robustness of deterministic models. The air2water has been shown capable in to providinge similar performance, in terms of simulating LSWT, to process-based 70 models , and to be an effective tool has been tested to be an effective tool in reconstructing historical LSWT (Czernecki and Ptak 2018;Piccolroaz et al., 2020;Schmid and Köster 2016) and in investigating LSWT responses to climate change for lakes with different morphological characteristics around the world (Piccolroaz et al., 2015;Prats and Danis, 2019;Toffolon et al., 2014). The simplified model is especially competitive and practicable for regions with limited in-situ observations to drive and calibrate a more complex process-based model. 75 We herein combine together the strengths of both the remotely sensed and model-based approach in producing an integrated dataset on daily surface water temperature of 160 large lakes across the Tibetan Plateau for the period 1978-2017. For the combination, the remotely sensed LSWT is used to calibrate and validate the air2water model, which is then applied to reconstruct a complete and consistent daily LSWT time series for each studied lake and extend LSWT dataset to a wider time span. The integrated dataset demonstrates potentials in investigating variabilities and changes in LSWT across the Tibetan 80 Plateau during the past decades. It could be valuable for assessing the impacts of climate warming on the dynamics of water and heat budget, water quality and aquatic biota in lakes across the Tibetan Plateau.

Study area
The Tibetan Plateau locates between 26°00′-39°47′N and 73°19′-104°47′E with mean elevation over 4500 m and area about 2.5×10 6 km 2 . It is known as the "Asian water tower", contributing to most major rivers in Asia (Wu and Lei, 2014). The total 85 area of lakes in the plateau is around 45,000 km 2 , most of which are located at the altitudes between 4000 m and 5000 m (Yao et al., 2015). Our dataset includes 160 lakes in the Tibetan Plateau covering most major lakes with area above 40 km 2 (Fig.1).
All lakes were selected from Records of Lakes in China (Wang and Dou, 1998). The general properties of the studied lakes are listed in the lake_info.csv file of the dataset, which includes the names, locations and areas of the lakes. 3 Methods Fig. 2 shows the overall framework of our study in integrating lake surface water temperature dataset for the largest lakes across the Tibetan Plateau. The MOD11A1 product is the source data for the satellite-based LSWT, while the air temperature 95 data from meteorological station is the source data used to drive the air2water model. The satellite-based observation is used to calibrate and validate the air2water model. The details of the procedures and methods are described in the following sections.

Satellite-based observation of lake surface water temperature 100
The first step of our effort in producing the integrated LSWT dataset is to derive the satellite-based observation from the MOD11 (version 6) provided by NASA's Earth Observing System Data and Information System (EOSDIS, https://earthdata.nasa.gov). The MOD11 (Wan, 2013) is the land surface temperature and emissivity products retrieved at 1km pixels by the generalized split-window algorithm and 6km grids by the day/night algorithm mainly based on bands 31 (10.78-11.28μm) and 32 (11.77-12. (1:30 P.M.). The MOD11 includes products with temporal resolutions of daily, 8-day, and monthly respectively. Validation of the MODIS LST product for water surface temperatures against in-situ measurements has been conducted and the absolute differences have been reported to be within the range of 0.8-1.9 ℃K (Crosman andHorel, 2009, 2008;Hulley et al., 2011; Reinart and Reinhold). The bias of MODIS-based LSWT versus in situ was reported to be around is -1.74℃  and -1.4℃ (Zhang et al., 2014) over the TP (Zhang et al., 2014) when compared to the limited in-situ observations. Different from the 8-day dataset by Wan et al. (2017), herein, we used MOD11A1 (Terra product of Land Surface Temperature/Emissivity Daily L3 Global 1km) instead of MOD11A2 to produce the daily lake surface temperature dataset.
The daily product is used herein also because it is more suitable to calibrate and validate the air2water model at the same temporal resolution. The MOD11A1 is a tile of daily Level 3 product at 1km spatial resolution corresponding to the earth 115 locations on the sinusoidal projection.
In our dataset, the satellite observation of LSWT for a specific lake is the lake-wide mean temperature of all 1-km pixels within the lake, which is different to the temperature of the centroid pixels presented by Zhang et al., (2014) and is more consistent with our model settings. In calculating the lake-wide mean surface water temperature, pixels within a lake are identified by the boundaries of the lakes (shapefiles included in the dataset), which are mainly from National 120 Tibetan

Reconstructing lake surface water temperature
The satellite-based lake surface water temperature is accessible since 2000 and with missing data due to the limitation in the 130 MODIS. To investigate the long-term changes in surface water temperature for lakes in the Tibetan Plateau and their responses to climate change, in our integrated datasets, we used a slightly modified air2water model to reconstruct the daily lake surface water temperature for the period 1978-2017 based on daily air temperature data from Dataset of daily climate data from Chinese surface stations (V3.0) provided by China Meteorological Data Service Centre of National Meteorological Information Centre (http://data.cma.cn/data/cdcdetail/dataCode/SURF_CLI_CHN_MUL_DAY_V3.0.html). 135 The air2water (Piccolroaz et al., 2013) is a semi-physical model designed to simulate lake surface temperature principally based on lake surface heat balance and can be expressed as: where T w is surface water temperature, is the net heat flux per unit surface, is the surface area of the lake, is the volume of water involved in the heat exchange with the atmosphere, and is the density and specific heat capacity of water. 140 The model considers all contributions to the heat balance ( ), which are represented as a function of air temperature or are parameterized (Piccolroaz et al., 2013;Piccolroaz, 2016). The original air2water model has simplified the lake heat balance by introducing eight calibratable parameters a 1−8 with air temperature as the only required input, which can be expressed as: where, T a is air temperature, t y is days of a year. δ = D/D r is normalized well-mixed depth, where D is the depth of wellmixed surface layer (the epilimnion thickness), D r is the maximum thickness. T h is the deep water temperature with default value assumed being to be 4℃. The a 1−4 in Eq.
The original air2water is limited to simulate the surface temperature of open water, which results in difficulties for applications 150 in lakes with long ice cover duration. For this sake, we assume that when the lake is completely covered by ice, the heat exchange between air and water is blocked and surface energy balance becomes: dT i dt = a 9 + a 10 T a − a 11 T w + a 12 cos [2π ( t t y − a 13 )], where T i is the ice surface temperature, a 9−13 have similar physical significance to a 1,2,3,5,6 . To represent the state-shift of lake between open water and ice-covered, two additional parameters a 14 and a 15 are introduced to determine the states. The lake 155 surface temperature T L can be expressed as: Where, K ice = √(a 15 − T L )/(a 15 − a 14 ) is the proportion of ice on the surface of the lake. Same as the original model, the model equations were solved numerically by using the Crank-Nicolson numerical scheme at daily time step.
The daily air temperature used to drive the modified air2water model are from 31 meteorological stations in the Tibetan Plateau 160 (Fig.1). The air temperature above each lake is interpolated from the nearest station and adjusted by a lapse rate of 0.65℃ 100m -1 (Hu et al., 2014). The model is calibrated against the derived remotely sensed LSWT described in Section 3.1 using the PSO (Particle Swarm Optimization) approach (Kennedy and Eberhart, 1995;Piccolroaz, 2016). The objective function of model calibration is the widely adopted Nash-Sutcliffe efficiency coefficient (NSE) (Nash and Sutcliffe, 1970). To ensure the rationality of the calibrated model parameters, a physically consistent priori range are assigned to each parameter (Piccolroaz, 165 2016) and D r is bounded by the average depth of the lake obtained from Records of Lakes in China (Wang and Dou, 1998) and HydroLAKES (Messager et al., 2016). The calibration period is set to be the period 2000-2012, while 2013-2017 is considered as the validation period. One backgroud that should not be ingnoredIt should be noticed that is that the satellitederived water temperature is measurements correspond toof the instantaneous water temperature at the top of the surface (∼10-20µm deep), known as skin temperature. The skin temperatures can differ from surface water temperature because the 170 thermal structure of the first metres of the water column is not uniform under all conditions. Nevertheless, satellite-derived water temperature data (skin temperatures) are relevant and sufficient to complement the data used for the calibratingon and validatingon of hydrodynamic or and water quality models of lakes (André assian et al., 2012;Prats and Danis, 2017;Prats et al., 2018). For comparison and further validation, the dataset produced by Liu et al. (2019) based on AVHRR was also used.

MODIS-based LSWT of lakes across the Tibetan Plateau
Three subsets of lake-wide mean surface water temperature are derived and included in our integrated dataset, which are daily daytime LSWT, daily night-time LSWT and daily mean LSWT for the period 2000-2017. The lake surface water temperature derived from the MOD11A1 is compared against two satellite-based datasets released by other researches, which were based on AVHRR (Liu et al., 2019) and ARC-Lake (Layden et al., 2015) respectively. 180 lakes in the TP, which however is incomplete and with different length of data gaps among the lakes. It should be noted that the AVHRR-based LSWT from the TPlake_Temp is that of the daytime instead of the daily mean, hence, it is compared against our MODIS-based daytime LSWT instead of mean daily LSWT herein. It is found that the LSWT from the two sensors (i.e., 185 AVHRR and MODIS) are highlylargely comparable with each other for most of the studied lakes, where the NSE and R 2 between the LSWT from the two datasets is higher than 0.76 for more than 8096% of the lakes, and the bias ranges from 0.6-3.0 to 3.385.6℃. It is worth mentioning that the MODIS-based LSWT would be better for investigating the long-term changes in lake surface temperature in the Tibetan Plateau since as the AVHRR-based LSWT from the TPlake_Temp dataset has more missing data and was reported having abrupt shifts in LSWT due to inconsistent calibration among the successive satellites 190 (Liu et al., 2019).

195
We further compare the MODIS-based LSWT against that from ARC-Lake. The ARC-Lake data set (version 3) included ATSR-2/AATSR-based lake surface temperatures for the period 1995-2012 for globally-distributed 1,628 target water bodies distributed globally. during 1995 to 2012., Here where tthe LSWTs of daytime and night-time in ARC-Lake are averaged to present daily mean LSWT. Since the ARC-Lake LSWT product is only for open water and tag temperature below 0℃ as "frozen" season with value 0 ℃, in the comparison, only the period when both MODIS-based and ARC-Lake LSWT are above 200 0℃ is considered. LSWTs of 11 studied lakes (location shows in Fig. S1 in Supplementary) are available from the ARC-Lake for the comparison. Besides Without the consideration of the some outliers where ich the difference between the two satellite observations isare larger than 5 ℃ ( around 8 % of the observations in the studied periods), as shown in Fig. 4, the two satellitebased observations of LSWT are mostly comparable with NSE>0.7 and bias ranging at -1.23 ℃ to 0.94 ℃.

Reconstructed LSWT by air2water
The reconstructed LSWT by the slightly modified air2water model is compared and validated by the MODIS-based LSWT.
As shown in Fig.5a and 5b, the spatial pattern of the long-term mean annual lake surface water temperature from the 210 reconstructed LSWT is close to that based on MOD11A1 with overall NSE=0.977 and R 2 =0.987. For the period 2000-2017, NSE and R 2 of most lakes (>90%) are both above 0.8 with bias ranging at ±0.55℃. The bias for the validation period (2013-2017) is relatively slightly higher than that for the calibration period (see Fig. S21 in the Supplementary) range from -1.7 to 0.9 ℃. Although there is a tendency showing that NSE is relatively higher for lakes with lower altitude and higher latitude, model performance is found weakly related to altitude and latitude of the lakes (see Fig. S32 in the Supplementary). The results 215 indicate there is no substantial systematic bias in the reconstructed LSWT. The modified air2water model hence is reliable in reconstructing LSWT for lakes at different altitude or latitude zones. However, it should be noticed that uncertainties exist in the reconstructed LSWT due to the uncertainties not only in the calibrated model but also in the air temperature inputted to the model. Summaries of model performance evaluation for each month are presented in Fig. S4 (in the Supplementary) show that the simulated datasetLSWT is with smaller bias shows more reliable in July and November. As mentioned in Section 3.2, the 220 daily air temperature above the lakes are is that interpolated from the nearest meteorological stations with elevation adjustment, where the performance of the interpolation would be affected by the density and locations of the stations. Currently, the available meteorological stations are sparsely located at concentrated on the east western part of the TP (Fig.1), which may result at higher inherent uncertainties of the reconstructed LSWT for lakes in the western TP. As shown in Fig.5c and 5d, though there is no significant difference in model performance with respect to the locations of the lakes, it is worth exploring 225 further in the future research the effects of the interpolation approaches on the simulation of LSWT.  The in-situ observed LSWT is ation data however are not widely available for lakes in the Tibetan Plateau to conduct an overall validation of the modelling results. Nevertheless, we have compared the modelling results against the currentlybest publicly available in-situ surface water temperature data (Guo et al., 2016;Li et al., 2015;Liu et al., 2021;Wang and Hou, 2018) for lakes in the Tibetan Plateau (Table S1 in Supplementary), which include sequential observation of 4 lakes (i.e., the Ngoring 235 Lake, Serling Co, Dogze Co, Bangong Co) and sporadic observation (simulated lake surface water temperature of day same to the observation) of 41 lakes. The information of in-situ data was shown in Supplementary. As shown in Fig.6, the simulated lake surface temperature is in good agreement temporally with the sequential observations (R 2 =0.97, 0.92, 0.90, 0.97 for Ngoring Lake, Serling Co, Dogze Co, Bangong Co respectively) and spatially with the sporadic observation (R 2 =0.94).

Long-term trends of LSWT in the Tibetan Plateau
On the basis of the reconstructed daily LSWT dataset, the long-term changes of surface water temperatures of lakes across the 245 Tibetan Plateau are detected by using the Mann-Kendall trend analysis approach (Kendall, 1955;Mann, 1945). The daily and annual variation of onea sample lake is was shown in Fig. S5 (the Supplementary) (Fig. S5) . As shown in Fig.7 shows that, for the period 1978-2017, the annual LSWT of most lakes (except for Lake Beidao and Lake Changhong) increase significantly at rates ranging from 0.01 to 0.47 ℃ 10a-1, which is consistent with but comparingly smaller than that of the increasing rate in air temperature (0.03-0.09 ℃ 10a-1) indicating the contributions of heat storage capacity of the lakes. Lakes in the southern 250 TP are found generally with higher warming rate than those in the northern TP. Fig. shows that, 6, for the period 1978-2017, the annual LSWT of most of the lakes (except for Lake Beidao and Lake Changhong) increase showed significantly increasing trends in annual LSWT with at increasing rates ranging from at 0.01 to 0.47 ℃ 10a -1 . Lakes in the southern TP are found generally with the higher warming rate than those in the northern TP. st increasing rate are found locating at the northwest of the Tibetan Plateau. The increase in LSWT is more evident in the winter season (December to -FeburaryFebruary) than in the 255 summer season (June to August). Except for Lake Kuhai Lake, the LSWTs of all the other lakes increase significantly at a rate ranging at 0.06 and 0.96 ℃ 10a -1 . In summer (June-August), while most of the lakes (125 out of 160) show significantly increase in LSWT, the LSWTs of 13 lakes locating at the northern part of the TP decrease insignificantly. The lakes with decreasing trend in summer LSWT could be because that the warming climate has been leading to more melting glacier water flowing into the lakes in the season. It should be noted that the warming and cooling trends could inherit uncertainties from 260 the modelling, which The cooling effect of melted glacier water, however, needs further validation when more in-situ observations becoming available. exploration in the future research.  Table 1 gives the details of the data included in our integrated dataset. In the dataset, the properties of the lakes (including Time series plot for each lake are also included in the dataset for readers to have a quick view. The dataset is archived and openly accessible via the Zenodo portal: https://doi.org/10.5281/zenodo.5878436https://doi.org/10.5281/zenodo.5111400 275 (Guo et al., 2021). The GEE scripts are available at: https://code.earthengine.google.com/563820c56b30595de901c21aef5f0c71. Model-based daily lake surface water temperature. 1978-2017

Conclusions 280
An integrated daily lake surface water temperature has been produced for 160 lakes across the Tibetan Plateau by combining the strengths of satellite-based observation and model-based approaches. The satellite-based lake-wide mean LSWT is derived from MOD11A1 via Google Earth Engine, which includes that for the daytime, night-time and for the daily mean for the period 2000-2017. The dataset is found comparable with other satellite-based LSWT products (e.g., LSWT from AVHRR and ARC-Lake) but unique due to its tempo-spatial span and resolution. The satellite-based LSWT enables the calibration and 285 validation of a simplified heat balance model (i.e., air2water) at regions with scarce ground-based observation (like the Tibetan Plateau). The modified air2water model is found successfully in reconstructing the daily LSWT of lakes across the Tibetan Plateau by extending the LSWT time series to the period 1978-2017 and filling the time gaps in the satellite-based LSWT though with uncertainties from model inputs and model parameterization. The completeness and consistency of the reconstructed LSWT 290 therefore is reliable and valuable to investigate the long-term variation and changes of LSWT in the Tibetan Plateau. According to the reconstructed LSWT dataset, the annual LSWT is found increased significantly in the period 1978-2017 with increase rate ranging at 0.01 to 0.47 ℃ 10a -1 . The warming trend of the lakes is more evident in winter than in summer. The integrated dataset together with the methods introduced herein can contribute to the research community in exploring water and heat balance changes in the Tibetan Plateau and the consequent ecological effects in the future researches. 295 Author contributions. YW, HZ, BZ and LG conceived the research. LG, YW and HZ developed the approaches and datasets.
LG, MW and LF collected basic data of lakes. JL, SW checked the results. LG, HZ and YW wrote the original draft. BZ and LZ revised the draft.
Competing interests. The authors declare that they have no conflict of interest.