Lake area and volume variation in the endorheic basin of 1 the Tibetan Plateau from 1989 to 2019 2

The Tibetan Plateau, known as "the third pole of the Earth", is a region susceptible to 13 climate change. With little human disturbance, lake storage changes serve as a unique indicator of 14 climate change, but comprehensive lake storage data are rare in the region, especially for the lakes with 15 an area less than 10 km which are the most sensitive to environmental changes. In this paper, we 16 completed a census of annual lake volume change for 976 lakes larger than 1 km in the endorheic 17 basin of the Tibetan Plateau (EBTP) during 1989-2019 using Landsat imagery and digital terrain 18 models. Validation and comparison with several existing studies indicate that our data are more 19 reliable. Lake volume in the EBTP exhibited a net increase of 193.45 km during the time period with 20 an increasing rate of 6.45 km year. In general, the larger the lake area, the greater the lake volume 21 change, though there are some exceptions. Lakes with an area less than 10 km have more severe 22 volume change whether decreasing or increasing. This research complements existing lake studies by 23 providing a comprehensive and long-term lake volume change data for the region. The dataset is 24 available on Zenodo (https://doi.org/10.5281/zenodo.5543615, Wang et al., 2021). 25 26

observation data, nor have they covered more than 75% of the lakes in the TP. Without a long-term 62 comprehensive census on lake volume change, it is difficult to study the impacts of climate change on 63 the hydrological system in the region.  We used the JRC-GSW data to identify the lakes in the study area. All the pixels with a positive water 115 cover frequency on the water occurrence band of the JRC-GSW data were retained, representing the 116 maximum water extent between 1984 and 2019. From those water pixels, spatially connected pixels were 117 identified as individual waterbodies and those with an area larger than 1 km 2 were kept. Some of those 118 waterbodies include both lakes and the rivers connected with them, especially for large lakes (Fig. 3).

119
The border between lakes and rivers is hard to define but we assume that the primary waterbody of a lake 120 is relatively flat and should have a slope close to zero. We used SRTM DEM to calculate the slope for 121 each waterbody pixel. Pixels with a slope greater than 0° are considered rivers and removed from the 122 waterbody. In this step, several patches of waterbody pixels may occur. We visually inspected those 123 patches and only kept the patch that represents the approximate extent of the lake associated with the 124 waterbody. This approach worked effectively for water bodies larger than 50 km 2 and the approximate 125 lake extents of 490 lakes were identified this way. In the process, we found there is a river linking two 126 lakes from high resolution remote sensing images (see Sect. 5.1 and Fig. 15). For these two lakes, the 127 linking river was kept and these two lakes were treat as one lake in our research. This situation happened 128 only once and these two lakes were usually treated as separate lakes in former reseach (Li et

139
In addition to lake approximate extents, a point is created for each lake (hereafter lake seed) to identify 140 and distinguish the target lake from other waterbodies within its analysis extent. The centroid point of 141 each lake's approximate extent was calculated as the initial lake seed location but these points were 142 manually checked and edited if necessary to make sure they are inside their lake approximate extents.

144
Although the JRC-GSW data provide global monthly surface water map, it is not designed for mapping 145 alpine lakes specifically. As such, we developed our own method for mapping lakes in the EBTP from

147
Based on the lake approximate extents obtained in Sect. 3.1, a 5 km buffer was generated around each 148 extent and all the analyses hereafter in this section are confined to this analysis extent. Since there are 149 more than 30,000 Landsat images in our study area within the study period, Google Earth Engine (GEE)

162
With the annual composite images, lake water pixels are classified using normalized difference water 163 index (NDWI) (Gao, 1996) Fig. 2). A 120-m double-sided buffer was then generated around the shorelines and Otsu method 179 was applied to obtain an optimal threshold that separates water from background pixels within the buffer.

180
This locally derived and image specific threshold was then used to extract lake pixels. As water level changes, some lakes may have several separate waterbodies in some years due to reduced 183 water volume. To handle this situation, we merged all the annual water pixels within a lake's analysis 184 extent and, from which, we then identified the spatially connected water pixels which contains the lake's 185 seed as the lake's maximum water extent during the study period. The maximum lake water extent is 186 then used to identify annual lake water pixels and calculate annual lake area (see red box in Fig. 2). In 187 this way, even if a lake has separate waterbodies in some of the years, all the waterbodies are counted as 188 parts of the same lake.

189
The Landsat imagery has several series, including

203
Because of this, we used DTM data to estimate lake surface elevation.   Table 2:

229
(1) If the number of data pairs is zero or one, we generated a new list of elevation-area pairs from the 230 selected DTM with eight pairs whose area starts with MaxA*1.5. The LR method was then used to derive (2) If the number of data pairs is two, we directly used LR to derive the elevation-area relationship 233 (labelled LRC);

234
(3) If the number of data pairs is equal to or greater than five and lake area range from the selected DTM   246

240
where E denotes lake surface elevation, and A is the lake area at the elevation. f(E) is the fitted elevation-249 area function using the LR or SOPR methods, and a, b, c, d, and e are the coefficients of the SOPR and 250 LR models.

251
Since the MCI function is not integrable analytically, we cut the lake volume between two dates into    where n is the sample size. and are ith data value in our results and existing datasets, respectively.

273
The range of sMAPE is [0, 2] and the smaller the sMAPE, the smaller the relative error. sMAPE is a 279

331
The large difference in volume might be caused by the gaps in elevation. But a definite conclusion cannot 332 be drawn as Li's data doesn't provide lake area information. 338 339 Figure 9 shows the differences in lake area and surface elevation among the datasets for lake Taro-Co.

340
Our results and the two existing datasets generally have a similar increase trend in surface elevation in

431
Trend analysis was performed for the EBTP, its sub-regions (IB, QB) and different time periods. The 432 slope and coefficient of determination (R 2 ) are shown in Table 5. It suggests that there was a significant 433 increasing trend both in the TP and IB in the recent 30 years. While the trend slope is positive in the QB

434
(0.0700), it is much smaller than that of EBTP (7.28) and IB (7.45). R 2 in the QB is 0.242 and it's 435 significant at 0.01 confidence level indicating a weak increasing trend. Trends in the IB and EBTB are 436 similar in the 7-year periods but this is not the case in the QB. This is mainly due to that most of the lakes 437 are located in the IB. Trend slopes in Table 5 correspond well to Fig. 14     lakes with area larger than 100 km 2 (Fig. 15D) but their RLV increasing rate is less than 0.003 km 3 /y.

456
Background remote sensing image is from http://t0.tianditu.gov.cn/img_c/wmts. Table 6 shows the statistics of annual RLV trend by lake area. In general, the larger the lake area, the

476
Four lakes, with area ranging from 0.97 km 2 to 149.3 km 2 , were selected to explain the typical situations 477 when different methods were used. Figure 16 shows

499
The number of lakes and the minimum, maximum, and average lake area for each method are listed in 500

508
Although lakes with larger area usually have larger RLV trend slope, we found that the range of the 509 change rates for the lakes in 1 -10 km 2 is larger than that for the lakes in 10-50 km 2 in Sect. 4.2. Here 510 we further examined the relationship between lake area and the coefficient of variation (CV) of RLV 511 (Fig. 17). While there is lack of correlation between them, the percentages of lakes with |CV| >10 in the 512 four area ranges are 3.7%, 2.3%, 1.8%, and 2.9% respectively, with lakes in 1 -10 km 2 having the highest 513 ratio. The lakes with extreme RLV are mostly located in the peripheral of the IB and QB (Fig. 17).

517
The minimum, maximum and mean CV of all lakes are -106.65, 82.77 and 0.89, respectively. And 94.36%

518
(921 out of 976) of the lakes have a CV between -1~1, which indicates that the remaining few lakes have 519 significant volume changes in the past 30 years. The annual lake area and RLV of the three lakes with 520 the highest absolute CV are shown in Fig. 18. All three lakes have significant volume fluctuation in the 521 past 30 years. For lake (1) (Fig. 18C(1)), its volume decreased significantly from 1994 to 1996 and 522 increased rapidly from 2003 to 2009. For lake (2) (Fig. 19C(2)), its volume fluctuated cyclically in the  found that lakes of different sizes respond differently to climate change, there is a lack of attention to 537 lakes less than 10 km 2 . This is mainly due to insufficient data of small lakes in the region. Our results

538
indicate that small lakes are important as lakes with higher CV are usually less than 10 km 2 , which,

549
The comparison with three other major existing data products indicates that our dataset is reliable and 550 might be more accurate. To the best of our knowledge, our dataset provides the longest and most 551 comprehensive lake water volume change data in the region, especially for small lakes (1-10 km 2 ). The 552 dataset is valuable in studying the impacts of climate change and water balance in the region.

553
Our research indicates that small lakes with an area in 1 -10 km 2 are most sensitive and have the highest