Water clarity annual dynamics (1984-2018) dataset across China derived from Landsat images in Google Earth Engine

. Water clarity provides a sensitive tool to examine spatial pattern and historical trend in lakes trophic status. Yet, this metric has insufficiently been explored despite the availability of remotely-sensed data, especially for long-term monitoring. Therefore, we utilized Landsat top of atmosphere reflectance products within Google Earth Engine in the period 15 of 1984-2018 to retrieve the average SDD for each lake in each year. We used three Three Secchi disk depth (SDD) datasets were used for model calibration and validation from different field campaigns mainly conducted during 2004-2018. The red/blue band ratio algorithm was applied to map SDD for lakes (> 0.01 km²1 ha) based on the first SDD dataset, where R 2 = 0.79, RMSE = 100.3 cm, rRMSE = 61.9%, MAE = 57.7 cm. The other two datasets were used to validate the temporal transferability of SDD estimation model, which were indicated the model had a stable performance.the SDD estimation 20 model, which were indicated the model had a stable performance of temporal transferability. The annual mean SDD of retrieved across China using Landsat top of air reflectance products in GEE from 1984 to 2018. The spatiotemporal dynamics of SDD were analysed at the five lake regions and individual lake scales, and the average, changing trend, lake number and area, and spatial distribution of lake SDDs across China were presented. In 2018, we found that the lakes with SDDs < 2 m accounted for the largest proportion (80.93%) of the total lakes, but the total area of lakes with SDD between 0- 25 0.5 m and > 4 m were the largest, accounting for 48.28% of the total lakes. 1984-2018, in the lake (TQR) the clearest water with an average value of 3.32±0.38 m, in the Northeastern lake region (NLR) the lowest SDD (mean: 0.60±0.09 m). 10,814 SDD significant increasing and decreasing trends, respectively. At the five lake regions, for Inner (MXR), total in every other lake region significant increasing trends. In the Eastern lake region (ELR), NLR and Yungui Plateau lake region (YGR), 50% of the lakes displayed an increase or decrease in SDD mainly distributed in an area of 0.01-1 km , whereas that in the TQR and MXR were primarily concentrated in large lakes (> 10 km 2 ). Spatially, lakes located this study could aid local, regional and national decision-making on policies and management for protecting/improving inland water quality in China. The research approach 440 implemented also could potentially be used to map water clarity in lakes at the global scale, an effort that can provide useful information for evaluating decadal trends in surface water quality resulting from adoption of pollution control policies.

methods is widely used to retrieve SDD at multiple scales due to its simplicity and operability (Duan et al., 2009;Feng et al., 2019a;Kloiber et al., 2002;McCullough et al., 2012;Olmanson et al., 2011a;Pi et al., 2020;Shen et al., 2020;. In the past, we faced the challenge of how to handle and analyse big data at national or global scales, like remote sensing datasets from different satellites. Since 2010, Google has launched the big geo data platform based on cloud computing, 70 named Google Earth Engine (GEE), which is time-saving for users to do some scientific researches online (such as vegetation, agriculture, hydrology, land cover and other applications) without downloading these satellite images (Amani et al., 2020). The GEE platform mainly comprises datasets of remote sensing, geophysics and meteorology. The remote sensing datasets contain Landsat (1972-present), Moderate Resolution Imaging Spectrometer (MODIS;2000-present) and Sentinel (2014-present) (https://code.earthengine.google.com/). Remote sensing images are selectively used to estimate SDD for 75 specific regions according to their spatial and temporal resolutions, among which the Landsat images can not only be used to examine the long-term (3-4 decades) spatiotemporal variation of SDD but also monitor lakes ranging from the small to the large with its higher spatial resolution (30 m). Yet, the wealth of ecological information contained in the archived Landsat images has not been fully explored. Therefore, the GEE platform is an optimal choice to quickly map SDD long time-series dynamics based on Landsat observation across China. 80 In recent years, a few studies have examined the spatiotemporal dynamics of SDD in lakes across China, but they mainly focused on the large lakes and reservoirs (area >10 km 2 ) Wang et al. 2020a;Zhang et al. 2021). Smaller lakes (area < 10 km 2 ) are widely distributed across the country, but our understanding of their ecological status remains limited. For example, Liu et al. (2020a) used an empirical model and the MODIS red and green bands (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018) within GEE to study SDD variation in 412 large lakes (area > 20 km 2 ) across China. Wang et al. (2020a) applied water color 85 parameters (Forel-Ule Index and hue angle) to MODIS data (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017) and obtained SDD data for 153 lakes (> 25 km 2 ) across China. Zhang et al. (2021) built a simple power function model based on Landsat red band (2016)(2017)(2018) to investigate the spatial distribution of SDD in 641 lakes (≥10 km 2 ) across China. In addition, other investigations of the spatio-temporal variations of SDD have been made using MODIS data for lakes in the Yangtze Plains (50 lakes, >10 km 2 ; Feng et al., 2019a) and in the Tibetan Plateau (64 lakes, >50 km 2 ; Pi et al., 2020). In these studies, the empirical models 90 exhibited better ability than other models to estimate SDD at large-scales.
In this study, we tuned a recently-developed SDD empirical model which has been demonstrated as effective to map the spatial-temporal dynamics of SDD in surface waters based on atmospherically-corrected Landsat reflectance products in GEE Zhang et al., 2021). The overall purpose of this study was to map the spatiotemporal variation of SDD in lakes (> 0.01 km² 1 ha) across China from 1984 to 2018. Specifically, the objectives were to: (1) built a lake SDD 95 estimation model across China based on extensive in-situ measurements; (2) derive SDD of lakes across China using Landsat data embedded in GEE; (3) analyse the inter-annual variability of SDD at the lake regions scale and the individual lake scale. Such a research would provide valuable information regarding water quality conditions and inform future water resources planning and management.

Study area 100
China is a vast and physiographically-diverse country endowed with a large number of lakes. Based on broad regional variations of landforms and climate characteristics, the lakes in China have been grouped into five regions (Ma et al., 2011) ( Fig. 1a). The Inner Mongolia-Xinjiang lake region (MXR) and Tibetan-Qinghai Plateau lake region (TQR) are located in arid or semiarid climates, while the Northeastern lake region (NLR), Yungui Plateau lake region (YGR) and Eastern lake region (ELR) are situated in the Asian monsoon climate zone. The MXR and TQR regions have lower annual precipitation, 105 lower temperature and higher evaporation level than other three lake regions. Regionally, lakes distribution sourced from  is as follows (in decreasing order): 49% in ELR, 22% in NLR, 18% in YGR, 8% in MXR and 4% in TQR ( Fig. 1b). However, on the basis of lake surface area, regional distribution is slightly different and is in the order: TQR (41%) > ELR (30%) > MXR (14%) > NLR (10%) > ELR (6%) (Fig. 1b). The lakes in the plateau region with higher elevation are less affected by human activities, and generally exhibit better ecological condition than lakes in the other 110 regions (Zhang et al., 2019). In contrast, the lakes in the plain regions are frequently influenced by anthropogenic activities, such as urbanization, population growth, agricultural fertilizer and wastewater discharge (Feng et al., 2019a;Tong et al., 2020).

Waterbody mask 120
In this study, we use band ratios (Green/NIR or Green/SWIR), Modified Normalized Difference Water Index (MNDWI), Tasseled Cap Transformation (TC), and a density slicing with multi-threshold approach to build a decision tree for delineating retrieving water body boundaries. Landsat images were classified into two categories: water and non-water areas (Feyisa et al., 2014;Wang et al., 2020b). The extracted water bodies were then converted into polygons with contiguous pixels and stored in a shape file using ArcGIS10.3 (ESRI Inc. Redlands, CA, USA). We divided water bodies into lakes, 125 reservoirs, and rivers according to their shoreline features, and also through referencing to the Global Reservoirs and Dams database (Lehner et al., 2011), Chinese Reservoirs and Dams database, and high-resolution images from Google Earth.
Water bodies with less than four pixels were removed to avoid shoreline interference. Further, lakes and reservoirs with surface area < 0.01 km² 1 ha were removed in the GIS database . Due to the fluctuation of lake surface area, we updated the lake shoreline shape file using the images which have been selected to map SDD in each year. The shape file 130 of lakes and reservoirs (herein lakes) was used as a water mask to extract the SDD map derived from the Landsat imageries ( Fig. 1a).
Based on the previous study by , the lake boundaries (lakes and reservoirs) with an area > 0.01 km 2 across China were derived from Landsat 8 OLI images mainly acquired in 2016, and detailed description of extracting boundaries could be seen in the research. However, some lakes in China are changing greatlysubstantially over time, which were dealt 135 with separately to obtain their boundaries in each year during the period of 1984-2018. We mainly referred to the research conducted by Zhang et al. (2019) who examined multi-decadal lake area (≥1km 2 in size) changes in China during 1960s-2015 to obtain the information of which lake area has changed and what year the lake started to vary (Fig. S1). The datasets of lake boundaries (1960s-2020) have been published on the National Tibetan Plateau Data Centre. As for the reservoirs, we mainly viewed the Landsat (5/7/8) and Google Earth images to confirm the changing region. With respect to the small lakes 140 with an area < 1km 2 , we assumed that their boundaries didn't change during the study period.
We delineated boundaries of these changing lakes using Landsat images during 1984-2018. The cloudless TOA image of each path and row was downloaded through GEE platform, processed to derive the Modified Normalized Difference Water Index (MNDWI) as follows: is the Rayleigh scattering reflectance in the green band, and short-wave infrared (SWIR) band, respectively. First, we used MNDWI, combined with Tasseled Cap Transformation (TC) and a density slicing with multithreshold approach, to build a decision tree for delineating water body boundaries using the ENVI software package (Rokni et al., 2014;Xu et al., 2006). Then, Landsat images acquired during 1984-2018 were classified into water and non-water areas (Feyisa et al., 2014;Wang et al., 2020b). The extracted water bodies were subsequently converted into polygons with 150 contiguous pixels and stored in shape file format using the ArcGIS10.4 (ESRI Inc. Redlands, CA, USA). We divided water bodies into lakes, reservoirs, and rivers according to their shoreline features, and also through referencing to the Global Reservoirs and Dams database (Lehner et al., 2011), Chinese Reservoirs and Dams database, and high-resolution images from Google Earth to tell rivers and reservoirs from water bodies. The shape file of lakes and reservoirs (herein lakes) was used as a water mask to extract the SDD map derived from the Landsat imageries (Fig. 1a). 155 The problem of land contamination to water is still a challenge for retrieving water quality parameters precisely (Jensen. 2006;Hou et al., 2018;Liu et al., 2020a;Wang et al., 2020a). Jensen. (2006 pointed out that the various surface objects have different reflectance to NIR band. For instance, land and vegetation can largely reflect the NIR band that was strongly absorbed by water, especially for the shallow lakes or reservoirs. In our study, in order to avoid the influence of adjacent land on water bodies, one pixel buffer inward of water boundary was removed for lakes with an area ≤ 1 km² , and two pixels 160 for lakes with an area > 1 km² in order to avoid the influence of adjacent land on water bodies. This method has been demonstrated to be effective in other studies related to SDD estimate Wang et al., 2020a).

SDD in-situ data collection across China
We used three SDD datasets for model calibration and validation (Fig.2a). To assemble the first dataset (IGA-04-19), we conducted 37 field campaigns from April 2004 to September 2018, surveyed 361 water bodies and collected 2,293 samples 165 from lakes and reservoirs across China (Table S1), most of which were collected in late summer and early autumn. The second dataset was assembled from field campaigns (2007)(2008)(2009) conducted by researchers from the Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences. The third dataset (229 samples) was collected by different research groups during 1980s-1990s, and included records for which data collection date was not available. The spatial distribution of these three groups samples is shown in Fig. 2a. 170 At each station, Secchi disk depth (SDD, in cm) was determined to represent water clarity, and was taken as the depth from water surface where a black-white Secchi disk can no longer be seen under water. For the first two datasets, SDD data derived from field surveys (2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018) were matched with the top of atmosphere air reflectance (TOA) data collected by Landsat satellites overpassing a lake/reservoir within 7 days of field site visit, and the average reflectance of pixels within a 3×3 window matching a sampling point was extracted for bands 1-5 (Kloiber et al., 2002). After matching the in-situ SDD 175 with images, altogether, 1,301 and 340 pairs of data were obtained based on the first and second SDD datasets, respectively.
For the third dataset, the cloud-free TOA images whose dates were closest to time recorded on the lake survey reports were selected to match the measured SDD, which were between May and October during the period of field survey. Finally, 229 match-ups were found by expanding the time window between the third dataset of SDD and images.

Acquisition and processing of Landsat imagery data 180
To track the dynamics of lakes SDD in the past 35 years, all available Landsat TM/ETM+/OLI images of TOA across China were used in this study (∼82,000 images, >60 terabytes of data) via GEE platform. The number of images used for SDD 带格式的: 缩进: 首行缩进: 1 字符 estimate in a specific year spanned a large range, from 371 in 1984 to 4,784 in 2018 (Fig. 2b), with more images available when two satellites operated simultaneously in space to acquire Landsat imagery. In this study, based on the GEE platform, the TOA images were mainly collected during the ice-free season (May to October) from 1984 to 2018 in the TQR, MXR 185 and , NLR, and ELR, except for the ELR and YGR (from January to December) due to lack of good-quality images. The pixel_qa band, as a pixel quality control band generated from the CFMASK algorithm, was selected to mask out the land and snow/ice, and to remove cloud contamination (cloud cover >60%) in the GEE platform, thus minimizing the potential impact of cloud on SDD estimation accuracy. Landsat imagery atmospheric correction is a key step for water quality inversion (Wang et al., 2009), particularly for monitoring of temporal variation at large scale. The TOA products were produced using 190 the Landsat Ecosystem Disturbance Adaptive Processing System (LEDAPS) software within GEE (Schmidt et al., 2013) More than 98.35 % of the pixels within China had a total of qualified observations > 35 in the past 35 years, and the majority of images had more than three scenes of good observations for each year.

Model for SDD estimation and mapping in GEE
Model development was a key step in this study. For the first match-ups dataset, i.e., 1,301pairs of in-situ SDD and TOA, we 195 divided the valid data into four groups, with three groups used to calibrate the model (N= 976) and one group (N = 325) used for model validation. Based on a previous investigation, the red and blue (or green) band ratio was found to improve the performance of reflectance-based water quality models both in terms of their spatial and temporal transferability (Kloiber et al., 2002;Olmanson et al., 2008;. Thus, by trying the band combination, the red/blue band ratio algorithm using the first matched dataset was employed in this study to map SDD of water bodies, and was mathematically expressed 200 as: (1) Then, combining the aforementioned image-processing methods, Eq. (1) was applied to the TOA images from 1984 to 2018 to estimate the SDD in the lakes with an area > 0.01 km² 1 ha over China via the GEE platform. The annual mean SDDs at 205 the pixel scale were obtained by averaging all available estimated results, and then the lake-based annual mean SDDs had been further worked out. During the calculations, we only took into consideration lakes with SDD results of more than 10 years. At last, 10,814 lakes (size > 0.01 km² 1 ha) were used to examined for the interannual dynamics of SDD (Fig.1c).

Statistical analysis
The SDD estimation model performance was assessed using determination coefficient (R 2 ), RMSE, relative RMSE (rRMSE), and mean absolute error (MAE). 215 , 220 (5)(4) where N refers to the number of water samples, i refers to the current water sample number, , refers to the in situ SDD measurements, , ̅̅̅̅̅̅̅̅̅̅̅̅ refers to the average of observed Y, and , refers to the estimated SDD from the Landsat data.
Once the annual mean SDD maps were generated, the average of SDD for each pixel within a lake were was calculated for 225 the observation period . For each lake region and individual lake, the spatiotemporal dynamics in SDD were analysed, including the variations of the average, changing trend, number of lakes, and lake surface area. The interannual changing trend was assessed at the 5% significance level, and the slope from linear regression analysis between SDD values and years. These analyses were conducted with the IBM SPSS software. Based on the analysis of interannual change trend in SDD, the lakes in China were divided into three types -lakes with SDD showing significantly increasing (Type I: p < 0.05 230 and slope > 0), decreasing (Type II: p < 0.05 and slope < 0) and non-significant (Type III: p > 0.05) trends from 1984-2018.

Validation of SDD estimation model
The estimation model of lake SDD across China was built using 3/4 of the first matched dataset (976 samples), for which the R 2 , RMSE, rRMSE, and MAE were 0.79, 100.3 cm, 61.9%, 57.7cm, respectively (Fig.3a). Then, we used 325 samples (1/4 of the first matched dataset) to validate the model, and the validation results indicated stable performance by showing 235 comparative errors (R 2 =0.80, RMSE = 92.7 cm, RMSE% = 57.6%, MAE= 54.9 cm; Fig.3b). Further, the second and the third datasets were both used to validate model performance with a major focus on testing the temporal transferability of the model (Fig.3c, d). The second dataset (340 samples), collected as part of the Chinese lakes survey conducted by Nanjing Institute of Geography and Limnology, also indicated a good model performance (R 2 =0.78, RMSE = 74.7 cm, RMSE% = 59.1%, MAE= 42.6 cm; Fig.3c). The third dataset (229 samples) was assembled by the first lake surveys conducted in the 240 1980s, and was used to validate the model performance for SDD derived from historical remotely sensed data. Our results also demonstrated a stable performance for lake SDD before 1990s (R 2 =0.81, RMSE = 61.8 cm, RMSE% = 50.6%, MAE= 40.3 cm; Fig.3d). Comparison of validation results for these different periods and datasets demonstrated the stable performance of the SDD model (Fig. 3). Therefore, the estimation of SDD using images acquired by Landsat series of sensors provides a reliable method to examine historical trend in SDD through time series analysis. 245  4b). Although the number of lakes with SDD < 2 m were was more numerous (80.9% of lakes), the total area of lakes with SDD between 0-0.5 m and > 4 m was the largest, accounting for 24% and 24.3% of the total area in each category, respectively (Fig. 4c).
Regarding the annual mean SDD in the five lake regions, the top three regions were TQR (3.37 m), YGR (2.35 m), MXR 260 (1.92 m), followed by ELR (1.50 m) and NLR (0.69 m) (Fig. 4d). Except for the YGR region, lakes with SDD <2 m were most common accounting for 96% (NLR), 82.8% (ELR), 80.5% (MXR) and 77.6% (TQR) of all lakes in the other regions, respectively (Fig. 4e). In the YGR, the lakes with SDD in 1-3 m range had a wide distribution, and the total proportion of lakes with SDD < 3 m was 85.4% in this region (Fig. 4e). Spatially, the lakes were widely scattered over the ELR, except for the northern and western sections of that region (i.e., northern and southern of Hebei province, northeast of Henan province, 265 northwest of Shandong province and western of Hubei and Hunan provinces). The lakes in the NLR are were located in the northwest and southwest of the region. In the YGR, the lakes are were clustered in the southern and northeast of the region   ., 0-0.5 m, 0.5-1 m, 1-2 m, 2-3 m, 3-4

m, and >4 m). (c) the proportion of lake area in six SDD levels. (d) the annual mean SDD in five lake regions. (e) the proportion of lake number with at different SDD levels in the five lake regions.
6 Interannual dynamics of lake SDD during 1984-2018

Average and temporal trend in lakes SDD
Similar to the spatial pattern of SDD estimates obtained in 2018, the multi-year average SDD values in each lake region 280 also revealed similar trends, i.e., the lakes located in the plateau region were more transparent than lakes from other physiographic regions (Fig. 5a). During 1984-2018, the lakes in the NLR exhibited the lowest SDD (mean: 0.60±0.09 m), Regarding the interannual change trend, with the exception of the TQR, results for the other four lake regions indicated a significant (p<0.05) increasing trend in SDD during the study period (Fig. 5b). At the scale of individual lakes, 55.4% (5,993 out of 10,814) and 3.5% (377 out of 10,814) of lakes experienced statistically significant (p<0.05) increasing and decreasing trends, respectively, and the remaining lakes (41.1%, 4,444 out of 10,814) displayed no significant change (Fig. 5c). Among 290 the five lake regions, except for the MXR, more than half of all lakes exhibited significant increasing trends (Fig. 5c).
Ranked by the total number of lakes exhibiting significant increase in SDD, the lake regions can be ordered as follows: TQR (61.7%, 618 out of 1,002), ELR (57.1%, 3,396 out of 5,943), YGR (54.6%, 829 out of 1,517) and NLR (51.3%, 784 out of 1,528). As for the lakes with decreasing SDD values, the NLR had the highest number of such lakes (8.4%, 128 out of 1,528) followed by the MXR (7%, 58 out of 824) (Fig. 5c). 295 Among the three types of lakeslakes with SDD showing significant increasing (Type I), decreasing (Type II) and nonsignificant (Type III) trends from 1984 to 2018, the lake SDDs in the Type I were mainly concentrated in 0.5-3 m, in the Type II were dominated by 0-2 m, and in the Type III widely distributed in 0-3 m, the proportions of which were 81.11% (4,861 out of 5,993), 80.11% (302 out of 377) and 85.13% (3,783 out of 4,444) of the total lake number in each type of lakes, respectively ( Fig. 5d-f). At the five lake regions scale, whatever the type of lake belongs to, the distribution of lake SDDs in 300 the NLR, TQR and MXR looked similar, whereas that in the ELR and YGR differed from these three lake regions. The former was mainly between 0-2 m, the latter ranged from 0.5-3 m (ELR) and 1-4 m (YGR), respectively ( Fig. 5d-f).  (0-0.5 m, 0.5-1 m, 1-2 m, 2-3 m, 3-4

Lake SDDs versus different lake sizes in China
The annual mean SDD and lake area were both separated into six levels, and the proportions of lakes with different areas in each SDD category were demonstrated in the Fig. 6. In terms of the number of different lake areas in five lake regions, the lakes with annual mean SDD values in the ELR, NLR and YGR were dominated by an area of 0.01-1 km 2 , followed by an area of 1-10 km 2 . In the MXR, the lakes were mainly concentrated on an area of 1-10 km 2 , followed by an area of 0.01-1 315 km 2 (Fig. 6a-f). In the TQR, when the SDDs < 2 m, the lakes covering an area of 1-10 km 2 were in the majority (Fig. 6a-c); when the SDDs > 2 m, the lakes with an area of >10 km 2 occupied a dominant position, especially for lakes with an area of 10-50 km 2 and 100-500 km 2 (Fig. 6d-f).
Among the three types of lakes in each SDD category, there exists the similarity in the distribution of lakes with different sizes between the Type I and Type III, while that of in the Type II differentiated from these two types of lakes (Fig. 6). In the 320 ELR, NLR and YGR, almost more than 50% of the lakes were in 0.01-1 km 2 range among the lakes of Type I and Type III.
The lakes of Type II, located in the three lake regions, with SDD values of 0.5-1 m in the ELR, and of 0-0.5 m and 2-3 m in the NLR were dominated by an area of 1-10 km 2 , whereas the remaining lakes were mainly concentrated in an area of 0.01-1 km 2 (Fig. 6a-f). In the MXR, the number of lakes covering an area of 1-10 km 2 in the three types of lakes were was much more larger than that of in other sizes among the lakes with SDDs in 0-3 m range (Fig. 6a-d). When the lake SDDs were > 3 325 m in this lake region, most of three types of lakes were dominated by the lakes covering an area of 0.01-1 km 2 , apart from the lakes of Type III with SDD values > 4 m in the Type III that the proportion of lakes with an area of 1-10 km 2 was slightly higher than that with an area of 0.01-1 km 2 , (Fig. 6e-f).
The distribution of the three types of lakes with different lake sizes in the TQR differed from that in the other four lake regions. As for the lakes of Type I and Type III in the TQR, when the SDDs were between 0-2 m, the proportions of lakes 330 covering an area of 1-10 km 2 were the largest, ranging from 49.64% to 81.12% (Fig. 6a-c); when the SDDs were between 2-3 m, the lakes with an area of 10-50 km 2 in the Type I and of 100-500 km 2 in the Type III had the largest proportions of numbers, accounting for 40.43% and 35.00%, respectively (Fig. 6d); when the SDDs were >3 m, the lakes covering an area of 100-500 km 2 were dominant in the two types of lakes, followed by the area of 10-50 km 2 (Fig. 6e-f). With respect to the lakes of Type II in the TQR, the lakes with SDDs in the 0-0.5 m category were distributed in the area of 10-50 km 2 , followed 335 by the area of 50-100 km 2 (Fig. 6a); when SDDs were in the 0.5-1 m category, the number of lakes with an area between 1-10 km 2 and 10-50 km 2 were was the mostlargest, the percentages of which both were 40.00% (Fig. 6b); when SDDs were in the 1-2 m category, there were two kinds of lakes whose areas were in the range of 0.01-1 km 2 and 50-100 km 2 , and their numbers were the same (Fig. 6c); when SDDs were in the 3-4 m category, only the lakes with an area of 1-10 km 2 existed (Fig. 6e). 340

Spatial distribution of lakes with different SDD values
The spatial distributions of lakes and their total lake numbers and areas of the three types of lakes in five lake regions were shown in the Fig.7. In the SDD of 0-0.5 m category (Fig.7a), the NLR had the largest lake numbers and areas in the three types of lakes, accounting for 34.51% and 33.20% in the Type I, 63.19% and 48.17% in the Type II, and 44.46% and 34.38% in the Type III of the total lake numbers and areas in the lake region, respectively. Spatially, the lakes in the Type I and Type 350 III were mainly distributed in the central of the ELR, the western of the NLR, the mid-west of the TQR and the mid-east of the MXR, while those that in the Type II were concentrated on the western of the NLR and eastern of the MXR.
In the SDD of 0.5-3 m categories (Fig.7b-d), the lakes of Type I and Type III were the most in the ELR, but the largest total lake areas of five lake regions were different from these two types of lakes. Specifically, in the lakes of Type I, the total lake areas in the TQR were the largest, the percentages of which were 36.38% (SDD: 0.5-1m), 44.14% (SDD: 1-2 m) and 61.03% 355 (SDD: 2-3 m), respectively (Fig.7b-d). In the lakes of Type III, the ELR and the TQR had the largest proportions of lake areas when the lake SDDs were between 0.5-2 m and 2-3 m, respectively, accounting for 76.80% (SDD: 0.5-1m), 46.90% (SDD: 1-2 m) and 46.65% (SDD: 2-3 m) of the total lake area in each lake region, respectively In the lakes of Type III, the ELR had the largest proportion of lake area when SDD was 0.5-2 m, and TQR the largest when SDD was 2-3 m. The percentages of lake area when SDD was 0.5-2 m in the ELR were 76.80% (SDD: 0.5-1m) and 46.90% (SDD: 1-2 m), while 360 that in the TQR was 46.65% (SDD: 2-3 m) (Fig.7b-d). In the lakes of Type II, the region that had the largest proportions of lake numbers and areas was were inconsistent in each SDD category (0.5-3) m. When the SDDs were in the range of 0.5-1 m, the NLR had the largest lake numbers, while the MXR had the highest percentage of lake area (Fig.7b); when the SDDs ranged from 1-2 m, the total lake numbers and areas in the ELR were the largest (Fig.7c); when the SDDs were in 2-3 m range, the lake numbers in the NLR were was the largestmost and the total lake area in the ELR were was the 365 maximumnlargest ( Fig.7d). Spatially, the distributions of lakes in the Type I and Type III with SDD between 0.5-2 m were concentrated on the most places of the ELR, the northwest and southeast of the NLR, the southern of the YGR, the mid-west of the TQR, and the mid-east and part of the northern of the MXR (Fig.7b-c). When these two types of lakes SDD were in 2-3 m range, they were distributed in the central and southeast coastal of the ELR, the central and southwest of the YGR, and the western of the TQR (Fig.7d). In the Type II of lakes with SDD were falling in the range between 0.5-3 m, their 370 distributions were scattered over part of the central and southeast coastal of the ELR, and southwest of the YGR (Fig.7b-d).
In the SDD of 3-4 m category (Fig.7e), the regions that had the most lakes in the three types of lakes were the YGR (Type I: 53.56%), the ELR (Type II: 48.00%), and the ELR (Type III: 53.19%), respectively. The regions that had the largest lake area were the TQR (Type I: 63.51%), the YGR (Type II: 90.06%), and the TQR (Type III: 75.22%), respectively. Spatially, the lakes of the Type I and Type III were concentrated at the junction of the ELR, YGR and MXR, the southeast coastal of 375 the ELR, the southern of the YGR, and the western of the TQR. The lakes of Type III were mainly distributed in the part of the southeast coastal of the ELR and the southern of the YGR.
In the SDD of >4 m category (Fig.7f), the TQR had the largest lake number and area in the lakes of Type I, accounting for 39.19% of the total number and 87.34% of the total lake area, respectively. In the lakes of Type II, there were a few lakes existed in the MXR and YGR. In the lakes of Type III, the YGR had the most lakes and the TQR had the largest total lake 380 area, accounted for 40.28% of the total number and 87.00% of the total lake area, respectively. Spatially, the distributions of these lakes were similar to the lakes with SDD in 3-4 m range.

Comparison with past studies and uncertainties
Several past studies have examined the spatiotemporal variation of SDD in lakes across China (or parts of China), but 390 these investigations were mainly based on MODIS images to estimate SDD in large lakes (>10 km 2 ) and primarily focused on the period after 2000 (Feng et al., 2019a;Liu et al., 2020a;Pi et al., 2020;Wang et al., 2020a). Therefore, it becomes a challenge to compare these past results with the results of the present study due to difference in the period of interest, resolution of the satellite images and lake size (> 0.01 km² 1 ha in our study). Zhang et al. (2021) adopted an empirical model to retrieve SDD of lakes (>10 km 2 ) across China based on Landsat surface reflectance products (2016 -2018) within GEE. 395 Because of the similarity of methods and images used in Zhang et al. (2021) and the present study, it provides a unique opportunity to compare in-situ measured SDD with lake SDD estimation models across China proposed obtained by Zhang et al. (2021) and in our studythese two researches. To that end, we used available in-situ SDD data (2019 -2020) collected at monitoring stations in Lake Taihu and Lake Dianchi to assess the accuracy of the two models. As shown in Fig. 8 and demonstrated by statistical parameters (higher R 2 , lower RMSE, rRMSE and MAE), the estimation model proposed by our 400 study exhibited better performance to retrieve SDD in both Lake Taihu (Fig.8c) and Lake Dianchi (Fig. 8d).
Though some studies have demonstrated the usage of Landsat series data (5 TM/ 7 ETM+/ 8 OLI) with proposed model can provide accurate long-term coverage of SDD in lakes in China (Zhang et al., 2021;Deutsch et al., 2018;Bonansea et al., 2015;Mccullough et al., 2013), a few systemic errors effected on SDD results are inevitable. On the one hand, the SDD estimation model proposed in this study existed some errors, where the validation model showed R 2 =0.80, 405 RMSE = 92.7 cm, RMSE% = 57.6%, MAE= 54.9 cm. On the other hand, different atmospheric correction methods cause diverse effects on the Landsat images (Bonansea et al., 2015;Lee et al., 2016). It's noted that the TOA products were produced using the Landsat Ecosystem Disturbance Adaptive Processing System (LEDAPS) software provided within GEE (Schmidt et al., 2013). Even so, these systemic errors do not have much impacts on the overall trends towards SDD of lakes across China (Bonansea et al., 2015;Deutsch et al., 2018;Zhang et al., 2021). 410

Data availability
The dataset of water clarity of lakes developed in this study consists of one .shp file document containing the annual mean values of water clarity in each lake (size > 0.01 km² 1 ha) during 1990-2018, with a time temporal resolution of 5-year. The dataset can now be accessed through the website of the National Tibetan Plateau Data Center (http://data.tpdc.ac.cn): DOI: 10.11888/Hydro.tpdc.271571. 415

Conclusions
As a comprehensive indicator of water eutrophication, encompassing nutrient enrichment, algal abundance and suspended sediment, water clarity can serve as a valuable index for tracking the ecological health of aquatic ecosystems and guiding the actions of water resources managers. Although field measurement of water clarity can easily be made with a Secchi disk apparatus, this approach is not suitable for long-term time series measurements of lake water clarity at regional and national 425 scales. This information is highly valuable, and can be extracted from archived satellite data. In-situ water clarity data collected in lakes across China during 2004-2018 was used to calibrate and validate SDD models that incorporate top of atmosphere air reflectance product and Google Earth Engine to map the spatiotemporal dynamics of SDD over a 35-year time span . The SDD model was validated using different datasets, and results confirmed the stable performance and temporal transferability of the SDD estimation model. Derived SDD estimates were analysed at the lake region and at 430 the individual lake scales. During the study period , annual mean SDD values in the TQR, YGR, MXR, ELR and NLR regions were 3.32±0.38 m, 2.35±0.21 m, 1.63±0.38 m, 1.23±0.17 m and 0.60±0.09 m, respectively. Among the 10,814 lakes with >10 years of SDD results, 55.4% and 3.5% experienced statistically significant (p<0.05) increasing and decreasing trends of water clarity, respectively. The remaining lakes (41.1%) displayed no significant trends. With the exception of the MXR, more than half of lakes in all the other regions exhibited a significant trend of increasing water clarity. 435 In the ELR, NLR and YGR regions, most of the lakes displaying either an increase or decrease in SDD tended to be of 0.01-1 km 2 in size whereas in the TQR and MXR lakes exhibiting clear trends in SDD were mostly large lakes (>10 km 2 ).
Spatially, the lakes in the plateau regions (TQR, YGR) generally exhibited higher SDD than those situated in the flat plain region. The time series of water clarity information presented in this study could aid local, regional and national decisionmaking on policies and management for protecting/improving inland water quality in China. The research approach 440 implemented also could potentially be used to map water clarity in lakes at the global scale, an effort that can provide useful information for evaluating decadal trends in surface water quality resulting from adoption of pollution control policies.