Improved maps of surface water bodies, large dams, reservoirs, and lakes in China

. Data and knowledge of surface water bodies (SWB), including large lakes and reservoirs (surface water areas > 1 km 2 ), are critical for the management and sustainability of water resources. However, the exist-ing global or national dam datasets have large georeferenced coordinate offsets for many reservoirs, and some datasets have not reported reservoirs and lakes separately. In this study, we generated China’s surface water bodies, Large Dams, Reservoirs, and Lakes (China-LDRL) dataset by analyzing all available Landsat


Introduction
Surface water bodies (SWB), including large lakes and reservoirs (surface water areas > 1 km 2 ), play an important role in the control and management of water resources Lu, 2014, 2013;Feng et al., 2013Feng et al., , 2019. A reservoir is usually defined as artificial lake formed by constructing dams across rivers (on-stream reservoir) (Thornton et al., 1996;Hayes et al., 2017) or partially or completely formed by enclosed waterproof banks with concrete or clay (off-stream reservoir) (Xiang et al., 2019;Thornton et al., 1996) (Fig. S1 in the Supplement). Nearly 50 % of the global large dams were built primarily for agricultural irrigation through storing, regulating, and diverting water (Mulligan et al., 2020). Additionally, they are also used for hydropower generation, human and industrial uses, and flood peak attenuation (Lehner et al., 2011;Lehner and Döll, 2004;Wang et al., 2022a). Large lakes have been the subject of great interest not only because of their water resources but also as indicators of local climate change and anthropogenic activities Ma et al., 2011;Birkett and Mason, 1995), and they could provide vital ecosystem services for human beings, such as alteration of river flow, supplies of irrigation water, fisheries, and abundant valuable mineral deposits, and have disproportionate effects on the global carbon cycle (Ran et al., 2021;Armstrong, 2010;Ma et al., 2011). Improved datasets of the numbers, sizes, and spatial distributions of SWB, large dams, reservoirs, and lakes could substantially provide crucial inputs for the studies of water resources, environmental health, aquatic ecosystems, and agricultural sustainability (Lehner and Döll, 2004;Zhu et al., 2020).
China has the largest population, fastest-growing economy, increased expansion of irrigation, limited water resources, dated water infrastructure, and inadequate water governance (Liu and Yang, 2012;Wang et al., 2020a;Tao et al., 2020). China encompasses almost 20 % of the world's population but contains only 7 % of the world's fresh water, and as a result, it has much smaller fresh water resources per capital than do most other countries (Feng et al., 2019;Dalin et al., 2014). Since the 1980s, China has taken diverse measures to ensure long-term water security (Zhou et al., 2020). For example, China has a remarkable increase of reservoir construction across the country (Wang et al., 2022a;Zhu et al., 2020), and the total number of dams increased to ∼ 89 700 by 2008 in China (Yang and Lu, 2014). The Three Gorges reservoir, which is the world's largest hydroelectric dam (Three Gorges Dam), is fully operational for flood control, power generation, navigation, and water use (Wu et al., 2004;Zhang et al., 2012;Wang et al., 2013Wang et al., , 2020a. China also has a large number of lakes with tremendous cultural and economic importance (Ma et al., 2011;Zhang et al., 2019). A previous study reported that there were 2693 large lakes (area > 1 km 2 ) in China during [2005][2006], covering 0.9 % of China's land area (Ma et al., 2011). However, due to intensive human activities and climate change over the last three decades, several natural lakes have been converted into reservoirs, dramatically accelerating shrinkage of lake areas (Yang and Lu, 2014;Ma et al., 2011). Therefore, the improved datasets on the number, size, and spatial distribution of large reservoirs and lakes in China is needed for assessing the impact of human activities and climate change on SWB, water management, and water security in China Yang and Lu, 2014).
Several published global dam and reservoir datasets include information from China (Table 1). The World Register of Dams (WRD), which was organized and released by the International Commission on Large Dams (ICOLD, 2011), is the largest and widely-used dataset (Mulligan et al., 2020;Paredes-Beltran et al., 2021;Wang et al., 2022a). It reports 23 841 dam entries for China; however, a large proportion of those entries are not georeferenced with latitude and longitude information, which limits its wide applications (Wang et al., 2022a). The GlObal GeOreferenced Database of Dams (GOODD) V1 dataset reported 9234 georeferenced dams in China (Mulligan et al., 2020); however, the information (e.g., area, volume capacity) of the corresponding reservoirs was not reported. The FAO's (Food and Agriculture Organization of the United Nations) global information system on water resources and agricultural water management (AQUASTAT) lists 14 000 dams in the world, in which only part of 722 dams in China were georeferenced, and has not been updated since 2015. The Global Reservoir and Dam database (GRanD), developed by the Global Water System Project (GWSP), compiled the available reservoir and dam information globally (Lehner et al., 2011) and has been updated for the year 2019. However, it only lists 922 geolocated dam entries for China. Recently, Wang et al. (2022a) released a global georeferenced global dam and reservoir (GeoDAR) dataset with 5283 georeferenced dams in China, and the reservoirs had more than 40 attributes acquired from the WRD dataset. In April 2022, the newest and fully peerreviewed version of GeoDAR is available, and this newest version had high accuracy of dams in China (Fig. S2). There were also several published dam and reservoir maps at the national scale (Table 1), but these maps neither included georeferenced dams nor reported reservoir attributes (e.g., reservoir area).
WRD: the World Register of Dams (https://www. icold-cigb.org, last access: 18 February 2022); GOODD: GlObal geOreferenced Database of Dams (Mulligan et al., 2020); FAO AQUASTAT: The Food and Agriculture Organization of the United Nations (FAO) global information system on water resources and agricultural water management (http://www.fao.org/aquastat/en/databases/dams/, last access: 18 February 2022); GRanD: the Global Reservoir and Dam database (Lehner et al., 2011); GeoDAR: Georeferenced global dam and reservoir dataset (Wang et al., 2022a); CLRM: China's Lakes and Reservoirs Map (Yang and Lu, 2014); BFNCW: Bulletin of First National Census for Water from Ministry of Water Resources the People's Republic In addition to the dam and reservoir datasets, several studies have reported the spatial distribution and multi-year dynamics of inland SWB (Tao et al., 2020;Ma et al., 2011;Wang et al., 2020a;Feng et al., 2019) and lakes (Gao, 2015;Gao et al., 2012;Ma et al., 2011;Zhang et al., 2019) in China; however, they did not explicitly explore the spatial distribution of large reservoirs and lakes in China, making it impossible to assess the impact of human activities on these two types of water resources. Thus, to date, the spatial distributions of SWB, large dams, reservoirs, and lakes in China have not yet been fully investigated and documented.
The objective of this study was to produce detailed and accurate maps of open SWB, large dams, reservoirs, and lakes (surface water area > 1 km 2 ) in China in 2019, the latest year when this study started in late 2020, and those SWB with area ≤ 1 km 2 were excluded. First, this study used time-series Landsat imagery in 2019 and Google Earth Engine (GEE) cloud computing platform and the simple and robust surface water mapping algorithm (Zou et al., 2018Zhou et al., 2019b;Wang et al., 2020a) to generate raster maps of SWB in China at 30 m spatial resolution. Second, we converted the raster map of SWB to a vector map of SWB and identified those large SWB with area > 1 km 2 . Third, we combined the vector maps of SWB with the historical satellite images in 2019 within China in Google Earth Pro to identify dams and released China's surface water bodies, large dams, reservoirs, and lakes dataset, namely, China-LDRL. Fourth, we analyzed the spatial distribution of SWB, large dams, reservoirs, and lakes in China. Finally, we discussed the reliabilities, uncertainties, limitations, outlooks, and implications of the China-LDRL dataset for the study of water security.

Study area
The study area covered all the provincial-level administrative divisions in China (Fig. 1a), including 23 provinces, two special administrative regions (Hong Kong and Macao SAR), four municipalities (Beijing, Tianjin, Shanghai, and Chongqing), and five autonomous regions (Inner Mongolia, Guangxi, Tibet, Ningxia, and Xinjiang). Since Macao SAR and Hong Kong have relatively small areas and are very close to Guangdong Province, we combined them as one region (Guangdong) when we performed the statistical analysis in this study.
China has great altitude diversity as the eastern plains and southern coasts consist of lowlands and foothills, the southern areas of China consist of hilly and mountainous terrains, the west and north of the country are dominated by basins, plateaus, and massifs, and the southwestern China contains part of the highest tablelands on earth, the Tibetan Plateau (Fig. 1a). Due to substantial differences in latitude, longitude, and altitude, the climate of China is extremely diverse, ranging from tropical in the far south to subarctic in the far north and alpine in the higher elevations of the Tibetan Plateau, contributing to the much more surface water areas in southwest and southeast of China than other regions, especially North China (Wang et al., 2020a).

Landsat data
In this study, we used the available Landsat surface reflectance (SR) images in the GEE platform, and there was a total of 19 338 images in 2019 for China, including 9028 Landsat-7 Enhanced Thematic Mapper Plus (ETM+) images and 10 310 Landsat-8 Operational Land Imager (OLI) images (∼ 21.73 TB). The detailed information of Landsat SR products is available on the GEE platform (https://developers.google.com/earth-engine/ datasets/catalog/landsat, last access: 18 February 2022). All these images had undergone necessary pre-processing in GEE, including radiometric calibration, atmospheric correction, and the removal of stripes in Landsat-7 imagery. We used the quality assurance (QA) band that was generated by the CFMask algorithm (Zhu et al., 2015) to identify badquality observations, including clouds and cloud shadows (Murray et al., 2019;Pekel et al., 2016). We also used the Shuttle Radar Topography Mission (SRTM) digital elevation model (DEM) data, the solar azimuth and zenith angle data of each image, and ee.Terrain.hillShadow algorithm in GEE to 3760 X. Wang et al.: Improved maps of surface water bodies, large dams, reservoirs, and lakes in China identify those pixels with terrain shadows (Zou et al., 2018;Wang et al., 2020a) (Fig. 1b), which were excluded from the data analysis. Out of ∼ 132.43 million pixels in China, approximately 98.36 % had more than 5 good-quality observations and 91.24 % had more than 10 good-quality observations in 2019. About 93.14 % of the 78.9 million pixels in North China had more than 20 good-quality observations due to the overlapping of Landsat images at the high latitudes and less cloud cover (Zhou et al., 2019a;Wang et al., 2020b). Note that the number of Landsat-7 ETM+ images in GEE may change in the future, as the United States Geological Survey (USGS) continues to work with the International Ground Stations (IGS) in the world to assemble and rescue some images from individual stations. For Landsat-8 OLI images, USGS does not rely on IGS for image downlink, as its data record is able to store all the images and then downlink them to the Landsat archive (Wulder et al., 2016).
We used three spectral indices -the Normalized Difference Vegetation Index (NDVI) the Enhanced Vegetation Index (EVI) the Modified Normalized Difference Water Index (mNDWI) -to identify SWB in this study. These indices are defined as: where ρ blue , ρ green , ρ red , ρ nir , and ρ swir are blue, green, red, near-infrared, and shortwave infrared bands of Landsat images, respectively.

Dam and reservoir datasets
The GlObal GeOreferenced Database of Dams (GOODD) dataset was released in 2020, and it lists ∼ 38 000 georeferenced dams and derived data on their associated catchments through one by one degree titles on the Google Earth geobrowser during 2007-2011 and the Shuttle Radar Topography Mission (SRTM) Water Body Dataset (SWBD) (Mulligan et al., 2020). It provides the raw digitized coordinates for the locations of dam walls, but it does not provide the detailed attribute data on the characteristics of each dam and reservoir ( Fig. 2a, d). Both the large dams and medium-sized dams were captured in this dataset. The Global Reservoir and Dam (GRanD) database v1.3 was recently updated in February 2019 by Lehner et al. (2011) (Fig. 2b, e). The spatial information of these dams was contributed by 11 participating institutions. Each dam was assigned to a polygon that depicted the reservoir surface, which was provided by SWBD (v1.1), and the surface water maps were produced by the Joint Research Center (JRC) of the European Commission from Landsat imagery at 30 m spatial resolution for the period 1984-2015 (Pekel et al., 2016) (v1.3). All reservoirs with a storage capacity of more than 0.1 km 3 were included in this dataset, and some smaller reservoirs were also added when their data were available.
The Georeferenced global Dam And Reservoir dataset (GeoDAR) was produced by utilizing multi-source dam and reservoir inventories (ICOLD WRD and GRanD v1.3 datasets) and the Google Maps geocoding Application Programming Interface (API) (Wang et al., 2022a) (Fig. 2d, e). The GeoDAR product includes two successive versions. GeoDAR v1.0 is essentially a georeferenced subset of ICOLD WRD, and contains more than 20 000 dam entries, and each of which is indexed by an encrypted identifier (ID) that is associated with a WRD record, allowing for the potential retrieval of all its 40+ proprietary at- tributes from ICOLD. GeoDAR v1.1 consists of (1) dam entries as in v1.0 except those that further harmonized with GRanD for an improved inclusion of the largest dams, and (2) reservoir boundaries for most of the dam entries. The GeoDAR was just updated in April 2022 and is available at https://doi.org/10.5281/zenodo.6163413 (Wang et al., 2022b).

Methods
The workflow for producing the China-LDRL dataset included two major sections: (1) generation of yearlong SWB maps in China by analyzing time-series Landsat imagery in 2019 with GEE platform; and (2) identification of dams and classification of yearlong SWB into lakes, reservoirs, and rivers by analyzing the historical satellite images in 2019 within China in Google Earth Pro. A flowchart showing the methodology of this study is illustrated in Fig. 3.

Algorithm to generate annual maps of yearlong surface water bodies
In this study, we combined a surface water index (mNDWI) and two greenness-based vegetation indices (EVI and NDVI) to identify SWB through the algorithm of ((mNDWI > EVI or mNDWI > NDVI) and EVI < 0.1) (Eq. 4). This mNDWI/VIs algorithm can reduce the effects of vegetation on identification of SWB, and has already been used to iden-tify and map SWB at the regional and national scales with high accuracy (Zou et al., 2018Zhou et al., 2019b;Wang et al., 2020a). Furthermore, this mNDWI/VIs algorithm had been compared with other surface water body mapping algorithms (e.g., NDWI, mNDWI, TCW, and AWEI), and the results showed that the integration of mNDWI/VIs algorithm and Landsat images could identify SWB with high producer's accuracy (98.1 %) and user's accuracy (91.0 %) (Zhou et al., 2017). Surface water body frequency (F SWB ) of a pixel in a year was calculated as the ratio of the number of observations identified as surface water body to the number of goodquality observations in a year, and ranged from 0 to 1.0 (or 100 %) , see Eq. (5). We generated the F SWB map of all the pixels in China for 2019 in the GEE platform (Fig. 5a).
and EVI < 0.1 0 Other values (4) where SWB is surface water body, F SWB is surface water body frequency, N SWB is the number of observations identified as SWB (see Eq. 4) in 2019, and N good is the number of good-quality observations in 2019.

3762
X. Wang et al.: Improved maps of surface water bodies, large dams, reservoirs, and lakes in China Consistent with our previous studies (Zou et al., 2018;Wang et al., 2020a), a water pixel was defined as yearlong surface water body (F SWB ≥ 0.75), seasonal surface water body (0.05 ≤ F SWB < 0.75), or ephemeral surface water body (F SWB < 0.05). We generated the seasonal and yearlong SWB maps in China for 2019, respectively (Fig. 4b, c).

The procedure to identify dams, reservoirs, and lakes in Google Earth Pro
We first generated the yearlong SWB vector map in China for 2019 based on the yearlong SWB raster map, then reprojected it to the Krasovsky_1940_Albers equal area conic projection and calculated the area (km 2 ) of each yearlong SWB polygon as its attribute (Python code is available in: https://drive.google.com/drive/folders/ 1B19VKbCIoDPmu-IcmiZcOIUF8wi1YnE?usp=sharing, last access: 17 August 2022). When we reported large reservoirs and lakes, only those polygons with area > 1 km 2 were kept in this study (Fig. S3). In an effort to distinguish riverine or off-stream reservoirs from lakes, we uploaded the large SWB vector layers into Google Earth Pro, and checked whether a dam existed around each polygon through the historical satellite images in 2019 within China by visual image interpretation approach. If a dam did not exist, we classified the polygon as river or lake; if a dam does exist, we classified the polygon as on-stream reservoir (constructed on a river/stream regardless of impoundment) or off-stream reservoir (formed by partial or complete embankment around an off-stream lake) (Fig. S1 in the Supplement). Simultaneously, the corresponding dam would be classified as on-stream dam or off-stream dam. Finally, the SWB polygons were classified into lakes, reservoirs, and rivers, and the dams/reservoirs were classified into on-stream and off-stream dams/reservoirs (Fig. 3). This work was carried out and completed by the lead author (Xinxin Wang) over two months, and users could reproduce the dam dataset by uploading the SWB polygons in the historical satellite images in 2019 in Google Earth Pro and following the procedure described here. Note that satellite images in the Google Earth Pro may change over time, but such changes may have very limited impact on identification of dams, as dams have often stayed for many years after their construction.

Cross-comparison with other lake and reservoir datasets
To better understand the improvements and potential applications of our China-LDRL dataset, we compared it with other three available dam and reservoir datasets: the GOODD, GRanD V1.3, and GeoDAR datasets (Fig. 2). We first compared the dam quantity and areas of large reservoirs at the provincial and national scales. Then, we checked the spatial distribution of each dam from these datasets within Google Earth imagery, as all these datasets provide detailed georef-erenced coordinates for some of dams, and the georeferenced information could be directly acquirable from the spatial longitude and latitude. Here we did not compare the reservoir area with the GOODD dataset, as it does not provide such attribute except for catchment area (Fig. 2d).

Numbers and areas of yearlong surface water bodies with different sizes in China
The numbers and areas of yearlong SWB polygons of different sizes in China differed considerably for 2019 (Fig. 6).
In terms of yearlong SWB numbers, out of a total of 3.52 × 10 6 yearlong SWB polygons in China in 2019, approximately 3.51 × 10 6 polygons (99.57 %) had an area of ≤ 1 km 2 and ∼ 2.16 × 10 6 polygons (61.19 %) had an area of ≤ 0.0036 km 2 (covering only 2 × 2 Landsat grid cells). Only 15 × 10 3 (0.43 %) yearlong SWB polygons had an area of > 1 km 2 and 359 polygons had an area of > 100 km 2 . In terms of yearlong SWB areas, out of a total of 214.92 × 10 3 km 2 yearlong SWB in China in 2019, large SWB polygons (size > 1 km 2 ) accounted for 83.54 % and very large SWB polygons (size > 100 km 2 ) accounted for 52.48 %. The numbers and areas of yearlong SWB polygons of different sizes at the provincial scale had similar distribution patterns with those at the national scale (Figs. S4, S5). Almost all the yearlong SWB polygons in individual provinces had an area of ≤ 1 km 2 (Fig. S4); however, those SWB polygons with an area of > 1 km 2 accounted for a large proportion of SWA in most provinces (Fig. S5). Those yearlong SWB polygons with an area of > 100 km 2 were mostly very large lakes and rivers, which were mainly located in Tibet, Xinjiang, Qinghai, Jiangxi, and Heilongjiang Provinces (Fig. S5) (Feng et al., 2019). Some provinces also had very large-size reservoirs, such as Miyun Reservoir in Beijing, whose polygon size was greater than 100 km 2 .

Numbers, areas, and distribution of large dams, reservoirs, and lakes in China
We identified 2418 large dams in China, including 624 offstream dams and 1794 on-stream dams, most of which were located in South, East, and Northeast China, and Xinjiang of Northwest China (Fig. 7a). At the provincial scale, Xinjiang had the largest number of off-stream dams (67), followed by Shandong (62), Heilongjiang (46), and Anhui (45) provinces. Three provinces (Hubei, Yunnan, and Guangdong) also had relatively larger off-stream dam numbers (≥ 40) than other provinces. Chongqing, Qinghai, and Tibet had no off-stream dams (Fig. 7b). Most of on-stream dams in China were distributed in those provinces with large rivers. Guangdong had the largest number of on-stream dams (172) in China, followed by Hubei (146), Heilongjiang (132), Shandong (112), Jilin (103), and Sichuan (103) provinces (Fig. 7c). However, there were no on-stream dams in Shanghai. In terms of the functions of two kinds of dams and the spatial patterns of climate (e.g., precipitation, temperature) and social-economic factors (e.g., population, GDP, irrigation area) in South and North China, the provinces in Northeast and East China had larger percentage of off-stream dams, whereas the provinces in Northeast and South China had larger percentage of onstream dams (Fig. 7d).
China had 3051 large lakes with an area of > 1 km 2 in 2019, most of which were distributed in West China, the lower Yangtze River basin, and Northeast China (Figs. 8a, S6), and they together amounted to ∼ 73.38 × 10 3 km 2 . Tibet in West China had the largest number of lakes (966), followed by Qinghai (479), Xinjiang (350), Inner Mongolia (234), and Heilongjiang (174) provinces (Fig. 8b). The lake areas in China had similar spatial patterns with the lake numbers (Fig. 8c), and the western provinces in China had much larger lake areas than other provinces, especially Tibet and Qinghai provinces with 31.73 × 10 3 and 15.78 × 10 3 km 2 , respectively. As reservoirs and dams usually exist simultaneously, the spatial patterns of reservoir numbers and areas matched well with those of dam numbers (Figs. 7b, 8e-f). In total, China had 2194 large reservoirs in 2019, they together amounted to an area of ∼ 16.35 × 10 3 km 2 . Xinjiang in Northwest China had the largest reservoir area (1923.11 km 2 ), followed by Heilongjiang (1468.48 km 2 ), Jiangsu (1309.95 km 2 ), and Hubei (1190.75 km 2 ) provinces. In contrast, Tibet (18.34 km 2 ), Shanghai (36.61 km 2 ), and Ningxia (45.40 km 2 ) had much smaller reservoir areas than other provinces in China. In general, most of the dams and reservoirs in China were distributed in South China, East China, and Northeast China, whereas most of lakes were located in West China, the lower Yangtze River basin, and Northeast China.

Improvements of the dataset of large dams, reservoirs, and lakes in China
In order to validate the reliability of our China-LDRL dataset, we first compared the numbers of large dams and areas of large reservoirs between our dataset and published datasets (GOODD, GRanD, and GeoDAR), then we checked the geographical coordinates of dams within the historical satellite images in 2019 in Google Earth Pro. The GOODD dataset has the largest number of dams (9234) in China among these published global datasets  ( Fig. 2a). However, it includes both large, moderate, and small dams, and does not report the corresponding reservoir attributes (e.g., reservoir area), which limits its applications to water-related research (Paredes-Beltran et al., 2021). The GRanD dataset has the smallest number (814) of large dams with reservoir area > 1 km 2 in China (Fig. 9b, e), as the dam information was provided by multiple institutions from the world (Lehner et al., 2011), which clearly underestimates the number of dams. The GeoDAR dataset has a larger number of large dams (1162) than the GRanD dataset because it was generated by combining the GRanD and ICOLD WRD datasets (Wang et al., 2022a). However, our China-LDRL dataset identified 2418 large dams and 2194 large reservoirs (Fig. 9d, e), making a substantial improvement of large dam and reservoir dataset in China.
The number differences of large dams between our China-LDRL and the GRanD and GeoDAR datasets could be explained by several factors. First, our study used all the available Landsat images in 2019 and a more accurate SWB mapping algorithm to generate SWB maps in China; however, the GRanD and GeoDAR datasets used the SWBD map (produced in 2000) (Slater et al., 2006) and the surface water maps during 1984-2015 produced by the JRC (Pekel et al., 2016). We were able, therefore, to integrate more Landsat images and get more SWB polygons, and larger numbers of large dams and reservoirs than other datasets. In addition, the different strategies for identifying dams also caused the differences of dam numbers. The dam information from the GRanD dataset was contributed by 11 participating institutions, and the GeoDAR dataset combined two published dam datasets (WRD and GRanD), rechecked detailed dam information using the Google Maps geocoding API, and then reported the georeferenced information of dams. Unlike the GRanD and GeoDAR datasets, our study first generated SWB raster and vector maps using the mNDWI/VIs SWB mapping algorithm, and then selected the large yearlong SWB polygons with area > 1 km 2 . After that, we visually checked the large SWB polygons one by one and identified each dam with accurate geographical coordinates (Fig. S3). In addition to the dam numbers, we also compared the reservoir areas between different datasets (Fig. S7). Our China-LDRL dataset reported ∼ 16.35 × 10 3 km 2 large 3766 X. Wang et al.: Improved maps of surface water bodies, large dams, reservoirs, and lakes in China reservoir area, which was smaller than those of the GRanD (20.98 × 10 3 km 2 ) and GeoDAR (21.84 × 10 3 km 2 ) datasets. The GRanD v1.3 dataset linked the "maximum surface water extent" from the JRC dataset to the corresponding dams as the reservoir regions; however, we used the "yearlong surface water body" to depict the reservoirs in the China-LDRL dataset, which might have made our reservoir areas smaller (Fig. S8).
In this study, we also checked the accuracy of geographical coordinates of dams from these dam datasets. Here we first uploaded above-mentioned three dam datasets and our China-LDRL dataset in the Google Earth Pro and visually checked the spatial distribution of each dam within the historical satellite images in 2019 (Fig. 10). We found that the dam locations of the GOODD dataset had substantial geographic offsets, some of which are larger than 500 m (Fig. S9). We further overlapped the GOODD dam layer with our yearlong SWB map (Sect. 2.3.1), and the results showed that only 12.52 ± 3.87 % of the GOODD dams were intersected with the SWB layer at the national scale (Fig. S10a). In the case that we applied a 100 and 500 m tolerance when inter-secting the GOODD dams with our yearlong SWB map for 2019, the intersection rate increased to only 47.58 ± 9.70 % and 76.46 ± 7.11 %, respectively (Figs. S10b, S11). In addition, we applied different tolerances when the GRanD and GeoDAR datasets intersected with our yearlong SWB layer. About 65.57 ± 6.79 % of the dams in the GRanD dataset intersected with our yearlong SWB map, which increased to 87.52 ± 6.45 % and 95.94 ± 4.49 % when using a 100 and 500 m tolerance. Although the GeoDAR dataset is just updated and the newest version had much higher accuracy than the previous version, its geographical coordinates also had some offsets (Fig. 10f, g), and 58.49 ± 6.07 % of its dams intersected with the yearlong SWB layer, and 82.33 ± 3.98 % and 90.22 ± 3.18 % intersected when the tolerance was 100 and 500 m. Different methods and purposes caused the georeferenced offsets of these datasets. For example, the original digitized dam points in GOODD V1.0 were purposefully snapped to the 30 arcsec HydroSHEDS river networks, leading to the offset from the actual dam locations. On the other hand, GOODD v1.0 is directly compatible with Hy-droSHEDS and is therefore more convenient for modeling purposes. In GeoDAR v1.1, dam points in China were georeferenced using the Google Maps geocoding API, and many dam labels fell on the reservoir surface instead of the dams. Additionally, Google Maps in China have substantial misalignment (500 m to 1 km or so) between the satellite images and the map labels due to China's GPS shift problem, resulting in the geographic offsets even though the geocoding procedure is correct. In total, these comparisons suggested the improved accuracy of our China-LDRL dataset, which could provide important and reliable information for water resource management and water security in China.

Uncertainties, limitations, outlooks, and implications
In this study, we produced a detailed and more accurate dataset for China's open surface water bodies, large dams, reservoirs, and lakes (China-LDRL) for 2019, and analyzed their spatial distribution patterns. This study benefited from the usage of time-series Landsat imagery and GEE cloud computing platform, and simple and robust SWB mapping algorithms. First, time series Landsat images at high spatial resolution (30 m) provide larger numbers of good-quality observations for identifying SWB. Second, GEE cloud computing platform enables us to acquire and analyze tens of thousands of Landsat images in hours. Third, the mNDWI/VIs algorithm used in this study reduced the uncertainties induced by the bad-quality observations and provide accurate SWB maps. Finally, we visually checked the large SWB polygons (area > 1 km 2 ) one by one by using the historical satellite images in 2019 within China in Google Earth Pro, and we recorded the georeferenced coordinates of individual dams in China for 2019.
We would also acknowledge that the data quality of input satellite images remains to be a concern for the identification of dams, reservoirs, and lakes. The spatial distribution of good-quality observations of Landsat data shows that more than 98.36 % of the total 30 m pixels in China had more than five good-quality observations and more than 91.24 % of the total pixels had more than 10 good-quality observations for 2019 (Fig. 1b), but the regions with complex topography and mountains, such as South and Southwest China, had much fewer good-quality observations than other regions, which might underestimate surface water areas, and dam and reservoir numbers and areas. In addition, it is impossible to remove all the bad-quality observations (e.g., clouds, terrain shadows) because of the limited quality of the QA band and digital elevation model data in GEE. Therefore, the remaining bad-quality observations could result in some inevitable uncertainties in the resultant maps. In the future, as more images from Landsat datasets and other high spatial resolution sensors (e.g., Sentinel-1, Sentinel-2) are added into GEE platform (Wulder et al., 2016), SWB mapping accuracy could be further improved, providing more detailed geospatial data of dams, reservoirs, and lakes in China. In addition, 3768 X. Wang et al.: Improved maps of surface water bodies, large dams, reservoirs, and lakes in China visual interpretation method for identifying dams and reservoirs in this study could also bring about some uncertainties to the classification of dams/reservoirs due to the limitations of knowledge and experience of interpreters, such as the misclassification of some reservoirs regulated by dams/gates as lakes (e.g., Hongze Lake in Jiangsu Province) and the misclassification between on-stream and off-stream dams/reservoirs.
In our China-LDRL dataset, we identified and reported those large SWB; however, the importance of monitoring small water bodies (area ≤ 1 km 2 ) and dams is gradually recognized as they play critical roles in accurate assessments of their agricultural potential or their cumulative influence on watershed hydrology (Ogilvie et al., 2018). In the future, we can include these small SWB polygons into our dataset to en-hance the spatial details and distributions of dams, reservoirs, and lakes in China.
The conversions between rivers, lakes, and reservoirs have critical effects on the ecosystem services. For example, the construction of the Three Gorges Dam contributed to the decrease of surface water area and biodiversity in its downstream areas (Fang et al., 2006;Feng et al., 2013;Wang et al., 2020a), and reduced the sediment loads in the Yangtze River, causing the decreased deposition rates of coastal wetlands in the Yangtze Delta (Feng et al., 2016;Wang et al., 2021a). Furthermore, the conversion from natural lakes and rivers to man-made reservoirs has disproportionate effects on the local, regional, and global carbon cycle (Howard Coker et al., 2009). For example, dam construction has reduced the areal extent of CO 2 gas exchange in natural rivers (Ran et al., 2021). In the future, more detailed information (e.g., construction year of dam) needs to be included in our China-LDRL dataset, making it possible to analyze the effects of conversions from natural lakes and rivers to reservoirs on the biodiversity and carbon cycle.

Code availability
Code used in calculations of surface water bodies is available upon request.

Conclusions
Several studies have published global or national dam, reservoir, and lake datasets based on satellite images (Table 1). However, these datasets usually have large georeferenced coordinate offsets, which poses some limitations to those studies that aim to address major issues in hydrology, ecology, and water resource management in China. In this study, we have generated the dataset of China's open surface water bodies, large dams, reservoirs, and lakes (China-LDRL) for 2019, and then analyzed their spatial distributions at the provincial and national scales. Satellite image data quality is still a major source of uncertainty that affects the accuracy of the surface water body maps. As more images from Landsat datasets and other high spatial resolution sensors (e.g., Sentinel-1, Sentinel-2) are added to GEE platform, the accuracy of SWB maps can be further improved, providing more detailed geospatial data of dams, reservoir, and lakes in China. The provision of the reliable, accurate China-LDRL dataset on dams, reservoirs, and lakes will contribute to the understanding of water resources management and water security in China.
Author contributions. XX, XW, and BL designed the study. XW carried out image data processing, and led interpretation of the results and writing of the article. YQ and JD contributed to image data processing. XX, BL, YQ, JD, and JW contributed to the interpretation and discussion of the results.