A comprehensive geospatial database of nearly 100 000 reservoirs in China

. With rapid population growth and socioeconomic development over the last century, a great number of dams/reservoirs have been constructed globally to meet various needs. China has strong economical and societal demands for constructing dams and reservoirs. The ofﬁcial statistics reported more than 98 000 dams/reser-voirs in China, including nearly 40 % of the world’s largest dams. Despite the availability of several global-scale dam/reservoir databases (e.g., the Global Reservoir and Dam database (GRanD), the GlObal geOreferenced Database of Dams (GOODD),


Introduction
Reservoirs and their dams play a crucial role in green energy generation and water resources management. Since the mid-20th century, the ever-growing human demands for water use and hydropower have driven an unprecedented boom in reservoir construction worldwide (Chao et al., 2008;Wada et al., 2017). Dam construction and reservoir impoundment can lead to many potential environmental and socioeconomic impacts (Jiang et al., 2018;Zarfl et al., 2019). These consequences mainly include the threat to biodiversity and ecosystems (Winemiller et al., 2016), a change in the hydrological regime Vörösmarty et al., 2003), the degradation of water quality (Zarfl et al., 2019;Barbarossa et al., 2020), the modification of the geochemical cycle (Maavara et al., 2020), an alternation of the river morphology (Bednarek, 2001;Nilsson and Berggren, 2000;Winemiller et al., 2016;Grill et al., 2019;Latrubesse et al., 2017;Bond and Cottingham, 2008;Nilsson et al., 2005;Wang et al., 2013), a disturbance in climate regimes (Pekel et al., 2016;Degu et al., 2011;W. Wang et al., 2017;Van Manh et al., 2015), migration of human settlement (Tilt et al., 2009), and changes in the land use patterns (Stoate et al., 2009;Carpenter et al., 2011).
Despite these controversial effects, artificial reservoirs have been constructed widely across many basins of the world, serving a variety of purposes such as hydropower generation, water supply, irrigation, navigation, flood control, and recreation (Belletti et al., 2020;Biemans et al., 2011;Döll et al., 2009;Grill et al., 2019;Boulange et al., 2021). In addition, reservoirs assist water managers in converting natural flow conditions into flow conditions that meet human demands, which is especially important in locations where water resources are restricted due to the hydrologic seasonality or the growing influences of climate change and variability (Richter et al., 2006). The solution to balance the benefits and consequences of reservoirs should not be a simple decision of whether or not to construct them. The significant benefits and the additional effects highlight the importance and necessity of a holistic picture of the reservoir distributions and continuous monitoring of them to understand the impacts better. Information and data regarding reservoirs are rather crucial for scientists, practitioners, and policymakers, owing to various purposes, for instance, estimation of water budgets and impacts on hydrologic and nutrient fluxes on regional or global scales (Chao et al., 2008;Bakken et al., 2013Bakken et al., , 2016Popescu et al., 2020;Postel, 2000), water availability projections or flood/drought risk mitigation (Di Baldassarre et al., 2017;Ehsani et al., 2017;Elmer et al., 2012;Veldkamp et al., 2017;Metin et al., 2018), assessment of hydropower station construction (Bertoni et al., 2019;Gernaat et al., 2017;Xu et al., 2013;Moran et al., 2018;Winemiller et al., 2016), and investigation of biotic disturbance (Latrubesse et al., 2017;Maavara et al., 2020;Dorber et al., 2020;Sabo et al., 2017). Considering reservoirs in physical models can significantly improve the modeling performance (Gutenson et al., 2020). The modeling requires a minimum set of the reservoir characteristics, including their spatial location, abundance, area, and storage capacity. Besides, the reservoirs are considered to be a key source of greenhouse gases (GHGs), partly offsetting the carbon sink of continents (St. Louis et al., 2000;Aufdenkampe et al., 2011;Barros et al., 2011;Raymond et al., 2013;Deemer et al., 2016). There is thus an increasing concern about the true GHG fluxes from reservoirs. Answering these questions requires a comprehensive database depicting reservoir distributions and properties, especially for hydropower boom regions in Asia, South America, and Africa.
China has a strong economical and societal demand for hydroelectric development, flood control, and agricultural irrigation. In 2007, China's Medium and Long-Term Plan for Renewable Energy Development projected constructing 300 GW of gross installed hydropower capacity by 2020, exceeding the doubled capacity in 2007. The installed hydropower capacity target has been reset to 420 × 10 6 kW by 2020, representing a 70 % increase in 2012. In China, more than 60 % of total water consumption is taken by the agricultural water sector, among which 90 % of the quota is shared by irrigation water use (Jiang et al., 2018). Therefore, reservoir construction in China has experienced drastic growth. The number of Chinese reservoirs increased slowly after the 1980s and soared to the count of 98 000 around 2015 (MWR, 2016). According to the register of the International Commission on Large Dams (ICOLD and CIGB, 2011), China possesses nearly 40 % of the global large dams (storage capacity greater than 0.1 km 3 ). However, little is known about the spatial locations and related georeferenced information of these constructed reservoirs at the national level for China.
There have been multiple efforts made to produce a global reservoir inventory, including those of China. The most recognized and comprehensive database is the World Register of Dams (WRD), hosted and maintained by ICOLD, which reports 23 841 dams for China. However, as this database is not georeferenced, its utility is severely limited. The Global Reservoir and Dam database (GRanD; Lehner et al., 2011) was an initiative database that can provide global geospatial details about reservoirs and their attributes. Its latest version, v1.3, contains 7320 dams/reservoirs, with a cumulative capacity of 6881 km 3 , while only 921 Chinese reservoirs were included. Recently, the GlObal geOreferenced Database of Dams (GOODD; Mulligan et al., 2020) and the Georeferenced global Dams And Reservoirs dataset (GeoDAR; Wang et al., 2022) were published, containing more than 38 000 and 20 214 reservoirs on a global scale, respectively. GOODD was manually digitized from high-resolution Google Earth imagery, whereas GeoDAR was georeferenced from ICOLD WRD, with a full harmonization with GRanD. For the Chinese territory, the GOODD and GeoDAR databases contain 9238 and 4859 reservoirs, respectively, which still significantly below the scales of WRD and the Ministry of Water Resources (MWR). Given the lacking information, a compre-hensive and spatially explicit database of reservoirs in China is required.
This study aims to share, as comprehensively as possible, fundamental open-access information on reservoirs in China. We have compiled the database based on a variety of data sources, including the national 1 : 250 000 public basic geographic database, the Almanac of China's Water Power, three global reservoir inventories (GeoDAR v1.1, GRanD v1.3, and GOODD v1.0), and other published documents and online maps (e.g., OpenStreetMap (OSM) and Tianditu Map). A comparison with GeoDAR, GRanD, and GOODD was conducted to assess the database. Our inventory contains significantly more reservoirs than the currently available databases. This database can provide researchers with basic information on reservoir locations, spatially explicit inundation areas, water storage, and related details in China, with the goal of advancing research on water resources, ecological and environmental consequences, and global change impacts and socioeconomic sector assessments on a national and worldwide scale. Before constructing the reservoir database in China, the data of the existing dams and reservoirs are preliminarily compiled as the basis for determining the location of reservoirs. Existing dam/reservoir databases containing geographical information are one of the key spatial data sources for reservoirs, including GRanD, GOODD, GeoDAR, and Future Hydropower Reservoirs and Dams Database (FHReD). GRanD is a data product of the Global Water System Project and was first released in 2011 (Lehner et al., 2011). GOODD (Mulligan et al., 2020) is a comprehensive global dam database provided by manual inspection and is digitization based on multi-source remote sensing satellite observations and Google Earth images. FHReD database collects spatial locations of reservoirs that are currently being built or those that are planned for the future (Zarfl et al., 2015). GeoDAR is a global dam and reservoir geographic database based on the multi-source data fusion and online geocoding of the ICOLD reservoir records . The FHReD database provides information on 3700 planned and under construction reservoirs worldwide, of which 251 reservoirs are located in China, and 97 have been dammed by 2020.
In this study, these abovementioned databases were used to provide location information on part of China's reservoirs, particularly those of a large size. We integrated the spatial information of existing reservoirs in China and eliminated duplicate information. In this way, the China Reservoir Dataset (CRD) retains the spatial information of each unique Chinese reservoir in these three global databases.

National basic geographic databases
The national 1 : 250 000 public basic geographic database covers the whole land area of China and its major islands. Overall, the map elements represent the landscape situation around 2015. The database, which contains nine element layers such as waterbody (point, line, and surface layer), is treated with the security technology of spatial location accuracy and attribute content. Reservoir information is contained in the waterbody layer provided by the basic topographic map and the layers of natural place names (notes), most of which have name attributes and spatial positioning information. Although the national surveying authorities provide the basic terrain data, the spatial coordinates are biased due to the confidential processing of the map. Therefore, we carried out rigorous data correction and quality control by referring to the high-resolution Google Earth imagery. Finally, the database provided the spatial information references of 27 047 reservoirs for the CRD database.
The Tiandi Map is an online map system developed by the State Bureau of Surveying and Mapping of China (https: //map.tianditu.gov.cn/ last access: 16 January 2022; only in Chinese), which provides geographic information services in two forms, i.e., portal and service interface. It integrates public geographic information resources from national-, provincial-, and prefecture-or county-level mapping and geographic information departments, relevant government departments, enterprises and institutions, social groups, and the public. In addition, users can use the service interface to call the authoritative, standard, and unified online geographic information comprehensive service of the Tiandi Map. In this study, the Tiandi Map was mainly used in two aspects. First, it is used as a base map for visual interpretation and supplementing the potentially missing reservoirs. In this process, we initially identified about 60 000 potential reservoirs. Second, the map was used to provide the reservoir name attribute. According to the locations of the reservoir checked by manual inspection based on the Tiandi Map, the name of the reservoir was queried by calling its reverse geocoding API.

Open-source map data
Open-source maps such as OSM were another key source of obtaining reservoir locations. OSM is a platform for users, organizations, or countries worldwide to organize and maintain multi-source geographic information data. Map vector data are available for download under an open database license. Due to OSM data's open-source and shared characteristics, the collected multi-source geographic information data can be used as a supplement to other time-limited databases. They can promptly reflect the changes in land surface in-formation. OSM contains data such as water systems, road traffic, natural boundaries, land use, and construction. Water system data provide part of the reservoir polygon data with names, which are mainly compiled manually by OSM users. Finally, the spatial locations of 89 reservoirs were obtained from the OSM.

Data sources for reservoir inundation area mapping
The water inundation area is an important indicator of the reservoir and a variable for modeling reservoir storage capacity. Since the reservoir area is dynamically changing, we considered the maximum water area of the reservoir over the last several decades  in this study. Moreover, the maximum water area of the reservoir can indirectly reflect its water storage capacity. Therefore, we merged two water occurrence datasets, the Global Surface Water v1.0 (GSW) and Global Land Analysis and Discovery (GLAD), to obtain long-term historical maximum water areas of each of the compiled reservoirs.
GSW is a remote sensing big data computing platform developed by Pekel et al. (2016) using the Google Earth Engine (GEE). Based on all available Landsat 5, 6, 7, and 8 data acquired from 1984 to the present, Pekel et al. (2016) used the expert classification system to divide each available pixel into water bodies and non-water bodies and integrated the results into the data of monthly, annual, and decadal timescales. The maximum water boundary, water inundation frequency, water change intensity, water transition, water recurrence, seasonal water, monthly water range, monthly water recurrence, and annual water range are provided. GLAD is the global water body map from 1999 to 2019 obtained by Pickens et al. (2020) using the GEE remote sensing big data computing platform based on Landsat 5, 7, and 8 images. The surface water range changes during this period were highlighted, and the water was classified into several categories based on water probability, including permanent water area, seasonal water area, lost water area, new water area, temporary land area, temporary water area, and high change area.
Considering that both the GSW and GLAD datasets are at 30 m resolution, we also applied FROM-GLC10 (Finer Resolution Observation and Monitoring of Global Land Cover at 10 m resolution) based on Sentinel-2 data in 2017 (Gong et al., 2019) to handle the incomplete mapping of extremely narrow boundaries for a few reservoirs located in deep valleys. This database takes the existing land cover data as training samples. It combines the data of the Shuttle Radar Topography Mission (SRTM) on the GEE big data computing platform to classify the data by random forest method to obtain the maps of alpine and swamp areas with an overall accuracy loss rate of less than 1 %. The training samples were classified based on Landsat 8 original images and eight important indices commonly used in remote sensing monitoring, such as normalized difference vegetation index, modified water index, and normalized difference building index.

Data sources for reservoir storage capacity estimation
The reservoir storage capacity records were retrieved from various yearbooks and documents, including the Almanac of China's Water Power and other government documents. The Almanac of China's Water Power is a professional industry yearbook for hydropower in China, providing detailed information on China's mega-reservoirs, including the reservoir location, the dam purpose, the basin area, the storage capacity, and water level data of various types and the dam construction and impoundment time. Other government documents used in the study mainly include the "List of Persons responsible for the safety of Large reservoirs in China in 2020", which is issued by the Ministry of Water Resources, the "List of persons responsible for the safety of large and medium-sized reservoirs", which is issued by different provinces and prefectures in China, and the "List of Reservoirs in Hunan Province", which is issued by the Water Resources Department of Hunan Province. These documents provide information on the type and location of the dam/reservoir and the storage capacity of reservoirs of different sizes. Finally, from the Almanac of China's Water Power and some other government documents, we collected authoritative information on the locations and storage capacities of 5143 reservoirs.

Reservoir location extraction
To build this database, we started with a preliminary compilation of the location information of Chinese dams and reservoirs from three types of data sources (see Fig. 1a). The first type of source is published georeferenced databases for dams and reservoirs, including GRanD, GOODD, FHReD, and GeoDAR. We combined China's reservoir location information with the four published dam/reservoir products. After removing duplicates by manual inspection, we obtained the names and locations of about 7400 unique reservoirs. The second type of sources is national basic geographic databases (including the national 1 : 250 000 public basic geographic database and Tiandi Map), the Almanac of China's Water Power, and other government documents. We checked the national 1 : 250 000 public basic geographic database, and its drainage layer data and natural place name layer contained the most reservoir information. Here, the Tiandi Map was used a base map for visual interpretation to supplement missing reservoirs in the national public basic geographic database. Moreover, we made a list of reservoirs from the Almanac of China's Water Power and documents from local governments, which only provided the county-level address for each reservoir. We then employed the Tiandi Map geocoding API to query the latitudes and longitudes of these reservoirs. Based on the second type of data sources, we obtained the location information of about 90 000 reservoirs. The third type of data sources is the open map database, OSM. From the OSM, we obtained the location information of 89 reservoirs. After harmonizing the three types of sources, we concluded with the locations of a total of 97 435 unique reservoirs in China.

Reservoir water inundation extent mapping
After determining the spatial location of all reservoirs, we extracted the historical maximum water inundation extent (from the mid-1980s to 2020) of the corresponding reservoirs based on GSW, GLAD, and FROM-GLC10 data (Fig. 1b). GSW data can provide the maximum water area of reservoirs with a long time series from 1984 to 2020. GLAD only maps images over the last 20 years, but it combines Landsat with Sentinel-1 and Sentinel-2 to provide a higher temporal resolution to describe ephemeral surface water better. Through comparative inspection, we found that GLAD could describe the water area details more completely for some reservoirs, especially narrow river channel reservoirs. Therefore, we merged GSW and GLAD datasets to obtain the maximum water area of all reservoirs. In addition, FROM-GLC10 is based on the Sentinel 10 m resolution imagery data, which can identify relatively small reservoirs (reservoir area smaller than 0.01 km 2 ). Therefore, we also supplemented a few narrow river channel reservoirs, especially those in mountainous regions of Zhejiang, Fujian, Sichuan, Jiangxi, and Guangxi provinces. The automatically extracted water masks by intersecting with our compiled reservoir point locations were visually inspected and, if necessary, manually edited (e.g., to separate the reservoir from the river segment) to form quality-controlled reservoir boundaries. Up to now, there are still reservoirs that have not been collected, except for those identified in Sect. 3.1. So, all the remaining water bodies were manually checked by overlapping with the Google Maps high-resolution images to minimize the number of missed reservoirs. Finally, a total of 97 435 reservoir polygons were extracted. For reservoirs without corresponding names, the reverse geocoding API of Tiandi Map was used to query the names of corresponding reservoirs. Here, the reverse geocoding API refers to entering the reservoir's coordinates and then returning the relevant name information of the corresponding reservoir. Eventually, 66 253 reservoirs were identified and supplemented with the name attribute.

Reservoir storage capacity and residence time estimation
Reservoir storage capacity is one of the most basic types of information about reservoirs. As shown in Fig. 1c, the source of reservoir storage capacity in the CRD database is mainly divided into two types, i.e., the recorded values obtained from the yearbook and government documents, as mentioned in Sect. 2.2, and statistical estimations by an empirical model. According to the yearbook and other documents (Sect. 2.2), we collected the storage capacity records for 5143 reservoirs in various sizes, among which there are 162 Type-I super-large reservoirs (storage capacity greater than 1 km 3 ), 580 Type-II large reservoirs (0.10-1 km 3 ), and 4407 small and medium-sized reservoirs (smaller than 0.10 km 3 ). As super-large reservoirs (mostly canyon-type reservoirs) tend to have different hypsometric (area-storage relationship) characteristics from small and medium-sized reservoirs (mostly in plain and hilly areas), we excluded the 742 large reservoirs from model calibration. In addition, we removed 84 reservoirs that do not conform to the small and medium-sized reservoir class (storage capacity smaller than 0.10 km 3 ). The statistical relationship between the inundation area and storage of a total of 4323 reservoirs was established to estimate and supplement the capacity estimation of the remaining unrecorded small and medium-sized reservoirs. The empirical model was used to fit the storage capacity and area of the existing recorded reservoirs (Fig. 2). The fitting equation is as follows, and the R 2 is 0.84. log 10 V = 1.096 × log 10 S + 0.349 (1) where V represents the reservoir storage capacity in the unit of cubic meters (m 3 ), and S represents the maximum reservoir area in the unit of square meters (m 2 ). We calculated the SMAPE (symmetric mean absolute percentage error) of the estimated storage capacity, and it showed 32.62 %-32.64 % bias at the 95 % confidence interval based on the fitted model. Finally, the recorded values from yearbook and other documents are regarded as the storage capacity of 5143 reservoirs, totaling about 803.29 km 3 . The storage capacity of the other 92 292 reservoirs was estimated using their maximum inundation areas, as in Eq. (1), with a total of 176.33 km 3 , ranging from 121.67 to 257.30 km 3 . Therefore, the total storage capacity of Chinese reservoirs is 979.62 km 3 (924.96-1060.59 km 3 ). HydroSHEDS (Hydrological data and maps based on SHuttle Elevation Derivatives at multiple Scales) provide hydrographic baseline information in a consistent and comprehensive format to support regional and global watershed analyses and hydrological modeling. It is currently considered the leading global product in terms of quality and resolution (Lehner and Grill, 2013). HydroBASINS and Hy-droRIVERS are extracted from HydroSHEDS at a 15 arcsec resolution. HydroRIVERS represents a vectorized line network of all global rivers with a catchment area of at least 10 km 2 , an average river flow of at least 0.10 m 3 s −1 , or both. HydroRIVERS covers all rivers in the Pfafstetter Level 12  sub-basins of HydroBASINS and contains the attribute information of each river, including an estimate of long-term average discharge. Here, we focused on reservoirs (17 185) located on HydroRIVERS rivers and extracted reservoir discharges based on HydroRIVERS. Moreover, these reservoirs cover 96 % of the CRD reservoirs larger than 1 km 2 . The remaining smaller reservoirs, on the one hand, are not on the HydroRIVERS rivers; on the other hand, it is difficult to obtain the discharge of smaller reservoirs. Therefore, they are generally not included in hydrological simulations. Notably, while the CRD database provided information about reservoir discharge and residence time, these data can be updated for specific hydrological modeling. The equation of average residence time is as follows: where DIS_AV_CMS represents the reservoir discharge in the unit of cubic meters per second (m 3 s −1 ), and RES_T represents the reservoir residence time in the unit of years. The R 2 of the estimated reservoir residence times and the corresponding results of HydroLAKES reservoirs is 0.82.

Description of the CRD database
This database catalogs the location information of 97 435 reservoirs in China, with an aggregated area of 50 085.21 km 2 and an estimated total storage capacity of 979.62 km 3 (924.96-1060.59 km 3 ). The 5143 reservoirs in the CRD database were directly derived from the yearbook and other documents data, accounting for 59 % and 82 % of the total reservoir area and storage capacity of the CRD database, respectively. This reservoir information was mainly obtained through manual compilation. The attributes of the recorded reservoirs include the longitude and latitude of the reservoir, name, province, prefecture, and county where the reservoir is located, water area, water level of normal storage capacity, storage capacity, reservoir class, main use, and regulation type ( Table 1). The attributes of all the CRD reservoirs (in all cases) include location information (longitude, latitude, province, prefecture, and county), inundation area, estimated storage capacity, river order, discharge, and residence time of reservoirs, as shown in Table 2. The Pareto distribution can describe the global distribution abundance of artificial reservoirs and their inundation areas (sizes; Lehner et al., 2011;Downing et al., 2006). In Fig. 3, we applied such a statistical fitting distribution to the CRD database and inferred the count of smaller reservoirs and their total inundation area. Assuming that our data for reservoirs larger than 0.01 km 2 are complete, trend lines can be fitted and extrapolated from the Pareto distribution to estimate smaller reservoirs not included in the CRD database. As a result, there is an overall good fitting in the Pareto model for the CRD reservoirs in the scale of 0.01-10 km 2 (Fig. 3a). In addition, the Pareto distributions in each basin are similar to those on the national scale ( Fig. 3b-k).

Accuracy evaluation of the CRD database
To evaluate the commission and omission accuracy of the CRD database, we randomly selected sub-basin areas in each first-level river basin across China and manually checked 3634 reservoirs (Fig. 4). The collection of the validation sub-basins followed the "create random sampling points" method. Most of them are third-level river basins. However, for the Yangtze River and the Yellow River basins with more reservoirs, four sub-basins were selected to evenly distribute the sampled reservoirs. For each sampled reservoir, we manually confirmed its relevant information with the record in the Tiandi Map. We overlapped 3634 selected samples with the Tiandi Map to validate the geo-matching accuracy of the CRD. Then, we manually checked whether the spatial coordinates of each sample were consistent with those recorded in the Tiandi Map. In addition, we conducted a second-round quality control to check if any reservoirs were missing.  Province in which the reservoir is located. Prefecture Prefecture in which the reservoir is located. County County in which the reservoir is located. Area Maximum water area of the reservoir (unit in km 2 ). Normal elevation Water level of the normal storage capacity (unit in m). STOR_Recor Total storage capacity of values from the yearbook and literature records (unit in km 3 ).

ResvClass
Reservoir class (1 is large Type-I, 2 is large Type-II, 3 is medium, 4 is small Type-I, 5 is small Type-II, and 6 is pumped storage type). Comprehensive utilization Main uses of the reservoir (mainly including power generation, water supply, shipping, flood control, and irrigation).

Type of regulation
Regulation types of reservoirs (mainly including day, week, season, and year).
Note that "normal storage capacity" means that the reservoir reaches the storage capacity that can actually be used to regulate runoff. Province in which the reservoir is located. Prefecture Prefecture in which the reservoir is located.

County
County in which the reservoir is located. Area Maximum water area of the reservoir (unit in km 2 ). STOR Total storage capacity (unit in km 3 ).

RIV_ORD
Indicator of the river order using river flow to distinguish logarithmic size classes. RIV_ORD refers to the RIV_ORD of the HydroRIVERS. DIS_AV_CMS Average long-term discharge estimate for the reservoir (unit in m 3 s −1 ).

RES_T
Residence time of each reservoir (the ratio between reservoir storage capacity and discharge; unit in years).
Note that missing or inapplicable values are flagged by "−999". Table 3, the overall evaluation accuracy for the CRD database is 95.13 %, ranging from 92.79 % to 97.17 % in different basins. The main cause of errors in most basins is the misclassification of false reservoirs (commission error), such as ponds and paddy fields. Also, these ponds and paddy fields are generally less than 0.10 km 2 . In comparison, the accuracy was lowest in the Southwest River basin due to the commission error.

Spatial distribution of reservoirs in China
The total area of reservoirs in China is 50 085.21 km 2 , and the total storage capacity is estimated to be 979.62 km 3 . The spatially divergent pattern is generally characterized by the topographic division of the Hengduan Mountains in the eastwest direction and the Qinling mountains and the Huaihe River in the north-south direction. The overall distribution of the reservoirs is bounded by the Heihe-Tengchong Line that is widely recognized as a separating line for the contrasting pattern of population, industrial development, and landscape characteristics, decreasing from southeast to northwest. Latitudinally, reservoirs in China are dominantly distributed in the belt between 20-30 • N, both in terms of count and area, whereas longitudinally, reservoirs in China are concentrated between 100-120 • E.
Chinese reservoirs are widely distributed and have obvious agglomeration characteristics. Reservoirs are distributed not only from the hot and humid southern areas to the arid desert areas but also from the eastern coastal areas to the Qinghai-Tibet Plateau. From Fig. 5, reservoirs are mainly distributed in China's major commodity grain production bases that have a relatively great demand for agricultural irrigation, such as Note: SER -Southeastern River, HR -Haihe River, HuR -Huaihe River, YR -Yellow River, LR -Liaohe River, SHR -Songhua River, NWR -Northwest River, SWR -Southwest River, YZR -Yangtze River, and PR -Pearl River. "Commission error" represents geocoding errors where the CRD information is inconsistent with the validation reference. "Omission error" indicates the number of missing reservoirs in the samples. the Poyang Lake and Dongting Lake plain, Huaihe River basin, Songnen Plain, and Sanjiang Plain. Moreover, many large reservoirs are accumulated in areas with large elevation drops and abundant water resources. For example, reservoirs in Sichuan province are clustered along the main stems of Fujiang River, Jialing River, and Yangtze River. In addition, as a major water supply, many reservoirs are concentrated in urban areas such as the Shandong Peninsula urban agglomerations. In the Shandong Peninsula, reservoirs are mainly concentrated in the Yimeng Mountains and the Bohai Rim area.

Distribution characteristics of reservoir storage capacity in China
In terms of storage capacity spatial distribution, reservoirs with substantial storage capacity are mostly found in the Yangtze River and the Pearl River. Many major reservoirs have been built in the Southwest River in recent years, primarily in the upper stages of the Lancang, Yuan, and Nujiang rivers. The Huaihe River and Haihe River basins, on the other hand, have several reservoirs, although their storage capabilities are limited, owing to the flat terrain's minimal elevation changes. While the Yellow River has no evident benefit in terms of count or storage capacity, it has the biggest reservoir regulation of any basin, and its total reservoir capacity has reached 3 times its annual runoff. The distribution of reservoir storage capacity in China is shown in Fig. 6. There are 135 reservoirs with a storage capacity of above 1 km 3 (see Fig. 6b), accounting for 60.81 % of the total. Among them, there are 15 reservoirs with a storage capacity of more than 10 km 3 in China, accounting for 29.39 % of the total reservoir capacity. Also, the top 10 reservoirs (Three Gorges, Danjiangkou, Longtan, Longyangxia, Nouzhadu, Xin'anjiang, Xiaowan, Shuifeng, Xinfengjiang, and Xiluodu reservoirs) are mainly distributed in the Yangtze River, Pearl River, and Southwest River, which are rich in water resources. These 10 reservoirs alone account for 23.51 % of the total storage capacity of the CRD.
Furthermore, we analyzed the distribution characteristics of the reservoir number, area, and storage capacity in each primary and secondary watershed of the water resources division. The big bubbles illustrated in Fig. 7 represent basins with a large count, large area, and large storage capacity which belong to the Yangtze River. Almost all the secondlevel river basins with a relatively large storage capacity are distributed in the middle and upper reaches of the Yangtze River, including the Dongting Lake basin, Poyang Lake basin, the Jinsha River basin, and the Han River basin.

Comparisons with other reservoir databases
To better examine the supplemented reservoirs in the CRD database over Chinese territory, we compared the CRD reservoirs with the widely recognized and publicly available reservoir/dam databases, including GOODD, GeoDAR v1.1, and GRanD v1.3. Figure 8 shows the contrasts among these four databases in the count, area, and storage capacity. Since GOODD does not provide reservoir attribute information (except for locations and catchment areas), it is only compared with CRD with respect to the reservoir count. The quantity of reservoirs in the CRD (97 435) exceeds those of the Chinese subsets of the global databases (from 9238 in GOODD to 921 in GRanD) by 1-2 orders of magnitude. The CRD increased the total reservoir area by about 169 % and 194 % compared with GeoDAR and GRanD, respectively. In comparison, the total storage capacity of the CRD exceeds the GeoDAR and GRanD by 249.23 and 293.51 km 3 in China, respectively. Notably, although GeoDAR still largely exceeds GRanD in dam count, their total storage capacity was comparable, with GeoDAR increasing its reservoir storage capacity by approximately 6 % (44 km 3 ). This is because GRanD has included the largest reservoirs in China.
We also compared CRD with the three global databases at different levels of reservoir areas. As shown in Fig. 9a, the advantage of the CRD is most evident in the supplement of reservoirs with an area less than 1 km 2 , particularly for reservoirs smaller than 0.10 km 2 . Therefore, the total reservoir areas of the corresponding CRD database with an area smaller than 0.10 and 0.10-1 km 2 are also higher than those of other databases. For larger reservoirs (1-9, 10-100, and larger than 100 km 2 ), the counts of the CRD, GeoDAR, and GRanD have little difference, but the CRD area is slightly higher, mainly because the reservoir polygons applied in this study represent the maximum water extents. In addition, we found that the storage capacity of the CRD reservoirs increased at different area levels, with an average increase of 54.28 km 3 . Figure 10a shows the distribution of large reservoirs (storage capacity larger than 3 ×10 6 m 3 ) in the upper reaches of the Yangtze River in GRanD v1.3, GeoDAR v1.2, and the CRD. Because the GOODD dataset is limited by the basic property (reservoir storage capacity and dam height), it was not included in this comparison. GeoDAR v1.2 incorporates GRanD v1.3 so that the pattern of large reservoirs in the upper Yangtze River is generally comparable between the two databases. Compared with GRanD v1.3 and Geo-DAR v1.2, CRD has added 16 large reservoirs in the upper reaches of the Yangtze River, with a total storage capacity of 52.60 km 3 , of which the total storage capacity of new reservoirs constructed in the past 5 years accounted for 77.00 % (40.50 km 3 ). The large reservoirs dominate the total storage capacity in the basin. Therefore, the increase in new large reservoirs dammed in recent years is one of the major differences of the CRD in the storage capacity. As shown in Fig. 10b-c, GRanD v1.3, GeoDAR v1.2, GOODD, and the CRD can all digitize reservoirs on rivers with catchments of more than 10 km 2 . However, many smaller reservoirs were not compiled in GRanD v1.3, GeoDAR v1.2, and GOODD.

Analysis on the accumulation hotspots of the CRD reservoir distribution
The construction of hydropower stations alleviates the energy shortage in China, reduces the consumption of nonrenewable coal energy, and makes a great contribution to the sustainable development of China's economy and society. To further understand the characteristics of reservoir accumulation distribution in China, we quantified the degree of reservoir accumulation from the dimensions of the count, area, and storage, respectively. Figure 11 shows the reservoir accumulation degree in the count, area, and storage capacity of the CRD reservoirs. High reservoir density hotspots can be observed in the Yangtze River's middle and lower reaches, mainly in the Poyang Lake and Dongting Lake basins. These two lake basins have rugged terrains, which provide topographic convenience for constructing reservoirs. Besides, the basins are densely populated and are an important commodity grain base, so reservoirs are critical to meeting the agricultural irrigation water demand. The large labor force also facilitated reservoir construction. The construction of small and medium-sized reservoirs in China reached a peak under the impact of the new and old "three pillars" policy from the founding of the People's Republic of China in 1949 to the reform and openness in 1978. Figure 11b shows that the hotspots in the reservoir area are mainly distributed in the Yangtze River, northeastern China, and the Huaihe River, where the terrain is relatively flat. Combined with the boom of building small reservoirs throughout the country during the Great Leap Forward period, the practice of "one piece of land for one piece of sky" even appeared in the Huaibei Plain, resulting in many reservoirs and a large total area in the Huaihe River. Compared with the storage accumulation hotspots shown in Fig. 8c, we found that large reservoirs are mostly localized in the upper reaches of the Yangtze River and the Pearl River. It is mainly because Chinese reservoir construction entered the era of a big hydropower project in the 21st century. With the construction of Xiaolangdi Reservoir, Three Gorges Reservoir, and other large hydropower stations as examples, China has built a series of large reservoirs in the southwest of China, where there are large elevation drops and abundant stream powers, such as the Jinsha River (the upper reaches of the Yangtze River), the upper reaches of the Pearl River, and the upper reaches of the Lancang River.

Data availability
The China Reservoir Database (CRD) is publicly available for download from a Zenodo repository https://doi.org/10.5281/zenodo.6984619 (Song et al.,Figure 10. Comparisons between GRanD v1.3, GeoDAR v1.2, GOODD, and the CRD in selected regions of China. Distribution of the large reservoirs (storage capacity larger than 3 ×10 6 m 3 ) in the upper reaches of Yangtze River (a). Distribution of reservoirs in GRanD v1.3, GeoDAR v1.2, GOODD, and the CRD in a 10-level sub-basin of Poyang Lake (b-d). Bright green triangles, orange squares, dark green diamonds, and red dots represent GRanD v1.3, GeoDAR v1.2, GOODD, and the CRD, respectively. Background image source: Esri imagery base map. 2022). The database is supplied in both shapefile format and the comma-separated values (csv) format.

Conclusions
In this study, the location information of a total of 97 435 reservoirs in China has been identified and collected in the China Reservoir Dataset (CRD) by compiling multiple existing dam/reservoir products, national basic geographic datasets, multi-source open map data, and multi-level government yearbooks and databases. Then, by merging three remote sensing waterbody products, the maximum water inundation area was extracted for each of the identified reservoirs. Based on a collection of 5143 reservoirs with official storage capacity records, an empirical model fitting the reservoir area-storage relationship was established to estimate the storage capacities of other unrecorded reservoirs in the CRD. The compiled reservoirs in the CRD have a total maximum inundation area of 50 085.21 km 2 and a total storage capacity of about 979.62 km 3 (924.96-1060.59 km 3 ).
Based on the CRD database, the spatial distribution characteristics of reservoir count, area, and storage capacity were comprehensively analyzed and compared. In addition, we discussed the major updates of the CRD over Chinese territory compared with other commonly used global dam/reservoir databases and the potential causes of several hotspots of the reservoir concentration in the context of China's socioeconomic development and major policy implementations. The results show that reservoirs are widely distributed across China, yet there are strong spatial heterogeneities with several concentration hotspots. The Yangtze River basin has the most dominant distribution in terms of reservoir count, area, and storage capacity. Specifically, the reservoirs are mainly concentrated in the basins of Dongting Lake, Poyang Lake, and the Han River, the middle and lower reaches of the Huaihe River and the Yangtze River, the Shandong Peninsula, the Sichuan Basin, and the Yunnan-Guizhou Plateau. The CRD database has greatly improved the reservoir mapping in terms of count, area, and storage capacity compared with existing dam/reservoir products over the territorial area of China. The prominent advantage of the CRD could be a complete map of reservoirs smaller than 1 km 2 . The CRD database can be used for a wide range of reservoir impact assessments and is expected to benefit water resources management, river system investigation, hydrological modeling, and other aspects in scientific research and sector practices. Review statement. This paper was edited by Hanqin Tian and reviewed by Dai Yamazaki and one anonymous referee.