Dataset of Georeferenced Dams in South America (DDSA)

Dams and their reservoirs generate major impacts on society and the environment. In general, its relevance relies on facilitating the management of water resources for anthropogenic purposes. However, dams could also generate many potential adverse impacts related to safety, ecology or biodiversity. These factors, as well as the additional effects that climate change could cause in these infrastructures and their surrounding environment, highlight the importance of dams and the necessity for their continuous monitoring and study. There are several studies examining dams both at regional and global scales; however, those that include the South America region focus mainly on the most renowned basins (primarily the Amazon basin), most likely due to the lack of records on the rest of the basins of the region. For this reason, a consistent database of georeferenced dams located in South America is presented: Dataset of Georeferenced Dams in South America (DDSA). It contains 1010 entries of dams with a combined reservoir volume of 1017 km3, and it is presented in the form of a list describing a total of 24 attributes that include the dams’ names, characteristics, purposes and georeferenced locations. Also, hydrological information on the dams’ catchments is also included: catchment area, mean precipitation, mean near-surface temperature, mean potential evapotranspiration, mean runoff, catchment population, catchment equipped area for irrigation, aridity index, residence time and degree of regulation. Information was obtained from public records, governments records, existing international databases and extensive internet research. Each register was validated individually and geolocated using public-access online map browsers, and then, hydrological and additional information was derived from a hydrological model computed using the HydroSHEDS (Hydrological data and maps based on SHuttle Elevation Derivatives at multiple Scales) dataset. With this database, we expect to contribute to the development of new research in this region. The database is publicly available at https://doi.org/10.5281/zenodo.4315647 (Paredes-Beltran et al., 2020).

from currently published databases, and second, records available about dams, reservoirs and water resources, from govern-95 ments and other official sources. In the first case, we used two well-known open access databases of dams and reservoirs: The GRanD (http://globaldamwatch.org/grand/, last access: 23 May 2020) database and the AQUASTAT (http://www.fao.org/aquastat/es/databases/dams/, last access: 23 May 2020) database. It should be noted that although the WORLD REGISTER OF DAMS published by ICOLD is often considered to be the most comprehensive database on dams and reservoirs, is not open access nor has georeferenced entries, which ultimately led to discard it from this study. In the second case, it was noted that 100 many governments keep up-to-date and comprehensive records of their water resources including dams and reservoirs. However, there were cases in which official information is not available. Table 1 details the public sources from which most of the information was obtained for each of the countries.

Geolocation of entries
To geolocate each dam record we used public access online map browsers such as: Google Earth 105 (https://earth.google.com/web/, last access: 23 May 2020), Bing Maps (https://www.bing.com/maps, last access: 23 May 2020) and Open Street Maps (https://www.openstreetmap.org/#map, last access: 23 May 2020). Although these browsers do not provide us with the analytical capabilities of Geographic Information Systems (GIS) files and programs, these products are operative when visually searching for geographic locations and landmarks, as well as providing data that is often up to date.
The coordinates in this database are described in decimal degrees using the WGS84 reference coordinate system. 110

Hydrosheds
To perform the analysis of the dam catchments, the HydroSHEDS (Hydrological data and maps based on SHuttle Elevation Derivatives at multiple Scales) dataset was used. This product allows users access to consistent hydrographic information on a regional scale at a resolution of 15 arc seconds and was derived primarily from the Shuttle Radar Topography Mission (SRTM). The dataset information was obtained from the public site (https://www.hydrosheds.org/downloads, last access: 23 115 May 2020) in raster format and for this project we utilized 3 layers: void-free elevation, drainage direction and flow accumulation.

Climatic Research Unit (CRU TS 4.03) time-series dataset
Mean monthly temperature, precipitation and potential evapotranspiration data was derived from the Climatic Research Unit (CRU TS 4.03) time-series dataset which is hosted by the UK's National Center for Atmospheric Science (NCAS) and pro-120 duced by the University of East Anglia's Climatic Research Unit (CRU). The data in this dataset is provided in a resolution of 0.5 degrees by 0.5 degrees grid and completely covers the South America continent from 1901 to 2018. This product is derived from periodic interpolation of data from a network of meteorological stations. For this database we used the current 4.03 version, which is provided by the Center for Environmental Data Analysis (CEDA) website (https://crudata.uea.ac.uk/cru/data/hrg/#current, last access: 23 May 2020), in a NetCDF format. 125 https://doi.org/10.5194/essd-2020-188

University of New Hampshire Global Runoff Data Centre (GRDC) composite runoff field
Runoff data was obtained from the University of New Hampshire Global Runoff Data Center (GRDC) composite runoff field, which is a product developed by the Water Systems Analysis Group (WSAG) at CSRC at the University of New Hampshire (UNH). The GRDC is a dataset that combines observed river discharge information with climate-driven water balance models in order to develop consistent composite runoff fields. The method applied in this product uses selected gauging stations data 130 archives to a simulated topological network and compares them with outputs from water balance model (WBM) simulation performed by the authors. The data in this dataset is provided in a resolution of 0.5 degrees by 0.5 degrees in ASCII formats and was obtained from the product public site (http://www.compositerunoff.sr.unh.edu/, last access: 23 May 2020).

Population data from the Global Rural-urban Mapping Project (GRUMP)
The estimated population data for each of the dams catchment was derived from the Global Rural-Urban Mapping Project 135 (GRUMP) provided by the Socioeconomic Data and Applications Center (SEDAC), which offers different georeferenced population datasets at continental, regional and national scales. The data was obtained from the products public site (https://sedac.ciesin.columbia.edu/data/collection/grump-v1, last access: 23 May 2020) in ASCII format in a 30 arc second resolution.

Equipped Area for Irrigation from the Global Map of Irrigated Area dataset 140
The equipped areas for irrigation for each dam catchment was extracted from the Global Map of Irrigated Areas dataset provided by the Food and Agriculture Organization of the United Nations and it was developed by combining sub-national irrigation statistics with geospatial information. This dataset is presented in a resolution of 0.5 degrees and it is presented in ASCII-grid formats. The files were obtained from the products public site (http://www.fao.org/aquastat/en/geospatial-information/global-maps-irrigated-areas/, last access: 23 May 2020). 145

Dams and reservoirs characteristics
After an extensive review of available information about dams and reservoirs in South America, we determined that georeferenced information about dams in this continent is limited. It is one of the main reasons why we aimed to develop a new database that includes all the current consistent information available. This database is the result of the compilation and treatment of 150 information available from existing public documents from national governments, existing open databases and other web sources. For this, we proceeded in three stages: first, we collected all the available published information on dams and reservoirs; second, we compared and validated this data with the existing information available from local and national governments; and finally, we determined the geolocation of each point. This information has been processed and we carried out an https://doi.org/10.5194/essd-2020-188 First, we researched for the most relevant databases of dams and reservoirs available and found three consistent results: The World Register of Dams from ICOLD, the GRanD database and the AQUASTAT database of dams. After the initial inspection, we discarded the ICOLD database because even though it is widely considered as the largest database on dams with over 57,985 entries worldwide and 1,922 dam entries in South America, it is not georeferenced nor it is an open-access database, 160 which limits later validation of our results. Then, we inspected the AQUASTAT database (which has not been updated since 2015) and collects detailed information of more than 14,000 dams; nonetheless, in the case of South America the list consists of 1,964 dams of which only 344 entries are georeferenced. Finally, we examined the GRanD database which presents 7,320 entries, however, only 343 of those entries correspond to South America.
Once initial information was collected from open-access databases to assemble our preliminary list, we examined public rec-165 ords available from local and national governments in each country. We compiled them in order to compare this data with our preliminary list, data collected from governments and other public sources is available in different formats and in most cases required different types of approximation and treatment to obtain results. Each dam record was compared individually and in the case of correspondence it was accepted, in the case of countries where we did not find available public reports, we compared and verified our preliminary records with information available on the internet, emphasizing on dams with a reservoir capacity 170 greater than one cubic hectometre, although some records with smaller volume of reservoir were included as these were able to be verified in a reliable manner. Finally, a supplementary search in the internet was performed in order to exclude gaps, mismatches or errors.
Finally, the geolocation of the dams was assessed individually for each record. First, we verified and corrected the data of the preliminary list and then we carried out a second geolocation assessment for our final database using public access online map 175 browsers like Google maps, Bing maps and Open Street maps. In most cases, it was necessary to carry out extensive examinations for each dam since there were cases in which the names of the dams were not sufficient reference to locate them, thus, it was necessary to use additional references such as the nearby cities or villages, the reservoirs names, rivers names, or secondary or alternative names of the dams.

Hydrological information of the reservoir's catchments 180
Hydrological information for each dam is also provided in this database: estimated catchment area, mean monthly precipitation, mean monthly near-surface temperature and mean monthly potential evapotranspiration from the Climatic Research Unit computed by applying the "D8" algorithm. Second, the ridge cells between catchments were identified to delineate them.
Finally, the catchment areas were calculated by counting the contributing above cells to each dam.
Surface climate variables are commonly used inputs in studies like agriculture, ecology and biodiversity. For this reason, precipitation, near-surface temperature and potential evapotranspiration mean monthly values from 1901 to 2018 are included 190 for each dam catchment in this database. This data was derived from the Climatic Research Unit (CRU TS 4.03) time-series dataset (Harris et al., 2014) which is a commonly used high-resolution gridded set and has been compared favourably with other climatic datasets (Beck et al., 2017;Jacob et al., 2007). First, datasets for each variable were downloaded in netCDF formats for monthly periods from 1901 to 2018. Then, these files were converted, resampled and aligned into raster formats in order to match the dam's catchment model. Finally, a statistical analysis for each variable was calculated in order to derive 195 mean monthly values of precipitation, near-surface temperature and potential evapotranspiration for each dam catchment.
A basic requirement in the assessment of water resource systems is monthly runoff data. For this, the mean monthly runoff data for each dam was also included using the University of New Hampshire and Global Runoff Data Centre (UNH/GRDC) Composite Runoff field v1.0 (Fekete et al., 2002) that is often regarded as the best available runoff dataset for large scale models (Gonzàlez-Zeas et al., 2012;Lv et al., 2018), which combines river discharge measurements and water balance models 200 and provides gridded high-resolution annual and monthly mean runoff series. The runoff dataset for South America was downloaded from the data product site in ASCII-grid formats, then, the file was converted, resampled and aligned in order to match the dam's catchment model. Finally, a statistical analysis was applied in order to derive mean monthly runoff data for each dam catchment.

Additional information 205
Additional information for each dam is also provided: population within each dam catchment, estimated from the Global Ruralurban Mapping Project (GRUMP) and irrigation area within each dam catchment, based on the Global Map of Irrigated Area dataset.
Demographic data is usually a necessary input for studies that include urban or rural information on water resources assessments. Population for each dam catchment is included on the database and was derived from the Global Rural-urban Mapping is based on polygons defined by the extent of the night-time light imagery and approximated urban extents from ground-based settlement points. Population data was downloaded from the data product site in ASCII-grid formats. The file was converted, resampled and aligned in order to match the dam's catchment model, and a statistical analysis was applied in order to account 215 the corresponding population for each dam catchment.
Finally, hydrological data for irrigation is also included. Equiped area for irrigation for each dam catchment was obtained from the Global Map of Irrigation Areas dataset (Siebert et al., 2005) which is a global scale dataset of irrigated areas based on cartographic information and FAO statistics and is often used to provide valuable information about irrigation in hydrological https://doi.org/10.5194/essd-2020-188  (Wisser et al., 2008). Equipped area for irrigation data was downloaded from the data product site in ASCII-grid 220 formats, then, the file was converted, resampled and aligned in order to match the dam's catchment model. A statistical analysis was performed to account the equipped area for irrigation for each dam catchment.

Dams and Reservoirs
Once the review, refinement and processing of the data concluded, a total of 1,010 dam entries were accepted for our database, 225 this represents a noticeable progress in the identification and geolocation of dams in the region and thus, enables the opportunity for new research that allows a more precise understanding of the water resources systems in the region. After a comparison with other databases, 376 entries were similar to the AQUASTAT and GrAND databases; however, they were included in our database since the 1,010 entries were inspected and verified following the same procedure described in previous sections.
Additionally, this database increases dam entries not only as a total regional number but also increases the number of entries 230 per country, which means that with this database we also expect to contribute for new research in study areas that have not been considered to date due to the absence of reliable information. Table 2 details the entries in our database for each country considered in this study, including a comparison with the AQUASTAT and GrAND databases. Table 3 describes the 24 variables processed and accepted for this database. The estimated total reservoir volume of this database is 1,017 cubic kilometres and the largest reservoir belongs to the "Guri" dam in Venezuela with an estimated volume 235 of 135 cubic kilometres.

Hydrological Information
The model derived from the Hydrosheds dataset allowed us to determine the catchment areas of this database, which were necessary to carry out the subsequent hydrological calculations. The accumulated area of the dam's catchments is approximately 14'855,192 square kilometres with an average catchment of 18,385 square kilometres, the largest catchment belongs 240 to "Jirau" dam in Brazil with an estimated area of 962,732 square kilometres. Table 4 describes the variables processed for the hydrological information and included in this database. Our results highlight the great influence and importance of the Amazon rainforest in the continent since most of the highest records are observed in this region. In the case of temperature, the highest annual record is located in the catchment of the "Malhada Vermelha" dam in Brazil with a temperature of 27.89 degrees Celsius. In the case of precipitation, the highest annual record is in the catchment of the "Petit Sout" dam in French Guiana 245 with a total of 3,035.74 millimetres per year. The highest potential evapotranspiration record is documented for the catchment of the "Pilões" dam in Brazil with 1,713.32 millimetres per year. Finally, in the case of runoff, the highest annual recorded value is located in the catchment of the "Billings" dam, located in Brazil with 2,961.70 millimetres per year. On the other hand, the lowest records are observed mostly in the southern part of the continent in the Andes mountains, with the lowest temperature being recorded in July in the catchment of the "El Yeso" dam in Chile with -3.36 degrees Celsius, the lowest annual 250 https://doi.org/10.5194/essd-2020-188   Table 4 describes the variables processed for the additional information provided in this database. Both, in the case of popula-255 tion and equipped areas for irrigation, the highest values belong to "Yacyreta" catchment dam with more than 55 million people in the case of population and more than 930,000 square kilometres of equipped areas for irrigation. Figure 3 depicts the values for population an equipped area for irrigation for all catchments in the database.

Data limitations
The information provided in this database cannot be considered error free since it has been prepared using the information 260 available at the time of its elaboration. It should also be noted that although our database was created independently, through an individual investigation and based primarily on reports and documents available from each of the countries in the region, the database may include attributes of dams that are also reported by other existing dam databases such as ICOLD, AQ-UASTAT and GRaND.

Summary
The database of georeferenced dams in South America (DDSA) has been developed to contribute to the improvement of water resources management in the region. The provision of reliable, high-resolution available data on dams and reservoirs will 270 contribute in the assessment of freshwater ecosystems and communities both for present and future scenarios in this region, which to this date, have been restricted to a limited number of catchments due to the absence of available information, and thus, contributing to generate more informed decision-making processes in order to safeguard the future sustainability of the communities in this region.
The 1,010 entries of dams present a total of 24 attributes. Each record has been included in the list after an individual review 275 and its position has been determined considering public digital terrain models. In addition, the database also provides mean monthly hydrological information. With this increased spatial coverage and attributes information, this database could be used https://doi.org/10.5194/essd-2020-188 as a baseline for further studies that address relevant issues regarding hydrology, ecology and people in the region. Also, with the inclusion of data for all the countries in the continent we also expect to contribute to an in-depth comprehension on the hydrological and environmental dynamics for the entire continent, and encouraging the generation of knowledge in areas that 280 have not been considered in past studies.
Finally, the data presented in this databased is largely based on open-access information available to date, therefore, it is necessary to expect for further contributions and monitoring in order to provide new data inputs and updates that may keep this database relevant to the public.