A Harmonized Dataset for Dams and Reservoirs in West Africa
Abstract. Most existing datasets that could support dam and reservoir management and assessments of their impacts in West Africa are limited by inaccurate georeferencing, inconsistent accessibility, heterogeneous data records, and a lack of validation against field observations. In this study, we review and assess existing datasets containing information on dams and reservoirs in West Africa and subsequently integrate them into a harmonized and consolidated regional dataset. We benchmarked the quality of the newly compiled dataset at watershed scale through an extended field study, and statistical analyses. The resulting dataset (https://doi.org/10.60507/FK2/YLDK1Y) includes 1,429 georeferenced dams and 1,258 reservoirs (with a minimum surface of 0.57 × 10-3 km2) exceeding the count of dams and reservoirs in West Africa reported by any available dataset. It contains 38 attributes and an estimated total reservoir surface area of 14,038 km2 and a cumulative storage capacity of 283,032 million cubic meter (MCM), thereby enhancing data accessibility in West Africa. The regional compiled dataset contains fewer missing entries and exhibits lower bias compared to the originate datasets, advancing the existing efforts by explicitly integrating both large- and small-scale reservoirs. The ground-based watershed scale assessment revealed strong spatial and temporal coherence for large scale reservoirs, but a systematic underrepresentation of small scale infrastructure in both the sources and thus also in the compiled dataset highlighting the importance of field validation. The field benchmarking advocates for collaborative research and data sharing initiatives among scientists and institutions across West Africa to improve the accuracy and completeness of dam and reservoir data, especially for small scale infrastructure.
Overall, the manuscript is well written and pleasant to read. The work is extremely interesting in a region like West Africa, where there is relatively little documentation, as it involves compiling an inventory of existing databases on the locations and characteristics of dams and reservoirs, and attempting to validate this information in the field to verify that the databases are consistent. The results could be very useful for the hydrology community, particularly for feeding into regional hydrological, taking into account water storage and irrigation processes. The database is already available online with a corresponding DOI.
One thing I'm not quite clear on is what criteria determine whether or not a database is included in the work. Do you simply use all existing data, or is there a preliminary filtering process? Similarly, I think the criteria for what is considered a large or small dam should be better defined. For example, what is the minimum reservoir size included in this new database? This is important because there is indeed a myriad of very small reservoirs, and I think it’s a real challenge to try to inventory them all. On the other hand, it’s important for database users to know what types and sizes of dams it contains. One point I think is worth elaborating on a bit more is this validation. It’s clear that the database was validated using data from a single watershed. But I think there is a need to provide the reader with a bit more information to assess the reliability of this database, particularly in different hydroclimatic contexts. I fully understand that this type of field validation is a major work that requires significant effort at the local level and, of course, cannot be carried out on a regional scale. However, we need to provide context to help verify whether this validation is representative at the regional level.
Specific comments:
Line 105: It would be helpful to explain why this basin was chosen for validation. Is it the only one with data, or is it representative of the area? In particular, we need to discuss how validation at a single site could be representative of the entire region, given that there are four very distinct climate zones.
Line 141. I don't quite understand this part. For example, the coordinates of the reservoirs were estimated using a statistical approach. That seems rather inappropriate to me. A reservoir can be observed, for example in satellite images, and its location is precise—not the result of an average estimated from a statistical distribution.
Line 286. I don't think this collection exceeds the number in previous studies; as its written it actually ranges from 1,415 to 1,429. So not a large difference. It would be interesting to state here which dataset reports 1415 dams, and the % in common with the current database. And it is not the same number reported line 488 (1141 records). Please revise.
Figure 6: I see that the size of the catchment area is missing in many cases. However, it seems to me that this information is very easy to estimate: once you have the precise locations of the dams, you can delineate the catchment area using a digital elevation model (DEM).
Line 310. It should also be noted that these are climatologically very different regions, with a gradient of aridity from south to north.
Figure 12 can be misleading. At first glance, I thought it was a comparison of the same dams—specifically, what was in the databases versus what was observed—but upon reading the text, one realizes that these are two different databases: on the one hand, what is observed in the field, and on the other hand, what is in the databases. Because otherwise, it would be very surprising that information such as the year of construction or the height of the dam—which are actually fairly easy to obtain—would differ so much. So I think the figure’s caption should be rewritten slightly to explain that these are in fact two different sets of data.
Line 415: I think this paragraph should be expanded slightly to explain the link between the spatial distribution of dams and the climate. One might imagine that dams are unnecessary in humid regions but necessary in semi-arid regions. However, this is not explicitly discussed in the manuscript. However, given the number of dams analyzed, I believe this type of analysis would be very interesting.
Line 496: I think we should clarify here what is meant by a “small reservoir.” Are we trying to detect pools a few meters in size? What is the average size? And, in fact, I feel that this definition is somewhat lacking in the manuscript; perhaps it should be clarified= what is the minimum dam size considered in the database