the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
GeoDAR: georeferenced global dams and reservoirs dataset for bridging attributes and geolocations
Blake A. Walter
Fangfang Yao
Chunqiao Song
Meng Ding
Abu Sayeed Maroof
Jingying Zhu
Chenyu Fan
Jordan M. McAlister
Safat Sikder
Yongwei Sheng
George H. Allen
Jean-François Crétaux
Yoshihide Wada
Download
- Final revised paper (published on 21 Apr 2022)
- Supplement to the final revised paper
- Preprint (discussion started on 24 Mar 2021)
Interactive discussion
Status: closed
-
RC1: 'Comment on essd-2021-58', Anonymous Referee #1, 01 Apr 2021
Wang et al. describe the creation of a new global dam and reservoir dataset. Overall, the paper is well-written and the methods and findings are on the whole very clear. This dataset will be a valuable contribution to the community and will likely be used by scientists across multiple fields.
The manuscript is incredibly detailed – perhaps slightly too much so – but users of the dataset may be grateful for it to be so well-documented. On the one hand, it is commendable that the authors have put so much effort into this dataset and into ensuring the manuscript thoroughly describes the entire process; on the other hand, the manuscript does get so repetitive in places, particularly in the results, that it is challenging for the reader to digest. It is up to the authors’ judgement as to whether or not to shorten the paper, but I would suggest in general thinking about streamlining the paper in places and shortening a few of the most repetitive sections.
Another more general comment is that this paper uses a lot of jargon (particularly geo-matching and geocoding) which are not clearly defined early in the text. Since most readers will not be familiar with geo-matching or geocoding, I suggest clearly defining these terms at some point early in the paper to explain in better detail what exactly they refer to. The methods are eventually pretty clear, and after reading the paper I now understand what these terms refer to, but a lack of understanding of these terms is perhaps likely to confuse the reader at the beginning of the paper.
As for the dataset itself, it appears complete and is easy to visualize and play around with in GIS. The dam/reservoir attributes included for each polygon as well as the readme are very useful and clear. Clearly a lot of thought and careful work has gone into the dataset and it is very well done.
I have several additional, mostly minor, comments listed below.
Specific Comments:
Line 55: It would be helpful here to define what is meant by “attributes” – i.e. reservoir use, storage capacity, dam completion year, etc
Line 107: Change to something like: “Our preference was the former when possible to optimize the georeferencing accuracy”
Line 109: Change “with” to “by”
Line 117-118: The sentence “We acknowledge that although we tried …” is a bit vague. I suggest adding a bit more detail about the challenges associated with duplicate record removal, making the sentence something like: “We acknowledge that owing to the challenge of XXXX, our duplicate removal is not perfect and may have misidentified or missed some duplicate dams”
Line 135: It is unclear to me what you mean by “build associated between new dams supplemented by GRanD and the WRD records”
Figure 2, particularly Figure 2a, is a little hard to interpret. I understand what you are going for here but I find it hard from looking at these Venn diagrams to determine which of these circles represent what is actually included in the GeoDAR datasets. I suggest revising, adding additional description or considering whether this figure is necessary.
Lines 220-231: Can you provide more details about how the QC/QA was performed? How did this analysis lead to determining that 3% of the matched results were matching errors?
Line 247: The phrase “the forward geocoding input the text address of each dam” does not make sense (and is not grammatically correct)
Line 247: Reading this paragraph, I was a bit confused by what is meant by “text address of each dam” and how this can be used to query the longitude and latitude. While this is explained in greater detail and is much clearer in the following paragraph, it made this first paragraph hard to follow. I suggest perhaps reorganizing this section (i.e. putting some specifics from the following paragraph into this paragraph) and/or providing a short primer, either here or earlier in the text, about “forward” vs. “reverse” geocoding so the reader, who is likely unfamiliar with geocoding, can better understand this rather abstract description.
Line 297: Can you clarify, both here and in the above mention of QA/QC (see above comment) whether all reservoirs were manually QCed, or just a subset?
Line 303: This offset for dams in China is interesting and as you probably know likely has to do with China’s GPS shift problem. Perhaps another sentence could be added to explain this in greater detail (i.e. why Google Maps does not work in China like it does in the rest of the world)
Lines 435-440: Was any QA/QC performed for the reservoir/dam matching? What is the likelihood that some dams were incorrectly matched with their reservoir polygons?
Lines 504-505: The sentence starting with “As a result, the average reservoir size decreased…” is a little confusing as I am unsure whether the decrease in mean size is referring to the mean size of reservoirs identified from each data type (i.e. GRanD, HydroLakes) or the mean size of the entire dataset decreasing as the datasets were added in hierarchical order from largest to smallest. I suggest keeping this information in the text but rephrasing to make clearer.
Line 407: Change to the “the retrieved polygons do not always represent the maximum water extents of the reservoirs…”
I don’t think Figure 12 is necessary since it is pretty hard to see differences between the two datasets at the global scale. Figure 13 and Figure 14 are much better illustrations of the differences between the datasets.
I’m not sure the section title “Improved spatial details over GRanD” is the right phrasing. I understand what you actually mean but to me this phrasing implies that you have improved the detail of the spatial attributes of GRanD reservoirs, not actually increased the number and decreased the size of resolved reservoirs. I’m not sure exactly what it should be changed to, but I would consider rephrasing. Similarly, throughout the text, I would suggest rephrasing or coming up with another term to replace “enhancing the spatial detail” since this could mean different things to the reader.
I’m not sure what the takeaway from Figure 16 is intended to be, other than perhaps showing that in general the datasets don’t necessarily agree and include different dams/reservoirs? The color scheme and size of the dots also make it very hard to distinguish against the colored background. I suggest either removing this figure entirely or at the very least changing the color scheme.
The section 3.4.2 (pages 33-41) is very long and detailed, particularly when it can perhaps be summarized in one sentence as something like “GeoDAR contains substantially more small dams/reservoirs than GRanD, and therefore while the total capacity of the reservoirs in GeoDAR and GRanD is similar, the main advantage of GeoDAR is that it is more spatially extensive by including many more small dams and reservoirs.” It is up to your judgement, of course, but I suggest considering shortening this section – much of the info here is perhaps interesting, but it is very repetitive and therefore hard to read in places.
The conclusion is really well-written and does a very good job of highlighting the advantages and novelty of this dataset. It is a bit repetitive but I think it is appropriate here because it is more of a summary than a conclusion (and that works for this sort of paper).
Citation: https://doi.org/10.5194/essd-2021-58-RC1 -
AC1: 'Reply on RC1', Jida Wang, 03 Apr 2021
We are very grateful for this reviewer's poisitve comments on the value of our dataset and very constructive suggestions to improve the readiability and clarity of our manuscript. Thank you. We will start to incorporate these comments in the coming weeks, and will later submit a detailed revision response (together with the responses to the second reviewer as well).
Citation: https://doi.org/10.5194/essd-2021-58-AC1 -
AC3: 'Reply on RC1', Jida Wang, 13 Oct 2021
We sincerely appreciate Reviewer 1 for his/her encouraging and constructive comments. These comments helped us clarify the merits and limitations of our dataset, and improve the structure and readability of our paper. Before our point-by-point responses, we provide a list that summarizes the major changes:
- We reorganized part of the manuscript to better streamline the methods and results. The revised “Methods” section starts with a definition and method overview, followed by the subsections elaborating each of the primary methods. The previous lengthy “Results and discussions” section has been broken into several stand-alone sections, including “Production components and usage”, “Validation”, and “Comparisons with existing global datasets”.
- Both reviewers indicated that our methods and results are overwhelmingly detailed. To improve the readability, we have relocated some of the technical triviality to Supplementary Materials and have reduced the redundancy as much as possible. However, we kept a certain amount of detail that we deemed important. Since this is a data description paper, our rationale is to ensure that we have conveyed the principles as such readers understand how our dataset differs from the existing ones and may potentially replicate and improve our dataset.
Our revision also includes several data improvements not requested by the reviewers:
- We have redone the geo-matching for the US by applying the newest version of the US National Inventory of Dams (version 2018, consisting of ~90K records).
- We improved our scripts to better handle the consistency between the names of states/provinces in ICOLD and those in regional inventories and Google Maps.
- We have repeated part of the QC to detect and correct more geocoding errors (such as misplacements in China and omissions in India).
- When we were harmonizing GeoDAR v1.0 with GRanD, we identified about 70 records in GRanD with possible georeferencing errors. These records were excluded from the revised harmonization. We released these problematic GRanD records, as well as our suggested corrections, in the Supplementary Materials for user convenience.
- Meanwhile, we took a deeper stab at building the linkage between WRD and GRanD. These above-mentioned improvements ended up expanding the total number of dams and reservoirs in our revised product by about 1300.
- We also expanded the validation sample from the previous ~980 dam points to now more than 1400 dam points. The accuracy turned out to be overall consistent.
Author's response to reviewer comments
Anonymous Referee #1
Wang et al. describe the creation of a new global dam and reservoir dataset. Overall, the paper is well-written and the methods and findings are on the whole very clear. This dataset will be a valuable contribution to the community and will likely be used by scientists across multiple fields.
Response: We appreciate the reviewer’s encouraging comments and the recognition of the potential value of our dataset.
The manuscript is incredibly detailed – perhaps slightly too much so – but users of the dataset may be grateful for it to be so well-documented. On the one hand, it is commendable that the authors have put so much effort into this dataset and into ensuring the manuscript thoroughly describes the entire process; on the other hand, the manuscript does get so repetitive in places, particularly in the results, that it is challenging for the reader to digest. It is up to the authors’ judgement as to whether or not to shorten the paper, but I would suggest in general thinking about streamlining the paper in places and shortening a few of the most repetitive sections.
Response: We thank the reviewer for this constructive comment. Since this is a data description paper, we documented the methods and results in substantial detail, hoping that any user will not only understand the dataset but also be able to replicate or improve the production. On the other hand, we agree with the reviewer that some reorganization of the contents is needed to reduce repetitiveness and improve clarity. In brief, this is what we have done:
- We have simplified the Methods section by (1) starting with “Definition and overview” followed by several subsections that elaborate the principles of primary procedures, and (2) relocating some of the technical triviality for each subsection into Supplementary Materials. This way, users will have a clear sense of how we streamlined the methods without being too overwhelmed by the technical details.
- We have broken the previous lengthy “Results and discussions” into several stand-alone sections, including “Production components and usage”, “Validation”, and then “Comparisons with existing global datasets”. This way, the main deliverables appear more organized, and meanwhile we felt less pressured to have to cut the details we deem necessary.
- We reduced the redundancy as much as possible in the section “Comparisons with existing global datasets”. However, we still kept a substantial amount of detail that we considered important. Since this paper is nothing but data description, our rationale is that providing a well-rounded, comprehensive comparison between our product and other existing datasets will greatly benefit the user when he/she is debating on which one to use.
- We relocated some of the discussions about the applications of our dataset to the conclusion section (now entitled “Summary and applications”).
Another more general comment is that this paper uses a lot of jargon (particularly geo-matching and geocoding) which are not clearly defined early in the text. Since most readers will not be familiar with geo-matching or geocoding, I suggest clearly defining these terms at some point early in the paper to explain in better detail what exactly they refer to. The methods are eventually pretty clear, and after reading the paper I now understand what these terms refer to, but a lack of understanding of these terms is perhaps likely to confuse the reader at the beginning of the paper.
Response: We thank the reviewer for this suggestion. We have considered two options in the revision. One option is to add a section that exclusively defines all jargons in our paper. However, such a section may look too mechanical; particularly, the definitions may appear disconnected from the context. The other option is to introduce each jargon as early as possible when the method develops; nevertheless, this may also obscure some of the terms as the reader may need to search the entire method section for one definition.
As a compromise of both options, we have decided to do the following:
- We merged the previous “Georeferencing rationale” and “Method overview” sections into one section “Definitions and overview” (Section 2.1). This way, readers are clearly informed this is the section where important definitions (such as geo-matching and geocoding) are introduced.
- In this merged section, we provided a concise method overview. This overview streamlined the key procedures along with important definitions (when possible), but excluded technical details (which were elaborated in the follow-up Method sections). When a jargon cannot be fully explained in the overview section, it was explained later but as early as possible. For the latter situation, we tried to provide sufficient context so the introduction of any jargon will not appear too abrupt or confusing.
- For the convenience of the reader, we tried to be as diligent as possible in cross-referring the same jargon to the occurrences in different sections.
We invite the reviewer to look at our revised method section.
As for the dataset itself, it appears complete and is easy to visualize and play around with in GIS. The dam/reservoir attributes included for each polygon as well as the readme are very useful and clear. Clearly a lot of thought and careful work has gone into the dataset and it is very well done.
Response: We sincerely appreciate this encouraging comment.
I have several additional, mostly minor, comments listed below.
Specific Comments:
Line 55: It would be helpful here to define what is meant by “attributes” – i.e. reservoir use, storage capacity, dam completion year, etc
Response: As suggested, we have added some explanation to define “attributes”:
“… ICOLD WRD provides more than 40 attributes (e.g., reservoir storage capacity, dam height, and reservoir purpose)”.
Line 107: Change to something like: “Our preference was the former when possible to optimize the georeferencing accuracy”
Response: Thank you. This sentence has been changed as suggested.
Line 109: Change “with” to “by”
Response: Thank you and this has been changed as suggested.
Line 117-118: The sentence “We acknowledge that although we tried …” is a bit vague. I suggest adding a bit more detail about the challenges associated with duplicate record removal, making the sentence something like: “We acknowledge that owing to the challenge of XXXX, our duplicate removal is not perfect and may have misidentified or missed some duplicate dams”
Response: Thank you. As suggested, we have revised this sentence to:
“We acknowledge that owing to the challenges of lacking explicit spatial information and occasional attribute errors in WRD, our duplicate removal is not perfect and may have misidentified or missed some duplicate dams.”
Line 135: It is unclear to me what you mean by “build associated between new dams supplemented by GRanD and the WRD records”
Response: We are sorry about this confusing statement. For improved clarity, we have revised it to:
“The harmonization aimed at merging both datasets, removing duplicates between them, and when possible, associating each new dam supplemented by GRanD with the corresponding WRD record.”
Figure 2, particularly Figure 2a, is a little hard to interpret. I understand what you are going for here but I find it hard from looking at these Venn diagrams to determine which of these circles represent what is actually included in the GeoDAR datasets. I suggest revising, adding additional description or considering whether this figure is necessary.
Response: Thank you for this suggestion. For improved clarity, we have provided the following explanation in the figure caption:
“Boxes indicate final subsets in each GeoDAR version, and the arrows point to the georeferencing sources or methods. Topology of the shapes illustrates logical relations among the data/methods, but sizes of the shape were not drawn to scale of the data volume.”
The reasons that we would like to keep these Venn diagrams are: the dams from some of these data sources or methods (circles) overlap with each other, so we believe using the Venn diagrams is perhaps the most visually effective way for readers to understand their relationships and how they contribute to each of the final components (boxes) in our dataset. Yes, I agree that these Venn diagrams can be a little tricky to interpret (although we have tried to optimize the design and clarity), but I think they offer more benefits than confusion and the readers can also refer to Table 1 for more clarification. In general, I hope the reviewer find this figure, now with our expanded caption, useful and clearer.
Lines 220-231: Can you provide more details about how the QC/QA was performed? How did this analysis lead to determining that 3% of the matched results were matching errors?
Response: Thank you for this question. In brief, we treated quality assurance (QA) and quality control (QC) as two separate processes. QA was firstly performed as an automated filter, followed by QC where we manually verified if the result was indeed accurate. Taking the geo-matching process for example, QA ranked all geo-matching results to several QA levels according to the quality of attribute agreements (see agreement scenarios in Supplementary Table S1). If a WRD record was matched to more than one register records, QA selected the match with the best rank. For any match that did not meet the minimum rank, QA filtered it out of the result. This way, each georeferenced WRD record was only matched to the best-ranking register record. The same principle applied to the QA process for geocoding except that the source for geocoding was Google Maps rather than regional registers (Supplementary Table S3; also see Section 2.3). For more technical details about QA, users can refer to our Python scripts for geo-matching and geocoding at https://github.com/jida-wang/georeferencing-ICOLD-dams-and-reservoirs.
QC aimed at manually reassuring the quality of the georeferencing results after the automated QA. To do so, we went through each georeferenced WRD record to examine whether its attributes (such as dam/reservoir name, administrative locations, river name, and for geo-matching, construction year and storage capacity if possible) generally agreed with those of the georeferencing source (i.e., regional record for geo-matching and Google Maps for geocoding). If any georeferenced WRD record showed a major discrepancy with the source, this record was considered to be erroneously georeferenced and thus removed from the final result. Our manual QC removed ~4% error from the QA’ed geo-matching result and ~42% error from the geocoding result.
To address the reviewer’s suggestion, we have restructured the last two paragraphs of Section 2.2 to reflect the explanation above and to improve the clarity of how QA and QC were performed separately.
Line 247: The phrase “the forward geocoding input the text address of each dam” does not make sense (and is not grammatically correct)
Response: We are sorry about this confusion. In our original sentence, “input” was a verb in the past tense. For improved clarity, we have modified this sentence to be:
“The forward geocoding (see Section 2.1 for definition) used the text address of each dam as the input…”
Line 247: Reading this paragraph, I was a bit confused by what is meant by “text address of each dam” and how this can be used to query the longitude and latitude. While this is explained in greater detail and is much clearer in the following paragraph, it made this first paragraph hard to follow. I suggest perhaps reorganizing this section (i.e. putting some specifics from the following paragraph into this paragraph) and/or providing a short primer, either here or earlier in the text, about “forward” vs. “reverse” geocoding so the reader, who is likely unfamiliar with geocoding, can better understand this rather abstract description.
Response: We really appreciate this constructive suggestion. Following the suggestion, we first clarified the definitions of “forward” and “reverse” geocoding in Section 2.2: “Opposite to regular (or “forward”) geocoding which converts a nominal location to numeric spatial coordinates, this reverse geocoding converted the spatial coordinates of each dam documented in the register, to a parsed address that contains administrative divisions at consecutive levels.”
Then, we reorganized Section 2.3 (originally Section 2.4) by first introducing how the text address was formatted before describing how geocoding was conducted. Following the framework of geocoding, we then described in more detail how the automated QA filtering was performed. We also simplified the original paragraph for text address formatting and located the technical details of it in Supplementary Text and Supplementary Table S1.
We invite the reviewer to see our revised Section 2.3 and the Supplementary Materials.
Line 297: Can you clarify, both here and in the above mention of QA/QC (see above comment) whether all reservoirs were manually QCed, or just a subset?
Response: Yes, we have now clarified in both sections (2.2 for geo-matching and 2.3 for geocoding) that we screened through the entirety of the georeferenced dams as thoroughly as possible during QC.
Line 303: This offset for dams in China is interesting and as you probably know likely has to do with China’s GPS shift problem. Perhaps another sentence could be added to explain this in greater detail (i.e. why Google Maps does not work in China like it does in the rest of the world)
Response: Thank you for this suggestion. Yes, the offset for dams in China was caused by China GPS shift problem, as the reviewer commented. As suggested, we have added a sentence to explain this:
“Due to China’s GPS shift problem, geocoded points in mainland China tended to show a systematic offset of roughly 500 m from their actual dam or reservoir features.”
Lines 435-440: Was any QA/QC performed for the reservoir/dam matching? What is the likelihood that some dams were incorrectly matched with their reservoir polygons?
Response: Thank you for this question. Yes, we browsed through the dam-reservoir pairs and found most matches to be accurate. Problems tended to rise when several reservoirs are close to each other (such as the situation of cascade dams). Since our algorithm has no mechanism to distinguish upstream and downstream drainage positions, this situation may lead to a dam assigned to a downstream (or upstream), larger reservoir. We manually corrected these matching errors if seen. For improved accuracy, we are now going through another round of QC on the retrieved reservoirs and will provide an updated version once the manuscript is officially published. For clarity, we also added a sentence at the end of Section 2.5 (Retrieving reservoir boundaries):
“A manual QC was performed on the combined result to confirm that each retrieved reservoir polygon was matched to the correct dam point, and if not, we tried to adjust the association as thoroughly as possible.”
Lines 504-505: The sentence starting with “As a result, the average reservoir size decreased…” is a little confusing as I am unsure whether the decrease in mean size is referring to the mean size of reservoirs identified from each data type (i.e. GRanD, HydroLakes) or the mean size of the entire dataset decreasing as the datasets were added in hierarchical order from largest to smallest. I suggest keeping this information in the text but rephrasing to make clearer.
Response: Thank you for raising this confusion. The decrease in mean size refers to the mean size of reservoirs identified from each data type. We have clarified this sentence to:
“As a result, the mean reservoir polygon size decreased from 66 km2 for those identified from GRanD, to 2 km2 from HydroLAKES and then less than 1 km2 from the UCLA Circa-2015 Lake Inventory.”
Line 407: Change to the “the retrieved polygons do not always represent the maximum water extents of the reservoirs…”
Response: Thank you and this has been changed as suggested.
I don’t think Figure 12 is necessary since it is pretty hard to see differences between the two datasets at the global scale. Figure 13 and Figure 14 are much better illustrations of the differences between the datasets.
Response: Thank you for this comment. Our rationale of including Figure 12 is that it offers some unique aspects that Figure 13 and Figure 14 do not have.
Different from Figure 14 which aggregated the improvement by country, Figure 12 shows the global distribution of GeoDAR storage capacity dam by dam, thus providing the most spatially-explicit view. We believe we need such a figure to present our global dataset. By comparing such a global distribution with that of GRanD side by side, we aim to convey: (1) GeoDAR improved GRanD in the spatial density of dams, (2) most of the added dams in GeoDAR have relatively small storage capacities, and (3) how the improvement varies across space in the most spatially-explicit way possible.
However, we agree with the reviewer that it is difficult to effectively convey the messages above on a global view. So to supplement Figure 12, we decided to include Figure 13 which blows-up a few hotspot regions. So, if the page limit is not a factor, we prefer to keep both figures.
I’m not sure the section title “Improved spatial details over GRanD” is the right phrasing. I understand what you actually mean but to me this phrasing implies that you have improved the detail of the spatial attributes of GRanD reservoirs, not actually increased the number and decreased the size of resolved reservoirs. I’m not sure exactly what it should be changed to, but I would consider rephrasing. Similarly, throughout the text, I would suggest rephrasing or coming up with another term to replace “enhancing the spatial detail” since this could mean different things to the reader.
Response: We appreciate this comment. We agree this expression may have a different connotation. To avoid confusion, we have rephrased “improved spatial details” to be “improved spatial density” in most of the cases that we found confusing. We believe “improved spatial density” is less ambiguous because it indicates a greater quantity of dams or reservoirs per unit area, which is exactly what we meant to say.
I’m not sure what the takeaway from Figure 16 is intended to be, other than perhaps showing that in general the datasets don’t necessarily agree and include different dams/reservoirs? The color scheme and size of the dots also make it very hard to distinguish against the colored background. I suggest either removing this figure entirely or at the very least changing the color scheme.
Response: Yes, as the reviewer said, the main takeaway of Figure 16 is to illustrate that GeoDAR is not a complete replication of GOODD or GRanD, and with different dams and reservoirs introduced, GeoDAR is able to well complement what GOODD and GRanD have covered in different regions of the world. This figure visualizes such a value of our dataset, as well as a possible benefit of combining all these datasets to achieve a better global coverage.
For this reason, we prefer to keep this figure. But as the reviewer kindly suggested, we have adjusted the color scheme by (1) lowering the brightness of the background satellite images, (2) brighten the colors of the dam points, and (3) increase the size of the dam points.
The section 3.4.2 (pages 33-41) is very long and detailed, particularly when it can perhaps be summarized in one sentence as something like “GeoDAR contains substantially more small dams/reservoirs than GRanD, and therefore while the total capacity of the reservoirs in GeoDAR and GRanD is similar, the main advantage of GeoDAR is that it is more spatially extensive by including many more small dams and reservoirs.” It is up to your judgement, of course, but I suggest considering shortening this section – much of the info here is perhaps interesting, but it is very repetitive and therefore hard to read in places.
Response: We sincerely appreciate this constructive comment and found the summary of this section (now Section 5.2) from the reviewer very accurate and to-the-point. Although we fully agree that the main idea of this section can be summarized to a couple of sentences, we do genuinely believe that a detailed and multifaceted comparison between GeoDAR and GRanD, like what we presented, will offer the users more benefits than obstacles. In case a user finds this amount of detail disorienting, we made sure to start every paragraph with a clear takeaway. This way, a user can easily understand what we are comparing and then decide on whether to read or skip this paragraph depending on the user’s interest. In general, we streamlined this section by the following takeaways:
- GeoDAR introduced a larger quantity of smaller dams despite a limited increase of total storage capacity.
- The improved spatial density of smaller dams is almost ubiquitous across the continents.
- Although GeoDAR’s improvements are widespread, the improvement levels are not spatially uniform.
- Certain regions with a limited increase in dam count show a greater increase in storage, implying that GeoDAR also improved the inventory of large dams.
- Other benefits of GeoDAR include a more compete representation of regulated watersheds and the identification of many smaller reservoir boundaries.
- Finally, GeoDAR improved the quantity of dams for all primary purposes, which will potentially benefit the understanding of reservoir operation rules.
While these takeaways were maintained, we tried to shorten some of the text to reduce unnecessary redundancy. We invite the reviewer to look at the revised Section 5.2.
The conclusion is really well-written and does a very good job of highlighting the advantages and novelty of this dataset. It is a bit repetitive but I think it is appropriate here because it is more of a summary than a conclusion (and that works for this sort of paper).
Response: Thank you very much for this positive comment.
Citation: https://doi.org/10.5194/essd-2021-58-AC3
-
AC1: 'Reply on RC1', Jida Wang, 03 Apr 2021
-
RC2: 'Comment on essd-2021-58', Anonymous Referee #2, 22 Apr 2021
Wang et al., describe a new global georeferenced database of dams based on geo matching attributes from the proprietary ICOLD database with publicly available sources to match attributes to spatial locations of dams.
This database complements existing geo-referenced databases as it a) expands on the number of dams for which more attributes such as reservoir storage are available and b) allows for connecting spatial locations of dams with attributes from the ICOLD database. As such I believe this database is a valuable addition to the growing number of global dam datasets (e.g. see www.globaldamwatch.org).
Whilst this database certainly has merits I think it promises more than it delivers. The authors frame the paper as a significant improvement over other dam databases in that it includes more attributes as there are already other global datasets (e.g. GOODD and GROD) that include more dams but lack such attributes. However, the majority of the paper is focused on identifying the spatial location of dams and improvements in quantity and spatial locations of dams. E.g. line 812 “GeoDAR’s major improvement lies on the quantity or spatial details of the dams”. This framing is understandable given the challenge of linking to a guarded proprietary database. However, this limits the use of the database as only people who have purchased access from ICOLD may be able to connect the attributes with the spatial location of the dams. Whilst this point is made clear in the conclusions it could be made more clear in abstract and introduction and the overall framing of the paper. It might be worth focusing on potential applications of the dataset.
Overall, it is also not entirely clear to me why there are two versions of the dataset released simultaneously. It seems to me that V1.1 supersedes v1.0 in that it includes more dams and associated reservoirs and the harmonising with GranD is just part of the method. The authors in line 929 also refer to V1.1 as “our end product”.
As noted by the authors, (line 967) connecting the dam locations with a hydrographic network would enable research into hydrological implications and ecological connectivity. This would greatly enhance the utility of the dataset.
As also noted by an earlier reviewer, the paper is very detailed and quite repetitive. I think it could be significantly shortened to make it more readable. In particular, the methods section is very detailed. Whilst this may be useful for some readers, the majority of readers will not require such extensive detail and could perhaps be referred to supplementary material if more detail is required. The methods section already includes a lot of the numbers later presented in results and discussion while some validation methods get introduced in the results section so some re-organisation would be required.
Specific comments:
I’m surprised that only about 60% of dams from GRanD were found in GeoDAR considering GRanD dams tend to be the largest and usually well documented dams and as such I would expect their attributes to be easily found.
I was wondering if there could be a potential bias in WRD data since this is a volunteered database? Are there any countries not included because they don’t contribute to ICOLD?
Line 54. “inaccessible” in what sense? I believe many WRD coordinates can be made available at cost. Suggest change to e.g. not freely or publicly available. In particular as the point about public availability is made in line 58 for regional registers.
Line 93. We may decrypt? Perhaps link to more detail provided in sections 4 and 5.
Line 98. How is it possible that about 1/3 of the WRD dams (v1.1) capture a similar total storage capacity as the full WRD inventory of ~60,000 dams. Is this because the remaining ~40k dams in WRD are non-reservoir dams? Please explain this in this section. Also would be good to provide the total storage in WRD here (which is only provided in line 138)
Line 118: “We acknowledge…” I suggest rephrasing this sentence to something like: “Whilst we have made every endeavour to remove duplicates, we acknowledge that some duplicates may remain in the dataset”
Line 138. 7388 km3 in original WRD. Is this the figure for all WRD dams or for the cleaned version of 56,783 dams?
Line 139: I don’t think the Venn diagrams are very clear. Not sure if they are even needed as the text explains the process. A simple flowchart might be easier to understand.
Line 239: “rest parts of the world” is a strange phrase
Line 322: ICOLD storage capacity erroneous. See earlier comment (line 98) on ICOLD reported storage. This can also explain the discrepancy. Note that Mulligan et al (2020) also note erroneous reporting of catchments in ICOLD.
Line 525-562, section 3.2 in results and discussion seems to introduce more methods on validation. This should be moved to methods.
Line 797: I find the term (global) capacity improvement a bit confusing. I guess what is meant is a higher reporting of total dam storage capacity by country or globally which is hardly surprising given that more dams and reservoirs are included.
Technical corrections:
Numbers in some cases use thousand separator (e.g. 7,163 line 238) but not in others. Please be consistent throughout
Line 163. “NID records were accessed”
Line 223 “This led to a conservative success rates”
Line 258: “this process was repeated”
Line 570 “We believed”
Line 700: Mulligan et al (2020)
Line 744: GeoDAR
References: line 39 Doll should be Döll, line 42 Vorosmarty should be: Vörösmarty and there may be others.
Citation: https://doi.org/10.5194/essd-2021-58-RC2 -
AC2: 'Reply on RC2', Jida Wang, 22 Apr 2021
We very much appreciate these detailed and constructive comments from Reviewer 2, which will certainly help us improve the clarity and readability of our data paper. We will try to absorb the reviewer's comments in the revision stage. Thank you again.
Citation: https://doi.org/10.5194/essd-2021-58-AC2 -
AC4: 'Reply on RC2', Jida Wang, 13 Oct 2021
We sincerely appreciate Reviewer 2 for his/her constructive comments. These comments helped us clarify the merits and limitations of our dataset, and improve the structure and readability of our paper.
Before our point-by-point responses, we provide a list that summarizes the major changes:
- We reorganized part of the manuscript to better streamline the methods and results. The revised “Methods” section starts with a definition and method overview, followed by the subsections elaborating each of the primary methods. The previous lengthy “Results and discussions” section has been broken into several stand-alone sections, including “Production components and usage”, “Validation”, and “Comparisons with existing global datasets”.
- Both reviewers indicated that our methods and results are overwhelmingly detailed. To improve the readability, we have relocated some of the technical triviality to Supplementary Materials and have reduced the redundancy as much as possible. However, we kept a certain amount of detail that we deemed important. Since this is a data description paper, our rationale is to ensure that we have conveyed the principles as such readers understand how our dataset differs from the existing ones and may potentially replicate and improve our dataset.
Our revision also includes several data improvements not requested by the reviewers:
- We have redone the geo-matching for the US by applying the newest version of the US National Inventory of Dams (version 2018, consisting of ~90K records).
- We improved our scripts to better handle the consistency between the names of states/provinces in ICOLD and those in regional inventories and Google Maps.
- We have repeated part of the QC to detect and correct more geocoding errors (such as misplacements in China and omissions in India).
- When we were harmonizing GeoDAR v1.0 with GRanD, we identified about 70 records in GRanD with possible georeferencing errors. These records were excluded from the revised harmonization. We released these problematic GRanD records, as well as our suggested corrections, in the Supplementary Materials for user convenience.
- Meanwhile, we took a deeper stab at building the linkage between WRD and GRanD. These above-mentioned improvements ended up expanding the total number of dams and reservoirs in our revised product by about 1300.
- We also expanded the validation sample from the previous ~980 dam points to now more than 1400 dam points. The accuracy turned out to be overall consistent.
Author's response to reviewer comments
Anonymous Referee #2
Wang et al., describe a new global georeferenced database of dams based on geo matching attributes from the proprietary ICOLD database with publicly available sources to match attributes to spatial locations of dams.
This database complements existing geo-referenced databases as it a) expands on the number of dams for which more attributes such as reservoir storage are available and b) allows for connecting spatial locations of dams with attributes from the ICOLD database. As such I believe this database is a valuable addition to the growing number of global dam datasets (e.g. see www.globaldamwatch.org).
Response: We are grateful for the reviewer’s recognition of our data values.
Whilst this database certainly has merits I think it promises more than it delivers. The authors frame the paper as a significant improvement over other dam databases in that it includes more attributes as there are already other global datasets (e.g. GOODD and GROD) that include more dams but lack such attributes. However, the majority of the paper is focused on identifying the spatial location of dams and improvements in quantity and spatial locations of dams. E.g. line 812 “GeoDAR’s major improvement lies on the quantity or spatial details of the dams”. This framing is understandable given the challenge of linking to a guarded proprietary database. However, this limits the use of the database as only people who have purchased access from ICOLD maybe able to connect the attributes with the spatial location of the dams. Whilst this point is made clear in the conclusions it could be made more clear in abstract and introduction and the overall framing of the paper. It might be worth focusing on potential applications of the dataset.
Response: We very much appreciate this comment. The reviewer is correct that our dataset improved the quantity of georeferenced dams but the access to their attributes is conditional on a purchase from ICOLD. The latter is restricted by the proprietary nature of ICOLD which we bear no responsibility for. The contributions exclusively made from us are 1) freely-accessible dam and reservoir features that improved the spatial density of existing global datasets, and 2) a way to enable the use of ICOLD attributes for more spatially-explicit applications. In other words, the geometric features are what GeoDAR can directly offer, and the access to attribute information is an extended capability of GeoDAR.
To incorporate the reviewer’s suggestion, we have further clarified these merits and limitations in the abstract and the introduction.
In the abstract, we have clarified: “GeoDAR does not release the proprietary WRD attributes, but upon individual user requests we may provide assistance in associating GeoDAR spatial features with the WRD attribute information that users have acquired from ICOLD. Despite this limit, GeoDAR with a dam quantity triple that of GRanD, significantly enhances the spatial details of smaller but more widespread dams and reservoirs, and complements other existing global dam inventories.”
In the Introduction, we have restructured the last paragraph into two paragraphs. The ending paragraph clarified the limit of GeoDAR and suggested potential usage.
“For proprietary reasons, neither GeoDAR version releases any WRD attributes. Instead, we provide an option for users if they need to acquire the attributes: upon individual request we may assist the user who has purchased WRD (https://www.icold-cigb.org/GB/world_register/world_register_of_dams.asp) to associate the GeoDAR ID with the ICOLD “international code”, through which WRD attributes can be linked to each GeoDAR feature (see Sections 3.3 and 7 for more details). Even without the proprietary WRD attributes, GeoDAR offers one of the most extensive and spatially-resolved global inventory of dams and reservoirs, which may benefit a variety of applications in hydrology, hydropower planning, and ecology.”
More discussions about potential applications of GeoDAR are also given in the concluding section (now “Summary and applications”).
Overall, it is also not entirely clear to me why there are two versions of the dataset released simultaneously. It seems to me that V1.1 supersedes v1.0 in that it includes more dams and associated reservoirs and the harmonising with GranD is just part of the method. The authors in line 929 also refer to V1.1 as “our end product”.
Response: Thank you for this question, and we do take responsibility for this confusion. Following our response to the previous comment, we have provided the reason of including both versions at the end of the last second paragraph in the Introduction section:
“While GeoDAR v1.1 can be considered as a version that supersedes v1.0, the latter was georeferenced independently from GRanD, and we opted to release both versions so that users have the flexibility to choose whichever works better for their cases and potentially improve the harmonization.”
As noted by the authors, (line 967) connecting the dam locations with a hydrographic network would enable research into hydrological implications and ecological connectivity. This would greatly enhance the utility of the dataset.
Response: Thank you for echoing the value of this potential application. Although snapping the dam locations to river networks will extend the applications of our data (as we discussed), this was not yet the primary goal of this data paper. We here focus on improving the spatial inventory of global dams and reservoirs, which is more fundamental but still essential to the improvements of water “infrastructure” data. As both reviewers agreed, what we have produced already represents an important contribution, and following this advancement, we will consider a future revision that rectifies this product to global river networks.
As also noted by an earlier reviewer, the paper is very detailed and quite repetitive. I think it could be significantly shortened to make it more readable. In particular, the methods section is very detailed. Whilst this may be useful for some readers, the majority of readers will not require such extensive detail and could perhaps be referred to supplementary material if more detail is required. The methods section already includes a lot of the numbers later presented in results and discussion while some validation methods get introduced in the results section so some re-organisation would be required.
Response: We are grateful for this constructive comment. Since this comment echoes that of reviewer 1, we here reiterate some of our reasoning and the corresponding changes below.
We documented the methods and results in substantial detail, hoping that any user will not only understand the dataset but also be able to replicate or improve the production. However, we agree that some of the text appears repetitive, and some reorganization and simplification are needed for an improved readability. We summarized what we have revised below:
- We started the Method section with “Definition and overview”, which was then followed by the subsections that elaborate the principles of the primary procedures.
- In the subsections for geo-matching and geocoding (Section 2.2 and 2.3), we relocated some of the technical triviality into Supplementary Materials. This way, users will have a clear sense of how we streamlined the methods without being too overwhelmed by the technical details.
- We have also removed some of the reported numbers in Methods so that they won’t appear too repetitive with those in Results. However, we kept some of the intermediate numbers when we believe they are necessary for the clarity of method description.
- We have re-organized the paper so that the Methods section (Section 2) is exclusively dedicated to data production, and the methods for validation are treated separately from data production and are included in the Validation section (now Section 4).
- Following the point above, we have broken the previously lengthy “Results and discussions” into several stand-alone sections, including “Production components and usage”, “Validation”, and then “Comparisons with existing global datasets”.
- We reduced the redundancy as much as possible in the section “Comparisons with existing global datasets”. However, we still kept a substantial amount of detail that we considered important. Since this paper is nothing but data description, our rationale is that providing a well-rounded, comprehensive comparison between our product and other existing datasets will greatly benefit the user when he/she is debating on which one to use.
- We relocated some of the discussions about the applications of our dataset to the conclusion section (now entitled “Summary and applications”).
Specific comments:
I’m surprised that only about 60% of dams from GRanD were found in GeoDAR considering GRanD dams tend to be the largest and usually well documented dams and as such I would expect their attributes to be easily found.
Response: Thank you for this important question. First of all, please allow us to clarify (just in case of misunderstanding) that GeoDAR v1.0 does not georeference all records in ICOLD WRD. So, the percentage of GRanD found in GeoDAR v1.0 (which is a georeferenced subset of WRD) will be lower than the percentage of GranD found in the entirety of WRD.
In our initial manuscript submission, the percentage of GRanD found in GeoDAR v1.0 is 64% (i.e., 4691 out of 7320) as the reviewer pointed out, and the percentage of GRanD found in the entire WRD is 85% (i.e., 6209 out of 7320). This means that the harmonization with GRanD helped us match another 1518 WRD records that were not georeferenced in GeoDAR v1.0.
In the revision, we improved the georeferencing scripts and did a deeper search in between WRD and GRanD (please refer to our revision summary at the beginning of the response letter). As a result, the percentage of GRanD in the updated GeoDAR v1.0 increased to 69%, and the percentage of GRanD found in the entire ICOLD WRD reaches 89%. The fraction of GRanD not found in WRD includes only 810 dams (or 11%), which were also appended to GeoDAR v1.1.
The revised result reflects a harmonizing effort that was as thorough as our capability allows at this moment. On a relevant aspect, this is also one of the reasons why we released both GeoDAR v1.0 and v1.1, in case users want to use v1.0 to perform their own harmonization with GRanD with a possible improvement.
In Section 5.3, we acknowledged that some of the remaining 810 dams in GRanD might be documented in WRD, although matching them was probably tricky due to the challenges of attribute inconsistency between the two datasets and the lack of spatial explicitness in WRD.
Given the chance, we would like to mention that GeoDAR v1.1 is not a terminal version of our data. Should additional GRanD dams be found in WRD, we may consider adding these associations in an updated GeoDAR version.
I was wondering if there could be a potential bias in WRD data since this is a volunteered database? Are there any countries not included because they don’t contribute to ICOLD?
Response: Thank you for raising this point, and this could well be. For instance, among the 59k original WRD records, more than 23k (about 40%) come from China alone, whereas Russia is only documented with 70 or so dams. This stark contrast indicates the existence of biases among the contributing nations.
To absorb this comment from the reviewer, we added in Line 753 (Section 5.2):
“However, this pattern also reflects the disparities due to several factors, such as a possible bias in WRD (as it is a volunteered dataset and not all member nations contributed equally), the accessibility of regional registers for geo-matching, and geocoding challenges for different countries.”
Line 54. “inaccessible” in what sense? I believe many WRD coordinates can be made available at cost. Suggest change to e.g. not freely or publicly available. In particular as the point about public availability is made in line 58 for regional registers.
Response: As suggested, we have change it to “not publically available”.
Line 93. We may decrypt? Perhaps link to more detail provided in sections 4 and 5.
Response: We have rephrased this statement to improve the clarity:
“… upon individual request we may assist the user who has purchased WRD (https://www.icold-cigb.org/GB/world_register/world_register_of_dams.asp) to associate the GeoDAR ID with the ICOLD “international code”, through which WRD attributes can be linked to each GeoDAR feature (see Sections 3.3 and 7 for more details).”
Line 98. How is it possible that about 1/3 of the WRD dams (v1.1) capture a similar total storage capacity as the full WRD inventory of ~60,000 dams. Is this because the remaining ~40k dams in WRD are non-reservoir dams? Please explain this in this section. Also would be good to provide the total storage in WRD here (which is only provided in line 138)
Response: Thank you for this question. We see three major reasons.
First, the original storage capacity values in ICOLD WRD have occasional voids or unit errors (sometimes underestimated by a factor of 1000, or for some of the US dams, the values in acre feet instead of thousand cubic meters), whereas in GeoDAR v1.1 we overwrote the original WRD capacity values by those of GRanD (when available). This correction could lead to an increased capacity in GeoDAR v1.1. Because of this, we later in Section 5.2 replaced the original capacity values of WRD that overlaps GRanD by the capacity values of GRanD, and then used the updated WRD capacities for comparison with GeoDAR v1.1 (this way, the comparison will be more apple to apple). Please refer to Section 5.2 for details.
Second, GeoDAR v1.1 absorbed the 810 GRanD dams we were unable to find in WRD. If these 810 dams are indeed not included by WRD, they may also result in a total storage capacity of GeoDAR that is similar to that of the full WRD.
Third, our harmonization between GeoDAR v1.0 and GRanD in the last round included some duplication errors, leading to an amplified storage in GeoDAR. Such duplication errors have been eliminated as thoroughly as possible in the revision.
Now in our revised GeoDAR 1.1, the total storage capacity is 7297 km3, which is below either the total capacity based on original WRD values (7334 km3) or the total capacity based on GRanD-adjusted WRD values (7642 km3). This appears more reasonable. The difference between GeoDAR and WRD capacities (up to ~350 km3) could be explained by the remaining ~60% of the WRD dams that were not georeferenced. These dams are mostly small, so it is not surprising that their accumulative capacity appears marginal.
Line 118: “We acknowledge…” I suggest rephrasing this sentence to something like: “Whilst we have made every endeavour to remove duplicates, we acknowledge that some duplicates may remain in the dataset”
Response: Thank you. This suggestion concurs with that of Reviewer 1. To combine both suggestions, we have rephrased this sentence to:
“We acknowledge that owing to the challenges of lacking explicit spatial information and occasional attribute errors in WRD, our duplicate removal is not perfect and may have misidentified or missed some duplicate dams.”
Line 138. 7388 km3 in original WRD. Is this the figure for all WRD dams or for the cleaned version of 56,783 dams?
Response: This value is the total storage capacity based on the original attribute values of the WRD dams after duplicate removal. In the revision, we performed another round of duplicate removal and concluded a total of 56,850 unique dams in WRD with a total capacity of 7334 km3.
For improved clarity, we added a sentence in Section 2.1:
“Unless otherwise described, the ICOLD WRD mentioned in the following text refers to the version after duplicate removal.”
Line 139: I don’t think the Venn diagrams are very clear. Not sure if they are even needed as the text explains the process. A simple flowchart might be easier to understand.
Response: We appreciate this comment, which echoes Reviewer 1 as well. As we responded previously, the reasons that we included the Venn diagrams (rather than a flow chart) are: the dams from some of these data sources or methods (circles) overlap with each other, so we believe using the Venn diagrams is perhaps the most visually effective way for readers to understand their topological relationships and how they contribute to each of the final components (boxes) in our dataset.
We agree that these Venn diagrams can be a little tricky to interpret although we tried to keep it simple and clear. But for the reasons above, we think the diagrams offer more benefits than confusion and the readers can also refer to Table 1 for more clarification.
For improved clarity, we have provided the following explanation in the figure caption:
“Boxes indate final subsets in each GeoDAR version, and the arrows point to the georeferencing sources or methods. Topology of the shapes illustrates logical relations among the data/methods, but sizes of the shape were not drawn to scale of the data volume.”
We hope the reviewer finds our Venn diagrams, particularly with the revised caption, more acceptable.
Line 239: “rest parts of the world” is a strange phrase
Response: We have changed this to “the other regions of the world”.
Line 322: ICOLD storage capacity erroneous. See earlier comment (line 98) on ICOLD reported storage. This can also explain the discrepancy. Note that Mulligan et al (2020) also note erroneous reporting of catchments in ICOLD.
Response: Thank you. We have also cited Mulligan et al. (2020) in this sentence to acknowledge that occasional capacity errors in ICOLD were similarly noticed in other literature and to further backup our statement.
Line 525-562, section 3.2 in results and discussion seems to introduce more methods on validation. This should be moved to methods.
Response: Thank you for this comment. We agree that the original section 3.2 included both validation methods and validation results. This reflected our perception that data validation was not necessarily part of data generation and that we intended to only include the methods related to data generation in the Methods section. This arrangement also reflected our hesitation to group everything after methods into a single “Results” section.
After deliberations, we decided to restructure the previous lengthy “Results” section into a few stand-alone sections. They are “Product components and structure”, “Validation”, and “Comparisons with existing global datasets”. These sections are relatively independent, and arguably not always about “results” (they are also about discussions and applications). So we believe it is more reasonable to reorganize them as separate sections, rather than lumping them all into a single “Results” section.
This way, the “Methods” section now only includes the methodology directly related to data generation, including pre-processing, georeferencing, QA/QC, harmonization, and reservoir retrieval. The product validation, including the methods of sample collection and validation, are now designated to the “Validation” section. We believe this adjusted structure is clearer to the general readers.
Line 797: I find the term (global) capacity improvement a bit confusing. I guess what is meant is a higher reporting of total dam storage capacity by country or globally which is hardly surprising given that more dams and reservoirs are included.
Response: Thank you for pointing out this confusing statement. We agree and have rephrased this expression to “global storage capacity increase”, to be comparable to our expression in the previous sentence (“global dam count increase”).
Technical corrections:
Numbers in some cases use thousand separator (e.g. 7,163 line 238) but not in others. Please be consistent throughout
Response: Thank you for this meticulous review. For consistency, we have now avoided using the thousand separator for any number under 10,000. But we will make necessary changes upon the request of the publisher.
Line 163. “NID records were accessed”
Response: Thank you. This sentence was deleted as it is no longer necessary. In the revised data we have used a more updated version of USNID (version 2018). Please see the revision summary at the beginning of the response letter.
Line 223 “This led to a conservative success rates”
Response: Thank you. We have corrected “rates” to “rate”.
Line 258: “this process was repeated”
Response: This has been corrected.
Line 570 “We believed”
Response: We have deleted “We believed” in this sentence.
Line 700: Mulligan et al (2020)
Response: Sorry about this typo, and we have corrected “2000” to “2020”.
Line 744: GeoDAR
Response: Thank you. We have corrected this typo.
References: line 39 Doll should be Döll, line 42 Vorosmarty should be: Vörösmarty and there may be others.
Response: We are sorry about leaving out the umlaut. We have corrected it throughout the revised manuscript.
Citation: https://doi.org/10.5194/essd-2021-58-AC4
-
AC2: 'Reply on RC2', Jida Wang, 22 Apr 2021