the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Bridging the Data Gap: An Enhanced Global Inventory for Statistical Characterization and Hazard Assessment of Landslide Dams
Abstract. Landslide dams and their subsequent outburst floods represent cascading geohazards with profound socio-economic and morphological impacts. However, the widespread absence of dynamic breaching parameters in existing global inventories severely constrains quantitative hydrodynamic modeling and downstream risk assessment. To bridge this critical data void, this study presents a comprehensive global landslide dam dataset encompassing 902 rigorously vetted events spanning before 2020. Moving beyond traditional static cataloging, the assembled dataset integrates 11 fundamental morphological and triggering parameters with 6 highly transient breaching metrics. Notably, it significantly improves the data availability of historically scarce variables, including peak discharge, released water volume, and three-dimensional breach geometries. Spatially, the database achieves global coverage, with the highest data densities clustered within the Alpine-Himalayan and Circum-Pacific active belts. To objectively account for observational limitations and chronological biases across different technological eras, a point-by-point Data Quality Flag (DQF) system is incorporated into the dataset, transparently classifying the spatial, geometric, and hydrodynamic uncertainties for every cataloged event. This multi-dimensional and structurally transparent inventory provides a robust empirical foundation for future machine-learning-based hazard susceptibility mapping and physically-based dam-breach simulations. The dataset is publicly available at Zenodo https://doi.org/10.5281/zenodo.19198720 (Jiang et al. 2026).
- Preprint
(2334 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 20 May 2026)
-
RC1: 'Comment on essd-2026-107', Anonymous Referee #1, 22 Apr 2026
reply
-
AC1: 'Reply on RC1', Xiangang Jiang, 13 May 2026
reply
Dear Referee, thank you very much for your keen interest in our global landslide dam database and for providing such a thorough, data-centric assessment. In this revised manuscript, we have comprehensively addressed your concerns by explicitly contextualizing our dataset against foundational historical databases and clarifying our rigorous screening criteria for data selection. We have also moderated expectations regarding dynamic parameters to instead emphasize our standardized, multi-dimensional framework for the 902 vetted records spanning 1800–2020. Furthermore, we enhanced the practical utility of the Data Quality Flag (DQF) system by explicitly defining its uncertainty levels, and replaced our previous intuitive categorization with geomorphologically established classifications. Below, please find our detailed, point-by-point responses to your valuable comments.
1.Quantitative contribution of this DB
Fan et al. (2021, ESR) compiled a global database of landslide dams and extended the record to earlier historical periods. Fan et al. (2012, ESR) provided a comprehensive inventory of 828 landslide dams triggered by the 2008 earthquake in China. Furthermore, Liu et al. (2019, ESR; not cited in the manuscript) and Peng et al. (2012, Landslides) extended the temporal coverage of landslide dam and outburst flood databases in China to before 1400 AD, including parameters similar to those presented in this manuscript. Carlo et al. (2016, Engineering Geology; also not cited in the ms) also summarized approximately 300 landslide dam cases in Italy along with related parameters. These studies account for approximately two thirds of the total sample size of this manuscript, including most of the parameter information considered here. Parameters not included in those previous datasets, such as breach geometry, are provided only for a very limited number of scenarios.
Similarly, Cheng et al. (2025, ESSD, this one is rightly cited in the ms) published a global database of debris flow dams including 555 cases. Why most of their data are not included in the present database? what criteria have been used to discriminate?
It therefore seems that the overall contribution in terms of landslide dam data compilation is limited and not clearly explained in the framework of previous works.
Reply: We sincerely thank the reviewer for raising this critical point and for pointing out the excellent regional and historical synthesis works by Fan et al. (2021), Peng et al. (2012), Cheng et al.(2025), Liu et al. (2019) and Carlo et al. (2016). We completely agree that establishing our dataset's quantitative contribution within the context of these previous massive efforts is essential.
Regarding the overlap with existing databases and the missing citations, we would like to clarify our data retrieval and citation methodology, and how our work builds upon these foundational studies:
- Citation Strategy and Primary Source Traceability:During the compilation of our database, we extensively referenced the mainstream databases mentioned by the reviewer, i.e., the databases proposed by Fan et al. (2021), Peng et al. (2012), Cheng et al. (2025), Liu et al. (2019), and Carlo et al. (2016), to identify and cross-check landslide dam cases.To ensure absolute data traceability for future users, our initial citation strategy strictly prioritized the primary, original literature where the raw measurements were first reported, rather than the secondary synthesis papers that compiled them.Therefore, when referencing previous datasets, we consulted the original literature that those datasets had cited, verified each of those original sources, and cited the original papers rather than the dataset papers themselves.This can be confirmed by the reference section of the dataset. We did not cite all the relevant literature in the manuscript; however, as suggested by the reviewer, we should cite some of the previously related literature that includes datasets in the manuscript. In the revised manuscript, we have now explicitly cited these works in the Introduction and Discussion sections to properly acknowledge their immense contributions to regional and historical cataloging. It should be noted that Carlo et al. (2016) (Stefanelli C T, Segoni S, Casagli N, et al. Geomorphic indexing of landslide dams evolution[J]. Engineering Geology, 2016, 208: 1-10.) pointed out that the analysis in that paper was based on the dataset from the authors' 2015 article (Tacconi Stefanelli C, Catani F, Casagli N. Geomorphological investigations on landslide dams[J]. Geoenvironmental Disasters, 2015, 2(1): 21). Therefore, in Section 6.2 of our article, we have placed particular emphasis on citing the paper of Carlo et al. (2015).
- The True Quantitative Contribution,Standardization and DQF Integration: While it is true that a significant portion of our raw cases overlaps with previous regional (e.g., Carlo et al. 2016 for Italy; Fan et al. 2012 for the Wenchuan earthquake) or historical (Peng et al. 2012) inventories, ourcontribution of the dataset lies in providing a rich set of parameters for landslide dams, along with global synthesis and rigorous standardization.The cases in our database were screened based on the completeness of core parameters (as outlined in Figure 1 in the manuscript). Records were excluded if fewer than three physical parameters were available for a given case. These parameters could be any three variables selected from the following categories: geometric parameters, including dam volume, dam height, dam width, and dam length; material-composition parameters; breach parameters, including breach width, breach depth, peak breach discharge, and breach duration; and hydrological parameters, primarily the storage capacity of the dammed lake. In addition, few previous datasets ensure that each case contains at least three available physical parameters selected from the combined set of geometric variables, including dam volume, dam height, dam width, and dam length; material-composition variables; breach variables, including breach width, breach depth, peak breach discharge, and breach duration; and hydrological variables, primarily dammed-lake storage capacity. Moreover, in most previous datasets, individual cases largely lack breach parameters of landslide dams, such as breach width, depth, peak breach discharge, and breach duration. Our dataset specifically provides these data, which is another highlight of our work. Furthermore, previous landslide-dam records were scattered across different sources, formats, measurement units, and regional reporting conventions. We therefore did not merely aggregate existing cases; instead, we subjected them to a unified and strict spatial cross-referencing procedure and incorporated a Data Quality Flag (DQF) system. This approach transforms fragmented regional and historical records into a coherent, globally interoperable framework, while providing an explicit indication of data reliability and uncertainty for each documented event. Finally, although the completeness of some dynamic parameters remains relatively low because of objective observational limitations, incorporating these parameters into a standardized database structure is itself an important step forward. It provides a consistent framework for future updates and enables newly reported landslide-dam events to be integrated in a comparable and quality-controlled manner. We had already described the contributions of our dataset in the original manuscript, but to make it clearer to readers, we restated this content in Section 6.2, as follows:
“In stark contrast, this study establishes a comprehensive multi-dimensional framework encompassing 11 types of fundamental dam parameters and, crucially, 6 types of detailed breach parameters (including peak discharge, released water volume, breach duration, breach depth, and top/bottom widths). The contribution of our database lies in providing a rich set of parameters for landslide dams, along with global synthesis and rigorous standardization..The cases in our database were screened based on the completeness of core parameters (as outlined in Figure 1). Records were excluded if fewer than three physical parameters were available for a given case. These parameters could be any three variables selected from the following categories: geometric parameters, including dam volume, dam height, dam width, and dam length; material-composition parameters; breach parameters, including breach width, breach depth, peak breach discharge, and breach duration; and hydrological parameters, primarily the storage capacity of the dammed lake. Conversely, few previous datasets ensure that each case contains at least three available physical parameters selected from the combined set of geometric, material-composition, breach, and hydrological variables. Moreover, in most previous datasets, individual cases largely lack breach parameters of landslide dams, such as breach width, depth, peak breach discharge, and breach duration. Our dataset specifically provides these data, which is another highlight of our work. Also, our dataset establishes a standardized framework for landslide-dam records and provides a consistent basis for future updates, enabling newly reported events to be integrated in a comparable and quality-controlled manner.”
- Discrimination Criteria and Integration of Cheng et al. (2025): Regarding the excellent recent work on debris flow dams by Cheng et al. (2025), their publication timeline closely overlapped with the finalization of our dataset. During our cross-validation, cases from their database were subjected to our strict standardized screening protocol (as detailed in Figure 2 of our manuscript). Our criteria required a minimum threshold of at least threephysical parameters, irrespective of parameter category. The considered parameters included geometric variables, material-composition information, breach characteristics, and hydrological variables, such as dammed-lake storage capacity. Because debris flow dams often present highly transient and morphologically complex features, some historical cases in Cheng et al. (2025) lacked the specific minimum quantitative bounds required by our framework and were consequently filtered out to maintain the overall statistical rigor of our dataset.
However, we fully acknowledge that there are still valid cases within Cheng et al. (2025) and other recent literature that meet our criteria. As stated in our "Data Availability" section, this database is designed as a "living" repository. We are firmly committed to continually updating the database, and the systematic incorporation of newly published, high-quality cases from recent literature will be a primary focus of our upcoming database expansion (Version 2.0, the DOI:https://doi.org/10.5281/zenodo.20149544).
2.Qualitative contribution
The authors emphasize that the main contribution of this dataset lies in filling the gap in dynamic breach parameters within global landslide dam inventories, thereby distinguishing it from previous studies such as those by Shi et al (2022) and Wu et al (2022). However, upon examination of the dataset, it appears that the completeness of the more novel dynamic parameters, such as breach top width, bottom width, and breach duration, remains below 5%. This is understandable since these parameters typically rely on costly monitoring equipment or real time field observations, but maybe the expectation created on the reader about the “global” scale of the data set could be moderated. Instead, the authors could clearly explain the quantitative contribution of their dataset compared to similar studies, to help assessing the significance of this compilation.
Another interesting parameter added in this study is the particle size distribution of the dam, abbreviated as PSD. However, among the total of 902 cases, only 8 include this information. Interesting as these measures are, their availability seems insufficient for statistical or comparative hazard analyses across regions or dam types.
Finally, the manuscript mentions a “Data Quality Flag” (DQF) meant to constrain uncertainties. This DQF is highlighted in the abstract as “a point-by-point system incorporated into the dataset, transparently classifying the spatial, geometric, and hydrodynamic uncertainties for every cataloged event”. However, the last columns in the database spreadsheet simply shows three levels of uncertainty (“high”, “medium”, “low”) that are not described or discussed in the ms and they apply to three groups of parameters, not to each of the parameters separately. These DQF therefore remain of little use for the end user of the database.
I believe the limitations above should be either clearly stated or satisfactorily solved, to avoid false expectations.
Reply: We sincerely thank the reviewer for the detailed examination of our dataset. Our dataset encompasses 902 cases compiled on a global scale. The overall completeness of the parameters is distributed as follows: peak discharge is recorded for 249 cases (27.6%); dam height for 870 cases (96.5%); dam width for 763 cases (84.6%); and dam volume for 729 cases (80.8%). Regarding breach geometries, the breach depth is documented in 90 cases (10.0%).
The reviewer correctly pointed out that the completeness for parameters such as breach top width, bottom width, and breach duration remains below 5%. In fact, across the entire dataset of 21 parameters, only four parameters fall below the 5% threshold: breach top width (3.1%), breach bottom width (3.3%), breach duration (4.4%), and inflow rate (2.3%). We completely agree with the reviewer's insightful assessment that "these parameters typically rely on costly monitoring equipment or real-time field observations." Consequently, acquiring measured values for these highly transient variables is inherently challenging worldwide, resulting in relatively lower completeness rates. However, it is crucial to emphasize that the remaining 17 parameters boast a completeness rate well above 5%, with fundamental geometrical parameters exceeding 80% to 90%. Given this robust overarching data coverage and the comprehensive geographical distribution of the 902 cases, we believe the designation of a "global" scale dataset is both scientifically justified and appropriate.
Compared with previous databases, the contributions of our database are as follows: this study establishes a comprehensive multi-dimensional framework encompassing 11 types of fundamental dam parameters and, crucially, 6 types of detailed breach parameters (including peak discharge, released water volume, breach duration, breach depth, and top/bottom widths). The contribution of our dataset lies in providing a rich set of parameters for landslide dams, along with global synthesis and rigorous standardization. The cases in our database were screened based on the completeness of core parameters (as outlined in Figure 1). The contribution of our database lies in providing a rich set of parameters for landslide dams, along with global synthesis and rigorous standardization..The cases in our database were screened based on the completeness of core parameters (as outlined in Figure 1). Records were excluded if fewer than three physical parameters were available for a given case. These parameters could be any three variables selected from the following categories: geometric parameters, including dam volume, dam height, dam width, and dam length; material-composition parameters; breach parameters, including breach width, breach depth, peak breach discharge, and breach duration; and hydrological parameters, primarily the storage capacity of the dammed lake. Furthermore, few previous datasets ensure that each case contains at least three available physical parameters selected from the combined set of geometric, material-composition, breach, and hydrological variables. Moreover, in most previous datasets, individual cases largely lack breach parameters of landslide dams, such as breach width, depth, peak breach discharge, and breach duration. Our dataset specifically provides these data, which is another highlight of our work. Also, our dataset establishes a standardized framework for landslide-dam records and provides a consistent basis for future updates, enabling newly reported events to be integrated in a comparable and quality-controlled manner. In the manuscript, Figure 14 quantitatively visualizes this advancement: while the subplots for the reference databases (Figure 14b–e) exhibit massive data voids regarding dynamic and breaching characteristics, subplot (a) demonstrates unprecedentedly dense and continuous records for these highly transient variables. The systematic integration of both static morphologies and dynamic failure metrics effectively bridges a critical gap in the existing literature. By providing these essential boundary conditions, the presented dataset facilitates a crucial transition from static hazard cataloging to dynamic hydrodynamic routing and quantitative downstream risk modeling.
Although only eight cases in the dataset have Particle Size Distribution (PSD) data, based on the records in the original literature, we have qualitatively categorized the landslide dam materials into four types: Soil-dominated, Boulders dominant, Boulders mixed with soil, and Soil contains boulders. This classification is consistent with the classification of landslide dam materials in the Chinese industry code for emergency response and risk assessment of barrier lakes, i.e., the Code for risk classification and emergency measures of barrier lake (SL/T450-2021). The collection of PSD data for landslide dams is challenging due to factors such as field monitoring techniques and environmental conditions. Nevertheless, the four qualitative material types provided by our dataset can greatly facilitate the risk assessment of landslide dams. For example, in the Code for risk classification and emergency measures of barrier lake (SL/T450-2021), the qualitative classification of dam materials can be used to estimate the median particle size of the dam material and to evaluate the risk level of the landslide dam. This type of material classification information is absent from previous landslide dam datasets and represents one of the advantages of our dataset over previous ones.
We reclassified the data into four categories: Basic information, time parameters, geometric parameters, and hydrological parameters. These four categories of parameters cover all the data. To objectively manage the observational limitations and temporal survivorship biases inherent in historical records, this study implemented a category-based Data Quality Flag (DQF) system. This multi-dimensional framework systematically categorizes spatial location accuracy, geometric precision, and hydrodynamic reliability based on the technological era and measurement constraints of each event. By explicitly establishing these confidence intervals, the dataset provides a transparent and structured metric for data reliability, enabling researchers to apply physically meaningful boundary conditions. We have finished the Data Quality Flag (DQF) system and assigned DQF evaluation results to each of the four data categories. And we have removed the misleading "point-by-point" terminology in the abstract section. The DQF now functions as a robust filtering tool, enabling end-users to leverage our large sample size to selectively isolate high-fidelity data subsets (low/medium uncertainty) tailored to their specific modeling precision requirements.
Reference
Yang Q G, Cai Y J, Liu Z M. Code for risk classification and emergency measures of barrier lake (SL/T450-2021)[S]. 2021.(in Chinese)
3.L134: In Figure 4, define "existence time".
Fig 5b and L153-160: Define “uncertain parameters”. Where can this number be seen in the spreadsheet? The uncertainty is in that file binned in 3 sets of parameters (columns AD to AF).
Reply: We deeply appreciate your meticulous editorial guidance. "Existence time" refers to the longevity of landslide dams, which we have supplemented in Figure 4.
We sincerely apologize for the ambiguity in our terminology, which understandably caused confusion. We thank the reviewer for pointing this out.We would like to clarify that the "uncertain parameters" shown in Figure 5b are completely distinct from the DQF "uncertainty ratings" (located in columns AD to AF).
The "uncertain parameters" in the Fig 5b should be "range-qualitative parameters" in the context. It refers to specific data entries that are recorded as numerical ranges (e.g., "15–20 m") or qualitative descriptions, rather than exact, deterministic numbers. We have revised it in the Fig 5b.
To address your question regarding "Where can this number be seen in the spreadsheet?": These "uncertain parameters" do not have a dedicated column like the DQF ratings. Instead, they are simply the specific cell entries scattered within the regular parameter columns that contain a range or descriptive text instead of a single digit.
The primary purpose of plotting Figure 5b is to transparently demonstrate to the database users that although we included these non-exact records, their overall proportion across the database is actually very small. This figure was deliberately designed to alleviate any potential concerns regarding the quantitative precision of our database, proving that the vast majority of our collected data consists of exact values. The revised Fugure 5 is in the supplement.
4.L165: In this study, dam types are classified into four categories, namely sliding, collapses, flows, and unknown. This intuituive classification lacks some clear criteria or at least some discussion of the biases it could introduce, also to allow the reproducibility of the results.
Reply: We sincerely thank the reviewer for pointing this out. We completely agree that our initial intuitive classification lacked clear theoretical criteria, which could introduce subjective bias and hinder the reproducibility of the results.
To address this critical issue, we have abandoned the previous intuitive categorization. Instead, we have re-classified the primary dam-forming mechanisms based on the foundational and widely accepted mass-movement framework established by Costa and Schuster (1988). Specifically, the historical cases in our database have been rigorously re-evaluated and grouped into the following standardized categories:
- RSSS: Rock and debris avalanches; Rock and soil slumps and slides.
- UL: Undifferentiated landslides.
- MDEF: Mud, debris, and earth flows.
- FALL: Falls.
We have explicitly defined the inclusion criteria for each of these categories in the revised manuscript and updated Figure 6a accordingly. By adopting this established, process-based taxonomy, we have eliminated subjective interpretation, providing a transparent and rigorous basis that guarantees the full reproducibility of our dataset for future comparative studies. The revised Fugure 6 is in the supplement.
- L334: In Figure 13, panels a to c, especially panel c, the number of landslide dam cases shows a clear and rapid increase since around 1980. It is not very clear whether this trend reflects a real increase in events, or if it is mainly due to improvements in observation methods, such as the development of remote sensing and satellite imagery, as well as better reporting and documentation. I suggest the authors could provide further discussion on the possible reasons behind this trend.
Reply: We highly appreciate the reviewer for raising this insightful point. We completely agree that the sharp exponential increase in recorded cases since the 1980s is heavily influenced by observational bias, and leaving this unexplained could indeed mislead readers into assuming an exclusive surge in physical geohazard frequencies.
Following your excellent suggestion, we have added a dedicated discussion in “ 6.2 Benchmarking Against Existing Inventories” section to the revised manuscript to explicitly address the underlying reasons behind this trend. We now comprehensively discuss how the advent of civilian remote sensing, satellite imagery, and digital reporting has fundamentally enhanced global detection rates, while also briefly acknowledging the compounding physical factors such as climate-driven extreme weather and human infrastructure expansion.
We have added the following text to the revised manuscript :
It is important to note that the rapid increase in recorded cases since around 1980, consistently observed across panels (a) to (c) in Figure 13, should be interpreted with caution. This steep exponential trend primarily reflects significant improvements in observational methodologies rather than an exclusive sudden surge in physical geological events. The widespread application of civilian remote sensing and satellite imagery (e.g., the Landsat program) starting in the 1980s, coupled with the rapid advancement of digital databases and internet reporting, fundamentally enhanced the global detection and documentation rates of landslide dams, especially in remote mountainous regions. While climate-driven extreme weather and increased human infrastructure expansion into mountainous areas have indeed contributed to a real increase in geohazard frequencies, the pronounced spike in recent decades is heavily amplified by this observational bias.
- L335: Panel d of figure 13, the Costa database was originally published in 1991, yet the data shown in the figure appear to extend to 2020. A similar issue can also be observed in panel a. Is this underestimating the most recent columns? or are their values extrapolated to the complete period they represent?
Reply: We thank the reviewer for this insightful observation. Regarding the temporal range in Figure 13, we would like to clarify the following:
In the original version, we used a unified 40-year interval (1980–2020) for the final bin across all panels. While this bin accurately captured the records from the reference databases within their respective active periods. For example, Costa and Schuster’s dataset strictly ends in 1991, but the terminal year, 1991, was statistically grouped into the 1980–2020 bin, which led to the misunderstanding that the dataset extended to 2020.
To ensure absolute clarity and prevent any perceived overestimation, we have refined the temporal binning strategy in the revised Figure 13.
Instead of a fixed 40-year window for the final column, we have now implemented a customized truncation for each comparison. The final bin in each panel is now explicitly defined by the actual data cutoff year of each reference database (e.g., exactly 1980–1991 for panel d). Our own dataset has been synchronized to these exact same truncated periods to ensure a strictly "apples-to-apples" comparison.The revised Fugure 13 is in the supplement.
7. References not cited in the ms:
Tacconi Stefanelli et al. Geoenvironmental Disasters (2015) 2:21 DOI 10.1186/s40677-015-0030-9
Peng et al. , Landslides (2012) 9:13–31 DOI 10.1007/s10346-011-0271-y
Liu et al. 2019. Earth-Science Reviews 197 (2019) 102895. https://doi.org/10.1016/j.earscirev.2019.102895
Reply: We sincerely apologize for this oversight and thank the reviewer for carefully checking our reference list.
We have now added the missing citations for Tacconi Stefanelli et al. (2015), Peng et al. (2012), and Liu et al. (2019) into the appropriate sections of the revised manuscript. Accordingly, their full details have been properly incorporated into the Reference list.
-
AC1: 'Reply on RC1', Xiangang Jiang, 13 May 2026
reply
Data sets
Bridging the Data Gap: An Enhanced Global Inventory for Statistical Characterization and Breach Prediction of Landslide Dams Jiangang Jiang, Tao Wen, and Guoqiang Xiao https://doi.org/10.5281/zenodo.19198720
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 302 | 68 | 23 | 393 | 20 | 23 |
- HTML: 302
- PDF: 68
- XML: 23
- Total: 393
- BibTeX: 20
- EndNote: 23
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
In spite of the effort involved in putting together this database for landslide-dam breach episodes, it remains unclear the amount of novel data and the clarity of the relationship between this dataset and similar previously published datasets. Below I raise some concerns about the completeness and antecedents of the database under scrutiny that I think must be taken into account:
General points:
Fan et al. (2021, ESR) compiled a global database of landslide dams and extended the record to earlier historical periods. Fan et al. (2012, ESR) provided a comprehensive inventory of 828 landslide dams triggered by the 2008 earthquake in China. Furthermore, Liu et al. (2019, ESR; not cited in the manuscript) and Peng et al. (2012, Landslides) extended the temporal coverage of landslide dam and outburst flood databases in China to before 1400 AD, including parameters similar to those presented in this manuscript. Carlo et al. (2016, Engineering Geology; also not cited in the ms) also summarized approximately 300 landslide dam cases in Italy along with related parameters. These studies account for approximately two thirds of the total sample size of this manuscript, including most of the parameter information considered here. Parameters not included in those previous datasets, such as breach geometry, are provided only for a very limited number of scenarios.
Similarly, Cheng et al. (2025, ESSD, this one is rightly cited in the ms) published a global database of debris flow dams including 555 cases. Why most of their data are not included in the present database? what criteria have been used to discriminate?
It therefore seems that the overall contribution in terms of landslide dam data compilation is limited and not clearly explained in the framework of previous works.
The authors emphasize that the main contribution of this dataset lies in filling the gap in dynamic breach parameters within global landslide dam inventories, thereby distinguishing it from previous studies such as those by Shi et al (2022) and Wu et al (2022). However, upon examination of the dataset, it appears that the completeness of the more novel dynamic parameters, such as breach top width, bottom width, and breach duration, remains below 5%. This is understandable since these parameters typically rely on costly monitoring equipment or real time field observations, but maybe the expectation created on the reader about the “global” scale of the data set could be moderated. Instead, the authors could clearly explain the quantitative contribution of their dataset compared to similar studies, to help assessing the significance of this compilation.
Another interesting parameter added in this study is the particle size distribution of the dam, abbreviated as PSD. However, among the total of 902 cases, only 8 include this information. Interesting as these measures are, their availability seems insufficient for statistical or comparative hazard analyses across regions or dam types.
Finally, the manuscript mentions a “Data Quality Flag” (DQF) meant to constrain uncertainties. This DQF is highlighted in the abstract as “a point-by-point system incorporated into the dataset, transparently classifying the spatial, geometric, and hydrodynamic uncertainties for every cataloged event”. However, the last columns in the database spreadsheet simply shows three levels of uncertainty (“high”, “medium”, “low”) that are not described or discussed in the ms and they apply to three groups of parameters, not to each of the parameters separately. These DQF therefore remain of little use for the end user of the database.
I believe the limitations above should be either clearly stated or satisfactorily solved, to avoid false expectations.
Minor points:
L134: In Figure 4, define "existence time".
Fig 5b and L153-160: Define “uncertain parameters”. Where can this number be seen in the spreadsheet? The uncertainty is in that file binned in 3 sets of parameters (columns AD to AF).
L165: In this study, dam types are classified into four categories, namely sliding, collapses, flows, and unknown. This intuituive classification lacks some clear criteria or at least some discussion of the biases it could introduce, also to allow the reproducibility of the results.
L334: In Figure 13, panels a to c, especially panel c, the number of landslide dam cases shows a clear and rapid increase since around 1980. It is not very clear whether this trend reflects a real increase in events, or if it is mainly due to improvements in observation methods, such as the development of remote sensing and satellite imagery, as well as better reporting and documentation. I suggest the authors could provide further discussion on the possible reasons behind this trend.
L335: Panel d of figure 13, the Costa database was originally published in 1991, yet the data shown in the figure appear to extend to 2020. A similar issue can also be observed in panel a. Is this underestimating the most recent columns? or are their values extrapolated to the complete period they represent?
References not cited in the ms: