the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Bridging the Data Gap: An Enhanced Global Inventory for Statistical Characterization and Hazard Assessment of Landslide Dams
Abstract. Landslide dams and their subsequent outburst floods represent cascading geohazards with profound socio-economic and morphological impacts. However, the widespread absence of dynamic breaching parameters in existing global inventories severely constrains quantitative hydrodynamic modeling and downstream risk assessment. To bridge this critical data void, this study presents a comprehensive global landslide dam dataset encompassing 902 rigorously vetted events spanning before 2020. Moving beyond traditional static cataloging, the assembled dataset integrates 11 fundamental morphological and triggering parameters with 6 highly transient breaching metrics. Notably, it significantly improves the data availability of historically scarce variables, including peak discharge, released water volume, and three-dimensional breach geometries. Spatially, the database achieves global coverage, with the highest data densities clustered within the Alpine-Himalayan and Circum-Pacific active belts. To objectively account for observational limitations and chronological biases across different technological eras, a point-by-point Data Quality Flag (DQF) system is incorporated into the dataset, transparently classifying the spatial, geometric, and hydrodynamic uncertainties for every cataloged event. This multi-dimensional and structurally transparent inventory provides a robust empirical foundation for future machine-learning-based hazard susceptibility mapping and physically-based dam-breach simulations. The dataset is publicly available at Zenodo https://doi.org/10.5281/zenodo.19198720 (Jiang et al. 2026).
- Preprint
(2334 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on essd-2026-107', Anonymous Referee #1, 22 Apr 2026
-
AC1: 'Reply on RC1', Xiangang Jiang, 13 May 2026
Dear Referee, thank you very much for your keen interest in our global landslide dam database and for providing such a thorough, data-centric assessment. In this revised manuscript, we have comprehensively addressed your concerns by explicitly contextualizing our dataset against foundational historical databases and clarifying our rigorous screening criteria for data selection. We have also moderated expectations regarding dynamic parameters to instead emphasize our standardized, multi-dimensional framework for the 902 vetted records spanning 1800–2020. Furthermore, we enhanced the practical utility of the Data Quality Flag (DQF) system by explicitly defining its uncertainty levels, and replaced our previous intuitive categorization with geomorphologically established classifications. Below, please find our detailed, point-by-point responses to your valuable comments.
1.Quantitative contribution of this DB
Fan et al. (2021, ESR) compiled a global database of landslide dams and extended the record to earlier historical periods. Fan et al. (2012, ESR) provided a comprehensive inventory of 828 landslide dams triggered by the 2008 earthquake in China. Furthermore, Liu et al. (2019, ESR; not cited in the manuscript) and Peng et al. (2012, Landslides) extended the temporal coverage of landslide dam and outburst flood databases in China to before 1400 AD, including parameters similar to those presented in this manuscript. Carlo et al. (2016, Engineering Geology; also not cited in the ms) also summarized approximately 300 landslide dam cases in Italy along with related parameters. These studies account for approximately two thirds of the total sample size of this manuscript, including most of the parameter information considered here. Parameters not included in those previous datasets, such as breach geometry, are provided only for a very limited number of scenarios.
Similarly, Cheng et al. (2025, ESSD, this one is rightly cited in the ms) published a global database of debris flow dams including 555 cases. Why most of their data are not included in the present database? what criteria have been used to discriminate?
It therefore seems that the overall contribution in terms of landslide dam data compilation is limited and not clearly explained in the framework of previous works.
Reply: We sincerely thank the reviewer for raising this critical point and for pointing out the excellent regional and historical synthesis works by Fan et al. (2021), Peng et al. (2012), Cheng et al.(2025), Liu et al. (2019) and Carlo et al. (2016). We completely agree that establishing our dataset's quantitative contribution within the context of these previous massive efforts is essential.
Regarding the overlap with existing databases and the missing citations, we would like to clarify our data retrieval and citation methodology, and how our work builds upon these foundational studies:
- Citation Strategy and Primary Source Traceability:During the compilation of our database, we extensively referenced the mainstream databases mentioned by the reviewer, i.e., the databases proposed by Fan et al. (2021), Peng et al. (2012), Cheng et al. (2025), Liu et al. (2019), and Carlo et al. (2016), to identify and cross-check landslide dam cases.To ensure absolute data traceability for future users, our initial citation strategy strictly prioritized the primary, original literature where the raw measurements were first reported, rather than the secondary synthesis papers that compiled them.Therefore, when referencing previous datasets, we consulted the original literature that those datasets had cited, verified each of those original sources, and cited the original papers rather than the dataset papers themselves.This can be confirmed by the reference section of the dataset. We did not cite all the relevant literature in the manuscript; however, as suggested by the reviewer, we should cite some of the previously related literature that includes datasets in the manuscript. In the revised manuscript, we have now explicitly cited these works in the Introduction and Discussion sections to properly acknowledge their immense contributions to regional and historical cataloging. It should be noted that Carlo et al. (2016) (Stefanelli C T, Segoni S, Casagli N, et al. Geomorphic indexing of landslide dams evolution[J]. Engineering Geology, 2016, 208: 1-10.) pointed out that the analysis in that paper was based on the dataset from the authors' 2015 article (Tacconi Stefanelli C, Catani F, Casagli N. Geomorphological investigations on landslide dams[J]. Geoenvironmental Disasters, 2015, 2(1): 21). Therefore, in Section 6.2 of our article, we have placed particular emphasis on citing the paper of Carlo et al. (2015).
- The True Quantitative Contribution,Standardization and DQF Integration: While it is true that a significant portion of our raw cases overlaps with previous regional (e.g., Carlo et al. 2016 for Italy; Fan et al. 2012 for the Wenchuan earthquake) or historical (Peng et al. 2012) inventories, ourcontribution of the dataset lies in providing a rich set of parameters for landslide dams, along with global synthesis and rigorous standardization.The cases in our database were screened based on the completeness of core parameters (as outlined in Figure 1 in the manuscript). Records were excluded if fewer than three physical parameters were available for a given case. These parameters could be any three variables selected from the following categories: geometric parameters, including dam volume, dam height, dam width, and dam length; material-composition parameters; breach parameters, including breach width, breach depth, peak breach discharge, and breach duration; and hydrological parameters, primarily the storage capacity of the dammed lake. In addition, few previous datasets ensure that each case contains at least three available physical parameters selected from the combined set of geometric variables, including dam volume, dam height, dam width, and dam length; material-composition variables; breach variables, including breach width, breach depth, peak breach discharge, and breach duration; and hydrological variables, primarily dammed-lake storage capacity. Moreover, in most previous datasets, individual cases largely lack breach parameters of landslide dams, such as breach width, depth, peak breach discharge, and breach duration. Our dataset specifically provides these data, which is another highlight of our work. Furthermore, previous landslide-dam records were scattered across different sources, formats, measurement units, and regional reporting conventions. We therefore did not merely aggregate existing cases; instead, we subjected them to a unified and strict spatial cross-referencing procedure and incorporated a Data Quality Flag (DQF) system. This approach transforms fragmented regional and historical records into a coherent, globally interoperable framework, while providing an explicit indication of data reliability and uncertainty for each documented event. Finally, although the completeness of some dynamic parameters remains relatively low because of objective observational limitations, incorporating these parameters into a standardized database structure is itself an important step forward. It provides a consistent framework for future updates and enables newly reported landslide-dam events to be integrated in a comparable and quality-controlled manner. We had already described the contributions of our dataset in the original manuscript, but to make it clearer to readers, we restated this content in Section 6.2, as follows:
“In stark contrast, this study establishes a comprehensive multi-dimensional framework encompassing 11 types of fundamental dam parameters and, crucially, 6 types of detailed breach parameters (including peak discharge, released water volume, breach duration, breach depth, and top/bottom widths). The contribution of our database lies in providing a rich set of parameters for landslide dams, along with global synthesis and rigorous standardization..The cases in our database were screened based on the completeness of core parameters (as outlined in Figure 1). Records were excluded if fewer than three physical parameters were available for a given case. These parameters could be any three variables selected from the following categories: geometric parameters, including dam volume, dam height, dam width, and dam length; material-composition parameters; breach parameters, including breach width, breach depth, peak breach discharge, and breach duration; and hydrological parameters, primarily the storage capacity of the dammed lake. Conversely, few previous datasets ensure that each case contains at least three available physical parameters selected from the combined set of geometric, material-composition, breach, and hydrological variables. Moreover, in most previous datasets, individual cases largely lack breach parameters of landslide dams, such as breach width, depth, peak breach discharge, and breach duration. Our dataset specifically provides these data, which is another highlight of our work. Also, our dataset establishes a standardized framework for landslide-dam records and provides a consistent basis for future updates, enabling newly reported events to be integrated in a comparable and quality-controlled manner.”
- Discrimination Criteria and Integration of Cheng et al. (2025): Regarding the excellent recent work on debris flow dams by Cheng et al. (2025), their publication timeline closely overlapped with the finalization of our dataset. During our cross-validation, cases from their database were subjected to our strict standardized screening protocol (as detailed in Figure 2 of our manuscript). Our criteria required a minimum threshold of at least threephysical parameters, irrespective of parameter category. The considered parameters included geometric variables, material-composition information, breach characteristics, and hydrological variables, such as dammed-lake storage capacity. Because debris flow dams often present highly transient and morphologically complex features, some historical cases in Cheng et al. (2025) lacked the specific minimum quantitative bounds required by our framework and were consequently filtered out to maintain the overall statistical rigor of our dataset.
However, we fully acknowledge that there are still valid cases within Cheng et al. (2025) and other recent literature that meet our criteria. As stated in our "Data Availability" section, this database is designed as a "living" repository. We are firmly committed to continually updating the database, and the systematic incorporation of newly published, high-quality cases from recent literature will be a primary focus of our upcoming database expansion (Version 2.0, the DOI:https://doi.org/10.5281/zenodo.20149544).
2.Qualitative contribution
The authors emphasize that the main contribution of this dataset lies in filling the gap in dynamic breach parameters within global landslide dam inventories, thereby distinguishing it from previous studies such as those by Shi et al (2022) and Wu et al (2022). However, upon examination of the dataset, it appears that the completeness of the more novel dynamic parameters, such as breach top width, bottom width, and breach duration, remains below 5%. This is understandable since these parameters typically rely on costly monitoring equipment or real time field observations, but maybe the expectation created on the reader about the “global” scale of the data set could be moderated. Instead, the authors could clearly explain the quantitative contribution of their dataset compared to similar studies, to help assessing the significance of this compilation.
Another interesting parameter added in this study is the particle size distribution of the dam, abbreviated as PSD. However, among the total of 902 cases, only 8 include this information. Interesting as these measures are, their availability seems insufficient for statistical or comparative hazard analyses across regions or dam types.
Finally, the manuscript mentions a “Data Quality Flag” (DQF) meant to constrain uncertainties. This DQF is highlighted in the abstract as “a point-by-point system incorporated into the dataset, transparently classifying the spatial, geometric, and hydrodynamic uncertainties for every cataloged event”. However, the last columns in the database spreadsheet simply shows three levels of uncertainty (“high”, “medium”, “low”) that are not described or discussed in the ms and they apply to three groups of parameters, not to each of the parameters separately. These DQF therefore remain of little use for the end user of the database.
I believe the limitations above should be either clearly stated or satisfactorily solved, to avoid false expectations.
Reply: We sincerely thank the reviewer for the detailed examination of our dataset. Our dataset encompasses 902 cases compiled on a global scale. The overall completeness of the parameters is distributed as follows: peak discharge is recorded for 249 cases (27.6%); dam height for 870 cases (96.5%); dam width for 763 cases (84.6%); and dam volume for 729 cases (80.8%). Regarding breach geometries, the breach depth is documented in 90 cases (10.0%).
The reviewer correctly pointed out that the completeness for parameters such as breach top width, bottom width, and breach duration remains below 5%. In fact, across the entire dataset of 21 parameters, only four parameters fall below the 5% threshold: breach top width (3.1%), breach bottom width (3.3%), breach duration (4.4%), and inflow rate (2.3%). We completely agree with the reviewer's insightful assessment that "these parameters typically rely on costly monitoring equipment or real-time field observations." Consequently, acquiring measured values for these highly transient variables is inherently challenging worldwide, resulting in relatively lower completeness rates. However, it is crucial to emphasize that the remaining 17 parameters boast a completeness rate well above 5%, with fundamental geometrical parameters exceeding 80% to 90%. Given this robust overarching data coverage and the comprehensive geographical distribution of the 902 cases, we believe the designation of a "global" scale dataset is both scientifically justified and appropriate.
Compared with previous databases, the contributions of our database are as follows: this study establishes a comprehensive multi-dimensional framework encompassing 11 types of fundamental dam parameters and, crucially, 6 types of detailed breach parameters (including peak discharge, released water volume, breach duration, breach depth, and top/bottom widths). The contribution of our dataset lies in providing a rich set of parameters for landslide dams, along with global synthesis and rigorous standardization. The cases in our database were screened based on the completeness of core parameters (as outlined in Figure 1). The contribution of our database lies in providing a rich set of parameters for landslide dams, along with global synthesis and rigorous standardization..The cases in our database were screened based on the completeness of core parameters (as outlined in Figure 1). Records were excluded if fewer than three physical parameters were available for a given case. These parameters could be any three variables selected from the following categories: geometric parameters, including dam volume, dam height, dam width, and dam length; material-composition parameters; breach parameters, including breach width, breach depth, peak breach discharge, and breach duration; and hydrological parameters, primarily the storage capacity of the dammed lake. Furthermore, few previous datasets ensure that each case contains at least three available physical parameters selected from the combined set of geometric, material-composition, breach, and hydrological variables. Moreover, in most previous datasets, individual cases largely lack breach parameters of landslide dams, such as breach width, depth, peak breach discharge, and breach duration. Our dataset specifically provides these data, which is another highlight of our work. Also, our dataset establishes a standardized framework for landslide-dam records and provides a consistent basis for future updates, enabling newly reported events to be integrated in a comparable and quality-controlled manner. In the manuscript, Figure 14 quantitatively visualizes this advancement: while the subplots for the reference databases (Figure 14b–e) exhibit massive data voids regarding dynamic and breaching characteristics, subplot (a) demonstrates unprecedentedly dense and continuous records for these highly transient variables. The systematic integration of both static morphologies and dynamic failure metrics effectively bridges a critical gap in the existing literature. By providing these essential boundary conditions, the presented dataset facilitates a crucial transition from static hazard cataloging to dynamic hydrodynamic routing and quantitative downstream risk modeling.
Although only eight cases in the dataset have Particle Size Distribution (PSD) data, based on the records in the original literature, we have qualitatively categorized the landslide dam materials into four types: Soil-dominated, Boulders dominant, Boulders mixed with soil, and Soil contains boulders. This classification is consistent with the classification of landslide dam materials in the Chinese industry code for emergency response and risk assessment of barrier lakes, i.e., the Code for risk classification and emergency measures of barrier lake (SL/T450-2021). The collection of PSD data for landslide dams is challenging due to factors such as field monitoring techniques and environmental conditions. Nevertheless, the four qualitative material types provided by our dataset can greatly facilitate the risk assessment of landslide dams. For example, in the Code for risk classification and emergency measures of barrier lake (SL/T450-2021), the qualitative classification of dam materials can be used to estimate the median particle size of the dam material and to evaluate the risk level of the landslide dam. This type of material classification information is absent from previous landslide dam datasets and represents one of the advantages of our dataset over previous ones.
We reclassified the data into four categories: Basic information, time parameters, geometric parameters, and hydrological parameters. These four categories of parameters cover all the data. To objectively manage the observational limitations and temporal survivorship biases inherent in historical records, this study implemented a category-based Data Quality Flag (DQF) system. This multi-dimensional framework systematically categorizes spatial location accuracy, geometric precision, and hydrodynamic reliability based on the technological era and measurement constraints of each event. By explicitly establishing these confidence intervals, the dataset provides a transparent and structured metric for data reliability, enabling researchers to apply physically meaningful boundary conditions. We have finished the Data Quality Flag (DQF) system and assigned DQF evaluation results to each of the four data categories. And we have removed the misleading "point-by-point" terminology in the abstract section. The DQF now functions as a robust filtering tool, enabling end-users to leverage our large sample size to selectively isolate high-fidelity data subsets (low/medium uncertainty) tailored to their specific modeling precision requirements.
Reference
Yang Q G, Cai Y J, Liu Z M. Code for risk classification and emergency measures of barrier lake (SL/T450-2021)[S]. 2021.(in Chinese)
3.L134: In Figure 4, define "existence time".
Fig 5b and L153-160: Define “uncertain parameters”. Where can this number be seen in the spreadsheet? The uncertainty is in that file binned in 3 sets of parameters (columns AD to AF).
Reply: We deeply appreciate your meticulous editorial guidance. "Existence time" refers to the longevity of landslide dams, which we have supplemented in Figure 4.
We sincerely apologize for the ambiguity in our terminology, which understandably caused confusion. We thank the reviewer for pointing this out.We would like to clarify that the "uncertain parameters" shown in Figure 5b are completely distinct from the DQF "uncertainty ratings" (located in columns AD to AF).
The "uncertain parameters" in the Fig 5b should be "range-qualitative parameters" in the context. It refers to specific data entries that are recorded as numerical ranges (e.g., "15–20 m") or qualitative descriptions, rather than exact, deterministic numbers. We have revised it in the Fig 5b.
To address your question regarding "Where can this number be seen in the spreadsheet?": These "uncertain parameters" do not have a dedicated column like the DQF ratings. Instead, they are simply the specific cell entries scattered within the regular parameter columns that contain a range or descriptive text instead of a single digit.
The primary purpose of plotting Figure 5b is to transparently demonstrate to the database users that although we included these non-exact records, their overall proportion across the database is actually very small. This figure was deliberately designed to alleviate any potential concerns regarding the quantitative precision of our database, proving that the vast majority of our collected data consists of exact values. The revised Fugure 5 is in the supplement.
4.L165: In this study, dam types are classified into four categories, namely sliding, collapses, flows, and unknown. This intuituive classification lacks some clear criteria or at least some discussion of the biases it could introduce, also to allow the reproducibility of the results.
Reply: We sincerely thank the reviewer for pointing this out. We completely agree that our initial intuitive classification lacked clear theoretical criteria, which could introduce subjective bias and hinder the reproducibility of the results.
To address this critical issue, we have abandoned the previous intuitive categorization. Instead, we have re-classified the primary dam-forming mechanisms based on the foundational and widely accepted mass-movement framework established by Costa and Schuster (1988). Specifically, the historical cases in our database have been rigorously re-evaluated and grouped into the following standardized categories:
- RSSS: Rock and debris avalanches; Rock and soil slumps and slides.
- UL: Undifferentiated landslides.
- MDEF: Mud, debris, and earth flows.
- FALL: Falls.
We have explicitly defined the inclusion criteria for each of these categories in the revised manuscript and updated Figure 6a accordingly. By adopting this established, process-based taxonomy, we have eliminated subjective interpretation, providing a transparent and rigorous basis that guarantees the full reproducibility of our dataset for future comparative studies. The revised Fugure 6 is in the supplement.
- L334: In Figure 13, panels a to c, especially panel c, the number of landslide dam cases shows a clear and rapid increase since around 1980. It is not very clear whether this trend reflects a real increase in events, or if it is mainly due to improvements in observation methods, such as the development of remote sensing and satellite imagery, as well as better reporting and documentation. I suggest the authors could provide further discussion on the possible reasons behind this trend.
Reply: We highly appreciate the reviewer for raising this insightful point. We completely agree that the sharp exponential increase in recorded cases since the 1980s is heavily influenced by observational bias, and leaving this unexplained could indeed mislead readers into assuming an exclusive surge in physical geohazard frequencies.
Following your excellent suggestion, we have added a dedicated discussion in “ 6.2 Benchmarking Against Existing Inventories” section to the revised manuscript to explicitly address the underlying reasons behind this trend. We now comprehensively discuss how the advent of civilian remote sensing, satellite imagery, and digital reporting has fundamentally enhanced global detection rates, while also briefly acknowledging the compounding physical factors such as climate-driven extreme weather and human infrastructure expansion.
We have added the following text to the revised manuscript :
It is important to note that the rapid increase in recorded cases since around 1980, consistently observed across panels (a) to (c) in Figure 13, should be interpreted with caution. This steep exponential trend primarily reflects significant improvements in observational methodologies rather than an exclusive sudden surge in physical geological events. The widespread application of civilian remote sensing and satellite imagery (e.g., the Landsat program) starting in the 1980s, coupled with the rapid advancement of digital databases and internet reporting, fundamentally enhanced the global detection and documentation rates of landslide dams, especially in remote mountainous regions. While climate-driven extreme weather and increased human infrastructure expansion into mountainous areas have indeed contributed to a real increase in geohazard frequencies, the pronounced spike in recent decades is heavily amplified by this observational bias.
- L335: Panel d of figure 13, the Costa database was originally published in 1991, yet the data shown in the figure appear to extend to 2020. A similar issue can also be observed in panel a. Is this underestimating the most recent columns? or are their values extrapolated to the complete period they represent?
Reply: We thank the reviewer for this insightful observation. Regarding the temporal range in Figure 13, we would like to clarify the following:
In the original version, we used a unified 40-year interval (1980–2020) for the final bin across all panels. While this bin accurately captured the records from the reference databases within their respective active periods. For example, Costa and Schuster’s dataset strictly ends in 1991, but the terminal year, 1991, was statistically grouped into the 1980–2020 bin, which led to the misunderstanding that the dataset extended to 2020.
To ensure absolute clarity and prevent any perceived overestimation, we have refined the temporal binning strategy in the revised Figure 13.
Instead of a fixed 40-year window for the final column, we have now implemented a customized truncation for each comparison. The final bin in each panel is now explicitly defined by the actual data cutoff year of each reference database (e.g., exactly 1980–1991 for panel d). Our own dataset has been synchronized to these exact same truncated periods to ensure a strictly "apples-to-apples" comparison.The revised Fugure 13 is in the supplement.
7. References not cited in the ms:
Tacconi Stefanelli et al. Geoenvironmental Disasters (2015) 2:21 DOI 10.1186/s40677-015-0030-9
Peng et al. , Landslides (2012) 9:13–31 DOI 10.1007/s10346-011-0271-y
Liu et al. 2019. Earth-Science Reviews 197 (2019) 102895. https://doi.org/10.1016/j.earscirev.2019.102895
Reply: We sincerely apologize for this oversight and thank the reviewer for carefully checking our reference list.
We have now added the missing citations for Tacconi Stefanelli et al. (2015), Peng et al. (2012), and Liu et al. (2019) into the appropriate sections of the revised manuscript. Accordingly, their full details have been properly incorporated into the Reference list.
-
AC2: 'Reply on RC1', Xiangang Jiang, 17 Jun 2026
Dear Referee, thank you very much for your keen interest in our global landslide dam database and for providing such a thorough, data-centric assessment. In this revised manuscript, we have comprehensively addressed your concerns by explicitly contextualizing our dataset against foundational historical databases and clarifying our rigorous screening criteria for data selection. We have also moderated expectations regarding dynamic parameters to instead emphasize our standardized, multi-dimensional framework for the 902 vetted records spanning 1800–2020. Furthermore, we enhanced the practical utility of the Data Quality Flag (DQF) system by explicitly defining its uncertainty levels, and replaced our previous intuitive categorization with geomorphologically established classifications. Below, please find our detailed, point-by-point responses to your valuable comments.
1.Quantitative contribution of this DB
Fan et al. (2021, ESR) compiled a global database of landslide dams and extended the record to earlier historical periods. Fan et al. (2012, ESR) provided a comprehensive inventory of 828 landslide dams triggered by the 2008 earthquake in China. Furthermore, Liu et al. (2019, ESR; not cited in the manuscript) and Peng et al. (2012, Landslides) extended the temporal coverage of landslide dam and outburst flood databases in China to before 1400 AD, including parameters similar to those presented in this manuscript. Carlo et al. (2016, Engineering Geology; also not cited in the ms) also summarized approximately 300 landslide dam cases in Italy along with related parameters. These studies account for approximately two thirds of the total sample size of this manuscript, including most of the parameter information considered here. Parameters not included in those previous datasets, such as breach geometry, are provided only for a very limited number of scenarios.
Similarly, Cheng et al. (2025, ESSD, this one is rightly cited in the ms) published a global database of debris flow dams including 555 cases. Why most of their data are not included in the present database? what criteria have been used to discriminate?
It therefore seems that the overall contribution in terms of landslide dam data compilation is limited and not clearly explained in the framework of previous works.
Reply: We sincerely thank the reviewer for raising this critical point and for pointing out the excellent regional and historical synthesis works by Fan et al. (2021), Peng et al. (2012), Cheng et al.(2025), Liu et al. (2019) and Carlo et al. (2016). We completely agree that establishing our dataset's quantitative contribution within the context of these previous massive efforts is essential.
Regarding the overlap with existing databases and the missing citations, we would like to clarify our data retrieval and citation methodology, and how our work builds upon these foundational studies:
1. Citation Strategy and Primary Source Traceability:During the compilation of our database, we extensively referenced the mainstream databases mentioned by the reviewer, i.e., the databases proposed by Fan et al. (2021), Peng et al. (2012), Cheng et al. (2025), Liu et al. (2019), and Carlo et al. (2016), to identify and cross-check landslide dam cases. To ensure absolute data traceability for future users, our initial citation strategy strictly prioritized the primary, original literature where the raw measurements were first reported, rather than the secondary synthesis papers that compiled them. Therefore, when referencing previous datasets, we consulted the original literature that those datasets had cited, verified each of those original sources, and cited the original papers rather than the dataset papers themselves.This can be confirmed by the reference section of the dataset. We did not cite all the relevant literature in the manuscript; however, as suggested by the reviewer, we should cite some of the previously related literature that includes datasets in the manuscript. In the revised manuscript, we have now explicitly cited these works in the Introduction and Discussion sections to properly acknowledge their immense contributions to regional and historical cataloging. It should be noted that Carlo et al. (2016) (Stefanelli C T, Segoni S, Casagli N, et al. Geomorphic indexing of landslide dams evolution[J]. Engineering Geology, 2016, 208: 1-10.) pointed out that the analysis in that paper was based on the dataset from the authors' 2015 article (Tacconi Stefanelli C, Catani F, Casagli N. Geomorphological investigations on landslide dams[J]. Geoenvironmental Disasters, 2015, 2(1): 21). Therefore, in Section 6.2 of our article, we have placed particular emphasis on citing the paper of Carlo et al. (2015).
2. The True Quantitative Contribution, Standardization and DQF Integration: While it is true that a significant portion of our raw cases overlaps with previous regional (e.g., Carlo et al. 2016 for Italy; Fan et al. 2012 for the Wenchuan earthquake) or historical (Peng et al. 2012) inventories, our contribution of the dataset lies in providing a rich set of parameters for landslide dams, along with global synthesis and rigorous standardization.The cases in our database were screened based on the completeness of core parameters. Records were excluded if fewer than three physical parameters were available for a given case. These parameters could be any three variables selected from the following categories: geometric parameters, including dam volume, dam height, dam width, and dam length; material-composition parameters; breach parameters, including breach width, breach depth, peak breach discharge, and breach duration; and hydrological parameters, primarily the storage capacity of the dammed lake. In addition, few previous datasets ensure that each case contains at least three available physical parameters selected from the combined set of geometric variables, including dam volume, dam height, dam width, and dam length; material-composition variables; breach variables, including breach width, breach depth, peak breach discharge, and breach duration; and hydrological variables, primarily dammed-lake storage capacity. Moreover, in most previous datasets, individual cases largely lack breach parameters of landslide dams, such as breach width, depth, peak breach discharge, and breach duration. Our dataset specifically provides these data, which is another highlight of our work. Furthermore, previous landslide-dam records were scattered across different sources, formats, measurement units, and regional reporting conventions. We therefore did not merely aggregate existing cases; instead, we subjected them to a unified and strict spatial cross-referencing procedure and incorporated a Data Quality Flag (DQF) system. This approach transforms fragmented regional and historical records into a coherent, globally interoperable framework, while providing an explicit indication of data reliability and uncertainty for each documented event. Finally, although the completeness of some dynamic parameters remains relatively low because of objective observational limitations, incorporating these parameters into a standardized database structure is itself an important step forward. It provides a consistent framework for future updates and enables newly reported landslide-dam events to be integrated in a comparable and quality-controlled manner. We had already described the contributions of our dataset in the original manuscript, but to make it clearer to readers, we restated this content in Section 6.2, as follows:
“Compared with the reference inventories, this study provides a multi-dimensional framework that includes 11 types of fundamental dam parameters and 6 types of breach-related parameters, including peak discharge, released water volume, breach duration, breach depth, and breach top/bottom widths. The contribution of the database lies in expanding the archived parameter set for landslide dams while applying consistent parameter definitions, harmonized units, and data-quality flags.The cases in our database were screened based on the completeness of parameters. Records were excluded if fewer than three physical parameters were available for a given case. These parameters could be any three variables selected from the following categories: geometric parameters, including dam volume, dam height, dam width, and dam length; material-composition parameters; breach parameters, including breach width, breach depth, peak breach discharge, and breach duration; and hydrological parameters, primarily the storage capacity of the dammed lake. Conversely, few previous datasets ensure that each case contains at least three available physical parameters selected from the combined set of geometric, material-composition, breach, and hydrological variables. Moreover, many previous datasets contain limited information on breach-related variables, such as breach width, breach depth, peak discharge, and breach duration. The present inventory incorporates these breach-related variables in a standardized format, thereby improving cross-case comparison of documented breach processes. Also, our dataset establishes a standardized framework for landslide-dam records and provides a consistent basis for future updates, enabling newly reported events to be integrated in a comparable and quality-controlled manner.”
3. Discrimination Criteria and Integration of Cheng et al. (2025): Regarding the excellent recent work on debris flow dams by Cheng et al. (2025), their publication timeline closely overlapped with the finalization of our dataset. During our cross-validation, cases from their database were subjected to our strict standardized screening protocol (as detailed in Figure 2 of our manuscript). Our criteria required a minimum threshold of at least three physical parameters, irrespective of parameter category. The considered parameters included geometric variables, material-composition information, breach characteristics, and hydrological variables, such as dammed-lake storage capacity. Because debris flow dams often present highly transient and morphologically complex features, some historical cases in Cheng et al. (2025) lacked the specific minimum quantitative bounds required by our framework and were consequently filtered out to maintain the overall statistical rigor of our dataset.
However, we fully acknowledge that there are still valid cases within Cheng et al. (2025) and other recent literature that meet our criteria. As stated in our "Data Availability" section, this database is designed as a "living" repository. We are firmly committed to continually updating the database, and the systematic incorporation of newly published, high-quality cases from recent literature will be a primary focus of our upcoming database expansion (Version 4.0, the DOI:https://doi.org/10.5281/zenodo.20728356).
2.Qualitative contribution
The authors emphasize that the main contribution of this dataset lies in filling the gap in dynamic breach parameters within global landslide dam inventories, thereby distinguishing it from previous studies such as those by Shi et al (2022) and Wu et al (2022). However, upon examination of the dataset, it appears that the completeness of the more novel dynamic parameters, such as breach top width, bottom width, and breach duration, remains below 5%. This is understandable since these parameters typically rely on costly monitoring equipment or real time field observations, but maybe the expectation created on the reader about the “global” scale of the data set could be moderated. Instead, the authors could clearly explain the quantitative contribution of their dataset compared to similar studies, to help assessing the significance of this compilation.
Another interesting parameter added in this study is the particle size distribution of the dam, abbreviated as PSD. However, among the total of 902 cases, only 8 include this information. Interesting as these measures are, their availability seems insufficient for statistical or comparative hazard analyses across regions or dam types.
Finally, the manuscript mentions a “Data Quality Flag” (DQF) meant to constrain uncertainties. This DQF is highlighted in the abstract as “a point-by-point system incorporated into the dataset, transparently classifying the spatial, geometric, and hydrodynamic uncertainties for every cataloged event”. However, the last columns in the database spreadsheet simply shows three levels of uncertainty (“high”, “medium”, “low”) that are not described or discussed in the ms and they apply to three groups of parameters, not to each of the parameters separately. These DQF therefore remain of little use for the end user of the database.
I believe the limitations above should be either clearly stated or satisfactorily solved, to avoid false expectations.
Reply: We sincerely thank the reviewer for the detailed examination of our dataset. Our dataset encompasses 902 cases compiled on a global scale. The overall completeness of the parameters is distributed as follows: peak discharge is recorded for 231 cases (25.6 %); dam height for 877 cases (97 %); dam width for 778 cases (86.3 %); and dam volume for 738 cases (81.8 %). Regarding breach geometries, the breach depth is documented in 89 cases (9.9 %).
The reviewer correctly pointed out that the completeness for parameters such as breach top width, bottom width, and breach duration remains below 5 %. In fact, across the entire dataset of 24 parameters, only five parameters fall below the 5% threshold: breach top width (3.2 %), breach bottom width (3.4 %), breach duration (4.7 %),bed slope (4.4 %) and inflow rate (2.5 %). We completely agree with the reviewer's insightful assessment that "these parameters typically rely on costly monitoring equipment or real-time field observations." Consequently, acquiring measured values for these highly transient variables is inherently challenging worldwide, resulting in relatively lower completeness rates. However, it is crucial to emphasize that the remaining 17 parameters boast a completeness rate well above 5%, with fundamental geometrical parameters exceeding 80 % to 90 %. Given this robust overarching data coverage and the comprehensive geographical distribution of the 902 cases, we believe the designation of a "global" scale dataset is both scientifically justified and appropriate.
Compared with previous databases, the contributions of our database are as follows: this study establishes a comprehensive multi-dimensional framework encompassing 11 types of fundamental dam parameters and, crucially, 6 types of detailed breach parameters (including peak discharge, released water volume, breach duration, breach depth, and top/bottom widths). The contribution of our dataset lies in providing a rich set of parameters for landslide dams, along with global synthesis and rigorous standardization. The cases in our database were screened based on the completeness of core parameters. The contribution of our database lies in providing a rich set of parameters for landslide dams, along with global synthesis and rigorous standardization.The cases in our database were screened based on the completeness of core parameters . Records were excluded if fewer than three physical parameters were available for a given case. These parameters could be any three variables selected from the following categories: geometric parameters, including dam volume, dam height, dam width, and dam length; material-composition parameters; breach parameters, including breach width, breach depth, peak breach discharge, and breach duration; and hydrological parameters, primarily the storage capacity of the dammed lake. Furthermore, few previous datasets ensure that each case contains at least three available physical parameters selected from the combined set of geometric, material-composition, breach, and hydrological variables. Moreover, in most previous datasets, individual cases largely lack breach parameters of landslide dams, such as breach width, depth, peak breach discharge, and breach duration. Our dataset specifically provides these data, which is another highlight of our work. Also, our dataset establishes a standardized framework for landslide-dam records and provides a consistent basis for future updates, enabling newly reported events to be integrated in a comparable and quality-controlled manner. In the manuscript, Figure 14 quantitatively visualizes this advancement: while the subplots for the reference databases (Figure 14b–e) exhibit massive data voids regarding dynamic and breaching characteristics, subplot (a) demonstrates unprecedentedly dense and continuous records for these highly transient variables. The systematic integration of both static morphologies and dynamic failure metrics effectively bridges a critical gap in the existing literature. By providing these essential boundary conditions, the presented dataset facilitates a crucial transition from static hazard cataloging to dynamic hydrodynamic routing and quantitative downstream risk modeling.
Particle-size distribution information for landslide-dam materials is difficult to obtain because it usually requires field sampling, laboratory testing, or detailed post-event investigation, which are unavailable for many historical or remote cases. In the revised inventory, we therefore rechecked the original literature and supplemented the available particle-size distribution records, increasing the number of cases with such information from 8 to 46. The particle size distribution parameter is a length parameter, and its values in the dataset are all in millimeters. Particle size distribution information is available for 46 cases, corresponding to approximately 5.0 % of the 902 compiled records. We have added the related contents in Section 3.1.
Furthermore, we have qualitatively categorized the landslide dam materials into four types: Soil-dominated, Boulders dominant, Boulders mixed with soil, and Soil contains boulders. This classification is consistent with the classification of landslide dam materials in the Chinese industry code for emergency response and risk assessment of barrier lakes, i.e., the Code for risk classification and emergency measures of barrier lake (SL/T450-2021). The collection of PSD data for landslide dams is challenging due to factors such as field monitoring techniques and environmental conditions. Nevertheless, the four qualitative material types provided by our dataset can greatly facilitate the risk assessment of landslide dams. For example, in the Code for risk classification and emergency measures of barrier lake (SL/T450-2021), the qualitative classification of dam materials can be used to estimate the median particle size of the dam material and to evaluate the risk level of the landslide dam. This type of material classification information is absent from previous landslide dam datasets and represents one of the advantages of our dataset over previous ones.
We reclassified the data into four categories: Basic information, time parameters, geometric parameters, and hydrological parameters. These four categories of parameters cover all the data. To objectively manage the observational limitations and temporal survivorship biases inherent in historical records, this study implemented a category-based Data Quality Flag (DQF) system. This multi-dimensional framework systematically categorizes spatial location accuracy, geometric precision, and hydrodynamic reliability based on the technological era and measurement constraints of each event. By explicitly establishing these confidence intervals, the dataset provides a transparent and structured metric for data reliability, enabling researchers to apply physically meaningful boundary conditions. We have finished the Data Quality Flag (DQF) system and assigned DQF evaluation results to each of the four data categories. And we have removed the misleading "point-by-point" terminology in the abstract section. The DQF now functions as a robust filtering tool, enabling end-users to leverage our large sample size to selectively isolate high-fidelity data subsets (low/medium uncertainty) tailored to their specific modeling precision requirements.
Reference
Yang Q G, Cai Y J, Liu Z M. Code for risk classification and emergency measures of barrier lake (SL/T450-2021)[S]. 2021.(in Chinese)
3.L134: In Figure 4, define "existence time".
Fig 5b and L153-160: Define “uncertain parameters”. Where can this number be seen in the spreadsheet? The uncertainty is in that file binned in 3 sets of parameters (columns AD to AF).
Reply: We deeply appreciate your meticulous editorial guidance. "Existence time" refers to the longevity of landslide dams, which we have supplemented in Figure 4.
We sincerely apologize for the ambiguity in our terminology, which understandably caused confusion. We thank the reviewer for pointing this out.We would like to clarify that the "uncertain parameters" shown in Figure 5b are completely distinct from the DQF "uncertainty ratings" (located in columns AD to AF).
The "uncertain parameters" in the Fig 5b should be "range-qualitative parameters" in the context. It refers to specific data entries that are recorded as numerical ranges (e.g., "15–20 m") or qualitative descriptions, rather than exact, deterministic numbers. We have revised it in the Fig 5b.
To address your question regarding "Where can this number be seen in the spreadsheet?": These "uncertain parameters" do not have a dedicated column like the DQF ratings. Instead, they are simply the specific cell entries scattered within the regular parameter columns that contain a range or descriptive text instead of a single digit.
4.L165: In this study, dam types are classified into four categories, namely sliding, collapses, flows, and unknown. This intuituive classification lacks some clear criteria or at least some discussion of the biases it could introduce, also to allow the reproducibility of the results.
Reply: We sincerely thank the reviewer for pointing this out. We completely agree that our initial intuitive classification lacked clear theoretical criteria, which could introduce subjective bias and hinder the reproducibility of the results.
To address this critical issue, we have abandoned the previous intuitive categorization. Instead, we have re-classified the primary dam-forming mechanisms based on the foundational and widely accepted mass-movement framework established by Costa and Schuster (1988). Specifically, the historical cases in our database have been rigorously re-evaluated and grouped into the following standardized categories:
1.RSSS: Rock and debris avalanches; Rock and soil slumps and slides.
2.UL: Undifferentiated landslides.
3.MDEF: Mud, debris, and earth flows.
4.FALL: Falls.
We have explicitly defined the inclusion criteria for each of these categories in the revised manuscript and updated Figure 6a accordingly. By adopting this established, process-based taxonomy, we have eliminated subjective interpretation, providing a transparent and rigorous basis that guarantees the full reproducibility of our dataset for future comparative studies. The revised Fugure 6 is in the supplement.
5.L334: In Figure 13, panels a to c, especially panel c, the number of landslide dam cases shows a clear and rapid increase since around 1980. It is not very clear whether this trend reflects a real increase in events, or if it is mainly due to improvements in observation methods, such as the development of remote sensing and satellite imagery, as well as better reporting and documentation. I suggest the authors could provide further discussion on the possible reasons behind this trend.
Reply: We highly appreciate the reviewer for raising this insightful point. We completely agree that the sharp exponential increase in recorded cases since the 1980s is heavily influenced by observational bias, and leaving this unexplained could indeed mislead readers into assuming an exclusive surge in physical geohazard frequencies.
Following your excellent suggestion, we have added a dedicated discussion in “ 6.2 Benchmarking Against Existing Inventories” section to the revised manuscript to explicitly address the underlying reasons behind this trend. We have accepted the advent of civilian remote sensing, satellite imagery, and digital reporting has fundamentally enhanced global detection rates, while also briefly acknowledging the compounding physical factors such as climate-driven extreme weather and human infrastructure expansion.
We have added the following text to the revised manuscript :
“It is important to note that the rapid increase in recorded cases since around 1980, consistently observed across panels (a) to (c) in Figure 13. The increase in documented records in recent decades should be interpreted in the context of improved source availability, remote-sensing capacity, digital reporting, and post-event documentation, rather than necessarily an exclusive sudden surge in physical geological events. The widespread application of civilian remote sensing and satellite imagery (e.g., the Landsat program), coupled with the rapid advancement of digital databases and internet reporting, fundamentally enhanced the global detection and documentation rates of landslide dams, especially in remote mountainous regions. While climate-driven extreme weather events and expanding human activities in mountainous areas have objectively raised the actual incidence of landslide dams, the explosive growth observed in recent decades is largely governed by the amplification of observational biases and does not fully map the true evolutionary trend of landslide dam frequency. Thus, the temporal comparison in Figure 13 is used here to evaluate record coverage and data structure among inventories, while emphasizing the parameter completeness and usability of the present database.”
6. L335: Panel d of figure 13, the Costa database was originally published in 1991, yet the data shown in the figure appear to extend to 2020. A similar issue can also be observed in panel a. Is this underestimating the most recent columns? or are their values extrapolated to the complete period they represent?
Reply: We thank the reviewer for this insightful observation. Regarding the temporal range in Figure 13, we would like to clarify the following:
In the original version, we used a unified 40-year interval (1980–2020) for the final bin across all panels. While this bin accurately captured the records from the reference databases within their respective active periods. For example, Costa and Schuster’s dataset strictly ends in 1991, but the terminal year, 1991, was statistically grouped into the 1980–2020 bin, which led to the misunderstanding that the dataset extended to 2020.
To ensure absolute clarity and prevent any perceived overestimation, we have refined the temporal binning strategy in the revised Figure 13.
Instead of a fixed 40-year window for the final column, we have now implemented a customized truncation for each comparison. The final bin in each panel is now explicitly defined by the actual data cutoff year of each reference database (e.g., exactly 1980–1991 for panel d). Our own dataset has been synchronized to these exact same truncated periods to ensure a strictly "apples-to-apples" comparison.The revised Fugure 13 is in the supplement.
7. References not cited in the ms:
Tacconi Stefanelli et al. Geoenvironmental Disasters (2015) 2:21 DOI 10.1186/s40677-015-0030-9
Peng et al. , Landslides (2012) 9:13–31 DOI 10.1007/s10346-011-0271-y
Liu et al. 2019. Earth-Science Reviews 197 (2019) 102895. https://doi.org/10.1016/j.earscirev.2019.102895
Reply: We sincerely apologize for this oversight and thank the reviewer for carefully checking our reference list.
We have now added the missing citations for Tacconi Stefanelli et al. (2015), Peng et al. (2012), and Liu et al. (2019) into the appropriate sections of the revised manuscript. Accordingly, their full details have been properly incorporated into the Reference list.
-
AC1: 'Reply on RC1', Xiangang Jiang, 13 May 2026
-
RC2: 'Comment on essd-2026-107', Anonymous Referee #2, 27 May 2026
This manuscript presents a new global landslide dam inventory containing 902 documented cases worldwide and aims to address the long-standing lack of dynamic breach-related parameters in existing datasets. The database integrates geomorphological, hydrological, temporal, and breach-related variables and introduces a Data Quality Flag (DQF) framework to characterize uncertainties associated with historical records and observational limitations. The topic is highly relevant to the communities of geomorphology, hydrology, geohazards, and risk assessment, and the effort devoted to compiling and standardizing such a dataset is appreciated. The dataset itself is potentially valuable for future statistical analyses, susceptibility assessment, breach modeling, and machine-learning applications.
Although the database construction effort is commendable, the current manuscript still exhibits several important shortcomings regarding scientific rigor, methodological transparency, uncertainty quantification, and demonstration of novelty relative to existing inventories. In its current form, the manuscript reads more as a descriptive summary of compiled parameters than a rigorously validated Earth System Science Data contribution. Substantial revisions are therefore required before the manuscript can be considered for publication. Some suggestions are provided as follows:
(1) The manuscript repeatedly emphasizes that the main novelty lies in the inclusion of breach-related parameters; however, the actual completeness of these key variables remains extremely low. For example, breach top width and breach bottom width are available for only ~3 % of the cases, released water volume and breach duration for ~5 %, and breach depth for ~10 %. Therefore, the claim that the database “bridges the data gap” may currently be overstated. The manuscript should more carefully distinguish between “including rare parameters” and “providing statistically representative global coverage” for those parameters.
(2) The methodological description of data acquisition and integration remains too general for a data paper in ESSD. The manuscript states that approximately 2,000 records were compiled and filtered into 902 cases, but the exact workflow for literature retrieval, duplicate identification, parameter extraction, conflict resolution, and source prioritization is insufficiently reproducible. For example, it is unclear how contradictory values from different references were handled, whether priority was given to field observations over secondary compilations, and how derived versus directly measured parameters were distinguished.
(3) The uncertainty framework (DQF system) is conceptually useful, but it remains largely qualitative and subjective. The assignment of uncertainty levels based mainly on historical period (e.g., pre-1950, post-2000) oversimplifies the actual variability in data quality. Some historical events may have excellent post-event reconstructions, while some modern cases may still rely on indirect estimation. The manuscript should clarify whether uncertainty flags were assigned manually case-by-case or automatically using predefined rules, and whether inter-review consistency checks were performed.
(4) The temporal increase in landslide dam occurrence is interpreted partly as evidence of climate-change-driven hazard escalation. However, the observed increase is very likely dominated by reporting bias, improved remote sensing, population growth, and enhanced scientific documentation. The manuscript currently risks overinterpreting the temporal trend as a physical increase in hazard occurrence. This interpretation should be significantly toned down unless supported by a rigorous bias-corrected statistical analysis.
(5) Some important variables listed in Table 1 are ambiguously defined. For example, “dam length,” “dam width,” and “breach depth” may have different definitions across studies. The manuscript should provide strict parameter definitions and explain whether geometric variables refer to crest length, valley-blocking width, longitudinal extent, vertical incision depth, or other interpretations.
(6) The manuscript states that “particle size distribution” is included as a parameter in Table 1; however, this variable is barely discussed elsewhere in the manuscript, and its completeness, classification standard, and representation format remain unclear.
Some minor suggestions:
(7) The manuscript is generally readable, but the writing is excessively verbose in many sections and repeatedly uses highly promotional language. Expressions such as “high-fidelity,” “robust,” “highly reliable,” “unprecedentedly dense,” and “solid empirical foundation” appear too frequently and should be moderated to maintain scientific objectivity.
(8) Figure 1 only presents parameter completeness percentages but does not show the actual number of valid records. Including both percentages and sample counts would improve interpretability.
(9) In Figure 5, the term “parameter certainty” is somewhat misleading because the figure actually represents the number of uncertain parameters rather than certainty itself.
(10) Several references appear duplicated or inconsistent. For example, Fan et al. (2021a) and Fan et al. (2021b) seem to refer to the same publication.
(11) Some terminology requires standardization. For example, “released water volume,” “outburst volume,” and “released flood volume” may refer to similar concepts but should use consistent wording.
(12) Table 1 would be clearer if parameter definitions or abbreviations were standardized across datasets rather than relying solely on the original naming conventions.
Citation: https://doi.org/10.5194/essd-2026-107-RC2 -
AC3: 'Reply on RC2', Xiangang Jiang, 17 Jun 2026
Dear Referee, thank you very much for your careful assessment of our manuscript and for providing such detailed, data-focused, and constructive comments. In this revised manuscript, we have addressed your concerns by clarifying the scope and contribution of the global landslide-dam inventory, refining the parameter definitions and terminology, improving the description of data screening and uncertainty assessment, and moderating interpretations of temporal trends and sparsely reported variables.The dataset has been updated; please see the V4 version (https://doi.org/10.5281/zenodo.20728356) for the revised data. We believe that these revisions have improved the transparency, reproducibility, and practical usability of the dataset. Below, please find our detailed point-by-point responses to your valuable comments.
(1) The manuscript repeatedly emphasizes that the main novelty lies in the inclusion of breach-related parameters; however, the actual completeness of these key variables remains extremely low. For example, breach top width and breach bottom width are available for only ~3 % of the cases, released water volume and breach duration for ~5 %, and breach depth for ~10 %. Therefore, the claim that the database “bridges the data gap” may currently be overstated. The manuscript should more carefully distinguish between “including rare parameters” and “providing statistically representative global coverage” for those parameters.
Reply: We thank the reviewer for this important and constructive comment. We agree that the completeness of several breach-related variables is low and that these parameters should not be presented as statistically representative global samples. In the revised manuscript, we have therefore moderated the relevant statements and more clearly distinguished between the inclusion and preservation of rare breach-related parameters and the provision of statistically representative global coverage for those parameters.
Specifically, we revised the Sect. 2.2 to explicitly acknowledge that transient breach-related variables, such as breach top width, breach bottom width, released water volume, breach duration, and breach depth, remain sparsely documented because they are difficult to observe during short-lived dam-failure processes and are rarely available in historical records. The content of Section 2.2 below is the revised version:
“2.2Parameter completeness and variable composition
To transparently illustrate the data abundance of the constructed global landslide-dam inventory, Figure 2 quantifies the data completeness, expressed as the non-empty percentage, of the parameters across the 902 vetted cases. The statistical distribution reveals a clear hierarchy in data availability, ranging from statistically well-represented parameters to rare but systematically archived breach parameters. The statistically well-represented component is dominated by fundamental dam-geometry parameters. Specifically, dam height shows the highest completeness, reaching approximately 97.2 %, followed by dam type (97 %), dam length (87 %), date of formation (86 %), dam width (84 %), and dam volume (82 %). These variables describe the basic morphology of the dam body and provide sufficient records for descriptive statistics and categorical summaries within the inventory. In addition, a group of supporting contextual parameters shows moderate to high completeness and helps characterize the environmental setting and general attributes of the documented landslide dams. Parameters such as storage capacity (72 %), trigger factors (69 %), catchment area (61 %), and material composition (61%) provide essential information on impounded-lake conditions, dam classification, triggering mechanisms, and dam-material characteristics. Their relatively high completeness supports categorical grouping and comparative summaries within the inventory.
Figure 2. Completeness of parameters in the global landslide-dam inventory. Bar lengths indicate the percentage completeness of each parameter among the 902 compiled cases, and labels at the end of the bars indicate the corresponding number of valid records. The letter n represents the number of landslide dam cases.
Beyond these statistically well-represented and supporting contextual parameters, the inventory also incorporates rare but systematically archived breach and transient hydrological variables that have historically been sparsely documented in global landslide-dam datasets. Peak discharge is recorded for 231 cases, providing an important archived sample of outflow magnitude within the present inventory. existence time (longevity of landslide dams), lake surface area, and water depth are available for 231, 182, and 176 cases, respectively, and provide useful information on dam longevity and impounded-lake conditions. The database also records several detailed breach-process variables and particle size parameters of landslide dam materials, including date of failure (n = 89), breach depth (n = 89), mean river runoff (n = 66), released water volume (n = 47), particle size distribution (n=46), breach duration (n = 42), bed slope (n = 40), breach bottom width (n = 31), breach top width (n = 29), and inflow rate during dam failure (n = 21). Although the data for these variables are limited, they play an important role in describing the breach geometry, flow forces, discharge conditions, the dam failure process, and the grain size condition. By incorporating these rare breach variables, the inventory enables landslide-dam breach analyses to consider a broader set of process-relevant information rather than relying only on basic dam geometry or general event descriptions. Although these parameters are not yet available for all cases, their systematic compilation and standardization improve the usability of information scattered across case reports, technical documents, and post-event investigations. These records provide a structured basis for comparing well-documented breach cases, supporting selected empirical analyses, and guiding future updates of the inventory.”
We also clarified that these sparse parameters may support targeted case-based comparisons, uncertainty-aware exploratory analyses, and selected dam-breach modeling studies, but should not be used as globally representative samples for generalized statistical inference without appropriate caution. Accordingly, we have reduced or removed overgeneralized expressions such as “bridges the data gap” where they could imply complete or representative coverage, and replaced them with more precise wording emphasizing the structured archival and standardized integration of historically scarce breach-related information.
As the reviewer noted, 'the claim that the database “bridges the data gap” may currently be overstated.' We have revised the paper title to 'An Enhanced Global Database of Landslide Dams: Integrating Geomorphic, Hydrological, Temporal, and Breach-Related Parameters.'
(2) The methodological description of data acquisition and integration remains too general for a data paper in ESSD. The manuscript states that approximately 2,000 records were compiled and filtered into 902 cases, but the exact workflow for literature retrieval, duplicate identification, parameter extraction, conflict resolution, and source prioritization is insufficiently reproducible. For example, it is unclear how contradictory values from different references were handled, whether priority was given to field observations over secondary compilations, and how derived versus directly measured parameters were distinguished.
Reply: We thank the reviewer for this constructive comment. We agree that the original methodological description was too general and insufficiently reproducible for a data paper. In the revised manuscript, we have substantially expanded Sect. 2.1 to describe the full workflow of data acquisition, source tracing, duplicate identification, parameter extraction, conflict resolution, source prioritization, and parameter-completeness screening. The revised text now explains that the initial pool of approximately 2,000 records was compiled from existing global and regional inventories, peer-reviewed academic papers, engineering reports, historical news, official institutional reports, and remote-sensing interpretations. Literature retrieval was conducted through keyword-based searches, source-dataset tracing, and reference tracking. For cases obtained from existing inventories, the original references cited by those inventories were checked whenever available to trace event descriptions and parameter values back to primary or more detailed sources.
We also clarified that duplicate identification was performed at the event level through cross-comparison of formation time, available failure time, reported blockage location, dam geometry, lake characteristics, peak discharge, and other available attributes, rather than by coordinates, river names, or trigger events alone. Records were merged only when they could be traced to the same landslide-dam event, as indicated by the same or highly consistent formation time, the same blockage location, and identical or mutually consistent key attributes.
For parameter extraction and conflict resolution, we revised the manuscript to explain that numerical and categorical information was extracted from tables, text descriptions, figures, supplementary materials, existing datasets, and the original references cited by those datasets whenever available. When multiple sources reported different values for the same parameter, the differences were examined in relation to the parameter definition and reporting context. Different values were not treated automatically as errors, because they may reflect different reporting conventions, such as maximum, minimum, mean, median, or range values. Priority was therefore given to sources that provided detailed parameter descriptions, clear measurement or reconstruction contexts, richer associated attributes, or explicit explanations of how the value was obtained.
In response to the reviewer’s concern regarding directly measured and derived data, we further clarified that directly measured values were generally prioritized over derived or secondary values when both types of information were available. Field-survey measurements, instrumental observations, and values traceable to original post-event investigations were treated as preferred records. Derived values, including empirical estimates, model outputs, hydraulic back-analyses, or values recalculated from other reported parameters, were used only when directly measured records were unavailable. Derived values were usually identifiable from descriptions of the calculation procedure, empirical formula, reconstruction method, or modeling approach. For values reported in secondary compilations, the cited original references were checked whenever possible to determine whether the value was directly measured or derived.
Finally, we clarified that the final inventory was obtained by applying a minimum parameter-completeness threshold. Records were retained only when at least three important parameters were available for a given case. Records were excluded if fewer than three physical parameters were available for a given case. These parameters could be any three variables selected from the following categories: geometric parameters, including dam volume, dam height, dam width, and dam length; material-composition parameters; breach parameters, including breach width, breach depth, peak breach discharge, and breach duration; and hydrological parameters, primarily the storage capacity of the dammed lake. Through this revised workflow, the data acquisition and integration procedure is now more transparent and reproducible. The following are the revision:
“2.1 Standardized screening and quality control
To improve the consistency, traceability, and transparency of the dataset, we implemented a multi-stage standardization and quality-control protocol (Figure 1). An initial pool of approximately 2,000 preliminary records was compiled by integrating existing global and regional inventories, peer-reviewed academic papers, historical news, engineering and official institutional reports, and remote-sensing interpretations. Literature retrieval was conducted using a combination of keyword-based searches, source-dataset tracing, and reference tracking. Peer-reviewed studies were searched using combinations of terms such as “landslide dam”, “barrier lake”, “quake lake”, “dammed lake”, “outburst flood”, “dam breach”, “breach discharge”, and “breach geometry”. For cases obtained from existing inventories, the original references cited by those inventories were further checked whenever available, so that event descriptions and parameter values could be traced back to primary or more detailed sources where possible. Additional contextual information was supplemented from engineering reports, historical news articles, official institutional reports, and remote-sensing interpretations.
The initial filtering phase focused on event-level deduplication and spatial verification. Because historical landslide dams are often reported under different local names, translated names, or basin-scale descriptions, duplicate identification was not based solely on the coordinates reported in the literature, river names, or triggering events. Potential duplicate records were therefore evaluated at the event level by comparing formation time, available failure time, reported blockage location, dam geometry, lake characteristics, peak discharge, and other available attributes. Records were merged only when they could be traced to the same landslide-dam event, as indicated by the same or consistent formation time, the same blockage location, and identical or mutually consistent key attributes. In contrast, records from the same earthquake, basin, river reach, or landslide cluster were retained as independent events when the source literature indicated that multiple landslide dams had formed, or when their formation time, location, or physical attributes differed. Descriptions in the original literature regarding the number of landslide dams formed within the same basin or triggering event were also used as an important basis for distinguishing independent cases from duplicate records. For records with conflicting coordinates for the same historical event, the most plausible blockage location was identified using high-resolution satellite imagery, topographic context, and source descriptions where available.
Figure 1. Methodological flowchart for the compilation of the global landslide-dam inventory, illustrating the main workflow from multi-source data collection to parameter extraction, case screening and georeferencing.
Parameter extraction was conducted for every event. Numerical and categorical information was extracted from tables, text descriptions, figures, supplementary materials, existing datasets, and the original references cited by those datasets whenever available. All extracted numerical values were converted to the standardized units adopted in this inventory. When a source reported a numerical range rather than a single value, the range information was retained and recorded as a range-based entry. Qualitative descriptions were incorporated only when their physical meaning could be assigned to one of the standardized database fields.
When multiple sources reported different values for the same parameter, the differences were first examined in relation to the parameter definition and reporting context. In some cases, different values did not necessarily represent errors but reflected different reporting conventions, such as maximum, minimum, mean, median, or range values. Priority was therefore given to sources that provided detailed parameter descriptions, clear measurement or reconstruction contexts, richer associated attributes, or explicit explanations of how the value was obtained. In addition, directly measured values were generally prioritized over derived or secondary values when both types of information were available. Field-survey measurements, instrumental observations, and values traceable to original post-event investigations were treated as preferred records. Derived values, including empirical estimates, model outputs, hydraulic back-analyses, or values recalculated from other reported parameters, were used only when directly measured records were unavailable. Derived values were usually identifiable from descriptions of the calculation procedure, empirical formula, reconstruction method, or modeling approach. For values reported in secondary compilations, the cited original references were checked whenever possible to determine whether the value was directly measured or derived.
A minimum parameter-completeness threshold was then applied to improve the usability of the database for geomorphological and hydrological analyses. Records were excluded if fewer than three important parameters were available for a given case. These parameters could be selected from geometric variables, including dam volume, dam height, dam width, and dam length; material-composition information; breach-related variables, including breach width, breach depth, peak discharge, and breach duration; and hydrological variables, primarily the storage capacity of the dammed lake. Through this workflow, the initial raw data pool was refined into a standardized inventory of 902 landslide-dam cases.
After the parameter-completeness screening, georeferencing was conducted for all retained cases. Because coordinates reported in different sources may differ, and some historical records do not provide explicit coordinates, the geographic location of each case was checked and recorded as a final step. Where necessary, Google Earth, high-resolution satellite imagery, topographic context, and source descriptions were used to verify the blockage location or to infer the most plausible location from available geographic information. The final longitude and latitude were then recorded in the standardized inventory.”
(3) The uncertainty framework (DQF system) is conceptually useful, but it remains largely qualitative and subjective. The assignment of uncertainty levels based mainly on historical period (e.g., pre-1950, post-2000) oversimplifies the actual variability in data quality. Some historical events may have excellent post-event reconstructions, while some modern cases may still rely on indirect estimation. The manuscript should clarify whether uncertainty flags were assigned manually case-by-case or automatically using predefined rules, and whether inter-review consistency checks were performed.
Reply: We thank the reviewer for this important comment. We agree that uncertainty in heterogeneous landslide-dam records cannot be adequately represented by a simple historical-period classification. In the revised manuscript, we have substantially revised Sect. 6.1 to clarify the logic, classification rules, uncertainty-review procedure, and implementation of the Data Quality Flag (DQF) framework.
Specifically, we clarified that the DQF levels were assigned manually at the parameter level using predefined category-specific rules, rather than automatically according to historical period alone. For each parameter value, the source description, measurement or reconstruction method, reporting precision, spatial or temporal specificity, and traceability to original references were jointly considered. Historical or technological period was retained as an important supporting criterion, particularly for geometrical parameters, but it was not used as the sole determinant of uncertainty. This clarification addresses the reviewer’s concern that some historical events may have reliable post-event reconstructions, whereas some recent cases may still rely on indirect estimates or incomplete reporting.
We also revised the manuscript to retain and clarify the explicit Low, Medium, and High uncertainty criteria for the four parameter groups, including basic information, time parameters, geometrical parameters, and hydrological parameters. For basic information, the uncertainty levels are mainly related to coordinate precision, with thresholds of < 1 km, 1–10 km, and > 10 km. For time parameters, the levels distinguish records with dates accurate to a specific day, records limited to a year, season, or month, and records based on geological dating or ancient texts. For geometrical parameters, the original period-based categories, namely Post-2000, 1950–1999, and Pre-1950, were retained as supporting criteria, but we clarified that these categories were considered together with measurement method, source type, spatial resolution, and level of methodological detail. For hydrological parameters, the uncertainty levels were evaluated according to whether values were supported by direct in-situ or instrumental measurements, empirical equations, hydraulic back-analysis, flood-mark leveling, historical damage descriptions, or paleo-flood sediment evidence.
Regarding uncertainty review and inter-review consistency, we added clarification that the DQF criteria were first discussed among the authors to ensure that the same classification rules were applied throughout the database. After the initial parameter-level assignment, ambiguous or borderline uncertainty flags were reviewed by multiple authors through re-examination of the original sources, parameter definitions, measurement or reconstruction context, and supporting evidence. When different interpretations occurred, the final uncertainty level was determined after discussion and source re-examination. If the available evidence was insufficient to justify a lower-uncertainty classification, a higher uncertainty level was assigned.
These revisions make the DQF framework more transparent and reproducible while acknowledging that it remains a qualitative but structured parameter-level uncertainty indicator for heterogeneous historical and modern landslide-dam records. The followings are the revision:
“6.1 Data Uncertainty and Observational Limitations
Although rigorous standardized screening protocols were implemented during the database construction, quantifying the inherent uncertainties within historical landslide-dam records remains critical for the objective application of this dataset. Given the extensive temporal span of the inventory (pre-2020), many historical and pre-instrumental records intrinsically lack exact numerical error margins. To objectively manage the observational limitations and temporal survivorship biases inherent in historical records, this study implemented a category-based Data Quality Flag (DQF) system. We systematically classified the relevant database parameters into four distinct groups, arranged sequentially from left to right in the dataset: basic information, time parameters, geometrical parameters, and hydrological parameters. Crucially, all database fields, excluding event identifiers, are subjected to this uncertainty rating. Event identifiers—specifically Dam name, Country (or Location), and Reference—are strictly treated as factual metadata ([Non-DQF]). Because these fields merely serve as textual labels for case identification rather than physically meaningful variables utilized in subsequent geohazard modeling, they are completely excluded from the classification and assessment.
The DQF levels were assigned manually at the parameter level using predefined category-specific rules, rather than automatically according to historical period alone. For each parameter value, the source description, measurement or reconstruction method, reporting precision, spatial or temporal specificity, and traceability to original references were jointly considered. Historical or technological period was retained as an important supporting criterion, particularly for geometrical parameters, but it was not used as the sole determinant of uncertainty. This is because some historical events may have reliable post-event reconstructions, whereas some recent records may still rely on indirect estimates or incomplete reporting. To improve inter-review consistency, the DQF criteria were discussed among the authors and applied using the same classification rules throughout the database. Ambiguous or borderline cases were revisited by multiple authors through source re-examination, with particular attention to the original sources, parameter definitions, measurement or reconstruction context, and supporting evidence. When different interpretations occurred, the final flag was assigned after discussion and source re-examination. If the available evidence was insufficient to justify a lower-uncertainty classification, a higher uncertainty level was assigned.
Regarding “Uncertainty basic information,” which evaluates the precision of geographical coordinates based on data sources and geocoding resolution, “Low (High Precision)” indicates exact coordinates validated by modern GPS or high-resolution remote sensing (error < 1 km); “Medium (Moderate Precision)” reflects approximate coordinates inferred from landmarks or historical maps (error 1 to 10 km); and “High (Low Precision)” represents vague spatial positioning reliant on qualitative historical texts or large-scale paleo-geomorphological inferences without specific geocoding (error > 10 km).
For “Uncertainty time parameters,” a “Low” rating denotes highly precise spatial and temporal records featuring exact latitude and longitude coordinates along with formation dates accurate to a specific day (YYYY-MM-DD), typically derived from modern monitoring or high-resolution remote sensing; a “Medium” rating applies to records with clear spatial locations but exhibiting temporal ambiguity, such as timing limited to a specific year, season, or month; and a “High” rating characterizes records with vague locations or geological/prehistoric timing based on dating techniques, such as “ka BP”, or ancient texts.
For “Uncertainty geometrical parameters,” the technological era was used as a supporting proxy for measurement accuracy together with the measurement method, source type, spatial resolution, and level of methodological detail. A “Low” rating (Post-2000) generally indicates dimensions extracted from high-resolution DEMs, UAV photogrammetry, LiDAR data, or clearly documented modern field measurements; a “Medium” rating (1950–1999) indicates parameters derived from early aerial photography, standard cartography, topographic maps, or post-event reconstructions with moderate methodological detail; and a “High” rating (Pre-1950) denotes dimensions relying heavily on historical archives, qualitative descriptions, approximate estimates, or paleo-geomorphological reconstructions. Where the documented measurement method or reconstruction evidence clearly supported a different uncertainty level, the flag was adjusted manually rather than assigned mechanically by period alone.
Finally, for “Uncertainty hydrological parameters,” which evaluates transient variables, a “Low” rating is reserved for exceptional cases with direct measurements from in-situ gauges or clearly documented instrumental observations; a “Medium” rating applies to values rigorously back-calculated using empirical equations, hydraulic back-analysis, or post-event flood-mark leveling; and a “High” rating is assigned to qualitative estimates broadly derived from historical damage descriptions or paleo-flood sediment deposits with inherently wide error margins.
By providing multi-dimensional DQF metadata, the inventory allows users to assess the relative reliability of different parameter groups and to apply case-specific filters according to their research objectives. The DQF should be interpreted as a categorical indicator of source quality, parameter availability, and documentation detail, rather than as a formal quantitative error estimate. The overall distribution of DQF levels indicates clear differences in data quality among parameter groups. Time parameters show the lowest apparent uncertainty, with 95.34 % of the records assigned a “Low” flag and only 0.11 % assigned a “High” flag. In contrast, hydrological parameters are characterized by substantially higher uncertainty, with 74.72 % of the records assigned a “High” flag and no records assigned a “Low” flag. This reflects the difficulty of directly observing transient breach-related flow variables, such as peak discharge, released water volume, and breach duration, especially for historical events. Basic information is mainly assigned “Medium” uncertainty, whereas geometrical parameters show a larger proportion of “High” uncertainty, reflecting differences in source detail, measurement methods, geocoding precision, post-event surveys, and the availability of topographic or remote-sensing data. Therefore, the DQF metadata provide a structured means of identifying records and parameter groups with different reliability levels, supporting more appropriate use of the inventory in comparative analyses, statistical summaries, and model calibration or validation.”
(4) The temporal increase in landslide dam occurrence is interpreted partly as evidence of climate-change-driven hazard escalation. However, the observed increase is very likely dominated by reporting bias, improved remote sensing, population growth, and enhanced scientific documentation. The manuscript currently risks overinterpreting the temporal trend as a physical increase in hazard occurrence. This interpretation should be significantly toned down unless supported by a rigorous bias-corrected statistical analysis.
Reply: We thank the reviewer for this important comment. We agree that the temporal increase in documented landslide-dam records should not be interpreted as a climate-change-driven increase in physical occurrence without rigorous bias correction. In the revised manuscript, we have toned down the relevant interpretation and removed statements that could imply direct attribution of the temporal pattern to climate-driven hazard escalation.
The Introduction has been revised to acknowledge possible climate-related influences on slope instability while also emphasizing that the growing number of documented records in recent decades reflects improved remote-sensing capacity, expanded reporting networks, and increased scientific attention. In Sect. 4.1, the temporal distribution is now described primarily as the chronology of documented database records, with explicit attention to temporal heterogeneity in observational capability, historical record preservation, and data availability, as follows:
“This increase in recorded events is likely influenced by multiple factors, including improved scientific reporting, greater availability of historical and engineering documents, population expansion into mountain regions, enhanced disaster documentation, and the development of aerial photography, satellite remote sensing, DEM products, and digital data archiving. Climatic and environmental changes, such as increases in extreme rainfall, glacier retreat, permafrost degradation, and related slope instability in some mountain regions, may also contribute to changes in landslide-dam occurrence. However, the present inventory does not apply a bias-corrected statistical analysis capable of separating these physical signals from reporting and observation biases. Therefore, the temporal distribution presented here should be interpreted as a record of documented landslide-dam cases and data availability through time, rather than as a direct measure of the global temporal trend in landslide-dam occurrence.”
We also revised the discussion of Figure 13 so that it focuses on temporal coverage and cross-dataset comparison rather than on inferring a physical occurrence trend. These revisions clarify that the manuscript does not use the temporal distribution to attribute changes in landslide-dam occurrence to climate forcing or other physical drivers.
- Some important variables listed in Table 1 are ambiguously defined. For example, “dam length,” “dam width,” and “breach depth” may have different definitions across studies. The manuscript should provide strict parameter definitions and explain whether geometric variables refer to crest length, valley-blocking width, longitudinal extent, vertical incision depth, or other interpretations.
Reply: We thank the reviewer for this helpful comment. We agree that several geometric variables in landslide-dam studies can be ambiguously defined across different sources, especially terms such as dam length, dam width, and breach depth. To address this issue, we have added Table 1 to provide stricter definitions, physical interpretations, and measurement descriptions for all archived parameters, as follows:
“
Table 1. Database parameter definitions.
Parameter
Definition
Location
The specific country where the landslide dam is located.
Bed slope
The average longitudinal gradient of the original riverbed at the dam site.
Dam type
The classification of the landslide dam based on the primary geological hazard that directly triggered its formation.
Reference
The original literature from which the case data were extracted.
Casualties
The direct or indirect human fatalities and injuries resulting from the formation or breaching of the landslide dam.
Dam name
The specific appellation used in historical literature or official reports (e.g., Tangjiashan landslide dam). If a single hazard event forms multiple landslide dams within the same basin and the literature only provides geographical descriptions without specific names, they are named and numbered sequentially based on their geographical location and river basin (e.g., [River Name] Landslide Dam 1, 2, 3).
Trigger factors
The external dynamic forces (e.g., rainfall, earthquake, snowmelt) that induced the slope failure or mass movement resulting in river blockage.
Material composition
The primary geological materials constituting the dam body.
Particle size distribution
Reported grain-size range or representative particle-size values of the dam materials.
Longitude and latitude
The precise geographical coordinates of the landslide dam.
Breach duration
The elapsed time of the physical breaching process, measured from the onset of significant overflow or breach initiation to the point when the water level stabilizes.
Existence time
The total longevity of the dam, measured in days from the complete blockage of the river to the occurrence of breaching or artificial mitigation.
Date of failure
The specific date (year, month, day) when the landslide dam breached.
Date of formation
The specific date (year, month, day) when the slope failure occurred and blocked the river.
Dam height
The maximum vertical relief of the dam body, measured from the lowest point of the original riverbed at the blockage section to the lowest overtopping point or effective crest elevation before failure or artificial breaching.
Dam width
The cross-valley extent of the dam body measured approximately perpendicular to the original river channel, from one valley side to the other across the blockage. This field corresponds to the valley-blocking or transverse dimension of the dam rather than the longitudinal extent along the river.
Dam length
The longitudinal extent of the dam body measured approximately parallel to the original river channel, representing the upstream–downstream thickness of the blockage along the valley floor. This field is distinct from the cross-valley dam length.
Breach depth
The maximum vertical incision depth of the breach channel, measured from the pre-failure dam crest or overtopping level to the final breach invert after the breaching event.
Breach top width
The width of the eroded breach channel measured at or near the pre-failure dam crest elevation after breaching.
Breach bottom width
The width of the eroded breach channel measured at the lowest breach bottom after breaching.
Dam volume
The total volume of the landslide or avalanche mass constituting the blockage dam.
Water depth
The average water level (or average water depth) of the barrier lake prior to failure or under stable conditions.
Catchment area
The total upstream drainage area contributing surface runoff to the control cross-section of the landslide dam.
Lake surface area
The maximum water surface area when the barrier lake reaches its highest water level prior to failure.
Inflow rate
The discharge of the upstream river flowing into the lake area during the formation of the landslide lake or prior to failure.
Mean runoff of river
The multi-year average flow rate of the blocked river under normal conditions or based on historical records.
Peak discharge
The maximum instantaneous flow rate of the water released through the breach channel during the dam failure process.
Storage capacity
The total volume of water impounded within the barrier lake prior to failure.
Released water volume
The total volume of impounded lake water discharged downstream through the breach channel during the dam-failure process.
”
In the revised Table 1, the potentially ambiguous geometric variables are now explicitly defined according to their physical orientation relative to the original river channel, dam body, and breach channel. Specifically, dam width is defined as the cross-valley extent of the dam body measured approximately perpendicular to the original river channel, from one valley side to the other across the blockage. This definition corresponds to the valley-blocking or transverse dimension of the dam rather than the longitudinal extent along the river. Dam length is defined as the longitudinal extent of the dam body measured approximately parallel to the original river channel, representing the upstream–downstream thickness of the blockage along the valley floor. Breach depth is defined as the maximum vertical incision depth of the breach channel, measured from the pre-failure dam crest or overtopping level to the final breach invert after the breaching event.
We also clarified in the text that when different terminologies were adopted in the original references, the contextual descriptions, figures, tables, and methodological explanations were examined to determine the actual physical meaning of each parameter before assigning it to the corresponding standardized database field. These revisions clarify whether each geometric variable refers to a valley-blocking width, longitudinal extent, crest-related geometry, or vertical incision depth, and improve the reproducibility of parameter interpretation across heterogeneous source documents.
(6) The manuscript states that “particle size distribution” is included as a parameter in Table 1; however, this variable is barely discussed elsewhere in the manuscript, and its completeness, classification standard, and representation format remain unclear.
Reply:We thank the reviewer for pointing out this ambiguity. We agree that the previous manuscript did not sufficiently explain the terminology, completeness, representation format, and intended use of the particle-size-related information. In the revised manuscript, we have standardized the terminology for particle size distribution in the relevant parts of the manuscript, especially in Figure 4 and Table 1.
We also revised the manuscript to clarify that particle size distribution is retained as a supplementary material-related field in the Basic Information module, rather than as a parameter used for case screening or parameter-completeness filtering. This is because particle size distribution information is relatively complex and is reported inconsistently across the source documents. Depending on the original source, particle size distribution may include grain-size ranges or magnitude, and representative particle-size values such as d5, d10, d16, etc. To improve the completeness of this field, we rechecked the source documents and supplemented the available Particle size distribution records. As a result, the number of cases containing particle size distribution information increased from 8 in the previous version to 46 in the revised inventory, corresponding to approximately 5.0 % of the 902 compiled cases. This information is now reported as supplementary grain-size information where available.
The related contents have been added to Section 3.1 as follows:
“Particle size distribution is retained as a supplementary material-related field in the Basic Information module. In this inventory, particle size distribution records provide available grain-size information of dam materials, including reported grain-size ranges or magnitudes and representative particle-size values, such as d5, d10, d16, etc. The particle size distribution parameter is a length parameter, and its values in the dataset are all in millimeters. Particle size distribution information is available for 46 cases, corresponding to approximately 5.0 % of the 902 compiled records.”
(7) The manuscript is generally readable, but the writing is excessively verbose in many sections and repeatedly uses highly promotional language. Expressions such as “high-fidelity,” “robust,” “highly reliable,” “unprecedentedly dense,” and “solid empirical foundation” appear too frequently and should be moderated to maintain scientific objectivity.
Reply: We thank the reviewer for this helpful comment. We agree that the previous manuscript contained several overly promotional and verbose expressions. In the revised manuscript, we carefully reviewed the text and moderated the language throughout the Abstract, Introduction, Results, Discussion, and Conclusions.
Terms such as “high-fidelity,” “robust,” “highly reliable,” “unprecedentedly dense,” and “solid empirical foundation” were removed or replaced with more neutral wording, including “standardized,” “documented,” “structured,” “screened,” “sparse but archived,” and “parameter availability,” depending on the context. We also shortened several long sentences and revised statements that could imply overconfidence in the completeness, representativeness, or applicability of the dataset. These changes improve readability and help maintain a more objective scientific tone.
(8) Figure 1 only presents parameter completeness percentages but does not show the actual number of valid records. Including both percentages and sample counts would improve interpretability.
Reply: We thank the reviewer for this helpful suggestion. We agree that showing only percentage completeness may make it difficult for readers to evaluate the actual sample size available for each parameter. In the revised manuscript, we have updated Figure 1 to include both the completeness percentage and the corresponding number of valid records for each parameter.
Specifically, the bar lengths in Figure 1 now indicate the percentage completeness of each parameter among the 902 compiled cases, while the labels at the end of the bars indicate the actual number of valid records. We also revised the Figure 1 caption accordingly to make this representation explicit. This modification improves the interpretability of parameter completeness and allows readers to directly assess both relative completeness and absolute sample size.
(9) In Figure 5, the term “parameter certainty” is somewhat misleading because the figure actually represents the number of uncertain parameters rather than certainty itself.
Reply: We thank the reviewer for this helpful comment. We agree that “parameter certainty” was a misleading term, because Figure 5b actually represents the number of parameters recorded as numerical ranges or qualitative descriptions rather than exact values.
In the revised manuscript, we replaced this terminology with “range-qualitative parameters” and revised the Figure 5 title, caption, and corresponding text accordingly. The revised text now defines these parameters as variables recorded as numerical ranges or qualitative descriptions rather than exact deterministic values. This clarification makes clear that Figure 5b shows the number of non-exact parameter records within each case, rather than parameter certainty itself.
(10)Several references appear duplicated or inconsistent. For example, Fan et al. (2021a) and Fan et al. (2021b) seem to refer to the same publication.
.Reply:We thank the reviewer for pointing out this inconsistency. We have carefully checked the reference list and the corresponding in-text citations in the revised manuscript. The duplicated Fan et al. (2021a/b) entries, which referred to the same publication, have been merged into a single reference as Fan et al. (2021), and all related in-text citations have been updated accordingly. We also reviewed the remaining references to remove duplicated entries and ensure consistency between the manuscript citations and the reference list.
(11) Some terminology requires standardization. For example, “released water volume,” “outburst volume,” and “released flood volume” may refer to similar concepts but should use consistent wording.
Reply:We thank the reviewer for pointing out this terminology inconsistency. We agree that terms such as “released water volume,” “outburst volume,” and “released flood volume” may refer to similar hydrological quantities but should not be used interchangeably without clear definition.
In the revised manuscript, we have standardized this terminology throughout the text, figures, and tables. We now consistently use “released water volume” to denote the total volume of impounded lake water discharged downstream through the breach channel during the dam-failure process. The parameter definition in Table 1 has also been revised accordingly to ensure that the term is used consistently and unambiguously across the manuscript.
We also checked other related hydrological terms to avoid confusion between released water volume and peak discharge. In the revised text, peak discharge refers to the maximum instantaneous flow rate during dam failure, whereas released water volume refers to the total discharged water volume. This standardization improves the clarity and consistency of the database terminology.
(12) Table 1 would be clearer if parameter definitions or abbreviations were standardized across datasets rather than relying solely on the original naming conventions.
Reply:We thank the reviewer for this helpful suggestion. We agree that parameter names and abbreviations should be standardized across datasets rather than relying solely on the original naming conventions. In the revised manuscript, Table 1 now provides unified parameter names, definitions, physical interpretations, and measurement descriptions for the present inventory.
When original sources or reference datasets used different terms, the parameters were matched according to their physical meanings rather than by exact wording. Contextual descriptions, figures, tables, and methodological explanations were examined to determine the corresponding standardized database field. We also standardized related terminology throughout the manuscript and the cross-dataset comparison, such as consistently using “released water volume” for the total volume of water discharged through the breach channel and reserving “peak discharge” for the maximum instantaneous flow rate. These revisions improve the clarity and reproducibility of the parameter dictionary and cross-dataset comparison.
-
AC3: 'Reply on RC2', Xiangang Jiang, 17 Jun 2026
Data sets
Bridging the Data Gap: An Enhanced Global Inventory for Statistical Characterization and Breach Prediction of Landslide Dams Jiangang Jiang, Tao Wen, and Guoqiang Xiao https://doi.org/10.5281/zenodo.19198720
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 439 | 112 | 31 | 582 | 24 | 28 |
- HTML: 439
- PDF: 112
- XML: 31
- Total: 582
- BibTeX: 24
- EndNote: 28
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
In spite of the effort involved in putting together this database for landslide-dam breach episodes, it remains unclear the amount of novel data and the clarity of the relationship between this dataset and similar previously published datasets. Below I raise some concerns about the completeness and antecedents of the database under scrutiny that I think must be taken into account:
General points:
Fan et al. (2021, ESR) compiled a global database of landslide dams and extended the record to earlier historical periods. Fan et al. (2012, ESR) provided a comprehensive inventory of 828 landslide dams triggered by the 2008 earthquake in China. Furthermore, Liu et al. (2019, ESR; not cited in the manuscript) and Peng et al. (2012, Landslides) extended the temporal coverage of landslide dam and outburst flood databases in China to before 1400 AD, including parameters similar to those presented in this manuscript. Carlo et al. (2016, Engineering Geology; also not cited in the ms) also summarized approximately 300 landslide dam cases in Italy along with related parameters. These studies account for approximately two thirds of the total sample size of this manuscript, including most of the parameter information considered here. Parameters not included in those previous datasets, such as breach geometry, are provided only for a very limited number of scenarios.
Similarly, Cheng et al. (2025, ESSD, this one is rightly cited in the ms) published a global database of debris flow dams including 555 cases. Why most of their data are not included in the present database? what criteria have been used to discriminate?
It therefore seems that the overall contribution in terms of landslide dam data compilation is limited and not clearly explained in the framework of previous works.
The authors emphasize that the main contribution of this dataset lies in filling the gap in dynamic breach parameters within global landslide dam inventories, thereby distinguishing it from previous studies such as those by Shi et al (2022) and Wu et al (2022). However, upon examination of the dataset, it appears that the completeness of the more novel dynamic parameters, such as breach top width, bottom width, and breach duration, remains below 5%. This is understandable since these parameters typically rely on costly monitoring equipment or real time field observations, but maybe the expectation created on the reader about the “global” scale of the data set could be moderated. Instead, the authors could clearly explain the quantitative contribution of their dataset compared to similar studies, to help assessing the significance of this compilation.
Another interesting parameter added in this study is the particle size distribution of the dam, abbreviated as PSD. However, among the total of 902 cases, only 8 include this information. Interesting as these measures are, their availability seems insufficient for statistical or comparative hazard analyses across regions or dam types.
Finally, the manuscript mentions a “Data Quality Flag” (DQF) meant to constrain uncertainties. This DQF is highlighted in the abstract as “a point-by-point system incorporated into the dataset, transparently classifying the spatial, geometric, and hydrodynamic uncertainties for every cataloged event”. However, the last columns in the database spreadsheet simply shows three levels of uncertainty (“high”, “medium”, “low”) that are not described or discussed in the ms and they apply to three groups of parameters, not to each of the parameters separately. These DQF therefore remain of little use for the end user of the database.
I believe the limitations above should be either clearly stated or satisfactorily solved, to avoid false expectations.
Minor points:
L134: In Figure 4, define "existence time".
Fig 5b and L153-160: Define “uncertain parameters”. Where can this number be seen in the spreadsheet? The uncertainty is in that file binned in 3 sets of parameters (columns AD to AF).
L165: In this study, dam types are classified into four categories, namely sliding, collapses, flows, and unknown. This intuituive classification lacks some clear criteria or at least some discussion of the biases it could introduce, also to allow the reproducibility of the results.
L334: In Figure 13, panels a to c, especially panel c, the number of landslide dam cases shows a clear and rapid increase since around 1980. It is not very clear whether this trend reflects a real increase in events, or if it is mainly due to improvements in observation methods, such as the development of remote sensing and satellite imagery, as well as better reporting and documentation. I suggest the authors could provide further discussion on the possible reasons behind this trend.
L335: Panel d of figure 13, the Costa database was originally published in 1991, yet the data shown in the figure appear to extend to 2020. A similar issue can also be observed in panel a. Is this underestimating the most recent columns? or are their values extrapolated to the complete period they represent?
References not cited in the ms: