Bridging the Data Gap: An Enhanced Global Inventory for Statistical Characterization and Hazard Assessment of Landslide Dams
Abstract. Landslide dams and their subsequent outburst floods represent cascading geohazards with profound socio-economic and morphological impacts. However, the widespread absence of dynamic breaching parameters in existing global inventories severely constrains quantitative hydrodynamic modeling and downstream risk assessment. To bridge this critical data void, this study presents a comprehensive global landslide dam dataset encompassing 902 rigorously vetted events spanning before 2020. Moving beyond traditional static cataloging, the assembled dataset integrates 11 fundamental morphological and triggering parameters with 6 highly transient breaching metrics. Notably, it significantly improves the data availability of historically scarce variables, including peak discharge, released water volume, and three-dimensional breach geometries. Spatially, the database achieves global coverage, with the highest data densities clustered within the Alpine-Himalayan and Circum-Pacific active belts. To objectively account for observational limitations and chronological biases across different technological eras, a point-by-point Data Quality Flag (DQF) system is incorporated into the dataset, transparently classifying the spatial, geometric, and hydrodynamic uncertainties for every cataloged event. This multi-dimensional and structurally transparent inventory provides a robust empirical foundation for future machine-learning-based hazard susceptibility mapping and physically-based dam-breach simulations. The dataset is publicly available at Zenodo https://doi.org/10.5281/zenodo.19198720 (Jiang et al. 2026).
In spite of the effort involved in putting together this database for landslide-dam breach episodes, it remains unclear the amount of novel data and the clarity of the relationship between this dataset and similar previously published datasets. Below I raise some concerns about the completeness and antecedents of the database under scrutiny that I think must be taken into account:
General points:
Fan et al. (2021, ESR) compiled a global database of landslide dams and extended the record to earlier historical periods. Fan et al. (2012, ESR) provided a comprehensive inventory of 828 landslide dams triggered by the 2008 earthquake in China. Furthermore, Liu et al. (2019, ESR; not cited in the manuscript) and Peng et al. (2012, Landslides) extended the temporal coverage of landslide dam and outburst flood databases in China to before 1400 AD, including parameters similar to those presented in this manuscript. Carlo et al. (2016, Engineering Geology; also not cited in the ms) also summarized approximately 300 landslide dam cases in Italy along with related parameters. These studies account for approximately two thirds of the total sample size of this manuscript, including most of the parameter information considered here. Parameters not included in those previous datasets, such as breach geometry, are provided only for a very limited number of scenarios.
Similarly, Cheng et al. (2025, ESSD, this one is rightly cited in the ms) published a global database of debris flow dams including 555 cases. Why most of their data are not included in the present database? what criteria have been used to discriminate?
It therefore seems that the overall contribution in terms of landslide dam data compilation is limited and not clearly explained in the framework of previous works.
The authors emphasize that the main contribution of this dataset lies in filling the gap in dynamic breach parameters within global landslide dam inventories, thereby distinguishing it from previous studies such as those by Shi et al (2022) and Wu et al (2022). However, upon examination of the dataset, it appears that the completeness of the more novel dynamic parameters, such as breach top width, bottom width, and breach duration, remains below 5%. This is understandable since these parameters typically rely on costly monitoring equipment or real time field observations, but maybe the expectation created on the reader about the “global” scale of the data set could be moderated. Instead, the authors could clearly explain the quantitative contribution of their dataset compared to similar studies, to help assessing the significance of this compilation.
Another interesting parameter added in this study is the particle size distribution of the dam, abbreviated as PSD. However, among the total of 902 cases, only 8 include this information. Interesting as these measures are, their availability seems insufficient for statistical or comparative hazard analyses across regions or dam types.
Finally, the manuscript mentions a “Data Quality Flag” (DQF) meant to constrain uncertainties. This DQF is highlighted in the abstract as “a point-by-point system incorporated into the dataset, transparently classifying the spatial, geometric, and hydrodynamic uncertainties for every cataloged event”. However, the last columns in the database spreadsheet simply shows three levels of uncertainty (“high”, “medium”, “low”) that are not described or discussed in the ms and they apply to three groups of parameters, not to each of the parameters separately. These DQF therefore remain of little use for the end user of the database.
I believe the limitations above should be either clearly stated or satisfactorily solved, to avoid false expectations.
Minor points:
L134: In Figure 4, define "existence time".
Fig 5b and L153-160: Define “uncertain parameters”. Where can this number be seen in the spreadsheet? The uncertainty is in that file binned in 3 sets of parameters (columns AD to AF).
L165: In this study, dam types are classified into four categories, namely sliding, collapses, flows, and unknown. This intuituive classification lacks some clear criteria or at least some discussion of the biases it could introduce, also to allow the reproducibility of the results.
L334: In Figure 13, panels a to c, especially panel c, the number of landslide dam cases shows a clear and rapid increase since around 1980. It is not very clear whether this trend reflects a real increase in events, or if it is mainly due to improvements in observation methods, such as the development of remote sensing and satellite imagery, as well as better reporting and documentation. I suggest the authors could provide further discussion on the possible reasons behind this trend.
L335: Panel d of figure 13, the Costa database was originally published in 1991, yet the data shown in the figure appear to extend to 2020. A similar issue can also be observed in panel a. Is this underestimating the most recent columns? or are their values extrapolated to the complete period they represent?
References not cited in the ms: