Reply on RC2

This manuscript pulls in several interesting global datasets to try to add more data and a global perspective to the existing literature on wildfire and landslides. Currently, there are a few relatively large challenges for the manuscript that lead to a lack of clarity, generally. I will point out several of these challenges and potential solutions that might help the authors to refine their description to enhance clarity and ultimately usability of the results.

source, and include the following additional analysis to highlight this issue: Subsequently, the results of the Wilcox tests comparing pre-landslide precipitation percentiles are duplicated splitting the data in the high-and low-accuracy groups (<=1 km and > 1 km respectively). The number of days with significantly significant differences in precipitation percentile in the 14 days prior to the landslide and 7 days are computed in each group. Finally, a similar analysis compared debris flows (labeled as 'debris flow' or 'mudslide' in the GLC) and other types of mass movements.
In addition, we include an additional figure (figure3a_pvalues_debrisflow) and analysis as described comparing the day-of-landslide precipitation percentile from the undifferentiated 'landslide' group with landslide specifically labeled as 'mudslide' or 'debris flow':  (Fig. 3a panel (d)), the landslide type has limited impact on the number of days with significant differences (p < 0.05) in precipitation in the 14 days prior to the landslide in regions with any such significant differences. For example, in California ( Fig. 3a panel (b)), nine days have a statistically significant difference for both groups. In the Intermountain West eight days have a statistically significant difference for debris flows while similarly six days have a statistically significant difference for other types of mass movements.
The second challenge is the imprecision in the spatial location of your landslide database. Currently you are using a 10km buffer to see if there are burned areas near the landslide. In the case of shallow landslides, that can be extremely small (on the order of 10-100 m in cross-hillslope width if you are talking about true landslides and not debris flows). A buffer of 10km will often be much larger than a wildfire perimeter therefore it would be very easy to accidently confuse an unburned landslide with a burn area, resulting in spurious conclusions. Moreover, in many studies that focus on true landslides after fire, the rainstorms that trigger slides in burn areas also trigger slides in unburned areas (See for example: Meyer et al., 2001). I suggest that you carve out a small case-study to convince readers that you have a handle on the location or can quantify the uncertainty. If you can use a subset of the data with very well known locations and show the applicability at a known location with post-fire landsliding I think this will help people to trust the generalizations you make.
We appreciate this concern. Additional analysis is included to explore the magnitude of the uncertainty introduced by location errors, as described in the methods section: To explore the effects of variability in location accuracy and landslide type within the GLC, validation analyses were performed to quantify the extent of errors due to these factors. Firstly, the percentages of burned sites in each region were computed for each location accuracy. Subsequently, the results of the Mann-Whitney hypothesis tests comparing prelandslide precipitation percentiles were duplicated splitting the data in the high-and lowaccuracy groups (<=1 km and > 1 km respectively). The number of days with significantly significant differences in precipitation percentile in the 14 days prior to the landslide and 7 days are computed in each group.  Figure 3b shows p-values for Mann-Whitney hypothesis tests comparing precipitation percentiles for burned and unburned groups for high and low location accuracy groups of landslides. High accuracy indicates less than 1 km. Several regions, such as California ( Fig. 3b panel (b)) show substantial differences between the high-accuracy and lowaccuracy p-values. Sample sizes of burned locations among the exact locations are low, ranging from 2 to 34 in each region, with overall only 3.7% of high-accuracy landslides classified as burned (below the threshold used to exclude regions from this study). The low percentage of burned sites may partially account for high p-values among the highaccuracy group. An additional important consideration is the likelihood of a greater number of false positive burned sites among the low-accuracy group. Notably, the percentage of identified burned sites using this method increases with the location accuracy radius -globally 12.5% of low-accuracy landslides were identified as burned in contrast with only 3.7% of high-accuracy landslides.
Finally, we expand the discussion: Low landslide location accuracy and lower number of burned landslides may have also contributed to the lack of conclusive results in the Pacific Northwest, Southeast Asia and Central America. The regions outside the US and Canada tended to have less accurate landslide locations. Furthermore, less accurate locations were also more likely to be marked as burned, with a threefold increase in the percentage of landslides identified as burned between high-and low-accuracy groups. This occurs because larger landslide radii were more likely to contain burned area by chance alone, and hence become `false positive' post-wildfire landslides, i.e.~landslides that occurred nearby but not coincident to a burned area. This idea is supported by the lower cumulative burned fractions within the regions outside the US and Canada (see Fig. 1 panels (c) and (d)). Though landslide accuracy in the GLC is an approximate measure, introducing the possibility of false negative unburned sites, false positive post-wildfire landslides nonetheless represent an important potential source of uncertainty in this analysis. These uncertainties introduce the possibility that some of differences in triggering precipitation percentiles between burned and unburned sites may be related to unique qualities of fire-prone areas rather than fire itself. The degree to which fires and landslides are statistically linked also contributes to the rate of false positives. Some regions may have many false positive burned landslides because there was a larger percentage of low accuracy locations, or alternatively because there was no significant increase in the probability that a landslide would occur in a burned location. Such a low posterior landslide probability given that a fire has occurred would tend to greatly increase the number of false positive burned areas by decreasing the probability that a landslide occurred in the burned section of the landslide radius, thus negating the effects of larger landslide buffers. Future studies using visible and other satellite imagery to pinpoint landslide locations and dates could help further clarify the post-wildfire posterior landslide probability by essentially eliminating the location error.
The third challenge is timing of the landslide database that you are using with respect to the wildfire. The issue of timing cross-cuts the first challenge. We know that, in general, shallow landslides happen several years after a wildfire and post-fire debris flows happen very soon after a wildfire, but you show the timing of the landsliding in any of your plots so it is very hard to analyze the how precipitation forcing should work based on the differences in those landslides with respect to time since fire. Consequently, explicitly analyzing time since fire will go a long way to helping readers to understand how to interpret your data. The following text is inserted in the Results section to describe this figure (note that the figure number is a placeholder to avoid confusion with existing figures): Figure 5a shows the p-values of Mann-Whitney tests comparing precipitation percentiles of groups of mass movements with different timing relative to wildfire with precipitation percentiles of mass movements at unburned sites. Landslides at burned sites were divided into two groups: within one year after a wildfire, landslide between one and three years after a wildfire. In California and the Pacific Northwest of the US (Figure 5a panels (b) and (d)), the p-values are similar among the two timing groups. By contrast, in the Intermountain West of the US (Figure 5a panel (c)), the lower precipitation percentiles at burned sites are only statistically significant at the time of the for landslides occurring 1-3 years after a wildfire. However, precipitation is significantly lower in the 'less than one year' group in the seven-to-three days before the landslide. In Central America, the Himalayas, and Southeast Asia (Figure 5a panels (e), (f), and (g)), differences between burned and unburned sites are not statistically significant for either group.
The following text is inserted into the Discussion: The timing of landslides relative to wildfire may also influence the magnitude of triggering storms. While in some regions, such as California and the Pacific Northwest, timing does not have a major impact on precipitation percentile differences, the Intermountain West of the US displays two distinct behaviors depending on the timing of landslides relative to wildfire. In the year immediately after a fire, the precipitation percentile is lower than for landslides at unburned locations in the seven-to-three days before the landslide, before rising to match precipitation percentile at unburned locations (see Figure 5a panel (c)). This pattern matches the result from Figure 6 panel (c) in which post-wildfire landslides in this region appear to manifest as a large storm preceded by a period of infrequent precipitation. In contrast, timing appears to make little difference to the precipitation percentile in other regions.
A final general comment is that some of the precipitation analysis is very vague for readers unfamiliar with the type of data you are using. For example, you often refer to changes in percentiles, but often it isn't clear what the precipitation is a percentile of? Is it the percentile of the max 7 day rainfall, the max rainfall in a 38 year record, or something else. More detail in explaining your methods would really help readers. Similar comment for the figures. Many of the figures are missing axis labels or labeled tick marks like the inset figures in Figure 1, Figure 4 h-u, and Figure 5.
We thank the reviewer for this suggestion and have clarified the methods section below: First, the seven-day running total precipitation depth percentile for the 30 days surrounding the day of the year and across the total 38-year record (see Sect. 2.4) was used as a proxy for landslide susceptibility.

25: Do Kirshbaum and Stanley reference wildfire?
We thank the reviewer for this observation, and revise the text as follows: Mass movement hazards in general may also depend on dynamic factors such as soil moisture, meteorology and the length of time since the most recent fire (Kirschbaum and Stanley, 2018;McGuire et al., 2021;DeGraff et al. 2015) DeGRAFF, J. V., Cannon, S. H., & Gartner, J. E. (2015).

56: Ebel 2012 said that ash holds much more water, not that it reduces infiltration
This sentence is revised as follows:

70: I'd add references to Pelletier and Orem, 2014
We make the following change: Increased likelihood of post-wildfire debris flows has also been associated with the erodibility of fine sediment in the soil, soil organic matter percentage, soil clay percentage, underlying lithology (e.g. sedimentary or granitic rock), watershed area, and watershed relief ratio ( We thank the reviewer for the above two observations. Where possible we have used the same vocabulary as the GLC, in which 'landslide' refers to all types of rainfall-triggered mass movements. However, this terminology is misleading. We propose to replace the term 'landslide' with 'mass movement,' or 'debris flow' where specified throughout the manuscript to reduce confusion.

139: Please provide a more detailed definition of the precipitation depth percentile. 224: A 30-day rolling percentile of what?
We appreciate the above two concerns, and will provided additional details about the percentile calculation below: First, the seven-day running total precipitation depth percentile for the 30 days surrounding the day of the year and across the total 38-year record (see Sect. 2.4) was used as a proxy for landslide susceptibility.
And further details in Sect. 2.4: Precipitation data were further processed to facilitate the comparison of landslidetriggering events across a variety of sea-sons and climates. The precipitation values were normalized for both location and time of year by computing a 30-day rolling percentile of the 7-day running precipitation values based on 38 years of historical precipitation climatology from 1981-2019 for each location. The percentile was computed from all the precipitation values from up to 15 days before or after the day of the year (DOY) on which the landslide occurred, and from all years in the record. This statistic controls for geographic and seasonal differences across landslide events by producing a normalized precipitation distribution that remains uniform for location and time of year. As a result, anomalous precipitation events are highlighted, facilitating the comparison of landslide triggers across locations and seasons.

236: Again the median percentile of what?
The text is modified as follows, as well as any other locations where the percentile is not clarified: The null hypothesis of the Mann-Whitney test was that the median precipitation percentile of the burned sites is greater than or equal to the median precipitation percentile of the unburned sites.

156: It isn't clear how you define those categories (e.g. what defines "rain" versus "downpour")
We have clarified the text below: In order to reduce errors resulting from including a variety of types of rainfall-triggered landslides within the same dataset, the selected landslides were limited to those labeled in the GLC with a 'landslide trigger' value of 'rain,' 'downpour,' 'flooding,' or 'continuous rain.'

179: Previous studies say that debris flow susceptibility increases within six months of a fire, but landsliding can take many years to occur. See Benda and Dunne, 1997
Gartner et al. (2014) found that the increase in debris flow probability in a watershed due to wildfire is greatest immediately after wildfire, but can last a total of 2-5 years. Other studies suggest that the overall mass movement hazard evolves over time in a more complex manner, with debris flow hazards increasing for the year after the fire followed by an increase in the frequency of shallow landslides as tree roots decay in subsequent years (Rengers et al., 2020;Benda and Dunne, 1997).

194: You should acknowledge that severe wildfire is most common in semi arid regions. Humid regions can have fires, but the severity is limited and very few fires from humid regions result in landslides or debris flows because they don't reach very high burn severity.
We appreciate this observation, and include the following reference to this effect: These five studies model the probability of landslides following fire using logistic regressions to demonstrate that both burn severity (Staley et al., 2016) and burn extent within a watershed (Cannon et al. 2010) are associated with increased debris flow likelihood. Notably, burn severity and extent are both increased by drought and other low antecedent soil moisture (Westerling et al., 2003), and thus we expect to find more postwildfire debris flows in dry climates.

196: Would CHIRPS even pick up a storm like the NCFR that hit Montecito, CA in January 2018?
We appreciate this point, and elaborate on the choice of the CHIRPS dataset: Time series of precipitation at the landslide sites were obtained from the CHIRPS precipitation dataset (Funk et al., 2015). CHIRPS is a gauge-corrected global precipitation database derived from satellite-based cloud temperature measurements.The CHIRPS dataset was chosen because of its global coverage and relatively long climatological record (1981-present). Though the ~5.5 km resolution of CHIRPS may present challenges in capturing high-intensity storms that sometimes trigger landslides (Hong et al., 2006), Gupta et al. (2019) found that CHIRPS performed well in detecting extreme precipitation across India. Furthermore, this resolution matches the 5 km resolution of the plurality of records in the GLC. Precipitation was averaged for each landslide location within the radius of the provided location accuracy. Additional pre-processing steps described below were performed to distinguish anomalously high precipitation events from potential seasonal shifts and climatic differences across sites

215: Please provide a more detailed description of both CHIRPS and Daymet.
Please see the above changes to the description of the CHIRPS dataset. In addition, we add the following description of Daymet: A comparison with the Daymet precipitation dataset over the same domain revealed that the two precipitation datasets frequently did not agree on these zero-precipitation landslide events, suggesting that the problem largely originated from the precipitation data themselves. Daymet is higher-resolution than CHIRPS (1 km vs. 5.5 km) and is based on precipitation gauge measurements. The extent of Daymet is limited to North America and thus is only used for validation in the California area. 440: Be more specific in the length of time you are referring to when you say "a dry spell followed by a sharp uptick in precipitation" Are you talking about decadal drought, a few weeks, ?
The text is clarified as indicated below: In contrast, in the Intermountain West burned landslide locations appear to be characterized by a dry spell of at least 20 days followed by a sharp uptick in precipitation, suggesting that burned and dry soil may be the most vulnerable to extreme erosion in that region.

446: Since you don't differentiate between debris flows and landslides, it is
entirely unclear how to assess your conclusion that you think landslides are caused by isolated intense thunderstorms on dry soil. Wall et al., 2020 offers a really nice overview of literature in the Pacific Northwest about true post-fire landslides (not debris flows). Note that the authors referenced therein often saw landsliding after very wet periods many years after wildfire.
We appreciate this point, and have clarified this conclusion as follows: In other regions such as the Intermountain West and Southeast Asia, landslide seasonality was shifted by 3or 6 months, suggesting that the physical mechanisms causing landslides at burned and unburned locations in these regions are entirely different. For example, in the Intermountain West we posit that a portion of post-wildfire landslides may be caused by isolated intense thunderstorms on dry soil producing the observed pattern of landslidetriggering storms in burned locations preceded by at least several weeks with limited precipitation. Among the unburned sites, by contrast, a pattern of mass movements occurring during the wettest part of the year suggests that saturation of the soil is a more important precursor. Days where a significant difference was found between the burned and unburned groups are indicated in darker colors. The caption of Figure 5 is changed as follows: DOY of landslides, DOY of fires, and the length of time in between fire and mass movement by region. Each horizontal line represents one event, arranged on the y-axis in order of the delay between wildfire and mass movement. Black dots on the right show the day of the year the landslide occurred, and horizontal lines represent the duration of time elapsed in between the fire and the landslide. Lines are colored by the season of the fire and are ordered by the day of the fire relative to the landslide. The black lines, or rug, at the top of each panel as well as the colored rug on the left duplicate the day-of-year of the fires to highlight seasonal patterns. We amend the caption of Fig. 6 to explain the legend: Precipitation frequency anomaly relative to the long-term mean aligned by the landslide date. In panels (a)(g), frequency is shown both daily and smoothed with a 90-day moving average to highlight shifts. Daily precipitation frequency is represented as thin lines in orange and purple (burned and unburned groups) while the 90-day average is a thicker line. The long-term mean has been removed from all the frequency curves. Landslides are in burned and unburned groups for each region separately and for all landslides. In panels (h)--(n), the kernel density estimate of landslides by the time of year is shown for both the burned and unburned groups in a radial plot.
In addition, we propose to clarify the kernel density analysis: Figure 6 shows differences in seasonality between burned and unburned landslide seasonality on the right and the results of the precipitation frequency analysis on the left. The kernel density estimates on the right show changes in the seasons (e.g. Fall or Winter) in which landslides at burned and unburned sites occurred. By contrast, the analysis on the left shows when landslides in each group tended to occur relative to the times of year with greater precipitation frequency. While all regions except for Central America… Please also note the supplement to this comment: https://nhess.copernicus.org/preprints/nhess-2021-111/nhess-2021-111-AC2-supplement .zip