A Global Drought Dataset from Clustering-Based Event Identification with Integrated Population, and GDP Exposure and Socioeconomic Impacts
Abstract. Drought events pose significant challenges to both ecosystems and human societies, requiring precise methodologies for their detection and impact assessment. A key challenge is linking physical drought indicators to socioeconomic consequences, such as the number of people affected or economic losses. This study introduces a robust two-step framework that integrates drought detection with impact analysis. In the first step, a clustering algorithm is used to identify coherent drought events and extract key characteristics such as severity and spatial extent. These events are tracked as spatially and temporally evolving objects. In the second step, the drought events are linked to population and GDP exposure, as well as to impact data from global disaster databases.
To characterize droughts, the study employs two widely used drought indices: the Standardized Precipitation Index (SPI) and the Standardized Precipitation Evapotranspiration Index (SPEI). Precipitation and temperature data from the ERA5 reanalysis are used to compute these indices at four different timescales (1, 3, 6, and 12 months). Drought events are identified for different severity levels (-1, -1.5, and -2). The study also incorporates high resolution gridded datasets of global population and economic activity, alongside disaster impact data on affected populations and economic losses. The resulting drought dataset provides valuable information on the association between drought characteristics, exposure, and recorded impacts.
The analysis shows that a relatively large buffer distance is needed to match the identified drought events to impacts from disaster databases, and that more severe drought thresholds isolate fewer but higher-impact events. Population exposure is found to be highest in Asia, while GDP exposure is largest in North America. This integrated framework (https://doi.org/10.5281/zenodo.17251815; Samantaray & Messori, 2025) bridges the gap between physical drought characteristics, exposure, and documented impacts, supporting vulnerability analyses, improved climate adaptation planning and disaster risk management.
The idea behind this work, of organizing a gridded meteorological drought index into clusters describing spatially widespread dry periods and correlating these with reports of drought impacts, is meritorious, but details of the execution need more justification and improvement:
How PET is calculated for determining SPEI doesn't seem to be explained.
It appears that only point data from GDIS are used to localize drought reports. However, according to its documentation, GDIS provides polygons of the affected provinces in the "geometry" field. Comparing this areal information to the ERA5 meteorological drought extent should show much clearer correspondences compared to only centroids or other individual points from GDIS.
At l. 267, "Frequency" seems like the wrong word for the number of months of drought; "Duration" would be more appropriate. Also, the definition of "Severity" is not clear.
Figure 3: The caption fails to state what the blue areas in the maps are.
Since "detection percentages are consistently higher for SPEI than for SPI", I recommend for the SPEI based drought definition to be used as the primary one for reporting the results, and SPI-based results to be given as secondary, whereas now it's mostly the opposite.
There are no clear conclusions drawn as to what SP(E)I timescale is considered to define drought. Most of the figures arbitrarily only show the 1 month timescale, which admittedly can be a "flash drought" but seems too short to correspond to impactful drought in most cases. I suggest to first analyze which SP(E)I timescale matches the drought disaster dataset the best, and then report findings primarily for that timescale.
As well as population and GDP, considering measures of agriculture intensity may be helpful in predicting the impacts of drought, since agriculture is by far the most water intensive major economic sector. Oddly, agriculture is not mentioned at all except in the literature review.
Seasonality is also never mentioned. It might be hypothesized that droughts occurring during the growing season have much bigger impacts than those at other times of year.
The mostly 3-D figures (e.g. 5-12) are not interpretable. I strongly recommend to find a different way to show results, and adjust the discussion in the text accordingly.
$2 × 10⁷ USD seems like a small amount of average damage for a large-scale drought in North America, since the USA has experienced quite a number of droughts that caused multiple billions in damage. It would be helpful to show more statistical information about the set of EM-DAT droughts included in the analysis, including their minimum, maximum, mean, and median damages, and to compare this to information from other databases.
Additionally, since this is for publication in a data description journal, the paper should say more about the format of the generated dataset, including what fields it contains and what are some anticipated use cases.