A Global Drought Dataset from Clustering-Based Event Identification with Integrated Population, and GDP Exposure and Socioeconomic Impacts

Samantaray, Alok Kumar; Messori, Gabriele

doi:10.5194/essd-2025-646

Preprints

https://doi.org/10.5194/essd-2025-646

Preprints

03 Feb 2026

| 03 Feb 2026

Status: this preprint is currently under review for the journal ESSD.

A Global Drought Dataset from Clustering-Based Event Identification with Integrated Population, and GDP Exposure and Socioeconomic Impacts

Alok Kumar Samantaray and Gabriele Messori

Abstract. Drought events pose significant challenges to both ecosystems and human societies, requiring precise methodologies for their detection and impact assessment. A key challenge is linking physical drought indicators to socioeconomic consequences, such as the number of people affected or economic losses. This study introduces a robust two-step framework that integrates drought detection with impact analysis. In the first step, a clustering algorithm is used to identify coherent drought events and extract key characteristics such as severity and spatial extent. These events are tracked as spatially and temporally evolving objects. In the second step, the drought events are linked to population and GDP exposure, as well as to impact data from global disaster databases.

To characterize droughts, the study employs two widely used drought indices: the Standardized Precipitation Index (SPI) and the Standardized Precipitation Evapotranspiration Index (SPEI). Precipitation and temperature data from the ERA5 reanalysis are used to compute these indices at four different timescales (1, 3, 6, and 12 months). Drought events are identified for different severity levels (-1, -1.5, and -2). The study also incorporates high resolution gridded datasets of global population and economic activity, alongside disaster impact data on affected populations and economic losses. The resulting drought dataset provides valuable information on the association between drought characteristics, exposure, and recorded impacts.

The analysis shows that a relatively large buffer distance is needed to match the identified drought events to impacts from disaster databases, and that more severe drought thresholds isolate fewer but higher-impact events. Population exposure is found to be highest in Asia, while GDP exposure is largest in North America. This integrated framework (https://doi.org/10.5281/zenodo.17251815; Samantaray & Messori, 2025) bridges the gap between physical drought characteristics, exposure, and documented impacts, supporting vulnerability analyses, improved climate adaptation planning and disaster risk management.

Received: 27 Oct 2025 – Discussion started: 03 Feb 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Alok Kumar Samantaray and Gabriele Messori

Status: open (until 18 Apr 2026)

Post a comment Subscribe to comment alert

RC1: 'Comment on essd-2025-646', Nir Krakauer, 06 Feb 2026 reply

The idea behind this work, of organizing a gridded meteorological drought index into clusters describing spatially widespread dry periods and correlating these with reports of drought impacts, is meritorious, but details of the execution need more justification and improvement:
How PET is calculated for determining SPEI doesn't seem to be explained.
It appears that only point data from GDIS are used to localize drought reports. However, according to its documentation, GDIS provides polygons of the affected provinces in the "geometry" field. Comparing this areal information to the ERA5 meteorological drought extent should show much clearer correspondences compared to only centroids or other individual points from GDIS.
At l. 267, "Frequency" seems like the wrong word for the number of months of drought; "Duration" would be more appropriate. Also, the definition of "Severity" is not clear.
Figure 3: The caption fails to state what the blue areas in the maps are.
Since "detection percentages are consistently higher for SPEI than for SPI", I recommend for the SPEI based drought definition to be used as the primary one for reporting the results, and SPI-based results to be given as secondary, whereas now it's mostly the opposite.
There are no clear conclusions drawn as to what SP(E)I timescale is considered to define drought. Most of the figures arbitrarily only show the 1 month timescale, which admittedly can be a "flash drought" but seems too short to correspond to impactful drought in most cases. I suggest to first analyze which SP(E)I timescale matches the drought disaster dataset the best, and then report findings primarily for that timescale.
As well as population and GDP, considering measures of agriculture intensity may be helpful in predicting the impacts of drought, since agriculture is by far the most water intensive major economic sector. Oddly, agriculture is not mentioned at all except in the literature review.
Seasonality is also never mentioned. It might be hypothesized that droughts occurring during the growing season have much bigger impacts than those at other times of year.
The mostly 3-D figures (e.g. 5-12) are not interpretable. I strongly recommend to find a different way to show results, and adjust the discussion in the text accordingly.
$2 × 10⁷ USD seems like a small amount of average damage for a large-scale drought in North America, since the USA has experienced quite a number of droughts that caused multiple billions in damage. It would be helpful to show more statistical information about the set of EM-DAT droughts included in the analysis, including their minimum, maximum, mean, and median damages, and to compare this to information from other databases.
Additionally, since this is for publication in a data description journal, the paper should say more about the format of the generated dataset, including what fields it contains and what are some anticipated use cases.

Reply

Citation: https://doi.org/10.5194/essd-2025-646-RC1

Alok Kumar Samantaray and Gabriele Messori

Data sets

Reproducible Workflows for Global Drought Clustering with Socioeconomic Exposure Alok Kumar Samantaray and Gabriele Messori https://zenodo.org/records/17251815?token=eyJhbGciOiJIUzUxMiJ9.eyJpZCI6ImMxZjRkNTk5LWRjZTQtNDc0MS1iOTUxLTRjNTc3NGEwNmUxYiIsImRhdGEiOnt9LCJyYW5kb20iOiJlMTUxNzQ5OGFhOGU2OTQ4MzE3ZDViM2ViMDM3MTQwZCJ9.fhEgvQggJwkVyDVUKEFrQcC5aspzBert5potIVNbLG9SO0FtzuH09SBN9ba3e9DCCGpjwlrRdRgXylKVYJeTIw

Model code and software

Alok Kumar Samantaray and Gabriele Messori

Viewed

Total article views: 377 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
241	119	17	377	25	26

HTML: 241
PDF: 119
XML: 17
Total: 377
BibTeX: 25
EndNote: 26

Views and downloads (calculated since 03 Feb 2026)

Month	HTML	PDF	XML	Total
Feb 2026	185	90	10	285
Mar 2026	56	29	7	92

Cumulative views and downloads (calculated since 03 Feb 2026)

Month	HTML	PDF	XML	Total
Feb 2026	185	90	10	285
Mar 2026	56	29	7	92

Viewed (geographical distribution)

Total article views: 359 (including HTML, PDF, and XML) Thereof 359 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 18 Mar 2026

Short summary

We present a global dataset that links drought events to human and economic exposure and impacts. Using a two-step approach, we first cluster drought events from precipitation and temperature records to track their severity and extent, and then connect them to population and Gross Domestic Product exposure and to impacts reported in disaster databases. The dataset supports risk planning and mitigation efforts.


Total:	0
HTML:	0
PDF:	0
XML:	0