Preprints
https://doi.org/10.5194/essd-2025-321
https://doi.org/10.5194/essd-2025-321
18 Aug 2025
 | 18 Aug 2025
Status: this preprint is currently under review for the journal ESSD.

GEMS-GER: A Machine Learning Benchmark Dataset of Long-Term Groundwater Levels in Germany with Meteorological Forcings and Site-Specific Environmental Features

Marc Ohmer, Tanja Liesch, Bastian Habbel, Benedikt Heudorfer, Mariana Gomez, Patrick Clos, Maximilian Nölscher, and Stefan Broda

Abstract. We present GEMS-GER (Groundwater Levels, Environment, Meteorology, Site Properties), the first benchmark dataset specifically designed for machine learning applications in long-term groundwater level modeling in Germany. The dataset comprises 32 years of gapless weekly observations from 3,207 monitoring wells, enriched with meteorological forcing variables and more than 50 site-specific static attributes. All data have undergone extensive preprocessing, including harmonization, outlier removal, and iterative imputation, to ensure high quality and suitability for machine learning applications. The wells are spatially distributed across Germany and cover diverse hydrogeological settings and aquifer types. To demonstrate the utility of the dataset, we provide three initial benchmark models: a single-well CNN model, a global LSTM model using dynamic inputs, and a global LSTM model incorporating both dynamic and static features. The best-performing model achieves satisfactory predictive performance (NSE > 0.5) for more than half (52 %) of the wells, which is considered a strong result in the context of groundwater modeling.

GEMS-GER is openly available under an open-access license via Zenodo, accompanied by detailed documentation. By enabling standardized and reproducible evaluation of data-driven groundwater models, the dataset offers a robust foundation for advancing machine learning research in hydrogeology.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Marc Ohmer, Tanja Liesch, Bastian Habbel, Benedikt Heudorfer, Mariana Gomez, Patrick Clos, Maximilian Nölscher, and Stefan Broda

Status: open (until 25 Sep 2025)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Marc Ohmer, Tanja Liesch, Bastian Habbel, Benedikt Heudorfer, Mariana Gomez, Patrick Clos, Maximilian Nölscher, and Stefan Broda

Data sets

GEMS-GER: A Machine Learning Benchmark Dataset of Long-Term Groundwater Levels in Germany with Meteorological Forcings and Site-Specific Environmental Features Marc Ohmer http://zenodo.org/records/15530171

Model code and software

GEMS-GER code and benchmark models for the publicly available groundwater monitoring dataset of Germany. Marc Ohmer and Tanja Liesch https://github.com/KITHydrogeology/GEMS-GER

Marc Ohmer, Tanja Liesch, Bastian Habbel, Benedikt Heudorfer, Mariana Gomez, Patrick Clos, Maximilian Nölscher, and Stefan Broda
Metrics will be available soon.
Latest update: 19 Aug 2025
Download
Short summary
We present a public dataset of weekly groundwater levels from more than 3,000 wells across Germany, spanning 32 years. It combines weather data and site-specific environmental information to support forecasting groundwater changes. Three benchmark models of varying complexity show how data and modeling approaches influence predictions. This resource promotes open, reproducible research and helps guide future water management decisions.
Share
Altmetrics