Articles | Volume 17, issue 7
https://doi.org/10.5194/essd-17-3141-2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
https://doi.org/10.5194/essd-17-3141-2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
LakeBeD-US: a benchmark dataset for lake water quality time series and vertical profiles
Bennett J. McAfee
CORRESPONDING AUTHOR
Center for Limnology, University of Wisconsin–Madison, Madison, WI 53706, USA
Aanish Pradhan
Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
Abhilash Neog
Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
Sepideh Fatemi
Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
Robert T. Hensley
National Ecological Observatory Network – Battelle, Boulder, CO 80301, USA
Mary E. Lofton
Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061, USA
Anuj Karpatne
Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
Cayelan C. Carey
Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061, USA
Paul C. Hanson
Center for Limnology, University of Wisconsin–Madison, Madison, WI 53706, USA
Related authors
No articles found.
Kelly S. Aho, Kaelin M. Cawley, Robert T. Hensley, Robert O. Hall Jr., Walter K. Dodds, and Keli J. Goodman
Earth Syst. Sci. Data, 16, 5563–5578, https://doi.org/10.5194/essd-16-5563-2024, https://doi.org/10.5194/essd-16-5563-2024, 2024
Short summary
Short summary
Gas exchange is fundamental to many biogeochemical processes in streams and depends on the degree of gas saturation and the gas transfer velocity (k). Currently, k is harder to measure than concentration. Here, we present a processing pipeline to estimate k from tracer-gas experiments conducted in 22 streams by the National Ecological Observatory Network. The processed dataset (n = 339) represents the largest compilation of standardized k estimates available.
Austin Delany, Robert Ladwig, Cal Buelo, Ellen Albright, and Paul C. Hanson
Biogeosciences, 20, 5211–5228, https://doi.org/10.5194/bg-20-5211-2023, https://doi.org/10.5194/bg-20-5211-2023, 2023
Short summary
Short summary
Internal and external sources of organic carbon (OC) in lakes can contribute to oxygen depletion, but their relative contributions remain in question. To study this, we built a two-layer model to recreate processes relevant to carbon for six Wisconsin lakes. We found that internal OC was more important than external OC in depleting oxygen. This shows that it is important to consider both the fast-paced cycling of internally produced OC and the slower cycling of external OC when studying lakes.
Malgorzata Golub, Wim Thiery, Rafael Marcé, Don Pierson, Inne Vanderkelen, Daniel Mercado-Bettin, R. Iestyn Woolway, Luke Grant, Eleanor Jennings, Benjamin M. Kraemer, Jacob Schewe, Fang Zhao, Katja Frieler, Matthias Mengel, Vasiliy Y. Bogomolov, Damien Bouffard, Marianne Côté, Raoul-Marie Couture, Andrey V. Debolskiy, Bram Droppers, Gideon Gal, Mingyang Guo, Annette B. G. Janssen, Georgiy Kirillin, Robert Ladwig, Madeline Magee, Tadhg Moore, Marjorie Perroud, Sebastiano Piccolroaz, Love Raaman Vinnaa, Martin Schmid, Tom Shatwell, Victor M. Stepanenko, Zeli Tan, Bronwyn Woodward, Huaxia Yao, Rita Adrian, Mathew Allan, Orlane Anneville, Lauri Arvola, Karen Atkins, Leon Boegman, Cayelan Carey, Kyle Christianson, Elvira de Eyto, Curtis DeGasperi, Maria Grechushnikova, Josef Hejzlar, Klaus Joehnk, Ian D. Jones, Alo Laas, Eleanor B. Mackay, Ivan Mammarella, Hampus Markensten, Chris McBride, Deniz Özkundakci, Miguel Potes, Karsten Rinke, Dale Robertson, James A. Rusak, Rui Salgado, Leon van der Linden, Piet Verburg, Danielle Wain, Nicole K. Ward, Sabine Wollrab, and Galina Zdorovennova
Geosci. Model Dev., 15, 4597–4623, https://doi.org/10.5194/gmd-15-4597-2022, https://doi.org/10.5194/gmd-15-4597-2022, 2022
Short summary
Short summary
Lakes and reservoirs are warming across the globe. To better understand how lakes are changing and to project their future behavior amidst various sources of uncertainty, simulations with a range of lake models are required. This in turn requires international coordination across different lake modelling teams worldwide. Here we present a protocol for and results from coordinated simulations of climate change impacts on lakes worldwide.
Robert Ladwig, Paul C. Hanson, Hilary A. Dugan, Cayelan C. Carey, Yu Zhang, Lele Shu, Christopher J. Duffy, and Kelly M. Cobourn
Hydrol. Earth Syst. Sci., 25, 1009–1032, https://doi.org/10.5194/hess-25-1009-2021, https://doi.org/10.5194/hess-25-1009-2021, 2021
Short summary
Short summary
Using a modeling framework applied to 37 years of dissolved oxygen time series data from Lake Mendota, we identified the timing and intensity of thermal energy stored in the lake water column, the lake's resilience to mixing, and surface primary production as the most important drivers of interannual dynamics of low oxygen concentrations at the lake bottom. Due to climate change, we expect an increase in the spatial and temporal extent of low oxygen concentrations in Lake Mendota.
Yun Dong, Elena Spinei, and Anuj Karpatne
Atmos. Meas. Tech., 13, 5537–5550, https://doi.org/10.5194/amt-13-5537-2020, https://doi.org/10.5194/amt-13-5537-2020, 2020
Short summary
Short summary
This paper is about a feasibility study of applying a machine learning technique to derive aerosol properties from a single MAX-DOAS sky scan, which detects sky-scattered UV–visible photons at multiple elevation angles. Evaluation of retrieved aerosol properties shows good performance of the ML algorithm, suggesting several advantages of a ML-based inversion algorithm such as fast data inversion, simple implementation and the ability to extract information not available using other algorithms.
Matthew R. Hipsey, Louise C. Bruce, Casper Boon, Brendan Busch, Cayelan C. Carey, David P. Hamilton, Paul C. Hanson, Jordan S. Read, Eduardo de Sousa, Michael Weber, and Luke A. Winslow
Geosci. Model Dev., 12, 473–523, https://doi.org/10.5194/gmd-12-473-2019, https://doi.org/10.5194/gmd-12-473-2019, 2019
Short summary
Short summary
The General Lake Model (GLM) has been developed to undertake simulation of a diverse range of wetlands, lakes, and reservoirs. The model supports the science needs of the Global Lake Ecological Observatory Network (GLEON), a network of lake sensors and researchers attempting to understand lake functioning and address questions about how lakes around the world vary in response to climate and land use change. The paper describes the science basis and application of the model.
Related subject area
Domain: ESSD – Land | Subject: Hydrology
An integrated high-resolution bathymetric model for the Danube Delta system
Benchmark dataset for hydraulic simulations of flash floods in the French Mediterranean region
Transformation rate maps of dissolved organic carbon in the contiguous US
A 1985–2023 time series dataset of absolute reservoir storage in Mainland Southeast Asia (MSEA-Res)
Machine-learning-based reconstruction of long-term global terrestrial water storage anomalies from observed, satellite and land-surface model data
Mapping the world's inland surface waters: an upgrade to the Global Lakes and Wetlands Database (GLWD v2)
One year of high-frequency monitoring of groundwater physico-chemical parameters in the Weierbach experimental catchment, Luxembourg
Discrete global grid system-based flow routing datasets in the Amazon and Yukon basins
GRILSS: opening the gateway to global reservoir sedimentation data curation
A worldwide event-based debris flow barrier dam dataset from 1800 to 2023
CAMELS-DK: hydrometeorological time series and landscape attributes for 3330 Danish catchments with streamflow observations from 304 gauged stations
An in situ daily dataset for benchmarking temporal variability of groundwater recharge
CAMELS-FR dataset: a large-sample hydroclimatic dataset for France to explore hydrological diversity and support model benchmarking
Features of Italian large dams and their upstream catchments
Gridded rainfall erosivity (2014–2022) in mainland China using 1 min precipitation data from densely distributed weather stations
OLIGOTREND, towards a global database of multi-decadal chlorophyll-a and water quality timeseries for rivers, lakes and estuaries
High-resolution hydrometeorological and snow data for the Dischma catchment in Switzerland
A 3-hour, 1-km surface soil moisture dataset for the contiguous United States from 2015 to 2023
CAMELS-IND: hydrometeorological time series and catchment attributes for 228 catchments in Peninsular India
HERA: a high-resolution pan-European hydrological reanalysis (1951–2020)
BCUB – a large-sample ungauged basin attribute dataset for British Columbia, Canada
Comprehensive inventory of large hydropower systems in the Italian Alpine Region
Northern Hemisphere in situ snow water equivalent dataset (NorSWE, 1979–2021)
ESA CCI Soil Moisture GAPFILLED: An independent global gap-free satellite climate data record with uncertainty estimates
Lena River biogeochemistry captured by a 4.5-year high-frequency sampling program
CAMELS-DE: hydro-meteorological time series and attributes for 1582 catchments in Germany
Observational partitioning of water and CO2 fluxes at National Ecological Observatory Network (NEON) sites: a 5-year dataset of soil and plant components for spatial and temporal analysis
A benchmark dataset for global evapotranspiration estimation based on FLUXNET2015 from 2000 to 2022
GRDC-Caravan: extending Caravan with data from the Global Runoff Data Centre
CIrrMap250: annual maps of China's irrigated cropland from 2000 to 2020 developed through multisource data integration
HANZE v2.1: an improved database of flood impacts in Europe from 1870 to 2020
A Copernicus-based evapotranspiration dataset at 100 m spatial resolution over four Mediterranean basins
Gridded dataset of nitrogen and phosphorus point sources from wastewater in Germany (1950–2019)
A globally sampled high-resolution hand-labeled validation dataset for evaluating surface water extent maps
Satellite-based near-real-time global daily terrestrial evapotranspiration estimates
Multivariate characterisation of a blackberry–alder agroforestry system in South Africa: hydrological, pedological, dendrological and meteorological measurements
CAMELS-AUS v2: updated hydrometeorological timeseries and landscape attributes for an enlarged set of catchments in Australia
SHIFT: a spatial-heterogeneity improvement in DEM-based mapping of global geomorphic floodplains
First comprehensive stable isotope dataset of diverse water units in a permafrost-dominated catchment on the Qinghai–Tibet Plateau
LamaH-Ice: LArge-SaMple DAta for Hydrology and Environmental Sciences for Iceland
High-resolution mapping of monthly industrial water withdrawal in China from 1965 to 2020
Evapotranspiration evaluation using three different protocols on a large green roof in the greater Paris area
Simbi: historical hydro-meteorological time series and signatures for 24 catchments in Haiti
CAMELE: Collocation-Analyzed Multi-source Ensembled Land Evapotranspiration Data
A hydrogeomorphic dataset for characterizing catchment hydrological behavior across the Tibetan Plateau
A synthesis of Global Streamflow Characteristics, Hydrometeorology, and Catchment Attributes (GSHA) for large sample river-centric studies
FOCA: a new quality-controlled database of floods and catchment descriptors in Italy
Dams in the Mekong: a comprehensive database, spatiotemporal distribution, and hydropower potentials
A global dataset of the shape of drainage systems
An extensive spatiotemporal water quality dataset covering four decades (1980–2022) in China
Lauranne Alaerts, Jonathan Lambrechts, Ny Riana Randresihaja, Luc Vandenbulcke, Olivier Gourgue, Emmanuel Hanert, and Marilaure Grégoire
Earth Syst. Sci. Data, 17, 3125–3140, https://doi.org/10.5194/essd-17-3125-2025, https://doi.org/10.5194/essd-17-3125-2025, 2025
Short summary
Short summary
We created the first comprehensive, high-resolution, and easily accessible bathymetry dataset for the three main branches of the Danube Delta. By combining four data sources, we obtained a detailed representation of the riverbed, with resolutions ranging from 2 to 100 m. This dataset will support future studies on water and nutrient exchanges between the Danube and the Black Sea and provide insights into the delta's buffer role within the understudied Danube–Black Sea continuum.
Juliette Godet, Pierre Nicolle, Nabil Hocini, Eric Gaume, Philippe Davy, Frederic Pons, Pierre Javelle, Pierre-André Garambois, Dimitri Lague, and Olivier Payrastre
Earth Syst. Sci. Data, 17, 2963–2983, https://doi.org/10.5194/essd-17-2963-2025, https://doi.org/10.5194/essd-17-2963-2025, 2025
Short summary
Short summary
This paper describes a dataset that includes input, output, and validation data for the simulation of flash flood hazards and three specific flash flood events in the French Mediterranean region. This dataset is particularly valuable as flood mapping methods often lack sufficient benchmark data. Additionally, we demonstrate how the hydraulic method we used, named Floodos, produces highly satisfactory results.
Lingbo Li, Hong-Yi Li, Guta Abeshu, Jinyun Tang, L. Ruby Leung, Chang Liao, Zeli Tan, Hanqin Tian, Peter Thornton, and Xiaojuan Yang
Earth Syst. Sci. Data, 17, 2713–2733, https://doi.org/10.5194/essd-17-2713-2025, https://doi.org/10.5194/essd-17-2713-2025, 2025
Short summary
Short summary
We have developed new maps that reveal how organic carbon from soil leaches into headwater streams over the contiguous United States. We use advanced artificial intelligence techniques and a massive amount of data, including observations at over 2500 gauges and a wealth of climate and environmental information. The maps are a critical step in understanding and predicting how carbon moves through our environment, hence making them a useful tool for tackling climate challenges.
Shanti Shwarup Mahto, Simone Fatichi, and Stefano Galelli
Earth Syst. Sci. Data, 17, 2693–2712, https://doi.org/10.5194/essd-17-2693-2025, https://doi.org/10.5194/essd-17-2693-2025, 2025
Short summary
Short summary
The MSEA-Res database offers an open-access dataset tracking absolute water storage for 186 large reservoirs across Mainland Southeast Asia from 1985 to 2023. It provides valuable insights into how reservoir storage grew by 130 % between 2008 and 2017, driven by dams in key river basins. Our data also reveal how droughts, like the 2019–2020 event, significantly impacted water reservoirs. This resource can aid water management, drought planning, and research globally.
Nehar Mandal, Prabal Das, and Kironmala Chanda
Earth Syst. Sci. Data, 17, 2575–2604, https://doi.org/10.5194/essd-17-2575-2025, https://doi.org/10.5194/essd-17-2575-2025, 2025
Short summary
Short summary
Optimal features among hydroclimatic variables and land surface model (LSM) outputs are selected using a novel Bayesian network (BN) approach for simulating terrestrial water storage anomalies (TWSAs). TWSAs are reconstructed (BNML_TWSA) with grid-specific leader models (among four machine learning models) from January 1960 to December 2022 to generate a continuous global gridded dataset. The uncertainty in the reconstructed BNML_TWSA product is also assessed in terms of standard error.
Bernhard Lehner, Mira Anand, Etienne Fluet-Chouinard, Florence Tan, Filipe Aires, George H. Allen, Philippe Bousquet, Josep G. Canadell, Nick Davidson, Meng Ding, C. Max Finlayson, Thomas Gumbricht, Lammert Hilarides, Gustaf Hugelius, Robert B. Jackson, Maartje C. Korver, Liangyun Liu, Peter B. McIntyre, Szabolcs Nagy, David Olefeldt, Tamlin M. Pavelsky, Jean-Francois Pekel, Benjamin Poulter, Catherine Prigent, Jida Wang, Thomas A. Worthington, Dai Yamazaki, Xiao Zhang, and Michele Thieme
Earth Syst. Sci. Data, 17, 2277–2329, https://doi.org/10.5194/essd-17-2277-2025, https://doi.org/10.5194/essd-17-2277-2025, 2025
Short summary
Short summary
The Global Lakes and Wetlands Database (GLWD) version 2 distinguishes a total of 33 non-overlapping wetland classes, providing a static map of the world’s inland surface waters. It contains cell fractions of wetland extents per class at a grid cell resolution of ~500 m. The total combined extent of all classes including all inland and coastal waterbodies and wetlands of all inundation frequencies – that is, the maximum extent – covers 18.2 × 106 km2, equivalent to 13.4 % of total global land area.
Karl Nicolaus van Zweel, Laurent Gourdol, Jean François Iffly, Loïc Léonard, François Barnich, Laurent Pfister, Erwin Zehe, and Christophe Hissler
Earth Syst. Sci. Data, 17, 2217–2229, https://doi.org/10.5194/essd-17-2217-2025, https://doi.org/10.5194/essd-17-2217-2025, 2025
Short summary
Short summary
Our study monitored groundwater in a Luxembourg forest over a year to understand water and chemical changes. We found seasonal variations in water chemistry, influenced by rainfall and soil interactions. These data help predict environmental responses and manage water resources better. By measuring key parameters like pH and dissolved oxygen, our research provides valuable insights into groundwater behaviour and serves as a resource for future environmental studies.
Chang Liao, Darren Engwirda, Matthew G. Cooper, Mingke Li, and Yilin Fang
Earth Syst. Sci. Data, 17, 2035–2062, https://doi.org/10.5194/essd-17-2035-2025, https://doi.org/10.5194/essd-17-2035-2025, 2025
Short summary
Short summary
Discrete global grid systems, or DGGS, are digital frameworks that help us organize information about our planet. Although scientists have used DGGS in areas like weather and nature, using them in the water cycle has been challenging because some core datasets are missing. We created a way to generate these datasets. We then developed the datasets in the Amazon and Yukon basins, which play important roles in our planet's climate. These datasets may help us improve our water cycle models.
Sanchit Minocha and Faisal Hossain
Earth Syst. Sci. Data, 17, 1743–1759, https://doi.org/10.5194/essd-17-1743-2025, https://doi.org/10.5194/essd-17-1743-2025, 2025
Short summary
Short summary
Trustworthy and independently verifiable information on declining storage capacity or sedimentation rates worldwide is sparse and suffers from inconsistent metadata and curation to allow global-scale archiving and analyses. The Global Reservoir Inventory of Lost Storage by Sedimentation (GRILSS) dataset addresses this challenge by providing organized, well-curated, and open-source data on sedimentation rates and capacity loss for 1013 reservoirs in 75 major river basins across 54 countries.
Haiguang Cheng, Kaiheng Hu, Shuang Liu, Xiaopeng Zhang, Hao Li, Qiyuan Zhang, Lan Ning, Manish Raj Gouli, Pu Li, Anna Yang, Peng Zhao, Junyu Liu, and Li Wei
Earth Syst. Sci. Data, 17, 1573–1593, https://doi.org/10.5194/essd-17-1573-2025, https://doi.org/10.5194/essd-17-1573-2025, 2025
Short summary
Short summary
After reviewing 2519 literature and media reports, we compiled the first comprehensive global dataset of 555 debris flow barrier dams (DFBDs) from 1800 to 2023. Our dataset meticulously documents 38 attributes of DFBDs, and we have utilized Google Earth for validation. Additionally, we discussed the applicability of landslide dam stability and peak-discharge models to DFBDs. This dataset offers a rich foundation of data for future studies on DFBDs.
Jun Liu, Julian Koch, Simon Stisen, Lars Troldborg, Anker Lajer Højberg, Hans Thodsen, Mark F. T. Hansen, and Raphael J. M. Schneider
Earth Syst. Sci. Data, 17, 1551–1572, https://doi.org/10.5194/essd-17-1551-2025, https://doi.org/10.5194/essd-17-1551-2025, 2025
Short summary
Short summary
We developed a CAMELS-style dataset in Denmark, which contains hydrometeorological time series and landscape attributes for 3330 catchments (304 gauged). Many catchments in CAMELS-DK are small and at low elevations. The dataset provides information on groundwater characteristics and dynamics, as well as quantities related to the human impact on the hydrological system in Denmark. The dataset is especially relevant for developing data-driven and hybrid physically informed modeling frameworks.
Pragnaditya Malakar, Aatish Anshuman, Mukesh Kumar, Georgios Boumis, T. Prabhakar Clement, Arik Tashie, Hitesh Thakur, Nagaraj Bhat, and Lokendra Rathore
Earth Syst. Sci. Data, 17, 1515–1528, https://doi.org/10.5194/essd-17-1515-2025, https://doi.org/10.5194/essd-17-1515-2025, 2025
Short summary
Short summary
Groundwater dynamics depend on groundwater recharge, but daily benchmark data of recharge are scarce. Here we present a daily groundwater recharge per unit specified yield (RpSy) data at 485 US groundwater monitoring wells. RpSy can be used to validate the temporal consistency of recharge products from land surface and hydrologic models and facilitate assessment of recharge-driver functional relationships in them.
Olivier Delaigue, Guilherme Mendoza Guimarães, Pierre Brigode, Benoît Génot, Charles Perrin, Jean-Michel Soubeyroux, Bruno Janet, Nans Addor, and Vazken Andréassian
Earth Syst. Sci. Data, 17, 1461–1479, https://doi.org/10.5194/essd-17-1461-2025, https://doi.org/10.5194/essd-17-1461-2025, 2025
Short summary
Short summary
This dataset covers 654 rivers all flowing in France. The provided time series and catchment attributes will be of interest to those modelers wishing to analyze hydrological behavior and perform model assessments.
Giulia Evangelista, Paola Mazzoglio, Daniele Ganora, Francesca Pianigiani, and Pierluigi Claps
Earth Syst. Sci. Data, 17, 1407–1426, https://doi.org/10.5194/essd-17-1407-2025, https://doi.org/10.5194/essd-17-1407-2025, 2025
Short summary
Short summary
This paper presents the first comprehensive dataset of 528 large dams in Italy. It contains structural characteristics of the dams, such as coordinates, reservoir surface areas and volumes, together with a range of geomorphological, climatological, extreme rainfall, land cover and soil-related attributes of their upstream catchments.
Yueli Chen, Yun Xie, Xingwu Duan, and Minghu Ding
Earth Syst. Sci. Data, 17, 1265–1274, https://doi.org/10.5194/essd-17-1265-2025, https://doi.org/10.5194/essd-17-1265-2025, 2025
Short summary
Short summary
Rainfall erosivity maps are crucial for identifying key areas of water erosion. Due to the limited historical precipitation data, there are certain biases in rainfall erosivity estimates in China. This study develops a new rainfall erosivity map for mainland China using 1 min precipitation data from 60 129 weather stations, revealing that areas exceeding 4000 MJ mm ha−1 h−1yr−1 of annual rainfall erosivity are mainly concentrated in southern China and on the southern Tibetan Plateau.
Camille Minaudo, Andras Abonyi, Carles Alcaraz, Jacob Diamond, Nicholas J. K. Howden, Michael Rode, Estela Romero, Vincent Thieu, Fred Worrall, Qian Zhang, and Xavier Benito
Earth Syst. Sci. Data Discuss., https://doi.org/10.5194/essd-2025-58, https://doi.org/10.5194/essd-2025-58, 2025
Revised manuscript accepted for ESSD
Short summary
Short summary
Many waterbodies undergo nutrient decline globally, called oligotrophication, but a comprehensive dataset to understand ecosystem responses is lacking. The OLIGOTREND database comprises multi-decadal chlorophyll-a and nutrient timeseries from rivers, lakes, and estuaries with 4.3 million observations from 1,894 unique measurement locations. The database provides empirical evidence for oligotrophication responses with a spatial and temporal coverage exceeding previous efforts.
Jan Magnusson, Yves Bühler, Louis Quéno, Bertrand Cluzet, Giulia Mazzotti, Clare Webster, Rebecca Mott, and Tobias Jonas
Earth Syst. Sci. Data, 17, 703–717, https://doi.org/10.5194/essd-17-703-2025, https://doi.org/10.5194/essd-17-703-2025, 2025
Short summary
Short summary
In this study, we present a dataset for the Dischma catchment in eastern Switzerland, which represents a typical high-alpine watershed in the European Alps. Accurate monitoring and reliable forecasting of snow and water resources in such basins are crucial for a wide range of applications. Our dataset is valuable for improving physics-based snow, land surface, and hydrological models, with potential applications in similar high-alpine catchments.
Haoxuan Yang, Jia Yang, Tyson E. Ochsner, Erik S. Krueger, Mengyuan Xu, and Chris B. Zou
Earth Syst. Sci. Data Discuss., https://doi.org/10.5194/essd-2025-55, https://doi.org/10.5194/essd-2025-55, 2025
Revised manuscript accepted for ESSD
Short summary
Short summary
We developed a 3-hour, 1-km surface soil moisture dataset for the contiguous United States from 2015 to 2023 using the spatio-temporal fusion method. This dataset effectively combines the distinct advantages of two long-term SSM datasets, which is also the first hour-level 1-km soil moisture dataset at the continental US scale. The new dataset could provide new insight into the fast changes in soil moisture along with drought and wet spell occurrences.
Nikunj K. Mangukiya, Kanneganti Bhargav Kumar, Pankaj Dey, Shailza Sharma, Vijaykumar Bejagam, Pradeep P. Mujumdar, and Ashutosh Sharma
Earth Syst. Sci. Data, 17, 461–491, https://doi.org/10.5194/essd-17-461-2025, https://doi.org/10.5194/essd-17-461-2025, 2025
Short summary
Short summary
We introduce CAMELS-IND (Catchment Attributes and MEteorology for Large-sample Studies – India), which provides daily hydrometeorological time series and static catchment attributes representing the location, topography, climate, hydrological signatures, land use, land cover, soil, geology, and anthropogenic influences for 472 catchments in Peninsular India to foster large-sample hydrological studies in India and promote the inclusion of Indian catchments in global hydrological research.
Aloïs Tilloy, Dominik Paprotny, Stefania Grimaldi, Goncalo Gomes, Alessandra Bianchi, Stefan Lange, Hylke Beck, Cinzia Mazzetti, and Luc Feyen
Earth Syst. Sci. Data, 17, 293–316, https://doi.org/10.5194/essd-17-293-2025, https://doi.org/10.5194/essd-17-293-2025, 2025
Short summary
Short summary
This article presents a reanalysis of Europe's river streamflow for the period 1951–2020. Streamflow is estimated through a state-of-the-art hydrological simulation framework benefitting from detailed information about the landscape, climate, and human activities. The resulting Hydrological European ReAnalysis (HERA) can be a valuable tool for studying hydrological dynamics, including the impacts of climate change and human activities on European water resources and flood and drought risks.
Daniel Kovacek and Steven Weijs
Earth Syst. Sci. Data, 17, 259–275, https://doi.org/10.5194/essd-17-259-2025, https://doi.org/10.5194/essd-17-259-2025, 2025
Short summary
Short summary
We made a dataset for British Columbia describing the terrain, soil, land cover, and climate of over 1 million watersheds. The attributes are often used in hydrology because they are related to the water cycle. The data are meant to be used for water resources problems that can benefit from lots of watersheds and their attributes. The data and instructions needed to build the dataset from scratch are freely available. The permanent home for the data is https://doi.org/10.5683/SP3/JNKZVT.
Andrea Galletti, Soroush Zarghami Dastjerdi, and Bruno Majone
Earth Syst. Sci. Data Discuss., https://doi.org/10.5194/essd-2024-521, https://doi.org/10.5194/essd-2024-521, 2025
Revised manuscript accepted for ESSD
Short summary
Short summary
We propose IAR-HP, a detailed inventory of large hydropower systems in Italy's Alpine Region, aimed at improving hydrological modeling for climate impact studies by providing the most relevant information with a consistent level of detail. It includes structural, geographical, and operational data for over 300 hydropower plants and their related reservoirs and water intakes. Validated through modeling, IAR-HP accurately reproduces observed hydropower, capturing 96.2 % of actual production.
Colleen Mortimer and Vincent Vionnet
Earth Syst. Sci. Data Discuss., https://doi.org/10.5194/essd-2024-602, https://doi.org/10.5194/essd-2024-602, 2025
Revised manuscript accepted for ESSD
Short summary
Short summary
In situ observations of snow water equivalent (SWE) are critical for climate applications and resource management. NorSWE is a dataset of in situ SWE observations covering North America, Finland and Russia over the period 1979–2021. It includes >11 million observations from >10 thousand different locations compiled from nine different sources. Snow depth and derived bulk snow density are included when available.
Wolfgang Preimesberger, Pietro Stradiotti, and Wouter Dorigo
Earth Syst. Sci. Data Discuss., https://doi.org/10.5194/essd-2024-610, https://doi.org/10.5194/essd-2024-610, 2025
Revised manuscript accepted for ESSD
Short summary
Short summary
We introduce the official ESA CCI Soil Moisture GAPFILLED climate data record. A univariate interpolation algorithm is applied to predict missing data points without relying on ancillary variables. The dataset includes gap-free uncertainty estimates for all predictions and was validated with independent in situ reference measurements. The data are recommended for applications, which require global long-term gap-free satellite soil moisture data.
Bennet Juhls, Anne Morgenstern, Jens Hölemann, Antje Eulenburg, Birgit Heim, Frederieke Miesner, Hendrik Grotheer, Gesine Mollenhauer, Hanno Meyer, Ephraim Erkens, Felica Yara Gehde, Sofia Antonova, Sergey Chalov, Maria Tereshina, Oxana Erina, Evgeniya Fingert, Ekaterina Abramova, Tina Sanders, Liudmila Lebedeva, Nikolai Torgovkin, Georgii Maksimov, Vasily Povazhnyi, Rafael Gonçalves-Araujo, Urban Wünsch, Antonina Chetverova, Sophie Opfergelt, and Pier Paul Overduin
Earth Syst. Sci. Data, 17, 1–28, https://doi.org/10.5194/essd-17-1-2025, https://doi.org/10.5194/essd-17-1-2025, 2025
Short summary
Short summary
The Siberian Arctic is warming fast: permafrost is thawing, river chemistry is changing, and coastal ecosystems are affected. We aimed to understand changes in the Lena River, a major Arctic river flowing to the Arctic Ocean, by collecting 4.5 years of detailed water data, including temperature and carbon and nutrient contents. This dataset records current conditions and helps us to detect future changes. Explore it at https://doi.org/10.1594/PANGAEA.913197 and https://lena-monitoring.awi.de/.
Ralf Loritz, Alexander Dolich, Eduardo Acuña Espinoza, Pia Ebeling, Björn Guse, Jonas Götte, Sibylle K. Hassler, Corina Hauffe, Ingo Heidbüchel, Jens Kiesel, Mirko Mälicke, Hannes Müller-Thomy, Michael Stölzle, and Larisa Tarasova
Earth Syst. Sci. Data, 16, 5625–5642, https://doi.org/10.5194/essd-16-5625-2024, https://doi.org/10.5194/essd-16-5625-2024, 2024
Short summary
Short summary
The CAMELS-DE dataset features data from 1582 streamflow gauges across Germany, with records spanning from 1951 to 2020. This comprehensive dataset, which includes time series of up to 70 years (median 46 years), enables advanced research on water flow and environmental trends and supports the development of hydrological models.
Einara Zahn and Elie Bou-Zeid
Earth Syst. Sci. Data, 16, 5603–5624, https://doi.org/10.5194/essd-16-5603-2024, https://doi.org/10.5194/essd-16-5603-2024, 2024
Short summary
Short summary
Quantifying water and CO2 exchanges through transpiration, evaporation, net photosynthesis, and soil respiration is essential for understanding how ecosystems function. We implemented five methods to estimate these fluxes over a 5-year period across 47 sites. This is the first dataset representing such large spatial and temporal coverage of soil and plant exchanges, and it has many potential applications, such as examining the response of ecosystems to weather extremes and climate change.
Wangyipu Li, Zhaoyuan Yao, Yifan Qu, Hanbo Yang, Yang Song, Lisheng Song, Lifeng Wu, and Yaokui Cui
Earth Syst. Sci. Data Discuss., https://doi.org/10.5194/essd-2024-460, https://doi.org/10.5194/essd-2024-460, 2024
Revised manuscript accepted for ESSD
Short summary
Short summary
Due to shortcomings such as extensive data gaps and limited observation durations in current ground-based latent heat flux (LE) datasets, we developed a novel gap-filling and prolongation framework for ground-based LE observations, establishing a benchmark dataset for global evapotranspiration (ET) estimation from 2000 to 2022 across 64 sites at various time scales. This comprehensive dataset can strongly support ET modelling, water-carbon cycle monitoring, and long-term climate change analysis.
Claudia Färber, Henning Plessow, Simon Mischel, Frederik Kratzert, Nans Addor, Guy Shalev, and Ulrich Looser
Earth Syst. Sci. Data Discuss., https://doi.org/10.5194/essd-2024-427, https://doi.org/10.5194/essd-2024-427, 2024
Revised manuscript accepted for ESSD
Short summary
Short summary
Large-sample datasets are essential in hydrological science to support modelling studies and advance process understanding. Caravan is a community initiative to create a large-sample hydrology dataset of meteorological forcing data, catchment attributes, and discharge data for catchments around the world. This dataset is a subset of hydrological discharge data and station-based watersheds from the Global Runoff Data Centre (GRDC), which are covered by an open data policy.
Ling Zhang, Yanhua Xie, Xiufang Zhu, Qimin Ma, and Luca Brocca
Earth Syst. Sci. Data, 16, 5207–5226, https://doi.org/10.5194/essd-16-5207-2024, https://doi.org/10.5194/essd-16-5207-2024, 2024
Short summary
Short summary
This study presented new annual maps of irrigated cropland in China from 2000 to 2020 (CIrrMap250). These maps were developed by integrating remote sensing data, irrigation statistics and surveys, and an irrigation suitability map. CIrrMap250 achieved high accuracy and outperformed currently available products. The new irrigation maps revealed a clear expansion of China’s irrigation area, with the majority (61%) occurring in the water-unsustainable regions facing severe to extreme water stress.
Dominik Paprotny, Paweł Terefenko, and Jakub Śledziowski
Earth Syst. Sci. Data, 16, 5145–5170, https://doi.org/10.5194/essd-16-5145-2024, https://doi.org/10.5194/essd-16-5145-2024, 2024
Short summary
Short summary
Knowledge about past natural disasters can help adaptation to their future occurrences. Here, we present a dataset of 2521 riverine, pluvial, coastal, and compound floods that have occurred in 42 European countries between 1870 and 2020. The dataset contains available information on the inundated area, fatalities, persons affected, or economic loss and was obtained by extensive data collection from more than 800 sources ranging from news reports through government databases to scientific papers.
Paulina Bartkowiak, Bartolomeo Ventura, Alexander Jacob, and Mariapina Castelli
Earth Syst. Sci. Data, 16, 4709–4734, https://doi.org/10.5194/essd-16-4709-2024, https://doi.org/10.5194/essd-16-4709-2024, 2024
Short summary
Short summary
This paper presents the Two-Source Energy Balance evapotranspiration (ET) product driven by Copernicus Sentinel-2 and Sentinel-3 imagery together with ERA5 climate reanalysis data. Daily ET maps are available at 100 m spatial resolution for the period 2017–2021 across four Mediterranean basins: Ebro (Spain), Hérault (France), Medjerda (Tunisia), and Po (Italy). The product is highly beneficial for supporting vegetation monitoring and sustainable water management at the river basin scale.
Fanny J. Sarrazin, Sabine Attinger, and Rohini Kumar
Earth Syst. Sci. Data, 16, 4673–4708, https://doi.org/10.5194/essd-16-4673-2024, https://doi.org/10.5194/essd-16-4673-2024, 2024
Short summary
Short summary
Nitrogen (N) and phosphorus (P) contamination of water bodies is a long-term issue due to the long history of N and P inputs to the environment and their persistence. Here, we introduce a long-term and high-resolution dataset of N and P inputs from wastewater (point sources) for Germany, combining data from different sources and conceptual understanding. We also account for uncertainties in modelling choices, thus facilitating robust long-term and large-scale water quality studies.
Rohit Mukherjee, Frederick Policelli, Ruixue Wang, Elise Arellano-Thompson, Beth Tellman, Prashanti Sharma, Zhijie Zhang, and Jonathan Giezendanner
Earth Syst. Sci. Data, 16, 4311–4323, https://doi.org/10.5194/essd-16-4311-2024, https://doi.org/10.5194/essd-16-4311-2024, 2024
Short summary
Short summary
Global water resource monitoring is crucial due to climate change and population growth. This study presents a hand-labeled dataset of 100 PlanetScope images for surface water detection, spanning diverse biomes. We use this dataset to evaluate two state-of-the-art mapping methods. Results highlight performance variations across biomes, emphasizing the need for diverse, independent validation datasets to enhance the accuracy and reliability of satellite-based surface water monitoring techniques.
Lei Huang, Yong Luo, Jing M. Chen, Qiuhong Tang, Tammo Steenhuis, Wei Cheng, and Wen Shi
Earth Syst. Sci. Data, 16, 3993–4019, https://doi.org/10.5194/essd-16-3993-2024, https://doi.org/10.5194/essd-16-3993-2024, 2024
Short summary
Short summary
Timely global terrestrial evapotranspiration (ET) data are crucial for water resource management and drought forecasting. This study introduces the VISEA algorithm, which integrates satellite data and shortwave radiation to provide daily 0.05° gridded near-real-time ET estimates. By employing a vegetation index–temperature method, this algorithm can estimate ET without requiring additional data. Evaluation results demonstrate VISEA's comparable accuracy with accelerated data availability.
Sibylle Kathrin Hassler, Rafael Bohn Reckziegel, Ben du Toit, Svenja Hoffmeister, Florian Kestel, Anton Kunneke, Rebekka Maier, and Jonathan Paul Sheppard
Earth Syst. Sci. Data, 16, 3935–3948, https://doi.org/10.5194/essd-16-3935-2024, https://doi.org/10.5194/essd-16-3935-2024, 2024
Short summary
Short summary
Agroforestry systems (AFSs) combine trees and crops within the same land unit, providing a sustainable land use option which protects natural resources and biodiversity. Introducing trees into agricultural systems can positively affect water resources, soil characteristics, biomass and microclimate. We studied an AFS in South Africa in a multidisciplinary approach to assess the different influences and present the resulting dataset consisting of water, soil, tree and meteorological variables.
Keirnan J. A. Fowler, Ziqi Zhang, and Xue Hou
Earth Syst. Sci. Data Discuss., https://doi.org/10.5194/essd-2024-263, https://doi.org/10.5194/essd-2024-263, 2024
Revised manuscript accepted for ESSD
Short summary
Short summary
This paper presents Version 2 of the Australian edition of the Catchment Attributes and Meteorology for Large-sample Studies (CAMELS) series of datasets. CAMELS-AUS v2 comprises data for an increased number (561) of catchments, each with with long-term monitoring, combining hydrometeorological time series with attributes related to geology, soil, topography, land cover, anthropogenic influence and hydroclimatology. It is freely downloadable from https://zenodo.org/doi/10.5281/zenodo.12575680.
Kaihao Zheng, Peirong Lin, and Ziyun Yin
Earth Syst. Sci. Data, 16, 3873–3891, https://doi.org/10.5194/essd-16-3873-2024, https://doi.org/10.5194/essd-16-3873-2024, 2024
Short summary
Short summary
We develop a globally applicable thresholding scheme for DEM-based floodplain delineation to improve the representation of spatial heterogeneity. It involves a stepwise approach to estimate the basin-level floodplain hydraulic geometry parameters that best respect the scaling law while approximating the global hydrodynamic flood maps. A ~90 m resolution global floodplain map, the Spatial Heterogeneity Improved Floodplain by Terrain analysis (SHIFT), is delineated with demonstrated superiority.
Yuzhong Yang, Qingbai Wu, Xiaoyan Guo, Lu Zhou, Helin Yao, Dandan Zhang, Zhongqiong Zhang, Ji Chen, and Guojun Liu
Earth Syst. Sci. Data, 16, 3755–3770, https://doi.org/10.5194/essd-16-3755-2024, https://doi.org/10.5194/essd-16-3755-2024, 2024
Short summary
Short summary
We present the temporal data of stable isotopes in different waterbodies in the Beiluhe Basin in the hinterland of the Qinghai–Tibet Plateau (QTP) produced between 2017 and 2022. In this article, the first detailed stable isotope data of 359 ground ice samples are presented. This first data set provides a new basis for understanding the hydrological effects of permafrost degradation on the QTP.
Hordur Bragi Helgason and Bart Nijssen
Earth Syst. Sci. Data, 16, 2741–2771, https://doi.org/10.5194/essd-16-2741-2024, https://doi.org/10.5194/essd-16-2741-2024, 2024
Short summary
Short summary
LamaH-Ice is a large-sample hydrology (LSH) dataset for Iceland. The dataset includes daily and hourly hydro-meteorological time series, including observed streamflow and basin characteristics, for 107 basins. LamaH-Ice offers most variables that are included in existing LSH datasets and additional information relevant to cold-region hydrology such as annual time series of glacier extent and mass balance. A large majority of the basins in LamaH-Ice are unaffected by human activities.
Chengcheng Hou, Yan Li, Shan Sang, Xu Zhao, Yanxu Liu, Yinglu Liu, and Fang Zhao
Earth Syst. Sci. Data, 16, 2449–2464, https://doi.org/10.5194/essd-16-2449-2024, https://doi.org/10.5194/essd-16-2449-2024, 2024
Short summary
Short summary
To fill the gap in the gridded industrial water withdrawal (IWW) data in China, we developed the China Industrial Water Withdrawal (CIWW) dataset, which provides monthly IWWs from 1965 to 2020 at a spatial resolution of 0.1°/0.25° and auxiliary data including subsectoral IWW and industrial output value in 2008. This dataset can help understand the human water use dynamics and support studies in hydrology, geography, sustainability sciences, and water resource management and allocation in China.
Pierre-Antoine Versini, Leydy Alejandra Castellanos-Diaz, David Ramier, and Ioulia Tchiguirinskaia
Earth Syst. Sci. Data, 16, 2351–2366, https://doi.org/10.5194/essd-16-2351-2024, https://doi.org/10.5194/essd-16-2351-2024, 2024
Short summary
Short summary
Nature-based solutions (NBSs), such as green roofs, have appeared as relevant solutions to mitigate urban heat islands. The evapotranspiration (ET) process allows NBSs to cool the air. To improve our knowledge about ET assessment, this paper presents some experimental measurement campaigns carried out during three consecutive summers. Data are available for three different (large, small, and point-based) spatial scales.
Ralph Bathelemy, Pierre Brigode, Vazken Andréassian, Charles Perrin, Vincent Moron, Cédric Gaucherel, Emmanuel Tric, and Dominique Boisson
Earth Syst. Sci. Data, 16, 2073–2098, https://doi.org/10.5194/essd-16-2073-2024, https://doi.org/10.5194/essd-16-2073-2024, 2024
Short summary
Short summary
The aim of this work is to provide the first hydroclimatic database for Haiti, a Caribbean country particularly vulnerable to meteorological and hydrological hazards. The resulting database, named Simbi, provides hydroclimatic time series for around 150 stations and 24 catchment areas.
Changming Li, Ziwei Liu, Wencong Yang, Zhuoyi Tu, Juntai Han, Sien Li, and Hanbo Yang
Earth Syst. Sci. Data, 16, 1811–1846, https://doi.org/10.5194/essd-16-1811-2024, https://doi.org/10.5194/essd-16-1811-2024, 2024
Short summary
Short summary
Using a collocation-based approach, we developed a reliable global land evapotranspiration product (CAMELE) by merging multi-source datasets. The CAMELE product outperformed individual input datasets and showed satisfactory performance compared to reference data. It also demonstrated superiority for different plant functional types. Our study provides a promising solution for data fusion. The CAMELE dataset allows for detailed research and a better understanding of land–atmosphere interactions.
Yuhan Guo, Hongxing Zheng, Yuting Yang, Yanfang Sang, and Congcong Wen
Earth Syst. Sci. Data, 16, 1651–1665, https://doi.org/10.5194/essd-16-1651-2024, https://doi.org/10.5194/essd-16-1651-2024, 2024
Short summary
Short summary
We have provided an inaugural version of the hydrogeomorphic dataset for catchments over the Tibetan Plateau. We first provide the width-function-based instantaneous unit hydrograph (WFIUH) for each HydroBASINS catchment, which can be used to investigate the spatial heterogeneity of hydrological behavior across the Tibetan Plateau. It is expected to facilitate hydrological modeling across the Tibetan Plateau.
Ziyun Yin, Peirong Lin, Ryan Riggs, George H. Allen, Xiangyong Lei, Ziyan Zheng, and Siyu Cai
Earth Syst. Sci. Data, 16, 1559–1587, https://doi.org/10.5194/essd-16-1559-2024, https://doi.org/10.5194/essd-16-1559-2024, 2024
Short summary
Short summary
Large-sample hydrology (LSH) datasets have been the backbone of hydrological model parameter estimation and data-driven machine learning models for hydrological processes. This study complements existing LSH studies by creating a dataset with improved sample coverage, uncertainty estimates, and dynamic descriptions of human activities, which are all crucial to hydrological understanding and modeling.
Pierluigi Claps, Giulia Evangelista, Daniele Ganora, Paola Mazzoglio, and Irene Monforte
Earth Syst. Sci. Data, 16, 1503–1522, https://doi.org/10.5194/essd-16-1503-2024, https://doi.org/10.5194/essd-16-1503-2024, 2024
Short summary
Short summary
FOCA (Italian FlOod and Catchment Atlas) is the first systematic collection of data on Italian river catchments. It comprises geomorphological, soil, land cover, NDVI, climatological and extreme rainfall catchment attributes. FOCA also contains 631 peak and daily discharge time series covering the 1911–2016 period. Using this first nationwide data collection, a wide range of applications, in particular flood studies, can be undertaken within the Italian territory.
Wei Jing Ang, Edward Park, Yadu Pokhrel, Dung Duc Tran, and Ho Huu Loc
Earth Syst. Sci. Data, 16, 1209–1228, https://doi.org/10.5194/essd-16-1209-2024, https://doi.org/10.5194/essd-16-1209-2024, 2024
Short summary
Short summary
Dams have burgeoned in the Mekong, but information on dams is scattered and inconsistent. Up-to-date evaluation of dams is unavailable, and basin-wide hydropower potential has yet to be systematically assessed. We present a comprehensive database of 1055 dams, a spatiotemporal analysis of the dams, and a total hydropower potential of 1 334 683 MW. Considering projected dam development and hydropower potential, the vulnerability and the need for better dam management may be highest in Laos.
Chuanqi He, Ci-Jian Yang, Jens M. Turowski, Richard F. Ott, Jean Braun, Hui Tang, Shadi Ghantous, Xiaoping Yuan, and Gaia Stucky de Quay
Earth Syst. Sci. Data, 16, 1151–1166, https://doi.org/10.5194/essd-16-1151-2024, https://doi.org/10.5194/essd-16-1151-2024, 2024
Short summary
Short summary
The shape of drainage basins and rivers holds significant implications for landscape evolution processes and dynamics. We used a global 90 m resolution topography to obtain ~0.7 million drainage basins with sizes over 50 km2. Our dataset contains the spatial distribution of drainage systems and their morphological parameters, supporting fields such as geomorphology, climatology, biology, ecology, hydrology, and natural hazards.
Jingyu Lin, Peng Wang, Jinzhu Wang, Youping Zhou, Xudong Zhou, Pan Yang, Hao Zhang, Yanpeng Cai, and Zhifeng Yang
Earth Syst. Sci. Data, 16, 1137–1149, https://doi.org/10.5194/essd-16-1137-2024, https://doi.org/10.5194/essd-16-1137-2024, 2024
Short summary
Short summary
Our paper provides a repository comprising over 330 000 observations encompassing daily, weekly, and monthly records of surface water quality spanning the period 1980–2022. It included 18 distinct indicators, meticulously gathered at 2384 monitoring sites, ranging from inland locations to coastal and oceanic areas. This dataset will be very useful for researchers and decision-makers in the fields of hydrology, ecological studies, climate change, policy development, and oceanography.
Cited articles
Addor, N., Newman, A. J., Mizukami, N., and Clark, M. P.: The CAMELS data set: catchment attributes and meteorology for large-sample studies, Hydrol. Earth Syst. Sci., 21, 5293–5313, https://doi.org/10.5194/hess-21-5293-2017, 2017. a
Akiba, T., Sano, S., Yanase, T., Ohta, T., and Koyama, M.: Optuna: A Next-generation Hyperparameter Optimization Framework, in: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, New York, NY, 4–8 August 2019, USA2623–2631, https://doi.org/10.1145/3292500.3330701, 2019. a
Angradi, T. R., Ringold, P. L., and Hall, K.: Water clarity measures as indicators of recreational benefits provided by U. S. lakes: Swimming and aesthetics, Ecol. Indic., 93, 1005–1019, https://doi.org/10.1016/j.ecolind.2018.06.001, 2018. a
Apache Arrow Developers: pyarrow: Python library for Apache Arrow, Python Package Index [code], https://pypi.org/project/pyarrow (last access: 5 September 2024), 2024. a
Arend, K. K., Beletsky, D., DePinto, J. V., Ludsin, S. A., Roberts, J. J., Rucinski, D. K., Scavia, D., Schwab, D. J., and Höök, T. O.: Seasonal and interannual effects of hypoxia on fish habitat quality in central Lake Erie, Freshwater Biol., 56, 366–383, https://doi.org/10.1111/j.1365-2427.2010.02504.x, 2011. a
Auguie, B.: gridExtra: Miscellaneous Functions for “Grid” Graphics, The Comprehensive R Archive Network [code], https://CRAN.R-project.org/package=gridExtra (last access: 13 June 2024), 2017. a
Becker, R. A., Wilks, A. R., and Brownrigg, R.: mapdata: Extra Map Databases, The Comprehensive R Archive Network [code], https://CRAN.R-project.org/package=mapdata (last access: 11 March 2024), 2022. a
Becker, R. A., Wilks, A. R., Brownrigg, R., Minka, T. P., and Deckmyn, A.: maps: Draw Geographical Maps, The Comprehensive R Archive Network [code], https://cran.r-project.org/package=maps (last access: 11 March 2024), 2023. a
Bjarke, N. R., Livneh, B., Elmendorf, S. C., Molotch, N. P., Hinckley, E.-L. S., Emery, N. C., Johnson, P. T. J., Morse, J. F., and Suding, K. N.: Catchment-scale observations at the Niwot Ridge long-term ecological research site, Hydrol. Process., 35, e14320, https://doi.org/10.1002/hyp.14320, 2021. a
Carey, C. C., Ward, N. K., Farrell, K. J., Lofton, M. E., Krinos, A. I., McClure, R. P., Subratie, K. C., Figueiredo, R. J., Doubek, J. P., Hanson, P. C., Papadopoulos, P., and Arzberger, P.: Enhancing collaboration between ecologists and computer scientists: lessons learned and recommendations forward, Ecosphere, 10, e02753, https://doi.org/10.1002/ecs2.2753, 2019. a
Carey, C. C., Lewis, A. S. L., Howard, D. W., Woelmer, W. M., Gantzer, P. A., Bierlein, K. A., Little, J. C., and WVWA: Bathymetry and watershed area for Falling Creek Reservoir, Beaverdam Reservoir, and Carvins Cove Reservoir (1), Environmental Data Initiative [data set], https://doi.org.10.6073/pasta/352735344150f7e77d2bc18b69a 22412, 2022. a
Carey, C. C., Howard, D. W., Hoffman, K. K., Wander, H. L., Breef-Pilz, A., Niederlehner, B. R., Haynie, G., Keverline, R., Kricheldorf, M., and Tipper, E.: Water chemistry time series for Beaverdam Reservoir, Carvins Cove Reservoir, Falling Creek Reservoir, Gatewood Reservoir, and Spring Hollow Reservoir in southwestern Virginia, USA 2013-2023 (12), Environmental Data Initiative [data set], https://doi.org.10.6073/pasta/7d7fdc5081ed5211651f86862e 8b2b1e, 2024. a
Chen, C., Chen, Q., Yao, S., He, M., Zhang, J., Li, G., and Lin, Y.: Combining physical-based model and machine learning to forecast chlorophyll-a concentration in freshwater lakes, Sci. Total Environ., 907, 168097, https://doi.org/10.1016/j.scitotenv.2023.168097, 2024a. a
Chen, L., Wang, L., Ma, W., Xu, X., and Wang, H.: PID4LaTe: a physics-informed deep learning model for lake multi-depth temperature prediction, Earth Sci. Inform., 17, 3779–3795, https://doi.org/10.1007/s12145-024-01377-5, 2024b. a
Cho, K., van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y.: Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation, in: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), EMNLP 2014, Doha, Qatar, 25–29 October 2014, 1724–1734, https://doi.org/10.3115/v1/D14-1179,2014. a
Daw, A., Karpatne, A., Watkins, W., Read, J., and Kumar, V.: Physics-guided Neural Networks (PGNN): An Application in Lake Temperature Modeling, arXiv [preprint], https://doi.org/10.48550/arXiv.1710.11431, 28 September 2021. a
Demir, I., Xiang, Z., Demiray, B., and Sit, M.: WaterBench-Iowa: a large-scale benchmark dataset for data-driven streamflow forecasting, Earth Syst. Sci. Data, 14, 5605–5616, https://doi.org/10.5194/essd-14-5605-2022, 2022. a
Du, W., Côté, D., and Liu, Y.: SAITS: Self-attention-based imputation for time series, Expert Syst. Appl., 219, 119619, https://doi.org/10.1016/j.eswa.2023.119619, 2023. a
Durant, M. and Augsperger, T.: fastparquet, Python Package Index [code], https://pypi.org/project/fastparquet (last access: 5 September 2024), 2024. a
Ejigu, M. T.: Overview of water quality modeling, Cogent Engineering, 8, 1891711, https://doi.org/10.1080/23311916.2021.1891711, 2021. a
FAIRsharing.org: Quantities, Units, Dimensions and Types (QUDT), https://doi.org/10.25504/FAIRsharing.d3pqw7, 2022. a
Flanagan, C. M., McKnight, D. M., Liptzin, D., Williams, M. W., and Miller, M. P.: Response of the Phytoplankton Community in an Alpine Lake to Drought Conditions: Colorado Rocky Mountain Front Range, U. S.A, Arct. Antarct. Alp. Res., 41, 191–203, https://doi.org/10.1657/1938.4246-41.2.191, 2009. a
Gerling, A. B., Browne, R. G., Gantzer, P. A., Mobley, M. H., Little, J. C., and Carey, C. C.: First report of the successful operation of a side stream supersaturation hypolimnetic oxygenation system in a eutrophic, shallow reservoir, Water Res., 67, 129–143, https://doi.org/10.1016/j.watres.2014.09.002, 2014. a, b, c
Goodman, K. J., Parker, S. M., Edmonds, J. W., and Zeglin, L. H.: Expanding the scale of aquatic sciences: the role of the National Ecological Observatory Network (NEON), Freshw. Sci., 34, 377–385, https://doi.org/10.1086/679459, 2015. a, b
Gries, C., Hanson, P. C., O'Brien, M., Servilla, M., Vanderbilt, K., and Waide, R.: The Environmental Data Initiative: Connecting the past to the future through data reuse, Ecol. Evol., 13, e9592, https://doi.org/10.1002/ece3.9592, 2023. a
Guo, M., Zhuang, Q., Yao, H., Golub, M., Leung, L. R., Pierson, D., and Tan, Z.: Validation and Sensitivity Analysis of a 1-D Lake Model Across Global Lakes, J. Geophys. Res.-Atmos., 126, e2020JD033417, https://doi.org/10.1029/2020JD033417, 2021. a
Hamilton, D. P., Carey, C. C., Arvola, L., Arzberger, P., Brewer, C., Cole, J. J., Gaiser, E., Hanson, P. C., Ibelings, B. W., Jennings, E., Kratz, T. K., Lin, F.-P., McBride, C. G., David de Marques, M., Muraoka, K., Nishri, A., Qin, B., Read, J. S., Rose, K. C., Ryder, E., Weathers, K. C., Zhu, G., Trolle, D., and Brookes, J. D.: A Global Lake Ecological Observatory Network (GLEON) for synthesising high-frequency sensor data for validation of deterministic ecological models, Inland Waters, 5, 49–56, https://doi.org/10.5268/IW-5.1.566, 2015. a
Hanson, P. C., Carpenter, S. R., Armstrong, D. E., Stanley, E. H., and Kratz, T. K.: Lake Dissolved Inorganic Carbon and Dissolved Oxygen: Changing Drivers from Days to Decades, Ecol. Monogr., 76, 343–363, https://doi.org/10.1890/0012-9615(2006)076[0343:LDICAD]2.0.CO;2, 2006. a, b
Hanson, P. C., Weathers, K. C., and Kratz, T. K.: Networked lake science: how the Global Lake Ecological Observatory Network (GLEON) works to understand, predict, and communicate lake ecosystem response to global change, Inland Waters, 6, 543–554, https://doi.org/10.1080/IW-6.4.904, 2016. a
Hanson, P. C., Stillman, A. B., Jia, X., Karpatne, A., Dugan, H. A., Carey, C. C., Stachelek, J., Ward, N. K., Zhang, Y., Read, J. S., and Kumar, V.: Predicting lake surface water phosphorus dynamics using process-guided machine learning, Ecol. Model., 430, 109136, https://doi.org/10.1016/j.ecolmodel.2020.109136, 2020. a
Hanson, P. C., Ladwig, R., Buelo, C., Albright, E. A., Delany, A. D., and Carey, C. C.: Legacy Phosphorus and Ecosystem Memory Control Future Water Quality in a Eutrophic Lake, J. Geophys. Res.-Biogeo., 128, e2023JG007620, https://doi.org/10.1029/2023JG007620, 2023. a, b
Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., Kern, R., Picus, M., Hoyer, S., van Kerkwijk, M. H., Brett, M., Haldane, A., del Río, J. F., Wiebe, M., Peterson, P., Gérard-Marchant, P., Sheppard, K., Reddy, T., Weckesser, W., Abbasi, H., Gohlke, C., and Oliphant, T. E.: Array programming with NumPy, Nature, 585, 357–362, https://doi.org/10.1038/s41586-020-2649-2, 2020. a
Helsel, D. R.: Advantages of nonparametric procedures for analysis of water quality data, Hydrolog. Sci. J., 32, 179–190, https://doi.org/10.1080/02626668709491176, 1987. a, b, c
Hipsey, M. R., Bruce, L. C., Boon, C., Busch, B., Carey, C. C., Hamilton, D. P., Hanson, P. C., Read, J. S., de Sousa, E., Weber, M., and Winslow, L. A.: A General Lake Model (GLM 3.0) for linking with high-frequency sensor data from the Global Lake Ecological Observatory Network (GLEON), Geosci. Model Dev., 12, 473–523, https://doi.org/10.5194/gmd-12-473-2019, 2019. a
Jain, S. M.: Hugging Face, in: Introduction to Transformers for NLP: With the Hugging Face Library and Models to Solve Problems, edited by: Jain, S. M., Apress, Berkeley, CA, https://doi.org/10.1007/978-1-4842-8844-3_4, 51–67, 2022. a
Jane, S. F., Hansen, G. J. A., Kraemer, B. M., Leavitt, P. R., Mincer, J. L., North, R. L., Pilla, R. M., Stetler, J. T., Williamson, C. E., Woolway, R. I., Arvola, L., Chandra, S., DeGasperi, C. L., Diemer, L., Dunalska, J., Erina, O., Flaim, G., Grossart, H.-P., Hambright, K. D., Hein, C., Hejzlar, J., Janus, L. L., Jenny, J.-P., Jones, J. R., Knoll, L. B., Leoni, B., Mackay, E., Matsuzaki, S.-I. S., McBride, C., Müller-Navarra, D. C., Paterson, A. M., Pierson, D., Rogora, M., Rusak, J. A., Sadro, S., Saulnier-Talbot, E., Schmid, M., Sommaruga, R., Thiery, W., Verburg, P., Weathers, K. C., Weyhenmeyer, G. A., Yokota, K., and Rose, K. C.: Widespread deoxygenation of temperate lakes, Nature, 594, 66–70, https://doi.org/10.1038/s41586-021-03550-y, 2021. a, b
Jane, S. F., Detmer, T. M., Larrick, S. L., Rose, K. C., Randall, E. A., Jirka, K. J., and McIntyre, P. B.: Concurrent warming and browning eliminate cold-water fish habitat in many temperate lakes, P. Natl. Acad. Sci. USA, 121, e2306906120, https://doi.org/10.1073/pnas.2306906120, 2024. a
Kadkhodazadeh, M. and Farzin, S.: A Novel LSSVM Model Integrated with GBO Algorithm to Assessment of Water Quality Parameters, Water Resour. Manag., 35, 3939–3968, https://doi.org/10.1007/s11269-021-02913-4, 2021. a
Karpatne, A., Atluri, G., Faghmous, J., Steinbach, M., Banerjee, A., Ganguly, A., Shekhar, S., Samatova, N., and Kumar, V.: Theory-guided Data Science: A New Paradigm for Scientific Discovery from Data, IEEE T. Knowl. Data En., 29, 2318–2331, https://doi.org/10.1109/TKDE.2017.2720168, 2017. a
Karpatne, A., Jia, X., and Kumar, V.: Knowledge-guided Machine Learning: Current Trends and Future Prospects, arXiv [preprint], https://doi.org/10.48550/arXiv.2403.15989, 2024. a, b, c
Keeler, B. L., Polasky, S., Brauman, K. A., Johnson, K. A., Finlay, J. C., O'Neill, A., Kovacs, K., and Dalzell, B.: Linking water quality and well-being for improved assessment and valuation of ecosystem services, P. Natl. Acad. Sci. USA, 109, 18619–18624, https://doi.org/10.1073/pnas.1215991109, 2012. a
Lacoste, A., Lehmann, N., Rodriguez, P., Sherwin, E. D., Kerner, H., Lütjens, B., Irvin, J. A., Dao, D., Alemohammad, H., Drouin, A., Gunturkun, M., Huang, G., Vazquez, D., Newman, D., Bengio, Y., Ermon, S., and Zhu, X. X.: GEO-Bench: Toward Foundation Models for Earth Monitoring, arXiv [preprint], https://doi.org/10.48550/arXiv.2306.03831, 23 December 2023. a
Ladwig, R., Hanson, P. C., Dugan, H. A., Carey, C. C., Zhang, Y., Shu, L., Duffy, C. J., and Cobourn, K. M.: Lake thermal structure drives interannual variability in summer anoxia dynamics in a eutrophic lake over 37 years, Hydrol. Earth Syst. Sci., 25, 1009–1032, https://doi.org/10.5194/hess-25-1009-2021, 2021. a, b, c, d
Ladwig, R., Daw, A., Albright, E. A., Buelo, C., Karpatne, A., Meyer, M. F., Neog, A., Hanson, P. C., and Dugan, H. A.: Modular Compositional Learning Improves 1D Hydrodynamic Lake Model Performance by Merging Process-Based Modeling With Deep Learning, J. Adv. Model. Earth Sy., 16, e2023MS003953, https://doi.org/10.1029/2023MS003953, 2024. a, b
Langman, O. C., Hanson, P. C., Carpenter, S. R., and Hu, Y. H.: Control of dissolved oxygen in northern temperate lakes over scales ranging from minutes to days, Aquat. Biol., 9, 193–202, https://doi.org/10.3354/ab00249, 2010. a
Li, X., Nieber, J. L., and Kumar, V.: Machine learning applications in vadose zone hydrology: A review, Vadose Zone J., 23, e20361, https://doi.org/10.1002/vzj2.20361, 2024. a
Lim, K.-Y. and Surbeck, C. Q.: A multi-variate methodology for analyzing pre-existing lake water quality data, J. Environ. Monitor., 13, 2477–2487, https://doi.org/10.1039/C1EM10119F, 2011. a, b, c
Lin, S., Pierson, D. C., and Mesman, J. P.: Prediction of algal blooms via data-driven machine learning models: an evaluation using data from a well-monitored mesotrophic lake, Geosci. Model Dev., 16, 35–46, https://doi.org/10.5194/gmd-16-35-2023, 2023. a
Lofton, M. E., Howard, D. W., Thomas, R. Q., and Carey, C. C.: Progress and opportunities in advancing near-term forecasting of freshwater quality, Glob. Change Biol., 29, 1691–1714, https://doi.org/10.1111/gcb.16590, 2023. a
Lottig, N.: High Frequency Under-Ice Water Temperature Buoy Data – Crystal Bog, Trout Bog, and Lake Mendota, Wisconsin, USA 2016–2020 (3), Environmental Data Initiative [data set], https://doi.org.10.6073/pasta/ad192ce8fbe8175619d6a41aa2 f72294, 2022. a
Lottig, N. R. and Dugan, H. A.: North Temperate Lakes-LTER Core Research Lakes Information (1), Environmental Data Initiative [data set], https://doi.org.10.6073/pasta/b9080c962f552029ee2b43aec 1410328, 2024. a, b, c
Lunch, C., Laney, C., Mietkiewicz, N., Sokol, E., Cawley, K., and NEON (National Ecological Observatory Network): neonUtilities: Utilities for Working with NEON Data, The Comprehensive R Archive Network [code], https://CRAN.R-project.org/package=neonUtilities (last access: 7 March 2024), 2024. a
Magnuson, J. J., Kratz, T. K., Allen, T. F., Armstrong, D. E., Benson, B. J., Bowser, C. J., Bolgrien, D. W., Carpenter, S. R., Frost, T. M., Gower, S. T., Lillesand, T. M., Pike, J. A., and Turner, M. G.: Regionalization of long-term ecological research (LTER) on north temperate lakes, SIL Proceedings, 1922–2010, 26, 522–528, https://doi.org/10.1080/03680770.1995.11900771, 1997. a
McAfee, B. J., Lofton, M. E., Breef-Pilz, A., Goodman, K. J., Hensley, R. T., Hoffman, K. K., Howard, D. W., Lewis, A. S. L., McKnight, D. M., Oleksy, I. A., Wander, H. L., Carey, C. C., Karpatne, A., and Hanson, P. C.: LakeBeD-US: Ecology Edition – a benchmark dataset of lake water quality time series and vertical profiles, Environmental Data Initiative [data set], https://doi.org.10.6073/pasta/c56a204a65483790f6277de4896 d7140, 2024. a, b, c
McKinney, W.: Data Structures for Statistical Computing in Python, in: Proceedings of the 9th Python in Science Conference, SciPy 2010, Austin, Texas, United States, 28 June–3 July 2010, https://doi.org/10.25080/Majora-92bf1922-00a, 56–61, 2010. a
Messager, M. L., Lehner, B., Grill, G., Nedeva, I., and Schmitt, O.: Estimating the volume and age of water stored in global lakes using a geo-statistical approach, Nat. Commun., 7, 13603, https://doi.org/10.1038/ncomms13603, 2016. a
Meyer, M. F., Topp, S. N., King, T. V., Ladwig, R., Pilla, R. M., Dugan, H. A., Eggleston, J. R., Hampton, S. E., Leech, D. M., Oleksy, I. A., Ross, J. C., Ross, M. R. V., Woolway, R. I., Yang, X., Brousil, M. R., Fickas, K. C., Padowski, J. C., Pollard, A. I., Ren, J., and Zwart, J. A.: National-scale remotely sensed lake trophic state from 1984 through 2020, Sci. Data, 11, 77, https://doi.org/10.1038/s41597-024-02921-0, 2024. a
Miller, T., Durlik, I., Adrianna, K., Kisiel, A., Cembrowska-Lech, D., Spychalski, I., and Tuński, T.: Predictive Modeling of Urban Lake Water Quality Using Machine Learning: A 20-Year Study, Appl. Sci., 13, 11217, https://doi.org/10.3390/app132011217, 2023. a
Nguyen, T., Brandstetter, J., Kapoor, A., Gupta, J. K., and Grover, A.: ClimaX: A foundation model for weather and climate, arXiv [preprint], https://doi.org/10.48550/arXiv.2301.10343, 18 December 2023. a
Paerl, H. W. and Huisman, J.: Climate change: a catalyst for global expansion of harmful cyanobacterial blooms, Env. Microbiol. Rep., 1, 27–37, https://doi.org/10.1111/j.1758-2229.2008.00004.x, 2009. a
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S.: PyTorch: an imperative style, high-performance deep learning library, in: Proceedings of the 33rd International Conference on Neural Information Processing Systems, Curran Associates Inc., Red Hook, NY, USA, 8–14 December 2019, 8026–8037, https://dl.acm.org/doi/10.5555/3454287.3455008 (last access: 7 November 2024), 2019. a
Peters, B., Brenner, S. E., Wang, E., Slonim, D., and Kann, M. G.: Putting benchmarks in their rightful place: The heart of computational biology, PLOS Comput. Biol., 14, e1006494, https://doi.org/10.1371/journal.pcbi.1006494, 2018. a
Pollard, A. I., Hampton, S. E., and Leech, D. M.: The Promise and Potential of Continental-Scale Limnology Using the U. S. Environmental Protection Agency's National Lakes Assessment, Limnology and Oceanography Bulletin, 27, 36–41, https://doi.org/10.1002/lob.10238, 2018. a
Pradhan, A., McAfee, B. J., Neog, A., Fatemi, S., Lofton, M. E., Carey, C. C., Karpatne, A., and Hanson, P. C.: LakeBeD-US: Computer Science Edition – a benchmark dataset for lake water quality time series and vertical profiles, Hugging Face [data set], https://doi.org/10.57967/hf/3771, 2024. a, b, c
Preston, D. L., Caine, N., McKnight, D. M., Williams, M. W., Hell, K., Miller, M. P., Hart, S. J., and Johnson, P. T. J.: Climate regulates alpine lake ice cover phenology and aquatic ecosystem structure, Geophys. Res. Lett., 43, 5353–5360, https://doi.org/10.1002/2016GL069036, 2016. a
Rangaraj, A. G., ShobanaDevi, A., Srinath, Y., Boopathi, K., and Balaraman, K.: Efficient and Secure Storage for Renewable Energy Resource Data Using Parquet for Data Analytics, in: Data Management, Analytics and Innovation, International Conference on Data Management 2021, Analytics and Innovation, Online, 263–292, https://doi.org/10.1007/978-981-16-2937-2_19, 2021. a
Read, E. K., Carr, L., De Cicco, L., Dugan, H. A., Hanson, P. C., Hart, J. A., Kreft, J., Read, J. S., and Winslow, L. A.: Water quality data for national-scale aquatic research: The Water Quality Portal, Water Resour. Res., 53, 1735–1745, https://doi.org/10.1002/2016WR019993, 2017. a
Read, J. S., Jia, X., Willard, J., Appling, A. P., Zwart, J. A., Oliver, S. K., Karpatne, A., Hansen, G. J. A., Hanson, P. C., Watkins, W., Steinbach, M., and Kumar, V.: Process-Guided Deep Learning Predictions of Lake Water Temperature, Water Resour. Res., 55, 9173–9190, https://doi.org/10.1029/2019WR024922, 2019. a, b
Richardson, N., Cook, I., Crane, N., Dunnington, D., François, R., Keane, J., Moldovan-Grünfeld, D., Ooms, J., Wujciak-Jens, J., and Apache Arrow: arrow: Integration to “Apache” “Arrow”, GitHub [code], https://github.com/apache/arrow (last access: 7 March 2024), 2024. a
Rodríguez, R., Pastorini, M., Etcheverry, L., Chreties, C., Fossati, M., Castro, A., and Gorgoglione, A.: Water-Quality Data Imputation with a High Percentage of Missing Values: A Machine Learning Approach, Sustainability, 13, 6318, https://doi.org/10.3390/su13116318, 2021. a
Sarkar, A., Yang, Y., and Vihinen, M.: Variation benchmark datasets: update, criteria, quality and applications, Database, 2020, baz117, https://doi.org/10.1093/database/baz117, 2020. a, b
Schür, C., Gasser, L., Perez-Cruz, F., Schirmer, K., and Baity-Jesi, M.: A benchmark dataset for machine learning in ecotoxicology, Sci. Data, 10, 718, https://doi.org/10.1038/s41597-023-02612-2, 2023. a
Slowikowski, K.: ggrepel: Automatically Position Non-Overlapping Text Labels with “ggplot2”, The Comprehensive R Archive Network [code], https://CRAN.r-project.org/package=ggrepel (last access: 11 March 2024), 2024. a
Smith, C.: EDIutils: An API Client for the Environmental Data Initiative Repository in R, GitHub [code], https://github.com/ropensci/EDIutils (last access: 7 March 2024), 2023. a
Snortheim, C. A., Hanson, P. C., McMahon, K. D., Read, J. S., Carey, C. C., and Dugan, H. A.: Meteorological drivers of hypolimnetic anoxia in a eutrophic, north temperate lake, Ecol. Model., 343, 39–53, https://doi.org/10.1016/j.ecolmodel.2016.10.014, 2017. a
Solomon, C. T., Bruesewitz, D. A., Richardson, D. C., Rose, K. C., Van de Bogert, M. C., Hanson, P. C., Kratz, T. K., Larget, B., Adrian, R., Babin, B. L., Chiu, C.-Y., Hamilton, D. P., Gaiser, E. E., Hendricks, S., Istvànovics, V., Laas, A., O'Donnell, D. M., Pace, M. L., Ryder, E., Staehr, P. A., Torgersen, T., Vanni, M. J., Weathers, K. C., and Zhu, G.: Ecosystem respiration: Drivers of daily variability and background respiration in lakes around the globe, Limnol. Oceanogr., 58, 849–866, https://doi.org/10.4319/lo.2013.58.3.0849, 2013. a
Soranno, P. A., Bacon, L. C., Beauchene, M., Bednar, K. E., Bissell, E. G., Boudreau, C. K., Boyer, M. G., Bremigan, M. T., Carpenter, S. R., Carr, J. W., Cheruvelil, K. S., Christel, S. T., Claucherty, M., Collins, S. M., Conroy, J. D., Downing, J. A., Dukett, J., Fergus, C. E., Filstrup, C. T., Funk, C., Gonzalez, M. J., Green, L. T., Gries, C., Halfman, J. D., Hamilton, S. K., Hanson, P. C., Henry, E. N., Herron, E. M., Hockings, C., Jackson, J. R., Jacobson-Hedin, K., Janus, L. L., Jones, W. W., Jones, J. R., Keson, C. M., King, K. B. S., Kishbaugh, S. A., Lapierre, J.-F., Lathrop, B., Latimore, J. A., Lee, Y., Lottig, N. R., Lynch, J. A., Matthews, L. J., McDowell, W. H., Moore, K. E. B., Neff, B. P., Nelson, S. J., Oliver, S. K., Pace, M. L., Pierson, D. C., Poisson, A. C., Pollard, A. I., Post, D. M., Reyes, P. O., Rosenberry, D. O., Roy, K. M., Rudstam, L. G., Sarnelle, O., Schuldt, N. J., Scott, C. E., Skaff, N. K., Smith, N. J., Spinelli, N. R., Stachelek, J., Stanley, E. H., Stoddard, J. L., Stopyak, S. B., Stow, C. A., Tallant, J. M., Tan, P.-N., Thorpe, A. P., Vanni, M. J., Wagner, T., Watkins, G., Weathers, K. C., Webster, K. E., White, J. D., Wilmes, M. K., and Yuan, S.: LAGOS-NE: a multi-scaled geospatial and temporal database of lake ecological context and water quality for thousands of US lakes, GigaScience, 6, gix101, https://doi.org/10.1093/gigascience/gix101, 2017. a, b, c
Spaulding, S. A., Platt, L. R. C., Murphy, J. C., Covert, A., and Harvey, J. W.: Chlorophyll a in lakes and streams of the United States (2005–2022), Sci. Data, 11, 611, https://doi.org/10.1038/s41597-024-03453-3, 2024. a
Stanley, E. H., Collins, S. M., Lottig, N. R., Oliver, S. K., Webster, K. E., Cheruvelil, K. S., and Soranno, P. A.: Biases in lake water quality sampling and implications for macroscale research, Limnol. Oceanogr., 64, 1572–1585, https://doi.org/10.1002/lno.11136, 2019. a
Stoker, J. M. and Miller, B.: The accuracy and consistency of 3D Elevation Program data: A systematic analysis, Remote Sens., 14, 4, https://doi.org/10.3390/rs14040940, 2022. a
Sutskever, I., Vinyals, O., and Le, Q. V.: Sequence to Sequence Learning with Neural Networks, arXiv [preprint], https://doi.org/10.48550/arXiv.1409.3215, 14 December 2014. a
The pandas development team: pandas-dev/pandas: Pandas (v2.2.2), Zenodo [code], https://doi.org/10.5281/zenodo.10957263, 2024. a
Thimijan, R. W. and Heins, R. D.: Photometric, Radiometric, and Quantum Light Units of Measure: A Review of Procedures for Interconversion, Hortic. Sci., 18, 818–822, https://doi.org/10.21273/HORTSCI.18.6.818, 1983. a
Thomas, R. Q., McClure, R. P., Moore, T. N., Woelmer, W. M., Boettiger, C., Figueiredo, R. J., Hensley, R. T., and Carey, C. C.: Near-term forecasts of NEON lakes reveal gradients of environmental predictability across the US, Front. Ecol. Environ., 21, 220–226, https://doi.org/10.1002/fee.2623, 2023. a, b, c
Van Rossum, G. and Drake, F. L.: Python 3 Reference Manual, CreateSpace, Scotts Valley, CA, ISBN 1441412697, 2009. a
Varadharajan, C., Appling, A. P., Arora, B., Christianson, D. S., Hendrix, V. C., Kumar, V., Lima, A. R., Müller, J., Oliver, S., Ombadi, M., Perciano, T., Sadler, J. M., Weierbach, H., Willard, J. D., Xu, Z., and Zwart, J.: Can machine learning accelerate process understanding and decision-relevant predictions of river water quality?, Hydrol. Process., 36, e14565, https://doi.org/10.1002/hyp.14565, 2022. a
Verpoorter, C., Kutser, T., Seekell, D. A., and Tranvik, L. J.: A global inventory of lakes based on high-resolution satellite imagery, Geophys. Res. Lett., 41, 6396–6402, https://doi.org/10.1002/2014GL060641, 2014. a
Virro, H., Amatulli, G., Kmoch, A., Shen, L., and Uuemaa, E.: GRQA: Global River Water Quality Archive, Earth Syst. Sci. Data, 13, 5483–5507, https://doi.org/10.5194/essd-13-5483-2021, 2021. a
Wai, K. P., Chia, M. Y., Koo, C. H., Huang, Y. F., and Chong, W. C.: Applications of deep learning in water quality management: A state-of-the-art review, J. Hydrol., 613, 128332, https://doi.org/10.1016/j.jhydrol.2022.128332, 2022. a
Wander, H. L., Farruggia, M. J., La Fuente, S., Korver, M. C., Chapina, R. J., Robinson, J., Bah, A., Munthali, E., Ghosh, R., Stachelek, J., Khandelwal, A., Hanson, P. C., and Weathers, K. C.: Using Knowledge-Guided Machine Learning To Assess Patterns of Areal Change in Waterbodies across the Contiguous United States, Environ. Sci. Technol., 58, 5003–5013, https://doi.org/10.1021/acs.est.3c05784, 2024. a
Weathers, K. C., Hanson, P. C., Arzberger, P., Brentrup, J., Brookes, J., Carey, C. C., Gaiser, E., Gaiser, E., Hamilton, D. P., Hong, G. S., Ibelings, B., Istvánovics, V., Jennings, E., Kim, B., Kratz, T., Lin, F.-P., Muraoka, K., O'Reilly, C., Rose, K. C., Ryder, E., and Zhu, G.: The Global Lake Ecological Observatory Network (GLEON): The Evolution of Grassroots Network Science, Limnology and Oceanography Bulletin, 22, 71–73, https://doi.org/10.1002/lob.201322371, 2013. a
Weinstein, B. G., Graves, S. J., Marconi, S., Singh, A., Zare, A., Stewart, D., Bohlman, S. A., and White, E. P.: A benchmark dataset for canopy crown detection and delineation in co-registered airborne RGB, LiDAR and hyperspectral imagery from the National Ecological Observation Network, PLOS Comput. Biol., 17, e1009180, https://doi.org/10.1371/journal.pcbi.1009180, 2021. a, b
Wickham, H.: ggplot2: Elegant Graphics for Data Analysis, Use R!, Springer-Verlag, New York, https://doi.org/10.1007/978-3-319-24277-4, 2016. a, b
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., Takahashi, K., Vaughan, D., Wilke, C., Woo, K., and Yutani, H.: Welcome to the tidyverse, Journal of Open Source Software, 4, 1686, https://doi.org/10.21105/joss.01686, 2019. a
Wilke, C. O.: cowplot: Streamlined Plot Theme and Plot Annotations for `ggplot2', The Comprehensive R Archive Network [code], https://CRAN.r-project.org/package=cowplot (last access: 7 March 2024), 2024. a
Wilkinson, G. M., Walter, J. A., Buelo, C. D., and Pace, M. L.: No evidence of widespread algal bloom intensification in hundreds of lakes, Front. Ecol. Environ., 20, 16–21, https://doi.org/10.1002/fee.2421, 2022. a
Wilkinson, M. D., Dumontier, M., Aalbersberg, Ij. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., Gonzalez-Beltran, A., Gray, A. J. G., Groth, P., Goble, C., Grethe, J. S., Heringa, J., 't Hoen, P. A. C., Hooft, R., Kuhn, T., Kok, R., Kok, J., Lusher, S. J., Martone, M. E., Mons, A., Packer, A. L., Persson, B., Rocca-Serra, P., Roos, M., van Schaik, R., Sansone, S.-A., Schultes, E., Sengstag, T., Slater, T., Strawn, G., Swertz, M. A., Thompson, M., van der Lei, J., van Mulligen, E., Velterop, J., Waagmeester, A., Wittenburg, P., Wolstencroft, K., Zhao, J., and Mons, B.: The FAIR Guiding Principles for scientific data management and stewardship, Sci. Data, 3, 160018, https://doi.org/10.1038/sdata.2016.18, 2016. a
Willard, J. D., Read, J. S., Appling, A. P., Oliver, S. K., Jia, X., and Kumar, V.: Predicting Water Temperature Dynamics of Unmonitored Lakes With Meta-Transfer Learning, Water Resour. Res., 57, e2021WR029579, https://doi.org/10.1029/2021WR029579, 2021. a
Yang, X., Liang, W., and Zou, J.: Navigating Dataset Documentations in AI: A Large-Scale Analysis of Dataset Cards on Hugging Face, arXiv [preprint], https://doi.org/10.48550/arXiv.2401.13822, 2024. a
Zhao, L., Zhu, R., Zhou, Q., Jeppesen, E., and Yang, K.: Trophic status and lake depth play important roles in determining the nutrient-chlorophyll a relationship: Evidence from thousands of lakes globally, Water Res., 242, 120182, https://doi.org/10.1016/j.watres.2023.120182, 2023. a
Short summary
LakeBeD-US is a dataset of lake water quality data collected by multiple long-term monitoring programs around the United States. This dataset is designed to foster collaboration between lake scientists and computer scientists to improve predictions of water quality. By offering a way for computer models to be tested against real-world lake data, LakeBeD-US offers opportunities for both sciences to grow and to give new insights into the causes of water quality changes.
LakeBeD-US is a dataset of lake water quality data collected by multiple long-term monitoring...
Altmetrics
Final-revised paper
Preprint