Preprints
https://doi.org/10.5194/essd-2025-27
https://doi.org/10.5194/essd-2025-27
05 Feb 2025
 | 05 Feb 2025
Status: a revised version of this preprint was accepted for the journal ESSD and is expected to appear here in due course.

LakeBeD-US: a benchmark dataset for lake water quality time series and vertical profiles

Bennett J. McAfee, Aanish Pradhan, Abhilash Neog, Sepideh Fatemi, Robert T. Hensley, Mary E. Lofton, Anuj Karpatne, Cayelan C. Carey, and Paul C. Hanson

Abstract. Water quality in lakes is an emergent property of complex biotic and abiotic processes that differ across spatial and temporal scales. Water quality is also a determinant of ecosystem services that lakes provide, and thus is of great interest to ecologists. Increasingly, machine learning and other computer science techniques are being used to predict water quality dynamics as well as to gain a greater understanding of water quality patterns and controls. To benefit both the sciences of ecology and computer science, we have created a benchmark dataset of lake water quality time series and vertical profiles. LakeBeD-US contains over 500 million unique observations of lake water quality collected by multiple long-term monitoring organizations across 17 water quality variables in 21 lakes in the United States. There are two published versions of LakeBeD-US: an "Ecology Edition" published in the Environmental Data Initiative repository, and a "Computer Science Edition" published in the Hugging Face repository. Each edition is formatted in a manner conducive to inquiries and analyses specific to each domain. For ecologists, LakeBeD-US provides an opportunity to study the spatial and temporal dynamics of several lakes with varying water quality, ecosystem, and landscape characteristics. For computer scientists, LakeBeD-US acts as a benchmark dataset that enables the advancement of machine learning for water quality prediction.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.
Share
Download
Short summary
LakeBeD-US is a dataset of lake water quality data collected by multiple long-term monitoring...
Share