Preprints
https://doi.org/10.5194/essd-2025-27
https://doi.org/10.5194/essd-2025-27
05 Feb 2025
 | 05 Feb 2025
Status: this preprint is currently under review for the journal ESSD.

LakeBeD-US: a benchmark dataset for lake water quality time series and vertical profiles

Bennett J. McAfee, Aanish Pradhan, Abhilash Neog, Sepideh Fatemi, Robert T. Hensley, Mary E. Lofton, Anuj Karpatne, Cayelan C. Carey, and Paul C. Hanson

Abstract. Water quality in lakes is an emergent property of complex biotic and abiotic processes that differ across spatial and temporal scales. Water quality is also a determinant of ecosystem services that lakes provide, and thus is of great interest to ecologists. Increasingly, machine learning and other computer science techniques are being used to predict water quality dynamics as well as to gain a greater understanding of water quality patterns and controls. To benefit both the sciences of ecology and computer science, we have created a benchmark dataset of lake water quality time series and vertical profiles. LakeBeD-US contains over 500 million unique observations of lake water quality collected by multiple long-term monitoring organizations across 17 water quality variables in 21 lakes in the United States. There are two published versions of LakeBeD-US: an "Ecology Edition" published in the Environmental Data Initiative repository, and a "Computer Science Edition" published in the Hugging Face repository. Each edition is formatted in a manner conducive to inquiries and analyses specific to each domain. For ecologists, LakeBeD-US provides an opportunity to study the spatial and temporal dynamics of several lakes with varying water quality, ecosystem, and landscape characteristics. For computer scientists, LakeBeD-US acts as a benchmark dataset that enables the advancement of machine learning for water quality prediction.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.
Share
Bennett J. McAfee, Aanish Pradhan, Abhilash Neog, Sepideh Fatemi, Robert T. Hensley, Mary E. Lofton, Anuj Karpatne, Cayelan C. Carey, and Paul C. Hanson

Status: open (until 14 Mar 2025)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Bennett J. McAfee, Aanish Pradhan, Abhilash Neog, Sepideh Fatemi, Robert T. Hensley, Mary E. Lofton, Anuj Karpatne, Cayelan C. Carey, and Paul C. Hanson

Data sets

LakeBeD-US: Ecology Edition - a benchmark dataset of lake water quality time series and vertical profiles Bennett J. McAfee, Mary E. Lofton, Adrienne Breef-Pilz, Keli J. Goodman, Robert T. Hensley, Kathryn K. Hoffman, Dexter W. Howard, Abigail S. L. Lewis, Diane M. McKnight, Isabella A. Oleksy, Heather L. Wander, Cayelan C. Carey, Anuj Karpatne, and Paul C. Hanson https://doi.org/10.6073/pasta/c56a204a65483790f6277de4896d7140

LakeBeD-US: Computer Science Edition - a benchmark dataset for lake water quality time series and vertical profiles Aanish Pradhan, Bennett McAfee, Abhilash Neog, Sepideh Fatemi, Mary E. Lofton, Cayelan C. Carey, Anuj Karpatne, and Paul C. Hanson https://doi.org/10.57967/hf/3771

Bennett J. McAfee, Aanish Pradhan, Abhilash Neog, Sepideh Fatemi, Robert T. Hensley, Mary E. Lofton, Anuj Karpatne, Cayelan C. Carey, and Paul C. Hanson

Viewed

Total article views: 102 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
82 18 2 102 3 1
  • HTML: 82
  • PDF: 18
  • XML: 2
  • Total: 102
  • BibTeX: 3
  • EndNote: 1
Views and downloads (calculated since 05 Feb 2025)
Cumulative views and downloads (calculated since 05 Feb 2025)

Viewed (geographical distribution)

Total article views: 112 (including HTML, PDF, and XML) Thereof 112 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 14 Feb 2025
Download
Short summary
LakeBeD-US is a dataset of lake water quality data collected by multiple long-term monitoring programs around the United States. This dataset is designed to foster collaboration between lake scientists and computer scientists to improve predictions of water quality. By offering a way for computer models to be tested against real-world lake data, LakeBeD-US offers opportunities for both sciences to grow and to give new insights into the causes of water quality changes.
Share
Altmetrics