Articles | Volume 17, issue 7
https://doi.org/10.5194/essd-17-3141-2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
https://doi.org/10.5194/essd-17-3141-2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
LakeBeD-US: a benchmark dataset for lake water quality time series and vertical profiles
Bennett J. McAfee
CORRESPONDING AUTHOR
Center for Limnology, University of Wisconsin–Madison, Madison, WI 53706, USA
Aanish Pradhan
Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
Abhilash Neog
Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
Sepideh Fatemi
Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
Robert T. Hensley
National Ecological Observatory Network – Battelle, Boulder, CO 80301, USA
Mary E. Lofton
Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061, USA
Anuj Karpatne
Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
Cayelan C. Carey
Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061, USA
Paul C. Hanson
Center for Limnology, University of Wisconsin–Madison, Madison, WI 53706, USA
Related authors
No articles found.
Kelly S. Aho, Kaelin M. Cawley, Robert T. Hensley, Robert O. Hall Jr., Walter K. Dodds, and Keli J. Goodman
Earth Syst. Sci. Data, 16, 5563–5578, https://doi.org/10.5194/essd-16-5563-2024, https://doi.org/10.5194/essd-16-5563-2024, 2024
Short summary
Short summary
Gas exchange is fundamental to many biogeochemical processes in streams and depends on the degree of gas saturation and the gas transfer velocity (k). Currently, k is harder to measure than concentration. Here, we present a processing pipeline to estimate k from tracer-gas experiments conducted in 22 streams by the National Ecological Observatory Network. The processed dataset (n = 339) represents the largest compilation of standardized k estimates available.
Austin Delany, Robert Ladwig, Cal Buelo, Ellen Albright, and Paul C. Hanson
Biogeosciences, 20, 5211–5228, https://doi.org/10.5194/bg-20-5211-2023, https://doi.org/10.5194/bg-20-5211-2023, 2023
Short summary
Short summary
Internal and external sources of organic carbon (OC) in lakes can contribute to oxygen depletion, but their relative contributions remain in question. To study this, we built a two-layer model to recreate processes relevant to carbon for six Wisconsin lakes. We found that internal OC was more important than external OC in depleting oxygen. This shows that it is important to consider both the fast-paced cycling of internally produced OC and the slower cycling of external OC when studying lakes.
Malgorzata Golub, Wim Thiery, Rafael Marcé, Don Pierson, Inne Vanderkelen, Daniel Mercado-Bettin, R. Iestyn Woolway, Luke Grant, Eleanor Jennings, Benjamin M. Kraemer, Jacob Schewe, Fang Zhao, Katja Frieler, Matthias Mengel, Vasiliy Y. Bogomolov, Damien Bouffard, Marianne Côté, Raoul-Marie Couture, Andrey V. Debolskiy, Bram Droppers, Gideon Gal, Mingyang Guo, Annette B. G. Janssen, Georgiy Kirillin, Robert Ladwig, Madeline Magee, Tadhg Moore, Marjorie Perroud, Sebastiano Piccolroaz, Love Raaman Vinnaa, Martin Schmid, Tom Shatwell, Victor M. Stepanenko, Zeli Tan, Bronwyn Woodward, Huaxia Yao, Rita Adrian, Mathew Allan, Orlane Anneville, Lauri Arvola, Karen Atkins, Leon Boegman, Cayelan Carey, Kyle Christianson, Elvira de Eyto, Curtis DeGasperi, Maria Grechushnikova, Josef Hejzlar, Klaus Joehnk, Ian D. Jones, Alo Laas, Eleanor B. Mackay, Ivan Mammarella, Hampus Markensten, Chris McBride, Deniz Özkundakci, Miguel Potes, Karsten Rinke, Dale Robertson, James A. Rusak, Rui Salgado, Leon van der Linden, Piet Verburg, Danielle Wain, Nicole K. Ward, Sabine Wollrab, and Galina Zdorovennova
Geosci. Model Dev., 15, 4597–4623, https://doi.org/10.5194/gmd-15-4597-2022, https://doi.org/10.5194/gmd-15-4597-2022, 2022
Short summary
Short summary
Lakes and reservoirs are warming across the globe. To better understand how lakes are changing and to project their future behavior amidst various sources of uncertainty, simulations with a range of lake models are required. This in turn requires international coordination across different lake modelling teams worldwide. Here we present a protocol for and results from coordinated simulations of climate change impacts on lakes worldwide.
Robert Ladwig, Paul C. Hanson, Hilary A. Dugan, Cayelan C. Carey, Yu Zhang, Lele Shu, Christopher J. Duffy, and Kelly M. Cobourn
Hydrol. Earth Syst. Sci., 25, 1009–1032, https://doi.org/10.5194/hess-25-1009-2021, https://doi.org/10.5194/hess-25-1009-2021, 2021
Short summary
Short summary
Using a modeling framework applied to 37 years of dissolved oxygen time series data from Lake Mendota, we identified the timing and intensity of thermal energy stored in the lake water column, the lake's resilience to mixing, and surface primary production as the most important drivers of interannual dynamics of low oxygen concentrations at the lake bottom. Due to climate change, we expect an increase in the spatial and temporal extent of low oxygen concentrations in Lake Mendota.
Cited articles
Addor, N., Newman, A. J., Mizukami, N., and Clark, M. P.: The CAMELS data set: catchment attributes and meteorology for large-sample studies, Hydrol. Earth Syst. Sci., 21, 5293–5313, https://doi.org/10.5194/hess-21-5293-2017, 2017. a
Akiba, T., Sano, S., Yanase, T., Ohta, T., and Koyama, M.: Optuna: A Next-generation Hyperparameter Optimization Framework, in: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, New York, NY, 4–8 August 2019, USA2623–2631, https://doi.org/10.1145/3292500.3330701, 2019. a
Angradi, T. R., Ringold, P. L., and Hall, K.: Water clarity measures as indicators of recreational benefits provided by U. S. lakes: Swimming and aesthetics, Ecol. Indic., 93, 1005–1019, https://doi.org/10.1016/j.ecolind.2018.06.001, 2018. a
Apache Arrow Developers: pyarrow: Python library for Apache Arrow, Python Package Index [code], https://pypi.org/project/pyarrow (last access: 5 September 2024), 2024. a
Arend, K. K., Beletsky, D., DePinto, J. V., Ludsin, S. A., Roberts, J. J., Rucinski, D. K., Scavia, D., Schwab, D. J., and Höök, T. O.: Seasonal and interannual effects of hypoxia on fish habitat quality in central Lake Erie, Freshwater Biol., 56, 366–383, https://doi.org/10.1111/j.1365-2427.2010.02504.x, 2011. a
Auguie, B.: gridExtra: Miscellaneous Functions for “Grid” Graphics, The Comprehensive R Archive Network [code], https://CRAN.R-project.org/package=gridExtra (last access: 13 June 2024), 2017. a
Becker, R. A., Wilks, A. R., and Brownrigg, R.: mapdata: Extra Map Databases, The Comprehensive R Archive Network [code], https://CRAN.R-project.org/package=mapdata (last access: 11 March 2024), 2022. a
Becker, R. A., Wilks, A. R., Brownrigg, R., Minka, T. P., and Deckmyn, A.: maps: Draw Geographical Maps, The Comprehensive R Archive Network [code], https://cran.r-project.org/package=maps (last access: 11 March 2024), 2023. a
Bjarke, N. R., Livneh, B., Elmendorf, S. C., Molotch, N. P., Hinckley, E.-L. S., Emery, N. C., Johnson, P. T. J., Morse, J. F., and Suding, K. N.: Catchment-scale observations at the Niwot Ridge long-term ecological research site, Hydrol. Process., 35, e14320, https://doi.org/10.1002/hyp.14320, 2021. a
Carey, C. C., Ward, N. K., Farrell, K. J., Lofton, M. E., Krinos, A. I., McClure, R. P., Subratie, K. C., Figueiredo, R. J., Doubek, J. P., Hanson, P. C., Papadopoulos, P., and Arzberger, P.: Enhancing collaboration between ecologists and computer scientists: lessons learned and recommendations forward, Ecosphere, 10, e02753, https://doi.org/10.1002/ecs2.2753, 2019. a
Carey, C. C., Lewis, A. S. L., Howard, D. W., Woelmer, W. M., Gantzer, P. A., Bierlein, K. A., Little, J. C., and WVWA: Bathymetry and watershed area for Falling Creek Reservoir, Beaverdam Reservoir, and Carvins Cove Reservoir (1), Environmental Data Initiative [data set], https://doi.org.10.6073/pasta/352735344150f7e77d2bc18b69a 22412, 2022. a
Carey, C. C., Howard, D. W., Hoffman, K. K., Wander, H. L., Breef-Pilz, A., Niederlehner, B. R., Haynie, G., Keverline, R., Kricheldorf, M., and Tipper, E.: Water chemistry time series for Beaverdam Reservoir, Carvins Cove Reservoir, Falling Creek Reservoir, Gatewood Reservoir, and Spring Hollow Reservoir in southwestern Virginia, USA 2013-2023 (12), Environmental Data Initiative [data set], https://doi.org.10.6073/pasta/7d7fdc5081ed5211651f86862e 8b2b1e, 2024. a
Chen, C., Chen, Q., Yao, S., He, M., Zhang, J., Li, G., and Lin, Y.: Combining physical-based model and machine learning to forecast chlorophyll-a concentration in freshwater lakes, Sci. Total Environ., 907, 168097, https://doi.org/10.1016/j.scitotenv.2023.168097, 2024a. a
Chen, L., Wang, L., Ma, W., Xu, X., and Wang, H.: PID4LaTe: a physics-informed deep learning model for lake multi-depth temperature prediction, Earth Sci. Inform., 17, 3779–3795, https://doi.org/10.1007/s12145-024-01377-5, 2024b. a
Cho, K., van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y.: Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation, in: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), EMNLP 2014, Doha, Qatar, 25–29 October 2014, 1724–1734, https://doi.org/10.3115/v1/D14-1179,2014. a
Daw, A., Karpatne, A., Watkins, W., Read, J., and Kumar, V.: Physics-guided Neural Networks (PGNN): An Application in Lake Temperature Modeling, arXiv [preprint], https://doi.org/10.48550/arXiv.1710.11431, 28 September 2021. a
Demir, I., Xiang, Z., Demiray, B., and Sit, M.: WaterBench-Iowa: a large-scale benchmark dataset for data-driven streamflow forecasting, Earth Syst. Sci. Data, 14, 5605–5616, https://doi.org/10.5194/essd-14-5605-2022, 2022. a
Du, W., Côté, D., and Liu, Y.: SAITS: Self-attention-based imputation for time series, Expert Syst. Appl., 219, 119619, https://doi.org/10.1016/j.eswa.2023.119619, 2023. a
Durant, M. and Augsperger, T.: fastparquet, Python Package Index [code], https://pypi.org/project/fastparquet (last access: 5 September 2024), 2024. a
Ejigu, M. T.: Overview of water quality modeling, Cogent Engineering, 8, 1891711, https://doi.org/10.1080/23311916.2021.1891711, 2021. a
FAIRsharing.org: Quantities, Units, Dimensions and Types (QUDT), https://doi.org/10.25504/FAIRsharing.d3pqw7, 2022. a
Flanagan, C. M., McKnight, D. M., Liptzin, D., Williams, M. W., and Miller, M. P.: Response of the Phytoplankton Community in an Alpine Lake to Drought Conditions: Colorado Rocky Mountain Front Range, U. S.A, Arct. Antarct. Alp. Res., 41, 191–203, https://doi.org/10.1657/1938.4246-41.2.191, 2009. a
Gerling, A. B., Browne, R. G., Gantzer, P. A., Mobley, M. H., Little, J. C., and Carey, C. C.: First report of the successful operation of a side stream supersaturation hypolimnetic oxygenation system in a eutrophic, shallow reservoir, Water Res., 67, 129–143, https://doi.org/10.1016/j.watres.2014.09.002, 2014. a, b, c
Goodman, K. J., Parker, S. M., Edmonds, J. W., and Zeglin, L. H.: Expanding the scale of aquatic sciences: the role of the National Ecological Observatory Network (NEON), Freshw. Sci., 34, 377–385, https://doi.org/10.1086/679459, 2015. a, b
Gries, C., Hanson, P. C., O'Brien, M., Servilla, M., Vanderbilt, K., and Waide, R.: The Environmental Data Initiative: Connecting the past to the future through data reuse, Ecol. Evol., 13, e9592, https://doi.org/10.1002/ece3.9592, 2023. a
Guo, M., Zhuang, Q., Yao, H., Golub, M., Leung, L. R., Pierson, D., and Tan, Z.: Validation and Sensitivity Analysis of a 1-D Lake Model Across Global Lakes, J. Geophys. Res.-Atmos., 126, e2020JD033417, https://doi.org/10.1029/2020JD033417, 2021. a
Hamilton, D. P., Carey, C. C., Arvola, L., Arzberger, P., Brewer, C., Cole, J. J., Gaiser, E., Hanson, P. C., Ibelings, B. W., Jennings, E., Kratz, T. K., Lin, F.-P., McBride, C. G., David de Marques, M., Muraoka, K., Nishri, A., Qin, B., Read, J. S., Rose, K. C., Ryder, E., Weathers, K. C., Zhu, G., Trolle, D., and Brookes, J. D.: A Global Lake Ecological Observatory Network (GLEON) for synthesising high-frequency sensor data for validation of deterministic ecological models, Inland Waters, 5, 49–56, https://doi.org/10.5268/IW-5.1.566, 2015. a
Hanson, P. C., Carpenter, S. R., Armstrong, D. E., Stanley, E. H., and Kratz, T. K.: Lake Dissolved Inorganic Carbon and Dissolved Oxygen: Changing Drivers from Days to Decades, Ecol. Monogr., 76, 343–363, https://doi.org/10.1890/0012-9615(2006)076[0343:LDICAD]2.0.CO;2, 2006. a, b
Hanson, P. C., Weathers, K. C., and Kratz, T. K.: Networked lake science: how the Global Lake Ecological Observatory Network (GLEON) works to understand, predict, and communicate lake ecosystem response to global change, Inland Waters, 6, 543–554, https://doi.org/10.1080/IW-6.4.904, 2016. a
Hanson, P. C., Stillman, A. B., Jia, X., Karpatne, A., Dugan, H. A., Carey, C. C., Stachelek, J., Ward, N. K., Zhang, Y., Read, J. S., and Kumar, V.: Predicting lake surface water phosphorus dynamics using process-guided machine learning, Ecol. Model., 430, 109136, https://doi.org/10.1016/j.ecolmodel.2020.109136, 2020. a
Hanson, P. C., Ladwig, R., Buelo, C., Albright, E. A., Delany, A. D., and Carey, C. C.: Legacy Phosphorus and Ecosystem Memory Control Future Water Quality in a Eutrophic Lake, J. Geophys. Res.-Biogeo., 128, e2023JG007620, https://doi.org/10.1029/2023JG007620, 2023. a, b
Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., Kern, R., Picus, M., Hoyer, S., van Kerkwijk, M. H., Brett, M., Haldane, A., del Río, J. F., Wiebe, M., Peterson, P., Gérard-Marchant, P., Sheppard, K., Reddy, T., Weckesser, W., Abbasi, H., Gohlke, C., and Oliphant, T. E.: Array programming with NumPy, Nature, 585, 357–362, https://doi.org/10.1038/s41586-020-2649-2, 2020. a
Helsel, D. R.: Advantages of nonparametric procedures for analysis of water quality data, Hydrolog. Sci. J., 32, 179–190, https://doi.org/10.1080/02626668709491176, 1987. a, b, c
Hipsey, M. R., Bruce, L. C., Boon, C., Busch, B., Carey, C. C., Hamilton, D. P., Hanson, P. C., Read, J. S., de Sousa, E., Weber, M., and Winslow, L. A.: A General Lake Model (GLM 3.0) for linking with high-frequency sensor data from the Global Lake Ecological Observatory Network (GLEON), Geosci. Model Dev., 12, 473–523, https://doi.org/10.5194/gmd-12-473-2019, 2019. a
Jain, S. M.: Hugging Face, in: Introduction to Transformers for NLP: With the Hugging Face Library and Models to Solve Problems, edited by: Jain, S. M., Apress, Berkeley, CA, https://doi.org/10.1007/978-1-4842-8844-3_4, 51–67, 2022. a
Jane, S. F., Hansen, G. J. A., Kraemer, B. M., Leavitt, P. R., Mincer, J. L., North, R. L., Pilla, R. M., Stetler, J. T., Williamson, C. E., Woolway, R. I., Arvola, L., Chandra, S., DeGasperi, C. L., Diemer, L., Dunalska, J., Erina, O., Flaim, G., Grossart, H.-P., Hambright, K. D., Hein, C., Hejzlar, J., Janus, L. L., Jenny, J.-P., Jones, J. R., Knoll, L. B., Leoni, B., Mackay, E., Matsuzaki, S.-I. S., McBride, C., Müller-Navarra, D. C., Paterson, A. M., Pierson, D., Rogora, M., Rusak, J. A., Sadro, S., Saulnier-Talbot, E., Schmid, M., Sommaruga, R., Thiery, W., Verburg, P., Weathers, K. C., Weyhenmeyer, G. A., Yokota, K., and Rose, K. C.: Widespread deoxygenation of temperate lakes, Nature, 594, 66–70, https://doi.org/10.1038/s41586-021-03550-y, 2021. a, b
Jane, S. F., Detmer, T. M., Larrick, S. L., Rose, K. C., Randall, E. A., Jirka, K. J., and McIntyre, P. B.: Concurrent warming and browning eliminate cold-water fish habitat in many temperate lakes, P. Natl. Acad. Sci. USA, 121, e2306906120, https://doi.org/10.1073/pnas.2306906120, 2024. a
Kadkhodazadeh, M. and Farzin, S.: A Novel LSSVM Model Integrated with GBO Algorithm to Assessment of Water Quality Parameters, Water Resour. Manag., 35, 3939–3968, https://doi.org/10.1007/s11269-021-02913-4, 2021. a
Karpatne, A., Atluri, G., Faghmous, J., Steinbach, M., Banerjee, A., Ganguly, A., Shekhar, S., Samatova, N., and Kumar, V.: Theory-guided Data Science: A New Paradigm for Scientific Discovery from Data, IEEE T. Knowl. Data En., 29, 2318–2331, https://doi.org/10.1109/TKDE.2017.2720168, 2017. a
Karpatne, A., Jia, X., and Kumar, V.: Knowledge-guided Machine Learning: Current Trends and Future Prospects, arXiv [preprint], https://doi.org/10.48550/arXiv.2403.15989, 2024. a, b, c
Keeler, B. L., Polasky, S., Brauman, K. A., Johnson, K. A., Finlay, J. C., O'Neill, A., Kovacs, K., and Dalzell, B.: Linking water quality and well-being for improved assessment and valuation of ecosystem services, P. Natl. Acad. Sci. USA, 109, 18619–18624, https://doi.org/10.1073/pnas.1215991109, 2012. a
Lacoste, A., Lehmann, N., Rodriguez, P., Sherwin, E. D., Kerner, H., Lütjens, B., Irvin, J. A., Dao, D., Alemohammad, H., Drouin, A., Gunturkun, M., Huang, G., Vazquez, D., Newman, D., Bengio, Y., Ermon, S., and Zhu, X. X.: GEO-Bench: Toward Foundation Models for Earth Monitoring, arXiv [preprint], https://doi.org/10.48550/arXiv.2306.03831, 23 December 2023. a
Ladwig, R., Hanson, P. C., Dugan, H. A., Carey, C. C., Zhang, Y., Shu, L., Duffy, C. J., and Cobourn, K. M.: Lake thermal structure drives interannual variability in summer anoxia dynamics in a eutrophic lake over 37 years, Hydrol. Earth Syst. Sci., 25, 1009–1032, https://doi.org/10.5194/hess-25-1009-2021, 2021. a, b, c, d
Ladwig, R., Daw, A., Albright, E. A., Buelo, C., Karpatne, A., Meyer, M. F., Neog, A., Hanson, P. C., and Dugan, H. A.: Modular Compositional Learning Improves 1D Hydrodynamic Lake Model Performance by Merging Process-Based Modeling With Deep Learning, J. Adv. Model. Earth Sy., 16, e2023MS003953, https://doi.org/10.1029/2023MS003953, 2024. a, b
Langman, O. C., Hanson, P. C., Carpenter, S. R., and Hu, Y. H.: Control of dissolved oxygen in northern temperate lakes over scales ranging from minutes to days, Aquat. Biol., 9, 193–202, https://doi.org/10.3354/ab00249, 2010. a
Li, X., Nieber, J. L., and Kumar, V.: Machine learning applications in vadose zone hydrology: A review, Vadose Zone J., 23, e20361, https://doi.org/10.1002/vzj2.20361, 2024. a
Lim, K.-Y. and Surbeck, C. Q.: A multi-variate methodology for analyzing pre-existing lake water quality data, J. Environ. Monitor., 13, 2477–2487, https://doi.org/10.1039/C1EM10119F, 2011. a, b, c
Lin, S., Pierson, D. C., and Mesman, J. P.: Prediction of algal blooms via data-driven machine learning models: an evaluation using data from a well-monitored mesotrophic lake, Geosci. Model Dev., 16, 35–46, https://doi.org/10.5194/gmd-16-35-2023, 2023. a
Lofton, M. E., Howard, D. W., Thomas, R. Q., and Carey, C. C.: Progress and opportunities in advancing near-term forecasting of freshwater quality, Glob. Change Biol., 29, 1691–1714, https://doi.org/10.1111/gcb.16590, 2023. a
Lottig, N.: High Frequency Under-Ice Water Temperature Buoy Data – Crystal Bog, Trout Bog, and Lake Mendota, Wisconsin, USA 2016–2020 (3), Environmental Data Initiative [data set], https://doi.org.10.6073/pasta/ad192ce8fbe8175619d6a41aa2 f72294, 2022. a
Lottig, N. R. and Dugan, H. A.: North Temperate Lakes-LTER Core Research Lakes Information (1), Environmental Data Initiative [data set], https://doi.org.10.6073/pasta/b9080c962f552029ee2b43aec 1410328, 2024. a, b, c
Lunch, C., Laney, C., Mietkiewicz, N., Sokol, E., Cawley, K., and NEON (National Ecological Observatory Network): neonUtilities: Utilities for Working with NEON Data, The Comprehensive R Archive Network [code], https://CRAN.R-project.org/package=neonUtilities (last access: 7 March 2024), 2024. a
Magnuson, J. J., Kratz, T. K., Allen, T. F., Armstrong, D. E., Benson, B. J., Bowser, C. J., Bolgrien, D. W., Carpenter, S. R., Frost, T. M., Gower, S. T., Lillesand, T. M., Pike, J. A., and Turner, M. G.: Regionalization of long-term ecological research (LTER) on north temperate lakes, SIL Proceedings, 1922–2010, 26, 522–528, https://doi.org/10.1080/03680770.1995.11900771, 1997. a
McAfee, B. J., Lofton, M. E., Breef-Pilz, A., Goodman, K. J., Hensley, R. T., Hoffman, K. K., Howard, D. W., Lewis, A. S. L., McKnight, D. M., Oleksy, I. A., Wander, H. L., Carey, C. C., Karpatne, A., and Hanson, P. C.: LakeBeD-US: Ecology Edition – a benchmark dataset of lake water quality time series and vertical profiles, Environmental Data Initiative [data set], https://doi.org.10.6073/pasta/c56a204a65483790f6277de4896 d7140, 2024. a, b, c
McKinney, W.: Data Structures for Statistical Computing in Python, in: Proceedings of the 9th Python in Science Conference, SciPy 2010, Austin, Texas, United States, 28 June–3 July 2010, https://doi.org/10.25080/Majora-92bf1922-00a, 56–61, 2010. a
Messager, M. L., Lehner, B., Grill, G., Nedeva, I., and Schmitt, O.: Estimating the volume and age of water stored in global lakes using a geo-statistical approach, Nat. Commun., 7, 13603, https://doi.org/10.1038/ncomms13603, 2016. a
Meyer, M. F., Topp, S. N., King, T. V., Ladwig, R., Pilla, R. M., Dugan, H. A., Eggleston, J. R., Hampton, S. E., Leech, D. M., Oleksy, I. A., Ross, J. C., Ross, M. R. V., Woolway, R. I., Yang, X., Brousil, M. R., Fickas, K. C., Padowski, J. C., Pollard, A. I., Ren, J., and Zwart, J. A.: National-scale remotely sensed lake trophic state from 1984 through 2020, Sci. Data, 11, 77, https://doi.org/10.1038/s41597-024-02921-0, 2024. a
Miller, T., Durlik, I., Adrianna, K., Kisiel, A., Cembrowska-Lech, D., Spychalski, I., and Tuński, T.: Predictive Modeling of Urban Lake Water Quality Using Machine Learning: A 20-Year Study, Appl. Sci., 13, 11217, https://doi.org/10.3390/app132011217, 2023. a
Nguyen, T., Brandstetter, J., Kapoor, A., Gupta, J. K., and Grover, A.: ClimaX: A foundation model for weather and climate, arXiv [preprint], https://doi.org/10.48550/arXiv.2301.10343, 18 December 2023. a
Paerl, H. W. and Huisman, J.: Climate change: a catalyst for global expansion of harmful cyanobacterial blooms, Env. Microbiol. Rep., 1, 27–37, https://doi.org/10.1111/j.1758-2229.2008.00004.x, 2009. a
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S.: PyTorch: an imperative style, high-performance deep learning library, in: Proceedings of the 33rd International Conference on Neural Information Processing Systems, Curran Associates Inc., Red Hook, NY, USA, 8–14 December 2019, 8026–8037, https://dl.acm.org/doi/10.5555/3454287.3455008 (last access: 7 November 2024), 2019. a
Peters, B., Brenner, S. E., Wang, E., Slonim, D., and Kann, M. G.: Putting benchmarks in their rightful place: The heart of computational biology, PLOS Comput. Biol., 14, e1006494, https://doi.org/10.1371/journal.pcbi.1006494, 2018. a
Pollard, A. I., Hampton, S. E., and Leech, D. M.: The Promise and Potential of Continental-Scale Limnology Using the U. S. Environmental Protection Agency's National Lakes Assessment, Limnology and Oceanography Bulletin, 27, 36–41, https://doi.org/10.1002/lob.10238, 2018. a
Pradhan, A., McAfee, B. J., Neog, A., Fatemi, S., Lofton, M. E., Carey, C. C., Karpatne, A., and Hanson, P. C.: LakeBeD-US: Computer Science Edition – a benchmark dataset for lake water quality time series and vertical profiles, Hugging Face [data set], https://doi.org/10.57967/hf/3771, 2024. a, b, c
Preston, D. L., Caine, N., McKnight, D. M., Williams, M. W., Hell, K., Miller, M. P., Hart, S. J., and Johnson, P. T. J.: Climate regulates alpine lake ice cover phenology and aquatic ecosystem structure, Geophys. Res. Lett., 43, 5353–5360, https://doi.org/10.1002/2016GL069036, 2016. a
Rangaraj, A. G., ShobanaDevi, A., Srinath, Y., Boopathi, K., and Balaraman, K.: Efficient and Secure Storage for Renewable Energy Resource Data Using Parquet for Data Analytics, in: Data Management, Analytics and Innovation, International Conference on Data Management 2021, Analytics and Innovation, Online, 263–292, https://doi.org/10.1007/978-981-16-2937-2_19, 2021. a
Read, E. K., Carr, L., De Cicco, L., Dugan, H. A., Hanson, P. C., Hart, J. A., Kreft, J., Read, J. S., and Winslow, L. A.: Water quality data for national-scale aquatic research: The Water Quality Portal, Water Resour. Res., 53, 1735–1745, https://doi.org/10.1002/2016WR019993, 2017. a
Read, J. S., Jia, X., Willard, J., Appling, A. P., Zwart, J. A., Oliver, S. K., Karpatne, A., Hansen, G. J. A., Hanson, P. C., Watkins, W., Steinbach, M., and Kumar, V.: Process-Guided Deep Learning Predictions of Lake Water Temperature, Water Resour. Res., 55, 9173–9190, https://doi.org/10.1029/2019WR024922, 2019. a, b
Richardson, N., Cook, I., Crane, N., Dunnington, D., François, R., Keane, J., Moldovan-Grünfeld, D., Ooms, J., Wujciak-Jens, J., and Apache Arrow: arrow: Integration to “Apache” “Arrow”, GitHub [code], https://github.com/apache/arrow (last access: 7 March 2024), 2024. a
Rodríguez, R., Pastorini, M., Etcheverry, L., Chreties, C., Fossati, M., Castro, A., and Gorgoglione, A.: Water-Quality Data Imputation with a High Percentage of Missing Values: A Machine Learning Approach, Sustainability, 13, 6318, https://doi.org/10.3390/su13116318, 2021. a
Sarkar, A., Yang, Y., and Vihinen, M.: Variation benchmark datasets: update, criteria, quality and applications, Database, 2020, baz117, https://doi.org/10.1093/database/baz117, 2020. a, b
Schür, C., Gasser, L., Perez-Cruz, F., Schirmer, K., and Baity-Jesi, M.: A benchmark dataset for machine learning in ecotoxicology, Sci. Data, 10, 718, https://doi.org/10.1038/s41597-023-02612-2, 2023. a
Slowikowski, K.: ggrepel: Automatically Position Non-Overlapping Text Labels with “ggplot2”, The Comprehensive R Archive Network [code], https://CRAN.r-project.org/package=ggrepel (last access: 11 March 2024), 2024. a
Smith, C.: EDIutils: An API Client for the Environmental Data Initiative Repository in R, GitHub [code], https://github.com/ropensci/EDIutils (last access: 7 March 2024), 2023. a
Snortheim, C. A., Hanson, P. C., McMahon, K. D., Read, J. S., Carey, C. C., and Dugan, H. A.: Meteorological drivers of hypolimnetic anoxia in a eutrophic, north temperate lake, Ecol. Model., 343, 39–53, https://doi.org/10.1016/j.ecolmodel.2016.10.014, 2017. a
Solomon, C. T., Bruesewitz, D. A., Richardson, D. C., Rose, K. C., Van de Bogert, M. C., Hanson, P. C., Kratz, T. K., Larget, B., Adrian, R., Babin, B. L., Chiu, C.-Y., Hamilton, D. P., Gaiser, E. E., Hendricks, S., Istvànovics, V., Laas, A., O'Donnell, D. M., Pace, M. L., Ryder, E., Staehr, P. A., Torgersen, T., Vanni, M. J., Weathers, K. C., and Zhu, G.: Ecosystem respiration: Drivers of daily variability and background respiration in lakes around the globe, Limnol. Oceanogr., 58, 849–866, https://doi.org/10.4319/lo.2013.58.3.0849, 2013. a
Soranno, P. A., Bacon, L. C., Beauchene, M., Bednar, K. E., Bissell, E. G., Boudreau, C. K., Boyer, M. G., Bremigan, M. T., Carpenter, S. R., Carr, J. W., Cheruvelil, K. S., Christel, S. T., Claucherty, M., Collins, S. M., Conroy, J. D., Downing, J. A., Dukett, J., Fergus, C. E., Filstrup, C. T., Funk, C., Gonzalez, M. J., Green, L. T., Gries, C., Halfman, J. D., Hamilton, S. K., Hanson, P. C., Henry, E. N., Herron, E. M., Hockings, C., Jackson, J. R., Jacobson-Hedin, K., Janus, L. L., Jones, W. W., Jones, J. R., Keson, C. M., King, K. B. S., Kishbaugh, S. A., Lapierre, J.-F., Lathrop, B., Latimore, J. A., Lee, Y., Lottig, N. R., Lynch, J. A., Matthews, L. J., McDowell, W. H., Moore, K. E. B., Neff, B. P., Nelson, S. J., Oliver, S. K., Pace, M. L., Pierson, D. C., Poisson, A. C., Pollard, A. I., Post, D. M., Reyes, P. O., Rosenberry, D. O., Roy, K. M., Rudstam, L. G., Sarnelle, O., Schuldt, N. J., Scott, C. E., Skaff, N. K., Smith, N. J., Spinelli, N. R., Stachelek, J., Stanley, E. H., Stoddard, J. L., Stopyak, S. B., Stow, C. A., Tallant, J. M., Tan, P.-N., Thorpe, A. P., Vanni, M. J., Wagner, T., Watkins, G., Weathers, K. C., Webster, K. E., White, J. D., Wilmes, M. K., and Yuan, S.: LAGOS-NE: a multi-scaled geospatial and temporal database of lake ecological context and water quality for thousands of US lakes, GigaScience, 6, gix101, https://doi.org/10.1093/gigascience/gix101, 2017. a, b, c
Spaulding, S. A., Platt, L. R. C., Murphy, J. C., Covert, A., and Harvey, J. W.: Chlorophyll a in lakes and streams of the United States (2005–2022), Sci. Data, 11, 611, https://doi.org/10.1038/s41597-024-03453-3, 2024. a
Stanley, E. H., Collins, S. M., Lottig, N. R., Oliver, S. K., Webster, K. E., Cheruvelil, K. S., and Soranno, P. A.: Biases in lake water quality sampling and implications for macroscale research, Limnol. Oceanogr., 64, 1572–1585, https://doi.org/10.1002/lno.11136, 2019. a
Stoker, J. M. and Miller, B.: The accuracy and consistency of 3D Elevation Program data: A systematic analysis, Remote Sens., 14, 4, https://doi.org/10.3390/rs14040940, 2022. a
Sutskever, I., Vinyals, O., and Le, Q. V.: Sequence to Sequence Learning with Neural Networks, arXiv [preprint], https://doi.org/10.48550/arXiv.1409.3215, 14 December 2014. a
The pandas development team: pandas-dev/pandas: Pandas (v2.2.2), Zenodo [code], https://doi.org/10.5281/zenodo.10957263, 2024. a
Thimijan, R. W. and Heins, R. D.: Photometric, Radiometric, and Quantum Light Units of Measure: A Review of Procedures for Interconversion, Hortic. Sci., 18, 818–822, https://doi.org/10.21273/HORTSCI.18.6.818, 1983. a
Thomas, R. Q., McClure, R. P., Moore, T. N., Woelmer, W. M., Boettiger, C., Figueiredo, R. J., Hensley, R. T., and Carey, C. C.: Near-term forecasts of NEON lakes reveal gradients of environmental predictability across the US, Front. Ecol. Environ., 21, 220–226, https://doi.org/10.1002/fee.2623, 2023. a, b, c
Van Rossum, G. and Drake, F. L.: Python 3 Reference Manual, CreateSpace, Scotts Valley, CA, ISBN 1441412697, 2009. a
Varadharajan, C., Appling, A. P., Arora, B., Christianson, D. S., Hendrix, V. C., Kumar, V., Lima, A. R., Müller, J., Oliver, S., Ombadi, M., Perciano, T., Sadler, J. M., Weierbach, H., Willard, J. D., Xu, Z., and Zwart, J.: Can machine learning accelerate process understanding and decision-relevant predictions of river water quality?, Hydrol. Process., 36, e14565, https://doi.org/10.1002/hyp.14565, 2022. a
Verpoorter, C., Kutser, T., Seekell, D. A., and Tranvik, L. J.: A global inventory of lakes based on high-resolution satellite imagery, Geophys. Res. Lett., 41, 6396–6402, https://doi.org/10.1002/2014GL060641, 2014. a
Virro, H., Amatulli, G., Kmoch, A., Shen, L., and Uuemaa, E.: GRQA: Global River Water Quality Archive, Earth Syst. Sci. Data, 13, 5483–5507, https://doi.org/10.5194/essd-13-5483-2021, 2021. a
Wai, K. P., Chia, M. Y., Koo, C. H., Huang, Y. F., and Chong, W. C.: Applications of deep learning in water quality management: A state-of-the-art review, J. Hydrol., 613, 128332, https://doi.org/10.1016/j.jhydrol.2022.128332, 2022. a
Wander, H. L., Farruggia, M. J., La Fuente, S., Korver, M. C., Chapina, R. J., Robinson, J., Bah, A., Munthali, E., Ghosh, R., Stachelek, J., Khandelwal, A., Hanson, P. C., and Weathers, K. C.: Using Knowledge-Guided Machine Learning To Assess Patterns of Areal Change in Waterbodies across the Contiguous United States, Environ. Sci. Technol., 58, 5003–5013, https://doi.org/10.1021/acs.est.3c05784, 2024. a
Weathers, K. C., Hanson, P. C., Arzberger, P., Brentrup, J., Brookes, J., Carey, C. C., Gaiser, E., Gaiser, E., Hamilton, D. P., Hong, G. S., Ibelings, B., Istvánovics, V., Jennings, E., Kim, B., Kratz, T., Lin, F.-P., Muraoka, K., O'Reilly, C., Rose, K. C., Ryder, E., and Zhu, G.: The Global Lake Ecological Observatory Network (GLEON): The Evolution of Grassroots Network Science, Limnology and Oceanography Bulletin, 22, 71–73, https://doi.org/10.1002/lob.201322371, 2013. a
Weinstein, B. G., Graves, S. J., Marconi, S., Singh, A., Zare, A., Stewart, D., Bohlman, S. A., and White, E. P.: A benchmark dataset for canopy crown detection and delineation in co-registered airborne RGB, LiDAR and hyperspectral imagery from the National Ecological Observation Network, PLOS Comput. Biol., 17, e1009180, https://doi.org/10.1371/journal.pcbi.1009180, 2021. a, b
Wickham, H.: ggplot2: Elegant Graphics for Data Analysis, Use R!, Springer-Verlag, New York, https://doi.org/10.1007/978-3-319-24277-4, 2016. a, b
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., Takahashi, K., Vaughan, D., Wilke, C., Woo, K., and Yutani, H.: Welcome to the tidyverse, Journal of Open Source Software, 4, 1686, https://doi.org/10.21105/joss.01686, 2019. a
Wilke, C. O.: cowplot: Streamlined Plot Theme and Plot Annotations for `ggplot2', The Comprehensive R Archive Network [code], https://CRAN.r-project.org/package=cowplot (last access: 7 March 2024), 2024. a
Wilkinson, G. M., Walter, J. A., Buelo, C. D., and Pace, M. L.: No evidence of widespread algal bloom intensification in hundreds of lakes, Front. Ecol. Environ., 20, 16–21, https://doi.org/10.1002/fee.2421, 2022. a
Wilkinson, M. D., Dumontier, M., Aalbersberg, Ij. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., Gonzalez-Beltran, A., Gray, A. J. G., Groth, P., Goble, C., Grethe, J. S., Heringa, J., 't Hoen, P. A. C., Hooft, R., Kuhn, T., Kok, R., Kok, J., Lusher, S. J., Martone, M. E., Mons, A., Packer, A. L., Persson, B., Rocca-Serra, P., Roos, M., van Schaik, R., Sansone, S.-A., Schultes, E., Sengstag, T., Slater, T., Strawn, G., Swertz, M. A., Thompson, M., van der Lei, J., van Mulligen, E., Velterop, J., Waagmeester, A., Wittenburg, P., Wolstencroft, K., Zhao, J., and Mons, B.: The FAIR Guiding Principles for scientific data management and stewardship, Sci. Data, 3, 160018, https://doi.org/10.1038/sdata.2016.18, 2016. a
Willard, J. D., Read, J. S., Appling, A. P., Oliver, S. K., Jia, X., and Kumar, V.: Predicting Water Temperature Dynamics of Unmonitored Lakes With Meta-Transfer Learning, Water Resour. Res., 57, e2021WR029579, https://doi.org/10.1029/2021WR029579, 2021. a
Yang, X., Liang, W., and Zou, J.: Navigating Dataset Documentations in AI: A Large-Scale Analysis of Dataset Cards on Hugging Face, arXiv [preprint], https://doi.org/10.48550/arXiv.2401.13822, 2024. a
Zhao, L., Zhu, R., Zhou, Q., Jeppesen, E., and Yang, K.: Trophic status and lake depth play important roles in determining the nutrient-chlorophyll a relationship: Evidence from thousands of lakes globally, Water Res., 242, 120182, https://doi.org/10.1016/j.watres.2023.120182, 2023. a
Short summary
LakeBeD-US is a dataset of lake water quality data collected by multiple long-term monitoring programs around the United States. This dataset is designed to foster collaboration between lake scientists and computer scientists to improve predictions of water quality. By offering a way for computer models to be tested against real-world lake data, LakeBeD-US offers opportunities for both sciences to grow and to give new insights into the causes of water quality changes.
LakeBeD-US is a dataset of lake water quality data collected by multiple long-term monitoring...
Altmetrics
Final-revised paper
Preprint