Articles | Volume 14, issue 12
© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
WaterBench-Iowa: a large-scale benchmark dataset for data-driven streamflow forecasting
Department of Civil and Environmental Engineering, University of Iowa, Iowa City, 52246 Iowa, USA
Department of Electrical and Computer Engineering, University of Iowa, Iowa City, 52246 Iowa, USA
Interdisciplinary Graduate Program in Informatics, University of Iowa, Iowa City, 52246 Iowa, USA
Interdisciplinary Graduate Program in Informatics, University of Iowa, Iowa City, 52246 Iowa, USA
Related subject area
Domain: ESSD – Land | Subject: HydrologyGlobal hourly, 5 km, all-sky land surface temperature data from 2011 to 2021 based on integrating geostationary and polar-orbiting satellite dataFlood detection using Gravity Recovery and Climate Experiment (GRACE) terrestrial water storage and extreme precipitation dataThe pan-Arctic catchment database (ARCADE)Multi-hazard susceptibility mapping of cryospheric hazards in a high-Arctic environment: Svalbard ArchipelagoHigh-resolution water level and storage variation datasets for 338 reservoirs in China during 2010–2021A dataset of 10-year regional-scale soil moisture and soil temperature measurements at multiple depths on the Tibetan PlateauOpenMRG: Open data from Microwave links, Radar, and Gauges for rainfall quantification in Gothenburg, SwedenA 1 km daily soil moisture dataset over China using in situ measurement and machine learningDownscaled hyper-resolution (400 m) gridded datasets of daily precipitation and temperature (2008–2019) for the East–Taylor subbasin (western United States)HRLT: a high-resolution (1 d, 1 km) and long-term (1961–2019) gridded dataset for surface temperature and precipitation across ChinaThe Surface Water Chemistry (SWatCh) database: a standardized global database of water chemistry to facilitate large-sample hydrological researchHydrography90m: a new high-resolution global hydrographic datasetGLOBMAP SWF: a global annual surface water cover frequency dataset during 2000–2020Lake Surface Temperature Dataset in the North Slave Region Retrieved from Landsat Satellite Series – 1984 to 2021Streamflow data availability in Europe: a detailed dataset of interpolated flow-duration curvesHigh-resolution streamflow and weather data (2013–2019) for seven small coastal watersheds in the northeast Pacific coastal temperate rainforest, CanadaA 500-year annual runoff reconstruction for 14 selected European catchmentsA comprehensive geospatial database of nearly 100 000 reservoirs in ChinaStable water isotope monitoring network of different water bodies in Shiyang River basin, a typical arid river in ChinaA dataset of lake-catchment characteristics for the Tibetan PlateauQUADICA: water QUAlity, DIscharge and Catchment Attributes for large-sample studies in GermanyA global terrestrial evapotranspiration product based on the three-temperature model with fewer input parameters and no calibration requirementA new snow depth data set over northern China derived using GNSS interferometric reflectometry from a continuously operating network (GSnow-CHINA v1.0, 2013–2022)Microwave radiometry experiment for snow in Altay, China: time series of in situ data for electromagnetic and physical features of snowpackAn integrated dataset of daily lake surface water temperature over the Tibetan Plateau
Aolin Jia, Shunlin Liang, Dongdong Wang, Lei Ma, Zhihao Wang, and Shuo Xu
Earth Syst. Sci. Data, 15, 869–895,Short summary
Satellites are now producing multiple global land surface temperature (LST) products; however, they suffer from data gaps caused by cloud cover, seriously restricting the applications, and few products provide gap-free global hourly LST. We produced global hourly, 5 km, all-sky LST data from 2011 to 2021 using geostationary and polar-orbiting satellite data. Based on the assessment, it has high accuracy and can be used to estimate evapotranspiration, drought, etc.
Jianxin Zhang, Kai Liu, and Ming Wang
Earth Syst. Sci. Data, 15, 521–540,Short summary
This study successfully extracted global flood days based on gravity satellite and precipitation data between 60° S and 60° N from 1 April 2002 to 31 August 2016. Our flood days data performed well compared with current available observations. This provides an important data foundation for analyzing the spatiotemporal distribution of large-scale floods and exploring the impact of ocean–atmosphere oscillations on floods in different regions.
Niek Jesse Speetjens, Gustaf Hugelius, Thomas Gumbricht, Hugues Lantuit, Wouter R. Berghuijs, Philip A. Pika, Amanda Poste, and Jorien E. Vonk
Earth Syst. Sci. Data, 15, 541–554,Short summary
The Arctic is rapidly changing. Outside the Arctic, large databases changed how researchers look at river systems and land-to-ocean processes. We present the first integrated pan-ARctic CAtchments summary DatabasE (ARCADE) (> 40 000 river catchments draining into the Arctic Ocean). It incorporates information about the drainage area with 103 geospatial, environmental, climatic, and physiographic properties and covers small watersheds , which are especially subject to change, at a high resolution
Ionut Cristi Nicu, Letizia Elia, Lena Rubensdotter, Hakan Tanyaş, and Luigi Lombardo
Earth Syst. Sci. Data, 15, 447–464,Short summary
Thaw slumps and thermo-erosion gullies are cryospheric hazards that are widely encountered in Nordenskiöld Land, the largest and most compact ice-free area of the Svalbard Archipelago. By statistically analysing the landscape characteristics of locations where these processes occurred, we can estimate where they may occur in the future. We mapped 562 thaw slumps and 908 thermo-erosion gullies and used them to create the first multi-hazard susceptibility map in a high-Arctic environment.
Youjiang Shen, Dedi Liu, Liguang Jiang, Karina Nielsen, Jiabo Yin, Jun Liu, and Peter Bauer-Gottwein
Earth Syst. Sci. Data, 14, 5671–5694,Short summary
A data gap of 338 Chinese reservoirs with their surface water area (SWA), water surface elevation (WSE), and reservoir water storage change (RWSC) during 2010–2021. Validation against the in situ observations of 93 reservoirs indicates the relatively high accuracy and reliability of the datasets. The unique and novel remotely sensed dataset would benefit studies involving many aspects (e.g., hydrological models, water resources related studies, and more).
Pei Zhang, Donghai Zheng, Rogier van der Velde, Jun Wen, Yaoming Ma, Yijian Zeng, Xin Wang, Zuoliang Wang, Jiali Chen, and Zhongbo Su
Earth Syst. Sci. Data, 14, 5513–5542,Short summary
Soil moisture and soil temperature (SMST) are important state variables for quantifying the heat–water exchange between land and atmosphere. Yet, long-term, regional-scale in situ SMST measurements at multiple depths are scarce on the Tibetan Plateau (TP). The presented dataset would be valuable for the evaluation and improvement of long-term satellite- and model-based SMST products on the TP, enhancing the understanding of TP hydrometeorological processes and their response to climate change.
Jafet C. M. Andersson, Jonas Olsson, Remco (C. Z.) van de Beek, and Jonas Hansryd
Earth Syst. Sci. Data, 14, 5411–5426,Short summary
This article presents data from three types of sensors for rain measurement, i.e. commercial microwave links (CMLs), gauges, and weather radar. Access to CML data is typically restricted, which limits research and applications. We openly share a large CML database (364 CMLs at 10 s resolution with true coordinates), along with 11 gauges and one radar composite. This opens up new opportunities to study CMLs, to benchmark algorithms, and to investigate how multiple sensors can best be combined.
Qingliang Li, Gaosong Shi, Wei Shangguan, Vahid Nourani, Jianduo Li, Lu Li, Feini Huang, Ye Zhang, Chunyan Wang, Dagang Wang, Jianxiu Qiu, Xingjie Lu, and Yongjiu Dai
Earth Syst. Sci. Data, 14, 5267–5286,Short summary
SMCI1.0 is a 1 km resolution dataset of daily soil moisture over China for 2000–2020 derived through machine learning trained with in situ measurements of 1789 stations, meteorological forcings, and land surface variables. It contains 10 soil layers with 10 cm intervals up to 100 cm deep. Evaluated by in situ data, the error (ubRMSE) ranges from 0.045 to 0.051, and the correlation (R) range is 0.866-0.893. Compared with ERA5-Land, SMAP-L4, and SoMo.ml, SIMI1.0 has higher accuracy and resolution.
Utkarsh Mital, Dipankar Dwivedi, James B. Brown, and Carl I. Steefel
Earth Syst. Sci. Data, 14, 4949–4966,Short summary
We present a new dataset that estimates small-scale variations in precipitation and temperature in mountainous terrain. The dataset is generated using a new machine learning framework that extracts relationships between climate and topography from existing coarse-scale datasets. The generated dataset is shown to capture small-scale variations more reliably than existing datasets and constitutes a valuable resource to model the water cycle in the mountains of Colorado, western United States.
Rongzhu Qin, Zeyu Zhao, Jia Xu, Jian-Sheng Ye, Feng-Min Li, and Feng Zhang
Earth Syst. Sci. Data, 14, 4793–4810,Short summary
This work presents a new high-resolution daily gridded maximum temperature, minimum temperature, and precipitation dataset for China (HRLT) with a spatial resolution of 1 × 1 km for the period 1961 to 2019. This dataset is valuable for crop modelers and climate change studies. We created the HRLT dataset using comprehensive statistical analyses, which included machine learning, the generalized additive model, and thin-plate splines.
Lobke Rotteveel, Franz Heubach, and Shannon M. Sterling
Earth Syst. Sci. Data, 14, 4667–4680,Short summary
Data are needed to detect environmental problems, find their solutions, and identify knowledge gaps. Existing datasets have limited availability, sample size and/or frequency, or geographic scope. Here, we begin to address these limitations by collecting, cleaning, standardizing, and compiling the Surface Water Chemistry (SWatCh) database. SWatCh contains global surface water chemistry data for seven continents, 24 variables, 33 722 sites, and > 5 million samples collected between 1960 and 2022.
Giuseppe Amatulli, Jaime Garcia Marquez, Tushar Sethi, Jens Kiesel, Afroditi Grigoropoulou, Maria M. Üblacker, Longzhu Q. Shen, and Sami Domisch
Earth Syst. Sci. Data, 14, 4525–4550,Short summary
Streams and rivers drive several processes in hydrology, geomorphology, geography, and ecology. A hydrographic network that accurately delineates streams and rivers, along with their topographic and topological properties, is needed for environmental applications. Using the MERIT Hydro Digital Elevation Model at 90 m resolution, we derived a globally seamless, standardised hydrographic network: Hydrography90m. The validation demonstrates improved accuracy compared to other datasets.
Yang Liu, Ronggao Liu, and Rong Shang
Earth Syst. Sci. Data, 14, 4505–4523,Short summary
Surface water has been changing significantly with high seasonal variation and abrupt change, making it hard to capture its interannual trend. Here we generated a global annual surface water cover frequency dataset during 2000–2020. The percentage of the time period when a pixel is covered by water in a year was estimated to describe the seasonal dynamics of surface water. This dataset can be used to analyze the interannual variation and change trend of highly dynamic inland water extent.
Gifty Attiah, Homa Kheyrollah Pour, and K. Andrea Scott
Earth Syst. Sci. Data Discuss.,
Revised manuscript accepted for ESSDShort summary
Lake surface temperature (LST) is a significant indicator of climate change and influences local weather and climate. This study developed a LST product (North Slave LST) retrieved from Landsat archives for 535 lakes across the North Slave region, Northwest Territories, Canada. The North Slave LST dataset will provide communities, scientists, and stakeholders with spatial and temporal changing trends of temperature on lakes for the past 38 years (1984–2021).
Simone Persiano, Alessio Pugliese, Alberto Aloe, Jon Olav Skøien, Attilio Castellarin, and Alberto Pistocchi
Earth Syst. Sci. Data, 14, 4435–4443,Short summary
For about 24000 river basins across Europe, this study provides a continuous representation of the streamflow regime in terms of empirical flow–duration curves (FDCs), which are key signatures of the hydrological behaviour of a catchment and are widely used for supporting decisions on water resource management as well as for assessing hydrologic change. FDCs at ungauged sites are estimated by means of a geostatistical procedure starting from data observed at about 3000 sites across Europe.
Maartje C. Korver, Emily Haughton, William C. Floyd, and Ian J. W. Giesbrecht
Earth Syst. Sci. Data, 14, 4231–4250,Short summary
The central coastline of the northeast Pacific coastal temperate rainforest contains many small streams that are important for the ecology of the region but are sparsely monitored. Here we present the first 5 years (2013–2019) of streamflow and weather data from seven small streams, using novel automated methods with estimations of measurement uncertainties. These observations support regional climate change monitoring and provide a scientific basis for environmental management decisions.
Sadaf Nasreen, Markéta Součková, Mijael Rodrigo Vargas Godoy, Ujjwal Singh, Yannis Markonis, Rohini Kumar, Oldrich Rakovec, and Martin Hanel
Earth Syst. Sci. Data, 14, 4035–4056,Short summary
This article presents a 500-year reconstructed annual runoff dataset for several European catchments. Several data-driven and hydrological models were used to derive the runoff series using reconstructed precipitation and temperature and a set of proxy data. The simulated runoff was validated using independent observed runoff data and documentary evidence. The validation revealed a good fit between the observed and reconstructed series for 14 catchments, which are available for further analysis.
Chunqiao Song, Chenyu Fan, Jingying Zhu, Jida Wang, Yongwei Sheng, Kai Liu, Tan Chen, Pengfei Zhan, Shuangxiao Luo, Chunyu Yuan, and Linghong Ke
Earth Syst. Sci. Data, 14, 4017–4034,Short summary
Over the last century, many dams/reservoirs have been built globally to meet various needs. The official statistics reported more than 98 000 dams/reservoirs in China. Despite the availability of several global-scale dam/reservoir databases, these databases have insufficient coverage in China. Therefore, we present the China Reservoir Dataset (CRD), which contains 97 435 reservoir polygons. The CRD reservoirs have a total area of 50 085.21 km2 and total storage of about 979.62 Gt.
Guofeng Zhu, Yuwei Liu, Peiji Shi, Wenxiong Jia, Junju Zhou, Yuanfeng Liu, Xinggang Ma, Hanxiong Pan, Yu Zhang, Zhiyuan Zhang, Zhigang Sun, Leilei Yong, and Kailiang Zhao
Earth Syst. Sci. Data, 14, 3773–3789,Short summary
From 2015 to 2020, we studied the Shiyang River basin, which has the highest utilization rate of water resources and the most prominent contradiction of water use, as a typical demonstration basin to establish and improve the isotope hydrology observation system, including river source region, oasis region, reservoir channel system region, oasis farmland region, ecological engineering construction region, and salinization process region.
Junzhi Liu, Pengcheng Fang, Yefeng Que, Liang-Jun Zhu, Zheng Duan, Guoan Tang, Pengfei Liu, Mukan Ji, and Yongqin Liu
Earth Syst. Sci. Data, 14, 3791–3805,Short summary
The management and conservation of lakes should be conducted in the context of catchments because lakes collect water and materials from their upstream catchments. This study constructed the first dataset of lake-catchment characteristics for 1525 lakes with an area from 0.2 to 4503 km2 on the Tibetan Plateau (TP), which provides exciting opportunities for lake studies in a spatially explicit context and promotes the development of landscape limnology on the TP.
Pia Ebeling, Rohini Kumar, Stefanie R. Lutz, Tam Nguyen, Fanny Sarrazin, Michael Weber, Olaf Büttner, Sabine Attinger, and Andreas Musolff
Earth Syst. Sci. Data, 14, 3715–3741,Short summary
Environmental data are critical for understanding and managing ecosystems, including the mitigation of water quality degradation. To increase data availability, we present the first large-sample water quality data set (QUADICA) of riverine macronutrient concentrations combined with water quantity, meteorological, and nutrient forcing data as well as catchment attributes. QUADICA covers 1386 German catchments to facilitate large-sample data-driven and modeling water quality assessments.
Leiyu Yu, Guo Yu Qiu, Chunhua Yan, Wenli Zhao, Zhendong Zou, Jinshan Ding, Longjun Qin, and Yujiu Xiong
Earth Syst. Sci. Data, 14, 3673–3693,Short summary
Accurate evapotranspiration (ET) estimation is essential to better understand Earth’s energy and water cycles. We estimate global terrestrial ET with a simple three-temperature model, without calibration and resistance parameterization requirements. Results show the ET estimates agree well with FLUXNET EC data, water balance ET, and other global ET products. The proposed daily and 0.25° ET product from 2001 to 2020 could provide large-scale information to support water-cycle-related studies.
Wei Wan, Jie Zhang, Liyun Dai, Hong Liang, Ting Yang, Baojian Liu, Zhizhou Guo, Heng Hu, and Limin Zhao
Earth Syst. Sci. Data, 14, 3549–3571,Short summary
The GSnow-CHINA data set is a snow depth data set developed using the two Global Navigation Satellite System station networks in China. It includes snow depth of 24, 12, and 2/3/6 h records, if possible, for 80 sites from 2013–2022 over northern China (25–55° N, 70–140° E). The footprint of the data set is ~ 1000 m2, and it can be used as an independent data source for validation purposes. It is also useful for regional climate research and other meteorological and hydrological applications.
Liyun Dai, Tao Che, Yang Zhang, Zhiguo Ren, Junlei Tan, Meerzhan Akynbekkyzy, Lin Xiao, Shengnan Zhou, Yuna Yan, Yan Liu, Hongyi Li, and Lifu Wang
Earth Syst. Sci. Data, 14, 3509–3530,Short summary
An Integrated Microwave Radiometry Campaign for Snow (IMCS) was conducted to collect ground-based passive microwave and optical remote-sensing data, snow pit and underlying soil data, and meteorological parameters. The dataset is unique in continuously providing electromagnetic and physical features of snowpack and environment. The dataset is expected to serve the evaluation and development of microwave radiative transfer models and snow process models, along with land surface process models.
Linan Guo, Hongxing Zheng, Yanhong Wu, Lanxin Fan, Mengxuan Wen, Junsheng Li, Fangfang Zhang, Liping Zhu, and Bing Zhang
Earth Syst. Sci. Data, 14, 3411–3422,Short summary
Lake surface water temperature (LSWT) is a critical physical property of the aquatic ecosystem and an indicator of climate change. By combining the strengths of satellites and models, we produced an integrated dataset on daily LSWT of 160 large lakes across the Tibetan Plateau (TP) for the period 1978–2017. LSWT increased significantly at a rate of 0.01–0.47° per 10 years. The dataset can contribute to research on water and heat balance changes and their ecological effects in the TP.
Agliamzanov, R., Sit, M., and Demir, I.: Hydrology@ Home: a distributed volunteer computing framework for hydrological research and applications, J. Hydroinform., 22, 235–248, 2020.
Athira, V., Geetha, P., Vinayakumar, R., and Soman, K. P.: Deepairnet: Applying recurrent networks for air quality prediction, Proc. Comput. Sci., 132, 1394–1403, 2018.
Bai, Y., Bezak, N., Sapač, K., Klun, M., and Zhang, J.: Short-term streamflow forecasting using the feature-enhanced regression model, Water Resour. Manage., 33, 4783–4797, 2019.
Chung, J., Gulcehre, C., Cho, K., and Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling, arXiv [preprint], https://doi.org/10.48550/arXiv.1412.3555, 2014.
Cybenko, G.: Approximation by superpositions of a sigmoidal function, Math. Control Signal., 2, 303–314, 1989.
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., and Fei-Fei, L.: Imagenet: A large-scale hierarchical image database, in: 2009 IEEE conference on computer vision and pattern recognition, Miami, FL, USA, 20–25 June 2009 248–255, https://doi.org/10.1109/CVPR.2009.5206848, 2009.
Demir, I., Xiang, Z., Demiray, B. Z., and Sit, M.: WaterBench-Iowa: A Large-scale Benchmark Dataset for Data-Driven Streamflow Forecasting, Zenodo [data set and code], https://doi.org/10.5281/zenodo.7087806, 2022a.
Demir, I., Xiang, Z., Demiray, B. Z., and Sit, M.: WaterBench, GitHub [data set], https://www.github.com/uihilab/WaterBench, last access: 10 June 2022.
Demiray, B. Z., Sit, M., and Demir, I.: D-SRGAN: DEM > super-resolution with generative adversarial networks, SN Comput. Sci., 2, 1–11, 2021.
Du, S., Li, T., Yang, Y., and Horng, S. J.: Deep Air Quality Forecasting Using Hybrid Deep Learning Framework, IEEE T. Knowl. Data En., 33, 2412–2424, https://doi.org/10.1109/TKDE.2019.2954510, 2019.
Ebert-Uphoff, I., Thompson, D. R., Demir, I., Gel, Y. R., Karpatne, A., Guereque, M., Kumar, V., Cabral-Cano, E., and Smyth, P.: A vision for the development of benchmarks to bridge geoscience and data science, in: 17th International Workshop on Climate Informatics, Boulder, CO, USA, 20–22 September 2017, https://par.nsf.gov/servlets/purl/10143795 (last access: 10 June 2022), 2017.
Fonley, M., Mantilla, R., Small, S. J., and Curtu, R.: On the propagation of diel signals in river networks using analytic solutions of flow equations, Hydrol. Earth Syst. Sci., 20, 2899–2912, https://doi.org/10.5194/hess-20-2899-2016, 2016.
Franz, K. J., Hogue, T. S., and Sorooshian, S.: Operational snow modeling: Addressing the challenges of an energy balance model for National Weather Service forecasts, J. Hydrol., 360, 48–66, 2008.
Gao, S., Huang, Y., Zhang, S., Han, J., Wang, G., Zhang, M., and Lin, Q.: Short-term runoff prediction with GRU and LSTM networks without requiring time step optimization during sample generation, J. Hydrol., 589, 125188, https://doi.org/10.1016/j.jhydrol.2020.125188, 2020.
Gericke, O. J. and Du Plessis, J. A.: Catchment parameter analysis in flood hydrology using GIS applications, J. S. Afr. Inst. Civ. Eng., 54, 15–26, 2012.
Godfried, I., Mahajan, K., Wang, M., Li, K., and Tiwari, P.: FlowDB a large scale precipitation, river, and flash flood dataset, arXiv [preprint], https://doi.org/10.48550/arXiv.2012.11154, 2020.
Goodfellow, I., Bengio, Y., Courville, A. and Bengio, Y.: Deep learning, Vol. 1, Cambridge, MIT press, ISBN 978-0262035613, 2016.
Guo, T., Lin, T., and Lu, Y.: An interpretable LSTM neural network for autoregressive exogenous model, arXiv [preprint], https://doi.org/10.48550/arXiv.1804.05251, 2018.
Guo, Y., Zhang, L., Hu, Y., He, X., and Gao, J.: Ms-celeb-1m: A dataset and benchmark for large-scale face recognition, in: European conference on computer vision, ECCV2016 conference, Amsterdam, 8–16 October 2016, Springer, Cham, 87–102, https://doi.org/10.48550/arXiv.1607.08221, 2016.
Hochreiter, S. and Schmidhuber, J.: Long short-term memory, Neural Comput., 9, 1735–1780, 1997.
Hornik, K., Stinchcombe, M., and White, H.: Multilayer feedforward networks are universal approximators, Neural Networks, 2, 359–366, 1989.
Hu, C., Wu, Q., Li, H., Jian, S., Li, N., and Lou, Z.: Deep learning with a long short-term memory networks approach for rainfall-runoff simulation, Water, 10, 1543, https://doi.org/10.3390/w10111543, 2018.
Iowa Department of Natural Resources: Chapter 1 Iowa's Water Resources, http://www.iowadnr.gov/portals/idnr/uploads/water/watershed/files/nonpoint plan/nps04.pdf, last access: 10 June 2022.
Krajewski, W. F., Ceynar, D., Demir, I., Goska, R., Kruger, A., Langel, C., Mantilla, R., Niemeier, J., Quintero, F., Seo, B., Small, S., Weber, L., and Young, N.: Real-time flood forecasting and information system for the state of Iowa, B. Am. Meteorol. Soc., 98, 539–554, https://doi.org/10.1175/BAMS-D-15-00243.1, 2017.
Krajewski, W. F., Ghimire, G. R., and Quintero, F.: Streamflow Forecasting without Models, J. Hydrometeorol., 21, 1689–1704, 2020.
Kratzert, F., Klotz, D., Brenner, C., Schulz, K., and Herrnegger, M.: Rainfall–runoff modelling using Long Short-Term Memory (LSTM) networks, Hydrol. Earth Syst. Sci., 22, 6005–6022, https://doi.org/10.5194/hess-22-6005-2018, 2018.
Kratzert, F., Klotz, D., Herrnegger, M., Sampson, A. K., Hochreiter, S., and Nearing, G. S.: Toward improved predictions in ungauged basins: Exploiting the power of machine learning, Water Resour. Res., 55, 11344–11354, 2019.
LeCun, Y., Bengio, Y., and Hinton, G.: Deep learning, Nature, 521, 436–444, 2015.
Leshno, M., Lin, V. Y., Pinkus, A., and Schocken, S.: Multilayer feedforward networks with a nonpolynomial activation function can approximate any function, Neural Networks, 6, 861–867, 1993.
Li, Y., Zhu, Z., Kong, D., Han, H., and Zhao, Y.: EA-LSTM: Evolutionary attention-based LSTM for time series prediction, Knowl.-Based Syst., 181, 104785, https://doi.org/10.1016/j.knosys.2019.05.028, 2019.
Lin, Y.: GCIP/EOP Surface: Precipitation NCEP/EMC 4KM Gridded Data (GRIB) Stage IV Data, version 1.0, UCAR/NCAR Earth Observing Laboratory [data set], https://data.eol.ucar.edu/dataset/21.093 (last access: 10 June 2022), 2011.
Liu, W., Guo, G., Chen, F., and Chen, Y.: Meteorological pattern analysis assisted daily PM2.5 grades prediction using SVM optimized by PSO algorithm, Atmos. Pollut. Res., 10, 1482–1491, 2019.
Mandapaka, P. V., Krajewski, W. F., Mantilla, R., and Gupta, V. K.: Dissecting the effect of rainfall variability on the statistical structure of peak flows, Adv. Water Resour., 32, 1508–1525, 2009.
Mantilla, R. and Gupta, V. K.: A GIS numerical framework to study the process basis of scaling statistics in river networks, IEEE Geosci. Remote S., 2, 404–408, 2005.
Mantilla, R., Gupta, V. K., and Troutman, B. M.: Scaling of peak flows with constant flow velocity in random self-similar networks, Nonlin. Processes Geophys., 18, 489–502, https://doi.org/10.5194/npg-18-489-2011, 2011.
Maskey, M., Alemohammad, H., Murphy, K. J., and Ramachandran, R.: Advancing AI for Earth science: A data systems perspective, EOS, 101, https://doi.org/10.1029/2020EO151245, 2020.
McEnery, J., Ingram, J., Duan, Q., Adams, T., and Anderson, L.: NOAA's advanced hydrologic prediction service: building pathways for better science in water forecasting, B. Am. Meteorol. Soc., 86, 375–386, 2005.
Newman, A., Sampson, K., Clark, M., Bock, A., Viger, R., and Blodgett, D.: A large sample watershed-scale hydrometeorological dataset for the contiguous USA, UCAR/NCAR, Boulder, CO, https://doi.org/10.5065/D6MW2F4D, 2014.
Post, W. M. and Zobler, L.: Global Soil Types, 0.5-Degree Grid (Modified Zobler), ORNL DAAC [data set], Oak Ridge, Tennessee, USA, https://doi.org/10.3334/ORNLDAAC/540, 2000.
Reichstein, M., Camps-Valls, G., Stevens, B., Jung, M., Denzler, J., and Carvalhais, N.: Deep learning and process understanding for data-driven Earth system science, Nature, 566, 195–204, 2019.
Sagheer, A. and Kotb, M.: Unsupervised pre-training of a Deep LStM-based Stacked Autoencoder for Multivariate time Series forecasting problems, Sci. Rep., 9, 1–16, 2019.
Seeger, M., Salinas, D., and Flunkert, V.: Bayesian intermittent demand forecasting for large inventories, in: Proceedings of the 30th International Conference on Neural Information Processing Systems, 4653–4661, ISBN 9781510838819, 2016.
Seo, B. C., Krajewski, W. F., Quintero, F., ElSaadani, M., Goska, R., Cunha, L. K., and Petersen, W. A.: Comprehensive evaluation of the IFloodS radar rainfall products for hydrologic applications, J. Hydrometeorol., 19, 1793–1813, 2018.
Seo, B. C., Keem, M., Hammond, R., Demir, I., and Krajewski, W. F.: A pilot infrastructure for searching rainfall metadata and generating rainfall product using the big data of NEXRAD, Environ. Modell. Softw., 117, 69–75, 2019.
Sit, M. and Demir, I.: Decentralized flood forecasting using deep neural networks, arXiv [preprint], https://doi.org/10.48550/arXiv.1902.02308, 2019.
Sit, M., Sermet, Y., and Demir, I.: Optimized watershed delineation library for server-side and client-side web applications, Open Geospatial Data, Software and Standards, 4, 1–10, 2019.
Sit, M., Demiray, B. Z., Xiang, Z., Ewing, G. J., Sermet, Y., and Demir, I.: A comprehensive review of deep learning applications in hydrology and water resources, Water Sci. Technol., 82, 2635–2670, 2020.
Sit, M., Demiray, B., and Demir, I.: Short-term hourly streamflow prediction with graph convolutional gru networks, arXiv [preprint], https://doi.org/10.48550/arXiv.2107.07039 2021a.
Sit, M., Seo, B. C., and Demir, I.: Iowarain: A statewide rain event dataset based on weather radars and quantitative precipitation estimation, arXiv [preprint], https://doi.org/10.48550/arXiv.2107.03432 2021b.
Sloan, B. P., Mantilla, R., Fonley, M., and Basu, N. B.: Hydrologic impacts of subsurface drainage from the field to watershed scale, Hydrol. Process., 31, 3017–3028, 2017.
Su, S., Delbracio, M., Wang, J., Sapiro, G., Heidrich, W., and Wang, O.: Deep video deblurring for hand-held cameras, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1279–1288, 2017.
Tao, Q., Liu, F., Li, Y., and Sidorov, D.: Air pollution forecasting using a deep learning model based on 1D convnets and bidirectional GRU, IEEE Access, 7, 76690–76698, 2019.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I.: Attention is all you need, arXiv [preprint], https://doi.org/10.48550/arXiv.1706.03762, 2017.
Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., and Mons, B.: The FAIR Guiding Principles for scientific data management and stewardship, Sci. Data, 3, 1–9, 2016.
Xiang, Z. and Demir, I.: Distributed long-term hourly streamflow predictions using deep learning – A case study for State of Iowa, Environ. Modell. Softw., 131, 104761, https://doi.org/10.1016/j.envsoft.2020.104761, 2020.
Xiang, Z., Yan, J., and Demir, I.: A rainfall-runoff model with LSTM-based sequence-to-sequence learning, Water Resour. Res., 56, e2019WR025326, https://doi.org/10.1029/2019WR025326, 2020.
Xiang, Z., Demir, I., Mantilla, R., and Krajewski, W. F.: A Regional Semi-Distributed Streamflow Model Using Deep Learning, EarthArXiv, https://doi.org/10.31223/X5GW3V, 2021.
Xu, H., Windsor, M., Muste, M., and Demir, I.: A web-based decision support system for collaborative mitigation of multiple water-related hazards using serious gaming, J. Environ. Manage., 255, 109887, https://doi.org/10.1016/j.jenvman.2019.109887, 2020.
Xue, T., Chen, B., Wu, J., Wei, D., and Freeman, W. T.: Video enhancement with task-oriented flow, Int. J. Comput. Vis., 127, 1106–1125, 2019.
Yildirim, E., and Demir, I.: An Integrated Flood Risk Assessment and Mitigation Framework: A Case Study for Middle Cedar River Basin, Iowa, US, Int. J. Disast. Risk Re., 56, 102113, https://doi.org/10.1016/j.ijdrr.2021.102113, 2021.
Yu, H. F., Rao, N., and Dhillon, I. S.: Temporal Regularized Matrix Factorization for High-dimensional Time Series Prediction, in: Proceedings of Advances in Neural Information Processing Systems, 29, 847–855, ISBN 9781510838819, 2016.
Zhang, J., Zhu, Y., Zhang, X., Ye, M., and Yang, J.: Developing a Long Short-Term Memory (LST) based model for predicting water table depth in agricultural areas, J. Hydrol., 561, 918–929, https://doi.org/10.1016/j.jhydrol.2018.04.065, 2018.
Zhu, S., Lian, X., Wei, L., Che, J., Shen, X., Yang, L., and Li, J.: PM2.5 forecasting using SVR with PSOGSA algorithm based on CEEMD, GRNN and GCA considering meteorological factors, Atmos. Environ., 183, 20–32, 2018.
We provide a large benchmark dataset, WaterBench-Iowa, with valuable features for hydrological modeling. This dataset is designed to support cutting-edge deep learning studies for a more accurate streamflow forecast model. We also propose a modeling task for comparative model studies and provide sample models with codes and results as the benchmark for reference. This makes up for the lack of benchmarks in earth science research.
We provide a large benchmark dataset, WaterBench-Iowa, with valuable features for hydrological...