the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
The UWO dataset – long-term observations from a full-scale field laboratory to better understand urban hydrology at small spatio-temporal scales
Abstract. Urban drainage systems are integral infrastructural components. However, their monitoring poses considerable challenges owing to the intricate, hazardous nature of the process, necessitating substantial resources and expertise. These inherent uncertainties act as deterrents, discouraging active involvement of researchers and sewer operators in the rigorous monitoring and utilization of data for a comprehensive understanding and efficient management of drainage-related processes. Consequently, a notable absence of openly available urban drainage datasets hampers exploring their potential for engineering applications, scientific analysis, and societal benefits. In this study, we present a distinctive dataset from the Urban Water Observatory (UWO) in Fehraltorf, Switzerland. This dataset is unique in terms of its completeness, consistency, extensive observation period, high spatio-temporal resolution and its availability in the public domain. The dataset comprises coherent information from 124 sensors that observe rainfall-runoff processes, wastewater and in-sewer atmosphere temperatures. Of these 124 sensors, 89 transmit their signals via a specifically set-up wireless network using long-range, low-power transmission technologies. Sensor data have a temporal resolution of 1–5 minutes and covers a period of three years from 2019–2021. To make the data interpretable and re-useable we provide systematically collected meta-data, data on sewer infrastructure, associated geo-information including a validated hydrodynamic rainfall-runoff model. Basic data quality checks were performed, and we motivate future research on the dataset with five selected research opportunities from detecting anomalies in the data to assessing groundwater infiltration and the capability of the low-power data transmission. We conclude that robust automated data quality checks, standardized data exchange formats, and a systematic meta-data collection are needed to boost interpretability and usability of urban drainage data. In the future, ontologies and knowledge graphs should be developed to expand the application of sewer observation data in solving scientific and practical problems.
- Preprint
(1616 KB) - Metadata XML
-
Supplement
(6396 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on essd-2024-47', Agnethe Nedergaard Pedersen, 09 Feb 2025
Dear authors
Thank you for a very well written and descriptive paper of the UWO dataset. The submission describes a three year long comprehensive measuring campaign of the urban drainage system in a Swiss minor city along with an accompanying SWMM model. The dataset is unique and well described in the manuscript with thorough supplementary material. The manuscript highlights well-thought potential application areas of the dataset. Dataset is made accessible and with a link to a data viewer. Translating this viewer would be of high value for the future users not able to read Swiss.
The data structure of the SQL databases are well structured and it is easy to understand data.
I have a few comments and corrections for the manuscript:
Maybe a question of taste, but why do you use both footnotes and references?
Figure 2, top: It seems that the top figure is missing.
Figure 2, buttom: It is not very clear which combination of the letters and numbers that are the naming of the manhole. Suggestion to make bold text for names. The light colors are not clear when being printed.Figure 3, left: It is not easy to read the small text. Please make it larger. The throttle says <75 L/s but in Figure 2 it says 80 L/s. Which one is correct? (Please also correct in text Line143). Are some of the text tagnames of the sensors?
Line 142. What does it mean that the nearby villages are “largely” connected to Fehreltorfs. Please specify.
Figure 4, left: The legends are hard to read. Are groundwater levels also a part of the dataset? Otherwise consider removing on figure, or indicate where to find data. I cannot see the WWTP on the figure. Is it hidden behind the other signatures?
Line 235ff: Is the maintenance information from the utility company available for the specific measuring period?
Line 375ff: You mention the installation of a flow-limiting hardware. Can you clarify where the device is installed or give a reference to the supplementary information. Wouldn’t it affect the measurements?
Line 393: Please clarify what “similar to previous work” means.
Line 444: You mention GWI ranges from 10-15 L/s. But where? At the treatment plant or at all sensor-points?
Line 445ff: How do you know how many manholes the high groundwater table affected? Is it based on traceback assumptions? Please elaborate.
Line 465ff: Great with an example of how redundant sensors can check data quality, but the text could be improved. I am not sure it is an incorrect level sensor processing, but erroneous sensor settings. Please clarify.
Figure 9, left: The triangles are really hard to read in print version.
Figure 9, middle and right: Please give appropriate titles to the figures.Line 488 and 489: Should K (-2 K and 5 K) be replaced with Celsius degree?
Table 2: You have an asterisk, what does it mean?
Citation: https://doi.org/10.5194/essd-2024-47-RC1 -
RC2: 'Comment on essd-2024-47', Anonymous Referee #2, 13 Mar 2025
In this work, the authors presented the Urban Water Observatory dataset. The dataset includes the data of more than 120 sensors over three years in high temporal resolution, complemented by descriptions about the underlying sewer network. Besides the description of the dataset, data access, implemented quality check and some preliminary analyses are presented. THE UWO dataset represents a comprehensive dataset highly needed in the field of urban drainage networks, and can have therefore a significant impact on future research.
In the invitation to the review, it was mentioned that the provided dataset should be interpretable without looking at the manuscript. Therefore, I started to provide the feedback with the data set first and then continued with the manuscript. Please find detailed comments below, which should be addressed in a revision of this work.
*Data base*
- It is specified in the journal instruction, that “data are distributed under a non-restrictive licence such as CC BY 4.0 or equivalent”. Currently, “No License Provided” is mentioned on the webpages. Therefore, please include a corresponding licence.
- In the provided links, there are two databases with different time lengths, namely the downloadable SQL databases with duration from 2019 to 2021 and the database which is assessable via the web platform with data up to now. What is the difference between these two databases besides the different time lengths, what was the motivation for having two different databases, and which database should be used in future? Additionally, it is unclear for me, why the downloadable SQL databases are only until 2021 and a time near data provision would clearly increase the impact of this work. To further increase the impact of this work, I would also suggest to provide scripts for an access to the database accessible via the web platform, which allows the access to the newer data.
- All the information and files are provided for access to the databases. However, they are downloadable and described on separated webpages, requiring some efforts to understand how they are connected to each other. Therefore, I suggest placing them either all on one webpage or provide an overview of the individual webpages with key points on the content to improve the comprehensibility.
- When running the supplementary SWMM files I get the error of the missing rain file “r02_mm_utc0_1min_corr.dat”. Besides, it seems that the coordinates for rus are wrong in faf_rus.inp.
- Additionally, what does A1 to A4 in the file “data_uwo_sqlite_content_overview.csv” mean? It is described in the manuscript, but it should be also described at the platform.
- I tested the provided Python script for extracting data from the SQL databases, please also include an information, if there are any limitation regarding Python version and if yes, which Python version is needed for running the scripts. Further suggestions to improve the usability are to include some exemplary information in the main function as comment, e.g., “# data_uwo_2019.sql in the folder uwo_data_slice” for add_argument filename and “#data_uwo_sqlite_content_overview.csv in the folder uwo_data_slice” for add_argument contentlist.
*Manuscript*
In general, the manuscript is well written and understandable. However, the structure and content of the individual sections as well as their differences are not always comprehensible:
- This is for example obvious for Section 2 Material and Section 3 Methods, which both include parts of introduced data pipeline in Figure 3 (e.g., for the sensors and the data collection it is referred to section 2, while data warehouse with data quality checks are section 3). Also, the content is described unevenly in these two sections (e.g., much more description for the utilised sensors, while the description of the data warehouse is really short and referred to the supplementary for more information). Therefore, I recommend to combine these two sections into one section (e.g., Materials and Methods), to shorten the text and to focus on the most important information required for the readers (for detailed information, it can be referred to the Supplementary as already done by the authors) to improve the comprehensibility.
- Since the structure of this manuscript was not presented in the introduction, I would have expected that the manuscript follows the standard structure for articles, including a section on results and discussions. However, in the current version, the results and discussion section is missing, which makes the results of this work less comprehensible and unfortunately also reduces the impact of this work from my point of view. From the Introduction, I would have expected the presentation of the UWO dataset including some first analysis as the main result of this work, while from the descriptions in the Method section, it would be the analysis of the performance of the implemented processes for data quality checks. Besides, results and discussion can be found throughout the manuscript (e.g., evaluation of LoRaWAN ranges in method). Therefore, I highly recommend to include a section “Results and Discussion” focusing on the provided UWO dataset to better present this great dataset, and extending it with already performed analysis throughout the manuscript. By doing so, this should also make the conclusions drawn in section 7 better comprehensible.
- The aim of section 4 is to present some possible research opportunities. However, the descriptions are more about presenting previous research performed by the authors, while there is a lack of concrete research opportunities to the scientific community noticeable. For me, this analysis could be part of the Results and Discussions section discussed in the previous comment. Additionally, I highly recommend to include also concrete research questions/task into the manuscript.
- The conclusions drawn by the authors are not always comprehensible, as they are discussed for the first time in the manuscript (e.g., learning effects, …). Maybe this is already solved by the recommended inclusion of a Results and Discussion section, otherwise the authors can also include a subsection with a critical discussion about their learnings.
- For a better reference to the individual subfigures, I recommend including subheadings such as (a), (b) ,… instead of top/bottom.
- Figure and table captions are quite long and should be shorten. Instead, the figure and table contain should be described in more details in text.
- As described in the manuscript composition, footnotes should be avoided in the text.
- L17: Would be ‘barriers’ a better word instead of ‘deterrents.
- L32: From the descriptions, it is unclear for me how the development of ontologies and knowledge graphs can help to extend the application of sewer observation data. Besides, knowledge graph is only mentioned in the abstract.
- L38 - 52: In my opinion, the lists of challenges require more detailed explanations to make them comprehensible without detailed knowledge of drainage networks (e.g., from the literature review it is not clear, why there is a need for better understanding the rainfall-runoff processes).
- L43: Please add information which high-resolution data is needed, e.g., temporal, spatial, or both.
- L50: I would suggest to rephrase “often dubious due to the low data literacy of the sewer workforce” with something such as “is influenced by errors in the data knowledge of employees”.
- L61: The descriptions how the following factor has “fueled” the demand for open data sets can be improved, e.g., for me the advancements in lower-power electronics have opened new ways in data collection rather than increasing the need for open data sets.
- Please specify, what the main novelty of this work is compared to the mention projects CAMEL and STREAM, as it seems, they are having the same aim?
- L87: For me, the description about the provided data packages belong to the methods section rather than in the Introduction. Instead, I would expect an outlook, how the manuscript is structured.
- 1 und Fig. 5 are quite redundant and can be combined into one.
- Fig2: It seems that the top is missing. I also suggest to include the meaning of triangles (instead of the Figure caption) and circles in the legend.
- L119: Reference is made to Fig 2., but none of this information is shown there.
- L121 - 126: Instead of the historical weather values, the authors can include the measured values from the dataset. Besides, what does a high variability and frequency mean?
- L142 + Fig. 3: Is the information about RUB Morgental needed in the manuscript?
- Figure 4: The legend is hardly readable, and how are sensor and sensor nodes distinguished? Additionally, I recommend to include names of rivers and special structures, as spatial information in combination with Fig 2. Instead, the right part of the Figure would belong the results and not to Materials.
- L201: What does in contrast to Bellinge dataset mean?
- L263: Is it right, that operating LPWANs has become straight forward? In contrast, why did the author reduce the number of sensors, developed the LoRa-based mesh technology and recommend a monitoring backbone?
- L330: Could the authors give more details about the quality check – what was the motivation to focus on range and a gradient check, what was used for the valid range (e.g., system boundaries, measurement range, calibration range) and maximum or minimum gradient to distinguish between measurement fluctuations with ultrasonic sensors and rainfall event, …?
- L350: Please add details about the types of regular check, how often were they performed, how is data consistency and homogeneity defined, and what conditions have to be fulfilled to classify the data as wrong?
- L354 – 357: This paragraph should belong to the Future Research section as it is discussing future research opportunities.
- L363: Could you please describe which data (period, sensors, …) was used for the calibration.
- L375: Is it correct, that the calibrated base model has a different underlying hydraulic structure then the provided dataset? If yes, could you please provide a description about the modifications and the expected impact (e.g., which sensors should correspond with the simulations and for which sensors there is a high difference expected). This would be a really important information for future work integrating hydrodynamic model and measurement data.
- L379: Section 3.4 shows some overlapping with section 6 Data availability. To avoid these overlaps, I recommend to combine these two sections in a joint section.
- Figure 9: The total overflow duration is the same in the middle and right chart, although there should be a difference according to L467. Based on the results, why is the capacitive sensor signal not directly used for estimating the total overflow duration, and what is the need of the level sensor?
- L485: Please include a definition of headspace and bulk liquid temperatures and a description how condensation and evaporation affect the sewer heat transfer processes for a better understanding, as it remains unclear from the descriptions.
- L487: In Figure 10, only the temperature is shown and it remains unclear, how the author determines if there is a real evaporation or condensation. From my understanding, the humidity is an important factor affecting these processes. Additionally, it is unclear, what medium evaporates or condensate.
- Figure 10: The y-labels have the unit [°C], while the temperature differences in the text are °K.
- L518: It should be section 5 instead of section 6, which is the same for the following sections.
- Author contributions: Andreas Scheidegger and Uwe Schmitt are mentioned here, but they are not listed in L5.
Citation: https://doi.org/10.5194/essd-2024-47-RC2
Data sets
UWO - Field observations (2019 to 2021) Frank Blumensaat, Simon Bloem, Christian Ebi, Andy Disch, Christian Förster, Max Maurer, Mayra Rodriguez, and Jörg Rieckermann https://doi.org/10.25678/00091Y
UWO - Accompanying data (2019 to 2021) Frank Blumensaat, Simon Bloem, Christian Ebi, Andy Disch, Christian Förster, Max Maurer, Mayra Rodriguez, and Jörg Rieckermann https://doi.org/10.25678/000991
UWO - Data viewer (2019 to 2021) Frank Blumensaat, Simon Bloem, Christian Ebi, Andy Disch, Christian Förster, Max Maurer, Mayra Rodriguez, and Jörg Rieckermann https://doi.org/10.25678/00092Z
Model code and software
UWO - Data access (2019 to 2021) Frank Blumensaat, Simon Bloem, Christian Ebi, Andy Disch, Christian Förster, Max Maurer, Mayra Rodriguez, and Jörg Rieckermann https://doi.org/10.25678/000980
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
283 | 52 | 9 | 344 | 40 | 8 | 10 |
- HTML: 283
- PDF: 52
- XML: 9
- Total: 344
- Supplement: 40
- BibTeX: 8
- EndNote: 10
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1