the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A 225-Year (1799–2024) Homogenized Daily Water Level Series of the Vistula River in Warsaw
Abstract. We present a 225-year (1799–2024) homogenized daily water level series for the Vistula River in Warsaw, comprising 82,453 observations. The construction of this consistent dataset required adjustments for changes in gauge location, shifts in gauge zero, differences in historical measurement units, and calendar discrepancies between the Julian and Gregorian systems. A small number of missing observations were reconstructed using stage–stage relationships established between overlapping periods of observation at the Warsaw gauge and parallel measurements from downstream stations along the Vistula. The resulting dataset offers a robust foundation for long-term hydrological, climatic, and socio-environmental research. The dataset is openly available at Zenodo repository: https://doi.org/10.5281/zenodo.16919654 (Sobechowicz et al., 2025).
- Preprint
(1857 KB) - Metadata XML
-
Supplement
(159 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on essd-2025-538', Anonymous Referee #1, 18 Jan 2026
-
AC1: 'Reply on RC1', Łukasz Sobechowicz, 11 Mar 2026
We would like to thank the Reviewer for their thorough reading of our manuscript and for providing such constructive feedback. These insightful comments have been immensely helpful in improving the quality of our work.
Specific comments:- A simple linear regression based on the neighbouring gauges has been used to fill the data. They have also selected specific highs and lows in the data for this purpose. The authors have observed a lag of up to 4 days for the waters to travel to the neighbouring gauges. Apart from this, the authors also use the maximum, minimum, and onset points of the flood data to fill in data that do not represent these conditions. In short, I find the gross assumption of an average time lag and the use of specific points overly simplistic, given the nonlinearity of the hydrodynamics of flows of varying intensity. More evidence of these kinds of assumptions would be useful for the readers to understand the reliability of the filled data. For instance, using cross-correlation between the time series of different gauges to show the time lag.
Authors' Response:
We thank the Reviewer for this insightful and methodologically significant comment. We fully agree that the hydrodynamics of the Vistula River are inherently non-linear and that wave propagation velocity varies depending on discharge intensity. To address the concern regarding the reliability of our lag assumptions, we have conducted an extensive empirical verification using the Cross-Correlation Function (CCF). This analysis was performed on three high-resolution historical datasets to ensure that the chosen lags are not merely "simplistic assumptions" but represent the robust physical characteristics of the river reaches in a daily temporal resolution.
- Methodological Approach: First-Order Differencing
To ensure statistical rigor, all CCF calculations were performed on first-order differenced data. We deliberately avoided using raw water level values because such series are highly non-stationary and exhibit strong autocorrelation. In our preliminary tests, raw data consistently yielded a maximum correlation at Lag 0, which is a known artifact of spurious correlation in hydrological time series (where shared seasonal trends mask the actual wave propagation). By applying differencing, we isolated the short-term hydrological signals (impulses). This methodological rigor allowed us to identify the true physical travel time of the flood wave with high statistical confidence.
- Empirical Evidence for Travel Time Lags
The analysis was conducted across extensive datasets to capture different hydrological regimes:
- Warsaw – Cypel Mątowski (Reach 1): Analyzed for 1806 and 1818–1828 (approx. 4,400 daily pairs). The CCF reached its maximum at Lag 4, confirming our initial assumption. Monthly heatmap analysis shows that in periods of high activity (e.g., August), the correlation coefficient reaches at exactly 4 days.
- Warsaw – Toruń (Reach 2): To verify this reach, we analyzed two distinct periods:
- Early Period (1817–1830): Approx. 4,900 daily pairs.
- Later Period (1908–1924): Over 6,200 daily pairs.
In both cases, despite the century-long gap and potential changes in the riverbed, the strongest statistical correlation was consistently observed at Lag 2.
- Addressing Hydrodynamic Non-linearity
The Reviewer correctly pointed out the non-linearity of flows. Our monthly heatmap analysis indeed captures this phenomenon:
- During spring freshets or high-flow months (e.g., March), the correlation is sometimes distributed between two or three days.
- However, Lag 2 (for Toruń) and Lag 4 (for Cypel Mątowski) remain the modal values, the most statistically frequent and strongest signals across the entire dataset.
In a model operating on a daily resolution, adopting these modal lags is the most reliable approach for data reconstruction. The stability of these signals across more than 15,000 analyzed days (total) demonstrates that these average values are robust empirical reflections of the river's behavior during the study period.
The suggested revisions will be included in the revised version of the manuscript.
- The authors could have used robust techniques like Long Short-Term Memory (LSTM) neural networks for data filling, as they can capture non-linear temporal patterns while selectively retaining information relevant to the current output in long time series. The authors need to present the existing variations in time lag across different gauge locations and discuss the associated uncertainties. Please refer to the work done by Ren et al. (2022) in this regard.
Authors' Response:
We would like to thank the Reviewer for suggesting the use of LSTM neural networks. In response, we have incorporated this robust technique into our workflow. Rather than simply replacing the initial linear models, we have chosen to provide both LM and LSTM reconstructions in the final dataset. This enables us to discuss the associated uncertainties more effectively and demonstrates the leap in accuracy achieved by using neural networks for capturing non-linear temporal patterns, as suggested by the Reviewer. We have incorporated the recommended reference (Ren et al., 2022) to support this methodological shift.
1. Model Architecture and Training Protocol
To ensure maximum reliability and avoid the pitfalls of stochastic initialization, we implemented a rigorous training framework:
- Three Specialized Models: Based on data availability and gauge locations, we developed three distinct LSTM models:
- Warsaw–Cypel Mątowski (1799–1828): Reconstructing gaps for 1800–1816.
- Warsaw–Toruń (1817–1830): Reconstructing gaps for 1817.
- Warsaw–Toruń (1908–1924): Reconstructing gaps for 1914–1918.
- Hyperparameter Optimization: We conducted a comprehensive Grid Search to identify optimal configurations for window size (7, 15, 31 days), LSTM units, dropout rates, and batch sizes. The final models were selected based on the lowest Mean Squared Error (MSE) on the validation set.
- Overfitting Prevention: Models were trained for 50 epochs with Early Stopping and Gradient Clipping (clipvalue=1.0) to ensure stability and generalizability.
2. Validation methodology
Standard k-fold cross-validation is unsuitable for autocorrelated hydrological time series as it violates the independence assumption. Random partitioning causes 'data leakage', where future observations inform past predictions, resulting in artificially inflated performance metrics. Instead, we employed a Chronological Split (Training/Validation/Testing sets). This ensures that the testing data remained entirely "unseen" by the model during training, providing a true assessment of its predictive performance on continuous historical gaps.
3. Feature Engineering and Seasonality
To enhance the models' understanding of river dynamics, we utilized a multi-feature input vector:
- Hydrological Inputs: Raw water levels and daily increments from the reference stations.
- Cyclical Encoding: To account for the strong seasonality of the Vistula (spring snowmelts vs. summer droughts), we applied trigonometric transformations (Sine/Cosine) to the temporal data. This allows the LSTM to process the time of year as a continuous, circular feature, which is crucial for hydrological accuracy.
The suggested revisions will be included in the final version of the manuscript.
- The validation of the proposed linear-regression-based data filling is incomplete. A robust k-fold cross-validation is needed to assess the accuracy of the proposed data-filling methods. The data needs to be split iteratively into training and validation sets. Importantly, the error needs to be reported using RMSE, NSE, KGE, etc.
Authors' Response:
We have evaluated both the original Linear Models (LM) presented in the article and our newly developed LSTM architectures. Detailed results are summarized in the table below. The analysis indicates that the LM models exhibit significant inconsistency, with performance varying substantially across different years. In contrast, the LSTM models demonstrated superior accuracy and higher stability, consistently yielding lower error margins. Consequently, the LSTM-based reconstruction results will be integrated into our database, and all corresponding updates will be incorporated into the revised manuscript.
- The authors themselves say that the bed level varied extensively due to gauge relocation and anthropogenic activities. They have taken these into consideration and modified the zero level. They need to comment on the reliability and accuracy of these adjustments.
Authors' Response:
The corrections were based on documented gauge relocations and archival leveling records. The reliability of these corrections is particularly high for data recorded after 1834, as that was when the first professional leveling and connection to the height reference system were established. However, data from before 1834 may contain errors due to the reconstruction of the gauge following winter damage, which lacked a reference to a known benchmark. The precision of subsequent zero-level adjustments to the gauge depended on the accuracy of the geodetic instruments available at the time, and it is not expected to exceed 1 cm.
- Even high-resolution remote-sensing-based digital elevation models may not accurately represent riverbed topography. In addition, extensive cross-sectional surveys are needed to simulate the correct water levels that reflect the corresponding conditions. Therefore, the discharge time series is often highly useful to the hydrological community for validating hydraulic/hydrological models or for climate research. Can the authors provide a discharge time series or comment along these lines?
Authors' Response:
Hydrological data on daily discharges of the Vistula River at the Warsaw gauging station are collected and made available by the IMGW - Institute of Meteorology and Water Management in digital form for the period from 1 November 1950 to 30 October 2024. They are available in Open Access format at: https://danepubliczne.imgw.pl/.
Attempting to reconstruct time series of discharges for the Vistula River in Warsaw for the period before 1950 is definitely justified, extremely difficult and a very interesting challenge that we would like to take on in the future. This work will require a lengthy search for information on historical hydrometric measurements (cross section of the channel, flow velocity) for the Vistula River collected in various archives in Poland, Germany and Russia. Due to the very complicated history of Poland, these measurements have been widely dispersed and, at this stage, their storage locations for the 19th century have not been identified. Russian archives in particular are currently inaccessible to us. To sum up, in the future we will attempt to reconstruct and make available a series of historical values of the Vistula River discharges in Warsaw.
The minor comments are as follows,
- Line 68, “km XXX of the Vistula River”, is difficult to understand. Please change similar lines in the manuscript.
- Line 127, what do the authors mean by “km 421 + 600”? Please modify similar lines to provide greater clarity to readers.
Authors' Response:
Location refers to the official river chainage (river kilometer system) used in Poland, measured along the river course upstream from the river mouth. we have changed the indicated sections to improve readability.
- In Table 4, what does the last column indicate?
Authors' Response:
The last column contains information about the number of observation pairs used to construct each LM.
- Line 290, please use a clearer term than “early measurements”.
Authors' Response:
We have changed the indicated sections to improve readability.
- Lines 389 -417: the summary need not include the uses of the dataset.
Authors' Response:
When writing the summary, we followed the guidelines for authors. One of the elements was to indicate the potential use of the dataset.
-
AC1: 'Reply on RC1', Łukasz Sobechowicz, 11 Mar 2026
-
RC2: 'Comment on essd-2025-538', Anonymous Referee #2, 30 Mar 2026
Long-term quality-assured data sets are not only a valuable asset but even indispensable for climate change research. Processing and quality assurance of existing heterogeneous historical data is a cumbersome and time-consuming task. Thus, the data set compiled by Sobechowicz et al. including a detailed protocol of data processing steps and the underlying sources is a valuable contribution. My criticism concerns only minor details.
- Please refer to the figures in the text using either “Fig.” or “Figure” consistently.
- 57: Please add the information that river kilometre marking starts at the mouth because it is done the other way round for many other river systems.
- 125 – l. 134 and Table 1: Better replace “river km 421+600” etc. by standard decimal values.
- 149-150: Is dredging and sand extraction the only reason for enhanced riverbed erosion and subsequent decrease of river stage since the 1940ies? Has it done to improve navigability? The data suggest that this process has been ongoing until today - is that true? Can you give recommendations how to consider this long-term shift in subsequent data analyses or in hydrological models?
- 149: Do you mean “old readings” or “new readings” which “would be approximately 200 cm higher”?
- 151, l. 156: Not every reader might know the Kronstadt 60 (East European Reference System) and EVRF2007 (European Vertical Reference System) references. Please add a short explanation.
- Providing a long-term time series of river stage is a merit on its own and is highly appreciated. I fully understand that converting stage data to discharge is not trivial and likely will not be possible for historical data at all. Nevertheless, I would suggest including references to available Vistula discharge data for more recent periods at least.
- 290: I highly appreciate the accuracy assessments using Unique Daily Values (UDV) and Days Without Change (DWC)!
- 297-298, Figure 4: The selection of periods for the Welch’s t-test (1800-1830, and 2000-2023) seems to be highly arbitrary. Rather, I would suggest comparing the pre- and post-1915 period for UVP (see identified breakpoints) and performing a standard trend analysis for DWC.
- 299: I guess “M” denotes the mean, and “SD” the standard deviation”, is that right?
- 311-314: Incomplete phrase, please correct.
- Figure 4A and Figure 5: What is the difference between these figures other than using either single dots or lines, and indication of the regression line or of the break points? I would suggest merging them. In addition, I urge to delete the regression line – it is not adequate facing the clear stepwise increase around 1915.
- Figure 2 and 7 are not referred to in the text.
- Section “4 Overview of the dataset” should rather be titled “Summary”. In contrast, Section “6 Summary” should precede the Summary under a new title, e.g., “Suggested use” or the like, but not without a warning w.r.t to the long-term trend since the 1940ies, and maybe with some suggestions how to deal with it in data analysis or modelling studies.
Citation: https://doi.org/10.5194/essd-2025-538-RC2 -
AC2: 'Reply on RC2', Łukasz Sobechowicz, 21 May 2026
Publisher’s note: the supplement to this comment was edited on 22 May 2026. The adjustments were minor without effect on the scientific meaning.
We are grateful to the reviewer for the thorough and constructive reading of the manuscript. Here, we address in particular the problem of channel-bed erosion of the Vistula River in Warsaw, as well as previous attempts to reconstruct discharges for historical water-level records.
Increasingly, attempts have been made to reconstruct characteristic discharge values for the Vistula River in Warsaw in the first half of the twentieth century, based on historical water levels and archival hydrometric measurements conducted along the Warsaw reach of the Vistula in 1919–1929 (e.g., Jankowski & Stolarska, 1978; Bogdanowicz et al., 2000; Wierzbicki, 2001; Fal & Dąbrowski, 2001b; Magnuszewski et al., 2012). In addition, attempts have also been made to reconstruct, through so-called retro-modelling, the maximum discharges of the Vistula during the extreme floods of 1813, 1844, and 1884 (e.g., Kuźniar, 1997; Fal & Dąbrowski, 2001a; Wierzbicki, 2001; Kuźniar, 2008; Kuźniar & Magnuszewski, 2010; Magnuszewski et al., 2012; Magnuszewski & Moran, 2015). Examples of the results of such studies for maximum flood events in the nineteenth century and the first half of the twentieth century are presented in supplement.
In Polish hydrological literature, the phenomenon of channel-bed lowering along the Warsaw reach of the Vistula was noted relatively early. As early as 1924–1932, the average rate of this process was 6.5 cm per year (Pomianowski, 1938), while in 1950–1959 a total bed lowering of 40 cm was recorded (Zielińska, 1960). Since the 1940s, a weakening of the relationship between low water levels and the corresponding discharges has also been observed. Although mean low flows have remained at a similar level over the last century, water levels have been steadily decreasing. This indicates that the cause of this trend is not a reduction in river supply, but rather the deepening of the river channel within the urban area (Fal & Dąbrowski, 2001b). For a low-flow discharge of Q = 150 m³/s, the corresponding water level in Warsaw decreased by 205 cm between 1919 and 2010 (Magnuszewski et al., 2012).
Hydrologists identify three main factors responsible for the acceleration of this process. The first was the regulation of the Vistula channel within the urban reach, carried out from 1885 onwards, together with the construction of flood embankments that formed the so-called “Warsaw corset”. As a result of flow concentration at a discharge of approximately 670 m³/s, the mean flow velocity in the cross-section increased from 0.36 sazhen per second in October 1887 to 0.48 sazhen per second in October 1895, i.e. from 0.77 m/s to 1.02 m/s, while the active cross-section of the channel was reduced by one quarter (Szymański, 1897). At the same time, the spacing between the embankments decreased from about 1,500 m to 470–480 m, which reduced the area available for the passage of major floodwaters to only 20–30% of the former floodplain (Magnuszewski et al., 2012). The second important factor was the further narrowing of the channel during the reconstruction of Warsaw after the Second World War. At that time, the river channel and riverbed, especially the spaces between the former training groynes, were used as dumping sites for rubble, which led to a reduction in flood-flow cross-sections (Gutry-Korycka, 2010). Today, these areas are largely covered by riparian vegetation. The third factor identified as contributing to the acceleration of channel incision was the extraction of aggregate from the Vistula riverbed. This was noted by Z. Kornacki (1960), who considered it one of the main causes of channel deepening. While changes at gauge cross-sections located far upstream and downstream of Warsaw are minor, a clear lowering of the riverbed has been observed within the Warsaw reach and in its immediate vicinity. The highest rates of change were recorded at the following cross-sections: Warszawa–Nadwilanówka, with an average of 5–6 cm per year; Żerań, 4–6 cm per year; and Warsaw, approximately 2.5 cm per year (Fal & Dąbrowski, 2001b). At the turn of the twentieth and twenty-first centuries, eight companies were extracting aggregate from the Vistula riverbed along the Warsaw reach. In 1997, the total volume extracted amounted to approximately 1.5 million m³, mainly sand and gravel, which substantially exceeded the amount of material transported by the river in the near-bed zone, estimated at about 500,000 m³, or around 870,000 tonnes per year (Jacewicz, 2000; Biernacki, 2000). In addition, in order to ensure sufficient navigational depth along the urban reach of the Vistula, 3,000 m³ of boulders were removed from the river channel in 1989–1991 to maintain a navigable depth of 1.2–1.5 m (Kowalski et al., 2013).
References:
Biernacki Z., 2000, Geomorfologia i wody powierzchniowe, [w:] Wisła w Warszawie (red. J. Lickiewicz, Pawlak J, W. Pietrusiewicz), Biuro Zarządu m. st. Warszawy. Wydział Planowania Przestrzennego i Architektury, Warszawa, s. 22-70.
Bogdanowicz E., Fal B., Dobrzyńska I., 2000, Charakterystyki hydrologiczne, [w:] Wisła w Warszawie (red. J. Lickiewicz, Pawlak J, W. Pietrusiewicz), Biuro Zarządu m. st. Warszawy. Wydział Planowania Przestrzennego i Architektury, Warszawa, s. 9-13.
Fal B., Dąbrowski P., 2001a, Dwieście lat obserwacji i pomiarów hydrologicznych Wisły w Warszawie. Część I Obserwacje stanów wody, [w:] Gospodarka Wodna, nr 11, s. 461-467.
Fal B., Dąbrowski P., 2001b, Dwieście lat obserwacji i pomiarów hydrologicznych Wisły w Warszawie. Część II Przepływy Wisły w Warszawie, [w:] Gospodarka Wodna, nr 12, s. 503-510.
Gutry-Korycka M., 2010, Katastrofalne powodzie Wisły poniżej Warszawy w zarysie historycznym, [w:] Hydrologia w ochronie i kształtowaniu środowiska (red. A. Magnuszewski), t. 2, Polska Akademia Nauk. Komitet Inżynierii Środowiska, Monografie, 69, s. 93-108.
Jacewicz A., 2000, Ocena i propozycja zabudowy hydrotechnicznej koryta Wisły, [w:] Wisła w Warszawie (red. J. Lickiewicz, Pawlak J, W. Pietrusiewicz), Biuro Zarządu m. st. Warszawy. Wydział Planowania Przestrzennego i Architektury, Warszawa, s. 154-171.
Jankowski W., Stolarska A., 1978, Ocena możliwości wystąpienia tysiącletniej fali powodziowej w rejonie Warszawy, Materiały Badawcze. Seria: Hydrologia i Oceanologia, IMGW, Warszawa.
Kornacki Z., 1960, Przyczyny obniżania się dna Wisły w Warszawie, [w:] Gospodarka Wodna, nr 7, s. 305-307.
Kowalski H., Kuźniar P., Magnuszewski A., 2013, Najniższe stany wody Wisły w Warszawie i podwodne odkrycia archeologiczne, [w:] Gospodarka Wodna, nr 1, s. 25-30.
Kuźniar P., 1997, Woda 500-letnia w Warszawie w świetle materiałów historycznych i symulacji komputerowych, [w:] Forum Naukowo-Techniczne - Powódź 1997, IMGW, Warszawa, s. 143-155.
Kuźniar P., Magnuszewski A., 2010, Przepływ wód wielkich Wisły w Warszawie - rekonstrukcja powodzi historycznych, [w:] Hydrologia w ochronie i kształtowaniu środowiska (red. A. Magnuszewski), t. 2, Polska Akademia Nauk. Komitet Inżynierii Środowiska, Monografie, 69, s. 109-118.
Magnuszewski A., Gutry-Korycka M., Mikulski Z., 2012, Historyczne i współczesne warunki przepływu wód wielkich Wisły w Warszawie. Część I, [w] Gospodarka Wodna, 1, s. 9-18.
Magnuszewski A., Moran S., 2015, Vistula River bed erosion processes and their influence on Warsaw’s flood safety, [w:] Proceedings of IAHS, 367, s. 147-154.
Pomianowski K., 1938, W sprawie jazu kanalizacyjnego na Wiśle pod Bielanami w Warszawie, [w:] Gospodarka Wodna, nr 4, s. 179-183.
Siebauer S., 1947, Charakterystyczne stany wody i objętości przepływu w przekrojach wodowskazowych rzeki Wisły, [w:] Wiadomości Służby Hydrologiczno-Meteorologicznej, 1, s. 24-36.
Szymański E., 1897, Roboty regulacyjne na rzece Wiśle pod Warszawą od r. 1885 do 1895, według pracy inż. L. Kwicińskiego, [w:] Przegląd Techniczny, 35, s. 11-17.
Wierzbicki J., 2001, Stałość pionowego układu koryta Wisły oraz położenia zwierciadła wód małych i wielkich na odcinku miejskim w Warszawie, [w:] Gospodarka Wodna, nr 4, s. 143-149.
Zielińska M., 1960, Zmiany niwelety dna Wisły w Warszawie na tle zmian profilu podłużnego Środkowej Wisły, [w:] Gospodarka Wodna, nr 11, s. 477-480.
Data sets
Daily Water Levels of the Vistula River at Warsaw, 1799–2024: A Complete and Homogenized Long-Term Record Ł. Sobechowicz et al. https://doi.org/10.5281/zenodo.16919654
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 668 | 312 | 44 | 1,024 | 110 | 45 | 40 |
- HTML: 668
- PDF: 312
- XML: 44
- Total: 1,024
- Supplement: 110
- BibTeX: 45
- EndNote: 40
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
The authors have done an impressive job of creating a 225-year dataset of daily water levels in Warsaw. Such a long data record is crucial for climate research, hydrological modelling, and flood risk management. They have exhaustively included data from publications and yearbooks (somewhere in different units and different locations) prior to 1981. Nevertheless, there are some serious technical concerns that need to be addressed before publication. They are as follows,
The minor comments are as follows,
Reference
Ren, H., Cromwell, E., Kravitz, B., and Chen, X.: Technical note: Using long short-term memory models to fill data gaps in hydrological monitoring networks, Hydrol. Earth Syst. Sci., 26, 1727–1743, https://doi.org/10.5194/hess-26-1727-2022, 2022.