the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Reconstructing Nineteenth-Century River Water Levels with Transformer-Based Computer Vision
Abstract. We convert nineteenth-century Bavarian Danube gauge charts (1826–1894) into daily water-level series referenced to gauge zero through a novel semi-automated workflow combining light document pre-processing, dewarping, transformer-based line extraction, pixel-to-curve calibration, and targeted human checks. A curated ground-truth sample supported benchmarking and uncertainty quantification. Across three representative gauges (Neu-Ulm, Vilshofen, Passau), the pipeline attains high series-level accuracy (mean composite score 0.979) while reducing manual effort by roughly an order of magnitude relative to full manual digitisation. Outputs include versioned datasets with page-level provenance, confidence scores, and methodological descriptors to ensure transparency and reuse. The approach offers a replicable template for rescuing analogue hydrometric records and enabling long-term analyses of extremes, regulation impacts, and ecological context. Data are openly available under CC BY 4.0 (Rehbein (2025); DOI: https://doi.org/10.5281/zenodo.17296750).
- Preprint
(10687 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 16 Feb 2026)
-
RC1: 'Comment on essd-2025-633', Anonymous Referee #1, 09 Jan 2026
reply
-
AC1: 'Reply on RC1', Malte Rehbein, 17 Jan 2026
reply
Thank you very much for your valuable and competent comments. I appreciate the time you invested in helping to improve this research.
1. Could you provide technical information about the image quality required for the proposed DARE methodology? Is there a minimum resolution (DPI) required? I only found some generalities on this matter. For instance, in lines 238-239 you can read: “Typical workflows comprise high-resolution scanning”.
The State Archives where I got the images from provide scans with a resolution of minimum 300 ppi (measured against the original) which is a standard in cultural heritage digitisation, for instance as defined by the German Research Council (DFG) and many others. I will add this information to the manuscript. However, this is not to say that this resolution is a requirement for the workflow to be effective. I would put more importance on flattening the surface and a uniform and adequate lighting.2. In connection to previous comment, I wonder if you worked (or plan to work) with photographs rather than scans. I’m familiar with rescuing meteorological data where it is advised to photograph large amounts of documents because this imaging procedure is many times faster than scanning (see Section 2.4.2. of Wilkinson, 2019). You developed an efficient way of obtaining machine-readable data from graphs and it would also make sense to accompany it with an efficient imaging strategy. Although this imaging issue does not matter much if all the Danube water level charts were already scanned with high resolution.
You are right, I was too sloppy with the word "scanning." Typically, to my knowledge, the archive is indeed using a system with a one-shot DSLR camera mounted on what we call "Reprostativ" with external lighting for this kind of material. In my own lab, we additionally use scanners with feeders for smaller sizes, also for historical material, which is a huge step forward in efficiency. However, the State Archive would not allow to use such a system due to conservation concerns (which I do not share). We also experimented with a scan tent (with mounted smartphone camera) and with simple smartphone camera shots. Due to the size of the material, both did not deliver sufficient quality for the processing, with the achieved resolution being a lesser problem than inconsistent lighting and warping.
Overall, I suggest to rephrase l. 237 to: "Typical workflows comprise creating high quality images from the originals (normally digital photographs 300ppi resolution as standard, flattened surface, orthogonal alignment, and adequate uniform lighting)."
I do not see a comprehensive imaging strategy within the scope of this paper. There are too many parameters to be discussed systematically, including questions of material, size, cost-benefit, and restriction laid out by the owner of the materials. But I may add a reference to the strategies typically employed in cultural heritage digitisation projects.
3. In the second paragraph of Section 3.1 you can also mention the methods developed to extract subdaily data from strip chats of meteorological instruments such as thermographs and barographs (e.g., Sušin and Peer, 2018).
Thank you very much for this. I will do some more research here and add the references.
4. In section 3.2.1 entitled “Americas” you can cite recent works that rescued long Paraná River hydrometric records, which start in 1875. Indeed Antico et al. (2018) manually digitized daily water levels from a hand-drawn chart similar to the one shown in the upper left panel of Fig. 5 of the revised manuscript. More recently Antico et al. (2020) found the tabulated version of these data and compared tabulated values with those digitized from the chart.
Thanks a lot also for these that I missed. I will add them. Tabulated data and its alignment with charts is also on my to-do-list for some further work, and it is good to have a near reference now. Tabulated historical data is currently widely discussed thanks to the progress made in automated reading of those by multimodal LLMs.
5. Did you consider using documentary sources (e.g., newspapers) or metadata provided by the charts to correct time misalignments of positive and negative peaks (floods and drought)? This could be a useful correction, as knowing the exact date of these peaks is important for many studies.
6. Similarly, documentary sources may inform the exact river levels attained during these peaks. That is, these sources may serve to correct these levels.This is a great suggestion! I would like to think about it but as a follow-up project. Thanks to modern OCR and LLMs and progresses in Natural Language Processing, obtaining such information from newspapers on large-scale appears feasible. However, at least in Germany, the state of digitisation of local newspapers from the 19th century is not what it could and should be (incomplete, some kind low quality, de-centralised and difficult to access). I will investigate into Passau (gauge 30) which is my home town with easy access to the municipal archive, but rather leave it for a follow-up.
Thank you also for your minor comments which I will implement!
Malte
Citation: https://doi.org/10.5194/essd-2025-633-AC1
-
AC1: 'Reply on RC1', Malte Rehbein, 17 Jan 2026
reply
Data sets
Bavarian Danube Water Level Reconstruction (1826-1894) Creators Malte Rehbein https://doi.org/10.5281/ZENODO.17296750
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 282 | 170 | 21 | 473 | 21 | 16 |
- HTML: 282
- PDF: 170
- XML: 21
- Total: 473
- BibTeX: 21
- EndNote: 16
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Please see the attached PDF file.