the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
CAMELS-DE: hydro-meteorological time series and attributes for 1555 catchments in Germany
Abstract. Comprehensive large sample hydrological datasets, particularly the CAMELS datasets (Catchment Attributes and Meteorology for Large-sample Studies), have advanced hydrological research and education in recent years. These datasets integrate extensive hydrometeorological observations with landscape features, such as geology and land use, across numerous catchments within a national framework. They provide harmonised large sample data for various purposes, such as assessing the impacts of climate change or testing hydrological models on a large number of catchments. Furthermore, these datasets are essential for the rapid progress of data-driven models in hydrology in recent years. Despite Germany's extensive hydrometeorological measurement infrastructure, it has lacked a consistent, nationwide hydrological dataset, largely due to its decentralised management across different federal states. This fragmentation has hindered cross-state studies and made the preparation of hydrological data labour-intensive. The introduction of CAMELS-DE represents a step forward in bridging this gap. CAMELS-DE includes 1555 streamflow gauges with hydro-meteorological time series data covering up to 70 years (median length of 46 years and a minimum length of 10 years), from January 1951 to December 2020. It includes consistent catchment boundaries with areas ranging from 5 to 15,000 km2 along with detailed catchment attributes covering soil, land cover, hydrogeologic properties and data about human influences. Furthermore, it includes a regionally trained Long-Short Term Memory (LSTM) network and a locally trained conceptual model that were used as quality control and that can be used to fill gaps in discharge data or act as baseline models for the development and testing of new hydrological models. Given the large number of catchments, including numerous relatively small ones (617 catchments < 100 km2), and the time series length of up to 70 years (156 catchments), CAMELS-DE is one of the most comprehensive national CAMELS datasets available and offers new opportunities for research, particularly in studying long-term trends, runoff formation in small catchments and in analysing catchments with strong human influences.
- Preprint
(3824 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on essd-2024-318', Anonymous Referee #1, 26 Aug 2024
CAMELS datasets have been developed in several countries and regions and have become widely utilized in hydrological research. This manuscript introduces a new CMALES dataset, which includes hydrometeorological time series and attributes for 1,555 catchments in Germany. The manuscript is well-organized, with clear and comprehensive data descriptions. I have a few minor comments for the authors:
-
I suggest improving the quality of the figures, as Figure 1 and Figure 5 are difficult to read. Instead of using a continuous color bar, the authors might consider using a discrete or categorized color bar to better distinguish spatial differences in values.
-
In line 387, the phrase "In terms of model training" could be more accurately described as "calibration of the lumped hydrological model." The term "calibration" is clearer when referring to hydrological models, rather than "training."
Citation: https://doi.org/10.5194/essd-2024-318-RC1 -
AC1: 'Reply on RC1', Ralf Loritz, 27 Sep 2024
Dear Reviewer,
Thank you very much for your positive evaluation of our manuscript and data set. Following your suggestions, we have improved the quality of Figures 1 and 5.
Best regards,
Ralf Loritz (on behalf of all authors)Citation: https://doi.org/10.5194/essd-2024-318-AC1
-
-
RC2: 'Comment on essd-2024-318', Juliane Mai, 30 Aug 2024
Review of
“CAMELS-DE: hydro-meteorological time series and attributes for 1555 catchments in Germany”
by Loritz et al. (ESSD)
This manuscript introduces a CAMELS dataset compiled for Germany including more than 1500 catchments their attributes and relevant hydro-meteorological time series.
Congratulations to the authors for compiling this impressive new dataset and for the meticulous documentation provided in the well-structured and well-written manuscript. The dataset represents a significant contribution to the field, and the thoroughness with which it has been derived and documented greatly enhances its usability and impact. The manuscript was a pleasure to read, showcasing the dataset's potential and the care taken in its development. I have a few minor comments; none of them are critical. Overall, this work is a commendable achievement and a valuable resource for researchers especially with their research focus being Germany.
Best regards,
Juliane Mai
Minor:
- L68: “a local conceptual hydrological model” —> at this point it could maybe be named? The LSTM is explicitly named. Also, probably add a citation for LSTM too.
- L102: Maybe finish this paragraph with stating that all these datasets are available through open access (right?) to avoid the need to check all references.
- L115: I think 5 km2 is fine. But if 5x5 km2 of meteorological forcings is the reasoning, shouldn’t the minimum area be set to 25 km2?
- L155: “Compared to other CAMELS datasets, CAMELS-DE includes a large number of relatively small catchments with an area of less than 100 km2 (i.e. 617 catchments).” —> Since you mention other CAMELS datasets, I think it would require mentioning at least 1-2 other datasets and list the number of basins less than 100 km2 they contain.
- L158: “They tend to be higher in regions with minimal topography …” —> That totally makes sense. A map with DEM and markers at each location indicating errors might be nice to support this statement. Just an idea. I leave it up to the authors to include this as panel 2c.
- L172-173: I found this sentence confusing. I would just name the three datasets components as (A) observed hydrologic time series (i.e., station discharge and water levels), (B) observed (even though these are probably interpolated and hence not really observed…) meteorologic time series (i.e., precipitations, temperature, humidity, and radiation), and (C) simulated hydro-meteorologic time series (i.e., discharge simulated by LSTM and conceptual model, and derived PET). It would be easier to follow what you are talking about after.
- L179-195: I am not familiar with the Hyras dataset. Hence, it is difficult to understand whether this dataset is a station dataset or already interpolated. Is the interpolation done by the authors? A clear statement right at the beginning of this paragraph would be much appreciated.
- Line 212: Is HYRAS-DE-TAS the mean daily temperature? Please state this clearly.
- Line 214: “mean, median, and standard deviation of temperature” refers to spatial mean, median, and std-dev, right? I think the paragraphs about the meteorological forcings would benefit from being always clear about something being a temporal or a spatial aggregation etc. The table 3 really helped me to understand what is what because the description is crystal clear there.
- Line 238: “ranges” —> “range”
- Line 250: “not-a-number (NaN)”
- Line 252-256: This part does not fit under the section “Discharge and water levels”. One idea would be to move lines 245 starting with “The quality control…” to line 256 to a separate sedation 4.5 called “Quality control”.
- Line 296: “the day on which the” —> “the number of days after which the”?
- Section 5.3: It would be nice to state the spatial resolution of the dataset. Is it only available for 2018? How easy would it be to get the same statistics but for a different year (given its available)?
- Line 339: “Germany” —> “Germany”
- Line 351: I would only use singular in the caption as it is one of each and not multiple, right? Same for line 377.
- Line 361-362: Why did you choose some periods as water years and some as calendar years or a mix of both?
- Line 378: This is the first time the conceptual model is revealed while it is clear all along that the data-driven model is an LSTM. I would either always say data-driven/conceptual or LSTM/SHM throughout. But don’t do LSTM/conceptual… (e.g., lines 40, 68, and 475).
- Line 390: “the LSTM the SHM” —> “the LSTM, the SHM”
- Table 1: Make sure first row is called “observed” to make difference to third entry which is “simulated” clearer. In the description I would avoid stating gridded resolutions for the metro data. It is confusing. Maybe say, e.g., original resolution of 5x5 km2.
- Figure 3a: I think a slightly thinner line for the catchment boundary would show better which grid cells of the precipitation product were extracted and could be used to support your description of the handling of partially contributing cells in lines 190-192.
- Figure 3b: “Possible gaps in the data are not taken into account in the time series length; in this case, the time series length is the number of years from the first available value to the last available value of a station.” —> Does this mean a station that has one datapoint available in 1950 and nothing else besides one datapoint in 2019 would have a time series length of 70 years? Wouldn’t it be much more helpful to have the x-axis as “available datapoints between 1951 to 2020”? The example above would then be listed with exactly 2 datapoints instead of ~70*365.
- Figure 6: I would slightly arrange the figure such that all text in the green, blue, and yellow boxes is readable and not hidden behind another box.
Citation: https://doi.org/10.5194/essd-2024-318-RC2 -
AC2: 'Reply on RC2', Ralf Loritz, 27 Sep 2024
Dear Juliane Mai,
Thank you very much for your positive evaluation of our manuscript and data set. Following your suggestions, we have improved the quality of several figures and enhanced the description of the dataset in the manuscript accordingly.
Thank you again for your review.
Best regards,
Ralf Loritz (on behalf of all authors)Citation: https://doi.org/10.5194/essd-2024-318-AC2
Status: closed
-
RC1: 'Comment on essd-2024-318', Anonymous Referee #1, 26 Aug 2024
CAMELS datasets have been developed in several countries and regions and have become widely utilized in hydrological research. This manuscript introduces a new CMALES dataset, which includes hydrometeorological time series and attributes for 1,555 catchments in Germany. The manuscript is well-organized, with clear and comprehensive data descriptions. I have a few minor comments for the authors:
-
I suggest improving the quality of the figures, as Figure 1 and Figure 5 are difficult to read. Instead of using a continuous color bar, the authors might consider using a discrete or categorized color bar to better distinguish spatial differences in values.
-
In line 387, the phrase "In terms of model training" could be more accurately described as "calibration of the lumped hydrological model." The term "calibration" is clearer when referring to hydrological models, rather than "training."
Citation: https://doi.org/10.5194/essd-2024-318-RC1 -
AC1: 'Reply on RC1', Ralf Loritz, 27 Sep 2024
Dear Reviewer,
Thank you very much for your positive evaluation of our manuscript and data set. Following your suggestions, we have improved the quality of Figures 1 and 5.
Best regards,
Ralf Loritz (on behalf of all authors)Citation: https://doi.org/10.5194/essd-2024-318-AC1
-
-
RC2: 'Comment on essd-2024-318', Juliane Mai, 30 Aug 2024
Review of
“CAMELS-DE: hydro-meteorological time series and attributes for 1555 catchments in Germany”
by Loritz et al. (ESSD)
This manuscript introduces a CAMELS dataset compiled for Germany including more than 1500 catchments their attributes and relevant hydro-meteorological time series.
Congratulations to the authors for compiling this impressive new dataset and for the meticulous documentation provided in the well-structured and well-written manuscript. The dataset represents a significant contribution to the field, and the thoroughness with which it has been derived and documented greatly enhances its usability and impact. The manuscript was a pleasure to read, showcasing the dataset's potential and the care taken in its development. I have a few minor comments; none of them are critical. Overall, this work is a commendable achievement and a valuable resource for researchers especially with their research focus being Germany.
Best regards,
Juliane Mai
Minor:
- L68: “a local conceptual hydrological model” —> at this point it could maybe be named? The LSTM is explicitly named. Also, probably add a citation for LSTM too.
- L102: Maybe finish this paragraph with stating that all these datasets are available through open access (right?) to avoid the need to check all references.
- L115: I think 5 km2 is fine. But if 5x5 km2 of meteorological forcings is the reasoning, shouldn’t the minimum area be set to 25 km2?
- L155: “Compared to other CAMELS datasets, CAMELS-DE includes a large number of relatively small catchments with an area of less than 100 km2 (i.e. 617 catchments).” —> Since you mention other CAMELS datasets, I think it would require mentioning at least 1-2 other datasets and list the number of basins less than 100 km2 they contain.
- L158: “They tend to be higher in regions with minimal topography …” —> That totally makes sense. A map with DEM and markers at each location indicating errors might be nice to support this statement. Just an idea. I leave it up to the authors to include this as panel 2c.
- L172-173: I found this sentence confusing. I would just name the three datasets components as (A) observed hydrologic time series (i.e., station discharge and water levels), (B) observed (even though these are probably interpolated and hence not really observed…) meteorologic time series (i.e., precipitations, temperature, humidity, and radiation), and (C) simulated hydro-meteorologic time series (i.e., discharge simulated by LSTM and conceptual model, and derived PET). It would be easier to follow what you are talking about after.
- L179-195: I am not familiar with the Hyras dataset. Hence, it is difficult to understand whether this dataset is a station dataset or already interpolated. Is the interpolation done by the authors? A clear statement right at the beginning of this paragraph would be much appreciated.
- Line 212: Is HYRAS-DE-TAS the mean daily temperature? Please state this clearly.
- Line 214: “mean, median, and standard deviation of temperature” refers to spatial mean, median, and std-dev, right? I think the paragraphs about the meteorological forcings would benefit from being always clear about something being a temporal or a spatial aggregation etc. The table 3 really helped me to understand what is what because the description is crystal clear there.
- Line 238: “ranges” —> “range”
- Line 250: “not-a-number (NaN)”
- Line 252-256: This part does not fit under the section “Discharge and water levels”. One idea would be to move lines 245 starting with “The quality control…” to line 256 to a separate sedation 4.5 called “Quality control”.
- Line 296: “the day on which the” —> “the number of days after which the”?
- Section 5.3: It would be nice to state the spatial resolution of the dataset. Is it only available for 2018? How easy would it be to get the same statistics but for a different year (given its available)?
- Line 339: “Germany” —> “Germany”
- Line 351: I would only use singular in the caption as it is one of each and not multiple, right? Same for line 377.
- Line 361-362: Why did you choose some periods as water years and some as calendar years or a mix of both?
- Line 378: This is the first time the conceptual model is revealed while it is clear all along that the data-driven model is an LSTM. I would either always say data-driven/conceptual or LSTM/SHM throughout. But don’t do LSTM/conceptual… (e.g., lines 40, 68, and 475).
- Line 390: “the LSTM the SHM” —> “the LSTM, the SHM”
- Table 1: Make sure first row is called “observed” to make difference to third entry which is “simulated” clearer. In the description I would avoid stating gridded resolutions for the metro data. It is confusing. Maybe say, e.g., original resolution of 5x5 km2.
- Figure 3a: I think a slightly thinner line for the catchment boundary would show better which grid cells of the precipitation product were extracted and could be used to support your description of the handling of partially contributing cells in lines 190-192.
- Figure 3b: “Possible gaps in the data are not taken into account in the time series length; in this case, the time series length is the number of years from the first available value to the last available value of a station.” —> Does this mean a station that has one datapoint available in 1950 and nothing else besides one datapoint in 2019 would have a time series length of 70 years? Wouldn’t it be much more helpful to have the x-axis as “available datapoints between 1951 to 2020”? The example above would then be listed with exactly 2 datapoints instead of ~70*365.
- Figure 6: I would slightly arrange the figure such that all text in the green, blue, and yellow boxes is readable and not hidden behind another box.
Citation: https://doi.org/10.5194/essd-2024-318-RC2 -
AC2: 'Reply on RC2', Ralf Loritz, 27 Sep 2024
Dear Juliane Mai,
Thank you very much for your positive evaluation of our manuscript and data set. Following your suggestions, we have improved the quality of several figures and enhanced the description of the dataset in the manuscript accordingly.
Thank you again for your review.
Best regards,
Ralf Loritz (on behalf of all authors)Citation: https://doi.org/10.5194/essd-2024-318-AC2
Data sets
CAMELS-DE: hydrometeorological time series and attributes for 1555 catchments in Germany A. Dolich, E. A. Espinoza, P. Ebeling, B. Guse, J. Götte, S. Hassler, C. Hauffe, J. Kiesel, I. Heidbüchel, M. Mälicke, H. Müller-Thomy, M. Stölzle, L. Tarasova, and R. Loritz https://doi.org/10.5281/zenodo.12733968
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
904 | 247 | 137 | 1,288 | 18 | 14 |
- HTML: 904
- PDF: 247
- XML: 137
- Total: 1,288
- BibTeX: 18
- EndNote: 14
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1