the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
GRDC-Caravan: extending Caravan with data from the Global Runoff Data Centre
Abstract. Large-sample datasets are essential in hydrological science to support modelling studies and advance process understanding. However, these datasets often lack standardization, which impedes their combination. Caravan is a community initiative to create a large-sample hydrology dataset of meteorological forcing data, catchment attributes, and discharge data for catchments around the world. Compared to existing large-sample hydrology datasets, the focus of Caravan is to use globally consistent forcing and attribute data to facilitate global studies. Caravan is a community project designed to be expanded by members of the hydrological community using a common cloud-based framework. This dataset is currently the 6th extension to Caravan, based on a subset of hydrological discharge data and station-based watersheds from the Global Runoff Data Centre (GRDC), which are covered by an open data policy. The GRDC is an international data centre operating under the auspices of the World Meteorological Organization (WMO), which collects quality-controlled river discharge data and associated metadata from the National Meteorological and Hydrological Services (NMHS) of WMO Member States. The dataset covers stations from 5,356 catchments and 25 countries, spans the years 1950–2023. This takes the total number of Caravan catchments (core dataset plus extensions) to 22,372 (1589 catchments accounting for duplicates). This extension is released under a CC-BY-4.0 license that allows redistribution and is publicly available on Zenodo: https://zenodo.org/records/14006282 (Färber et al., 2024). We encourage additional NMHS to make their data available under open licenses so that it can be included in future versions of the extension.
- Preprint
(1161 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 27 Dec 2024)
-
RC1: 'Comment on essd-2024-427', Thiago Nascimento, 21 Nov 2024
reply
Referee report for: GRDC-Caravan: Extending Caravan with Data from the Global Runoff Data CentreGeneral Assessment
This manuscript introduces the GRDC-Caravan dataset, which extends the Caravan hydrology dataset by incorporating discharge data from the Global Runoff Data Centre (GRDC). By expanding spatial and temporal coverage to over 5,000 stations in 25 countries, it’s clear that this work makes an important contribution to large-sample hydrology (LSH). The dataset is particularly valuable for underrepresented regions, providing a solid foundation for global hydrological research while aligning with open science principles. Personally I feel impresed by both the valuable job of GRDC as institution and the intitiative of the team to further contribute with this Caravan extension. I am sure thart this will provide a powerfull new dataset for the hydrology community. While the manuscript is undeniably significant and deserves publication, there are a few minor areas that need improvement to meet the standards of Earth System Science Data (ESSD). Addressing these issues will help improve clarity, structure, and reproducibility. With these adjustments I believe that the paper will be ready for the journal.
Suggestions for improvement
- Manuscript Structure:
The manuscript lacks a clear distinction between methods and results, making it difficult to follow.
- Please consider spliting the "Dataset Description" section into two separate sections: "Methodology" and "Dataset Description."
- Figures and Captions
I have the impression that while the figures are visually informative, they lack context and detailed descriptions.
- Improve captions to explain the figures' relevance to the dataset's purpose
- Provide more discussion of the figures in the main text to guide readers.
- Comparisons with Existing Datasets
I feel that the paper presents a limited discussion of how GRDC-Caravan compares with other LSH datasets.
- Add a detailed overview of LSH releases. Where did it start? MOPEX? CAMELS? Which are the main datasets currently available? Which countries are covered? How did Caravan arrive to existence? What about other datasets and similar initiatives? GSIM? How is the situation of LSH datasets by the end of 2024?
- Include a comparative analysis highlighting GRDC-Caravan’s unique contributions (e.g., coverage in underrepresented regions, temporal/spatial resolution, data quality). Remember that you should sell the idea of why GRDC-Caravan matters.
- Data Records and Repository Description
I feel that there is still insufficient details about the organization of the Zenodo repository and dataset files.
- Describe the repository structure, including folders and file organization. Of course, a further description of the data included, and units is already provided by Caravan, but for users, it is always beneficial to know what to expect after the download.
- L245: Specify the version number of the dataset being referred to on Zenodo. Pay attention to always place the specific version being described in the paper, not the general DOI for all versions.
- Reproducibility and Code Availability
The authors state that the GRDCFlowTools R package is only available upon request, limiting reproducibility.
- If possible, publish the package on a public repository (e.g., GitHub or CRAN).
- Include usage examples to facilitate adoption and reproducibility.
- Duplicates
While the authors acknowledge, the duplicated catchments are not explicitly labeled in the dataset.
- If possible, consider adding a column that identifies duplicated catchments in the dataset.
- Dataset Analysis and Validation
This paper already covers a clear description of the dataset being published. But to be fair to other similar dataset papers published in ESSD, it indeed lacks a bit of a overview and preliminary analysis of such dataset, which could also act as a technical validation of what is being published. While I see that the GRDC makes available data which has been checked by the NMHS, I think that future users could benefit from a general overview, which could also act as a validation. Personally, I do not see the need to perform any modelling analysis with the data as previous artiodactyls papers have done.
- Include plots showing spatial distributions of key streamflow signatures and climatic indices (e.g., Q mean, BFI, aridity).
- Try to include a creative plot showing an overview of this new discharge data for users. Why not some small boxplots of the Q mean covered among the GEnS climatic zones? Something inspired in Figure 4 by Marvin Höge et al. 2024 in CAMELS_CH? Take this as a suggestion (inspiration), not a requirement.
- Validate catchment boundaries with visual aids or references to prior successful applications of MERIT Hydro and delineator.py.
- Outlook Section
The outlook could be more concise and impactful.
- Consider Emphasizing more the televance of GRDC-Caravan for global hydrology. Describe the need for new Caravan extensions, and the Importance of open data policies from NMHS.
- Streamline the section to improve rhythm and flow.
Specific points:
General:
Some sentences are long and complex, making them hard to follow. Please while reviewing the manuscript make sure to go through such points. For example:
L 171: Consider rephrasing "After the calculation of all potential pour points was completed, a rating score R was calculated...” to: "Once all potential pour points were calculated, a rating score (R) was assigned based on the following formula."
Introduction
L33–34: Please consider to rephrase "is the primary objective of the principal reason" to something in the lines of "The primary objective of the Global Runoff Data Centre (GRDC) is to (…)."
Dataset description
L115: Acknowledge that Figure 3 adapts elements from the original Caravan publication. Add a note in the caption, e.g., "This figure is adapted from the original Caravan publication (Kratzert et al., 2023)."
Citation: https://doi.org/10.5194/essd-2024-427-RC1
Data sets
GRDC-Caravan: extending the original dataset with data from the Global Runoff Data Centre Claudia Färber, Henning Plessow, Frederik Kratzert, Nans Addor, Guy Shalev, and Ulrich Looser https://zenodo.org/records/14006282
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
145 | 26 | 6 | 177 | 4 | 4 |
- HTML: 145
- PDF: 26
- XML: 6
- Total: 177
- BibTeX: 4
- EndNote: 4
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1