the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
The use of GRDC gauging stations for calibrating large-scale hydrological models
Mikhail Smilovic
Abstract. The Global Runoff Data Centre provides time series of observed discharges that are valuable for calibrating and validating the results of hydrological models. We address a common issue in large-scale hydrology that has not been satisfactorily solved, though investigated several times. To compare simulated and observed discharge, grid-based hydrological models must fit reported station locations to the resolution-dependent gridded river network. We introduce an Intersection over Union ratio approach to selected station locations on a coarser grid-scale, reducing the errors in assigning stations to the correct upstream basin. We update the 10-year-old database of watershed boundaries with additional stations based on a high-resolution (3 arc seconds) river network and provide source codes and high- and low-resolution watershed boundaries. The dataset is stored on Zenodo with the associated DOI https://doi.org/10.5281/zenodo.6906577.
- Preprint
(1515 KB) - Metadata XML
- BibTeX
- EndNote
Peter Burek and Mikhail Smilovic
Status: closed
-
RC1: 'Comment on essd-2022-231', Anonymous Referee #1, 05 Oct 2022
General comments
This manuscript describes the new location data of GRDC stations for calibrating large-scale hydrological models. The calibration of these models relies on the accuracy of stations allocated on the gridded river network of different spatial resolutions. The data is developed based on the idea of the ‘intersection over union ratio approach’, which quantifies the similarity of the watershed shapes between low- and high-resolution gridded watersheds. The new dataset's accuracy was reasonably good compared to the previous versions; thus, the dataset attracts interest in hydrological communities and is worth publishing. Although the estimated precision is promising, however, I think the manuscript does not contain an adequate description as a data paper. Given the location of GRDC stations is a widely used essential information in hydrology and earth system science studies, I think the manuscript is worth publishing on ESSD, after corrections on a few ambiguous parts.
Specific comments:
(1) L 89. Is “Upstream area error” used in the subsequent manuscript?
(2) L93-94. The unit of the distance was not described here. The values of OC calculated with the described equation differed depending on the unit of the distance used. Because the values of (1-'area accordance') are always less than one, the OC values are predominantly determined by the distance if the unit is in kilometers or meters (usually larger than 1). At least, the second term of the equation should be normalized to the range from zero to one. Also, the authors should explain how the weighting of the second term, 2, was determined.
(3) L 141=143: The choice of the weighting factor for calculating ED should be explained.
(4) L 144: Figures 1 (and Figure 2 as well) are not appropriate for the examples describing the automatic upscaling process based on the similarity of shape, because it seems that only the 'area accordance' suffice for the selection of station 7. I would suggest the authors select a more appropriate example. I think Figure 7 worth explains the problem of mismatch in the upscaled stations and how they are solved with the proposed procedures. Upon revising, please consider the necessity of Figure 2 as it was not referred to in the manuscript.
Technical corrections:
(1) L18-21: I think this part is not relevant to the main context. Suggest deleting.
(2) L 92: Delete “)”.
(3) L 133: The number of cells for each resolution does not consistent with those described in line 303. I think this line should be corrected.
(4) Figure 3 was not referred to in the manuscript.
Citation: https://doi.org/10.5194/essd-2022-231-RC1 -
AC1: 'Reply on RC1', Peter Burek, 22 Mar 2023
Dear reviewer,
Thank you for reviewing our paper, your constructive comments and your attention to details. We appreciate your voluntary effort and we revised the manuscript according to your comments. The comments have been addressed as following:Specific comments:
Q: (1) L 89. Is “Upstream area error” used in the subsequent manuscript?
Reply: We used both upstream area error and upstream accordance in the paper, which might lead to confusion. We removed the term “Upstream area error” and replaced it with upstream area accordance. We also removed the term similarity index and replaced it with “Intersection over Union ratio”. Both indices score low with 0 and high with 1
Q: (2) L93-94. The unit of the distance was not described here. The values of OC calculated with the described equation differed depending on the unit of the distance used. Because the values of (1-'area accordance') are always less than one, the OC values are predominantly determined by the distance if the unit is in kilometers or meters (usually larger than 1). At least, the second term of the equation should be normalized to the range from zero to one. Also, the authors should explain how the weighting of the second term, 2, was determined.
Reply: Thank you for paying attention on this. We followed the approach of Lehner (2012), but we did not describe his method properly. The two terms are normalized to fit together, and the equations and the weighting are from Lehner (2012), too. We point this out.
We changed the description in L90f:
- A rectangular search radius of 165 arcsec (~5 km) for each station was defined.
- For each grid in this rectangle, the upstream drainage area (UPA) from the network from Yamazaki et al. (2019) was compared to the area reported in the GRDC, and the upstream area accordance is computed:
- Upstream area accordance = GRDC reported UPA / gridded network UPA
(where: GRDC reported UPA < gridded network UPA)
Upstream area accordance = gridded network UPA / GRDC reported UPA
(where: GRDC reported UPA ≥ gridded network UPA)
- Upstream area accordance = GRDC reported UPA / gridded network UPA
- All cells with an upstream area accordance of less than 50% were dismissed from further evaluation.
- A first ranking scheme – area discrepancy (RA) - was calculated with values between 0 (best fit) to 50: RA = 100 - Upstream area accordance[%]
- For the second ranking scheme – distance (RD) – the distance of the cell to the reported station location in the GRDC database was calculated and normalized to get the value 0 at the station location and 50 in 5 km distance.
- An objective criterion (OC) for ranking was computed by OC = RA + 2 * RD. The equation and weighting were taken from Lehner (2012).
- The grid cell with the lowest OC value was taken as the corresponding grid cell for the station location on a high-resolution network
- If no station location was found in this step, the search radius was increased to 5’ (~10 km), OC was calculated as OC = RA + RD, and the lowest OC value was taken as the corresponding grid cell.
Q: (3) L 141=143: The choice of the weighting factor for calculating ED should be explained.
Reply: The objective criterion Upstream area accordance range from ]0,1], and the Intersection over Union ratio criterion has a range of [0,1]. Therefore we decided to give a weighting factor of 1 to each. We added the text:
"Both objective criteria have a range between 0 and 1. Therefore, we decided to use a weighting factor of 1 for both criteria."Q:(4) L 144: Figures 1 (and Figure 2 as well) are not appropriate for the examples describing the automatic upscaling process based on the similarity of shape, because it seems that only the 'area accordance' suffice for the selection of station 7. I would suggest the authors select a more appropriate example. I think Figure 7 worth explains the problem of mismatch in the upscaled stations and how they are solved with the proposed procedures. Upon revising, please consider the necessity of Figure 2 as it was not referred to in the manuscript.
Reply: Yes we agree that for Passau/Inn the area accordance would be enough, but it is a good example that a) the cell with the station is not the most appropriated one b) moving 1 cell away from the station shows very different basins. We think it is fine to illustrate the method. In the result part (figure 7) we show 2 examples where Intersection over Union ratio really matters.
We changed figure 2 and added an explanation of fig 1 and 2 in the text:
“Figure 1 illustrates this method for low resolution 5’ and for cell location No. 7, which is one 5’ cell south of the cell where the station “Passau/Inn” is located (see the zoom in the upper left part of figure 1). Even if this cell is not representing the cell where the station is located, this cell fits the upstream area accordance and the Intersection over Union ratio best of all 25 cells around the station location.”
“Figure 2 shows four examples out of the 25 cell locations around station “Passau/Inn”. Figure 2a uses the cell where the station is located. This cell represents not only the Inn, but the also the Danube and the Inn basin. Figure 2b includes only a small tributary of the Inn and figure 2c contains only the Danube basin but not the Inn basin. Figure 2d shows the best location (one grid cell south of the grid cell with the station – same as in figure 1).”
Technical corrections:
Q: (1) L18-21: I think this part is not relevant to the main context. Suggest deleting.
Reply: Thanks for the correction. We think you are right. The lines have been deleted
Q: (2) L 92: Delete “)”. Reply: Done
Q: (3) L 133: The number of cells for each resolution does not consistent with those described in line 303. I think this line should be corrected.
Reply: We corrected the numbers in L133: “(e.g., ≥ 9,000 km2 for 30’ (~3 cells), ≥ 1,000 km2 for 5’ (~12 cells)).” and in L303 - (~12 grid cells on 5’).
Thanks for pointing. At the equator, a 5’ grid cell has an area of 85.8 km2, and a 30’ grid cell of 3087.6 km2Q: (4) Figure 3 was not referred to in the manuscript.
Reply: We put in a description:
“Figure 3 shows the global distribution of GRDC stations (status: March 2022) with a high concentration of stations in North America and Europe and a lower and more clustered distribution in Africa and Asia.”Citation: https://doi.org/10.5194/essd-2022-231-AC1
-
AC1: 'Reply on RC1', Peter Burek, 22 Mar 2023
-
RC2: 'Comment on essd-2022-231', Anonymous Referee #2, 22 Feb 2023
The manuscript describes the procedure used to generate a dataset of station locations of observed discharge to be used at different resolutions for calibrating large-scale hydrological models. The authors update the 10-year-old database of GRDC watershed boundaries and provide source codes and high- and low-resolution watershed boundaries. The manuscript is interesting and the results are useful for scientific purposes. However, before it can be considered for publication, the authors need to undertake a thorough revision process to better explain the objectives of the manuscript and the steps needed to achieve them. In addition, figures and tables must be clearly explained in the text. Symbols and acronyms should be used consistently.
General remarks:
- Authors should describe the dataset of river GRDC discharge data in the introduction. GRDC is not only a river discharge time series dataset but it also contains information on hydrometric station location, upstream basin area…In this way the readers can better follow the manuscript and in particular it is easy to understand the meaning of “reported upstream area” (e.g., point e in lines 57-59 and line 71).
- The authors should better describe the objective of the manuscript, which is not only to revise and correct the shapefiles of the GRDC stations, but also to provide a Python code to easily select stations for calibration/validation of LSM models.
- The Methods section should be revised. The authors should clearly state the objectives of the study and the steps needed to achieve them. Some sentences should be added before line 75 to introduce paragraph 2.3 and its sub-paragraphs. For example, when reading line 161, it is not immediately clear what does the authors mean by “For the next selection step…”.
- 84. The last part of the sentence “or with no upstream area record” is misleading. If I understand correctly, this part should be deleted as in the paragraph 2.1.1 only stations with an upstream drainage area are considered.
- 120. Please be consistent with the notation. According to L. 44, “30 arc minutes resolution”, should be recalled here as 30’. Please modify the sentence.
- 132. Please be consistent with the notation. According to L. 87, the upstream area was abbreviated as “UPA”. The sentence on lines 132-133 can be changed as “We defined a minimum UPA for the station we wanted to use in the low-resolution hydrological model (e.g., UPA ≥ 9,000 km2 for 30’ (~180 cells), UPA ≥ 1,000 km2 for 5’ (~100 cells)).”
- L 134-135. This sentence is hard to understand. Please, rephrase it.
- 139. Please define the range of variability for the Intersection over Union ratio.
- 143. What does “𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦” stand for? Does it refer to the Intersection over Union ratio?
- L 144. Please explain Figure 1 clearly.
- Table 2 should be better explained in the text.
- L 201 vs L.107. How many stations do not have a catchment shapefile? 228 as stated in line 201 or 352 as is written in line 107? Please check.
- 224. Should be Table 3 instead of Table 1.
- L 225. What does “distance median” mean? Please add details.
- 272. Should be Lokoja station not Lokojo.
- L 316 should be Figure 7b. The caption for Figure 7 should clearly describe both the a) and b) panels.
- Figure 2 is not described in the manuscript. Please explain the figure or delete it.
- Figure 3 is not described in the manuscript. Please explain the figure or delete it. This figure could be moved to section 2.
Minor corrections:
- 52. A parenthesis is missing at the end of the sentence.
- L 53. Please, define the ISIMIP acronym.
- 67. Please delete the parenthesis before the “3 arcseconds”. Please be consistent with the conversion of 3” to metres. Here is it indicated that “3”~93m” whereas in Line 76 it is written “3 arc seconds (~100 m)”.
- L 67. Please, enter the corresponding metres to 15 arc seconds.
- 70. Please, enter the corresponding metres to 5 arc minutes.
- L 92. Please delete the parenthesis at the end of the sentence.
Citation: https://doi.org/10.5194/essd-2022-231-RC2 -
AC2: 'Reply on RC2', Peter Burek, 22 Mar 2023
Thank you for your detailed review of the manuscript and the constructive comments. Due to your effort, we could improve the paper. The comments have been addressed as following:
General remarks:
- Authors should describe the dataset of river GRDC discharge data in the introduction. GRDC is not only a river discharge time series dataset but it also contains information on hydrometric station location, upstream basin area…In this way the readers can better follow the manuscript and in particular it is easy to understand the meaning of “reported upstream area” (e.g., point e in lines 57-59 and line 71).
Reply: We added in the abstract: “The Global Runoff Data Centre provides time series of observed discharges and information on hydrometric stations that are valuable for calibrating and validating the results of hydrological models.”
We added in the introduction: “The GRDC database of river discharge comes which information about the stations from the data providers, like the location of the station, name of the station and the river, upstream area, elevation, mean discharge, and more. Especially the location and the upstream area are very important to compare model results from hydrological models with station discharge data.” - The authors should better describe the objective of the manuscript, which is not only to revise and correct the shapefiles of the GRDC stations, but also to provide a Python code to easily select stations for calibration/validation of LSM models.
Reply: We added in the abstract: “we provide source codes and high- and low-resolution watershed boundaries to easily select stations for calibration/validation of hydrological models.”We added in the introduction: “The objective of this paper is to provide a Python code to easily select stations for calibration/validation of hydrological models by adressing these possible errors and giving examples of how to correct them.”
- The Methods section should be revised. The authors should clearly state the objectives of the study and the steps needed to achieve them. Some sentences should be added before line 75 to introduce paragraph 2.3 and its sub-paragraphs. For example, when reading line 161, it is not immediately clear what does the authors mean by “For the next selection step…”.
Reply: We added some lines before L75 to describe the step wise approach of the methods:
“The methods can be split up into three main groups, each group building upon the results of the previous one. The first method describes allocating a station location from the GRDC database to fit best on a high-resolution network. This method reproduces the approach from Lehner (2012). The second method describes how to upscale the station location from a high-resolution network to a low-resolution network used in standard land-surface hydrological routing models by comparing upstream area and similarity of the station upstream areas in high and low resolution. The third method describes how to select the most appropriate stations for calibrating hydrological models, depending on the metadata of the stations and the chosen model grid resolution. “ - 84. The last part of the sentence “or with no upstream area record” is misleading. If I understand correctly, this part should be deleted as in the paragraph 2.1.1 only stations with an upstream drainage area are considered.
Reply: We added some numbers in L84f and added reported upstream area.
“For the evaluation, we used all stations with an reported upstream area greater than or equal to 10 km2 (124 stations have an upstream area smaller than 10 km2) or with no reported upstream area record (327 have no upstream area record in the GRDC dataset).”
We kept the stations with no reported upstream area in the GRDC dataset, because most of them could be clearly identified by location and most of them (201 stations) are in Africa and Asia, which are anyway underrepresented. - 120. Please be consistent with the notation. According to L. 44, “30 arc minutes resolution”, should be recalled here as 30’. Please modify the sentence.
Reply: We use now 30’ or 5’ instead of arc minutes from L44 on. Same with 3 arc seconds. We use 3’’ after we introduced 3’’ in L 72 - 132. Please be consistent with the notation. According to L. 87, the upstream area was abbreviated as “UPA”. The sentence on lines 132-133 can be changed as “We defined a minimum UPA for the station we wanted to use in the low-resolution hydrological model (e.g., UPA ≥ 9,000 km2 for 30’ (~180 cells), UPA ≥ 1,000 km2 for 5’ (~100 cells))
Reply: We followed your advice and used UPA from L 87 in lines 132-133 but also everywhere else in the text and figures. - L 134-135. This sentence is hard to understand. Please, rephrase it.
Reply: To find the grid cell on the coarse resolution network which fits best to the upstream area and shape of the high-resolution network, we calculated two objective criteria for all coarse grid cells with a distance <= 2 coarse cell distance (altogether 25 grid cells) to the location of the station on the high-resolution network - 139. Please define the range of variability for the Intersection over Union ratio.
Reply: It is [0,1]. We put in the text: “The Intersection over Union ratio can have a value between [0,1]. The closer to 1 the value of Intersection over Union ratio is, then the more similar the shapes are.” In L 162f. We also put in the range of upstream area accordance in L 154: “The upstream area accordance can have a value between ]0,1] with 1 having GRDC and coarse area the same value.” ( as ]0,1] as 0 is outside of the interval - 143. What does “𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦” stand for? Does it refer to the Intersection over Union ratio?
Reply: Yes similarity = Intersection over Union ratio. We replaced similarity by Intersection over Union ratio in text and figures - L 144. Please explain Figure 1 clearly.
Reply: We added explanation for figure 1
“Figure 1 illustrates this method for low resolution 5’ and for cell location No. 7, which is one 5’ cell south of the cell where the station “Passau/Inn” is located (see the zoom in the upper left part of figure 1). Even if this cell is not representing the cell where the station is located, this cell fits the upstream area accordance and the Intersection over Union ratio best of all 25 cells around the station location.” - Table 2 should be better explained in the text.
Reply: We explained table 2:“If a station had a higher Intersection over Union ratio or upstream area accordace than 80% it got for every 2% one scoring point. Stations earn scoring points for every five additional years of time series length and for end dates of the time series after 1985. For missing data in the time series scoring points are subtracted (see Table 2 for the scoring criteria). The station with the higher scoring points is chosen. These criteria are subjective and can be changed in the Python code”
- L 201 vs L.107. How many stations do not have a catchment shapefile? 228 as stated in line 201 or 352 as is written in line 107? Please check.
Reply: Thank you for checking. We corrected L 107:
“For 2.2% of the stations (228 stations), we could not find an adequate location on the high-resolution network” - 224. Should be Table 3 instead of Table 1.
Reply: Yes, you are right, changed to table 3 - L 225. What does “distance median” mean? Please add details.
Reply: Added:
“(Here the distance is the distance in meter between reported station location in the GRDC dataset and the location represented in the 3’’ MERRIT network. The median of distance is calculated as the median of all distances in each row of table3.) “ - 272. Should be Lokoja station not Lokojo. Reply: Changed
- L 316 should be Figure 7b. The caption for Figure 7 should clearly describe both the a) and b) panels.
Reply: Done L316 and we changed the caption:
Mismatch of basin allocation because of selection from upstream area only. a) shows the South Platte River, USA at 30’ resolution b) shows the river Pisuerga in Spain at 5’ resolution. © OpenStreetMap contributors 2022. Distributed under the Open Data Commons Open Database License (ODbL) v1.0. - Figure 2 is not described in the manuscript. Please explain the figure or delete it.
Reply: We added:
“Figure 2 shows four examples out of the 25 cell locations around station “Passau/Inn”. Figure 2a uses the cell where the station is located. This cell represents not only the Inn, but the also the Danube and the Inn basin. Figure 2b includes only a small tributary of the Inn and figure 2c contains only the Danube basin but not the Inn basin. Figure 2d shows the best location (one grid cell south of the grid cell with the station – same as in figure 1).” - Figure 3 is not described in the manuscript. Please explain the figure or delete it. This figure could be moved to section 2.
Reply: We did not move this to the method section, because the method part is independent d of the actual number of stations of the GRDC database. In the result part we show the application of the methods to the GRDC database of March 2022, which include 10701 stations at that time.We put in a description: “Figure 3 shows the global distribution of GRDC stations (status: March 2022) with a high concentration of stations in North America and Europe and a lower and more clustered distribution in Africa and Asia.”
Minor corrections:
- 52. A parenthesis is missing at the end of the sentence. Reply: Done
- L 53. Please, define the ISIMIP acronym.
Reply: Deleted ISIMIP here because it does not add information here. ISIMIP is explained in section 2.2 - 67. Please delete the parenthesis before the “3 arcseconds”. Please be consistent with the conversion of 3” to metres. Here is it indicated that “3”~93m” whereas in Line 76 it is written “3 arc seconds (~100 m)”.
Reply: Done, we stick to 3’’ ~100m, but we gave also the exact value of 3’’ at the equator = 92.61 m - L 67. Please, enter the corresponding metres to 15 arc seconds. Reply: Done
- 70. Please, enter the corresponding metres to 5 arc minutes. Reply: Done
- L 92. Please delete the parenthesis at the end of the sentence. Reply: Done
Citation: https://doi.org/10.5194/essd-2022-231-AC2 - Authors should describe the dataset of river GRDC discharge data in the introduction. GRDC is not only a river discharge time series dataset but it also contains information on hydrometric station location, upstream basin area…In this way the readers can better follow the manuscript and in particular it is easy to understand the meaning of “reported upstream area” (e.g., point e in lines 57-59 and line 71).
Status: closed
-
RC1: 'Comment on essd-2022-231', Anonymous Referee #1, 05 Oct 2022
General comments
This manuscript describes the new location data of GRDC stations for calibrating large-scale hydrological models. The calibration of these models relies on the accuracy of stations allocated on the gridded river network of different spatial resolutions. The data is developed based on the idea of the ‘intersection over union ratio approach’, which quantifies the similarity of the watershed shapes between low- and high-resolution gridded watersheds. The new dataset's accuracy was reasonably good compared to the previous versions; thus, the dataset attracts interest in hydrological communities and is worth publishing. Although the estimated precision is promising, however, I think the manuscript does not contain an adequate description as a data paper. Given the location of GRDC stations is a widely used essential information in hydrology and earth system science studies, I think the manuscript is worth publishing on ESSD, after corrections on a few ambiguous parts.
Specific comments:
(1) L 89. Is “Upstream area error” used in the subsequent manuscript?
(2) L93-94. The unit of the distance was not described here. The values of OC calculated with the described equation differed depending on the unit of the distance used. Because the values of (1-'area accordance') are always less than one, the OC values are predominantly determined by the distance if the unit is in kilometers or meters (usually larger than 1). At least, the second term of the equation should be normalized to the range from zero to one. Also, the authors should explain how the weighting of the second term, 2, was determined.
(3) L 141=143: The choice of the weighting factor for calculating ED should be explained.
(4) L 144: Figures 1 (and Figure 2 as well) are not appropriate for the examples describing the automatic upscaling process based on the similarity of shape, because it seems that only the 'area accordance' suffice for the selection of station 7. I would suggest the authors select a more appropriate example. I think Figure 7 worth explains the problem of mismatch in the upscaled stations and how they are solved with the proposed procedures. Upon revising, please consider the necessity of Figure 2 as it was not referred to in the manuscript.
Technical corrections:
(1) L18-21: I think this part is not relevant to the main context. Suggest deleting.
(2) L 92: Delete “)”.
(3) L 133: The number of cells for each resolution does not consistent with those described in line 303. I think this line should be corrected.
(4) Figure 3 was not referred to in the manuscript.
Citation: https://doi.org/10.5194/essd-2022-231-RC1 -
AC1: 'Reply on RC1', Peter Burek, 22 Mar 2023
Dear reviewer,
Thank you for reviewing our paper, your constructive comments and your attention to details. We appreciate your voluntary effort and we revised the manuscript according to your comments. The comments have been addressed as following:Specific comments:
Q: (1) L 89. Is “Upstream area error” used in the subsequent manuscript?
Reply: We used both upstream area error and upstream accordance in the paper, which might lead to confusion. We removed the term “Upstream area error” and replaced it with upstream area accordance. We also removed the term similarity index and replaced it with “Intersection over Union ratio”. Both indices score low with 0 and high with 1
Q: (2) L93-94. The unit of the distance was not described here. The values of OC calculated with the described equation differed depending on the unit of the distance used. Because the values of (1-'area accordance') are always less than one, the OC values are predominantly determined by the distance if the unit is in kilometers or meters (usually larger than 1). At least, the second term of the equation should be normalized to the range from zero to one. Also, the authors should explain how the weighting of the second term, 2, was determined.
Reply: Thank you for paying attention on this. We followed the approach of Lehner (2012), but we did not describe his method properly. The two terms are normalized to fit together, and the equations and the weighting are from Lehner (2012), too. We point this out.
We changed the description in L90f:
- A rectangular search radius of 165 arcsec (~5 km) for each station was defined.
- For each grid in this rectangle, the upstream drainage area (UPA) from the network from Yamazaki et al. (2019) was compared to the area reported in the GRDC, and the upstream area accordance is computed:
- Upstream area accordance = GRDC reported UPA / gridded network UPA
(where: GRDC reported UPA < gridded network UPA)
Upstream area accordance = gridded network UPA / GRDC reported UPA
(where: GRDC reported UPA ≥ gridded network UPA)
- Upstream area accordance = GRDC reported UPA / gridded network UPA
- All cells with an upstream area accordance of less than 50% were dismissed from further evaluation.
- A first ranking scheme – area discrepancy (RA) - was calculated with values between 0 (best fit) to 50: RA = 100 - Upstream area accordance[%]
- For the second ranking scheme – distance (RD) – the distance of the cell to the reported station location in the GRDC database was calculated and normalized to get the value 0 at the station location and 50 in 5 km distance.
- An objective criterion (OC) for ranking was computed by OC = RA + 2 * RD. The equation and weighting were taken from Lehner (2012).
- The grid cell with the lowest OC value was taken as the corresponding grid cell for the station location on a high-resolution network
- If no station location was found in this step, the search radius was increased to 5’ (~10 km), OC was calculated as OC = RA + RD, and the lowest OC value was taken as the corresponding grid cell.
Q: (3) L 141=143: The choice of the weighting factor for calculating ED should be explained.
Reply: The objective criterion Upstream area accordance range from ]0,1], and the Intersection over Union ratio criterion has a range of [0,1]. Therefore we decided to give a weighting factor of 1 to each. We added the text:
"Both objective criteria have a range between 0 and 1. Therefore, we decided to use a weighting factor of 1 for both criteria."Q:(4) L 144: Figures 1 (and Figure 2 as well) are not appropriate for the examples describing the automatic upscaling process based on the similarity of shape, because it seems that only the 'area accordance' suffice for the selection of station 7. I would suggest the authors select a more appropriate example. I think Figure 7 worth explains the problem of mismatch in the upscaled stations and how they are solved with the proposed procedures. Upon revising, please consider the necessity of Figure 2 as it was not referred to in the manuscript.
Reply: Yes we agree that for Passau/Inn the area accordance would be enough, but it is a good example that a) the cell with the station is not the most appropriated one b) moving 1 cell away from the station shows very different basins. We think it is fine to illustrate the method. In the result part (figure 7) we show 2 examples where Intersection over Union ratio really matters.
We changed figure 2 and added an explanation of fig 1 and 2 in the text:
“Figure 1 illustrates this method for low resolution 5’ and for cell location No. 7, which is one 5’ cell south of the cell where the station “Passau/Inn” is located (see the zoom in the upper left part of figure 1). Even if this cell is not representing the cell where the station is located, this cell fits the upstream area accordance and the Intersection over Union ratio best of all 25 cells around the station location.”
“Figure 2 shows four examples out of the 25 cell locations around station “Passau/Inn”. Figure 2a uses the cell where the station is located. This cell represents not only the Inn, but the also the Danube and the Inn basin. Figure 2b includes only a small tributary of the Inn and figure 2c contains only the Danube basin but not the Inn basin. Figure 2d shows the best location (one grid cell south of the grid cell with the station – same as in figure 1).”
Technical corrections:
Q: (1) L18-21: I think this part is not relevant to the main context. Suggest deleting.
Reply: Thanks for the correction. We think you are right. The lines have been deleted
Q: (2) L 92: Delete “)”. Reply: Done
Q: (3) L 133: The number of cells for each resolution does not consistent with those described in line 303. I think this line should be corrected.
Reply: We corrected the numbers in L133: “(e.g., ≥ 9,000 km2 for 30’ (~3 cells), ≥ 1,000 km2 for 5’ (~12 cells)).” and in L303 - (~12 grid cells on 5’).
Thanks for pointing. At the equator, a 5’ grid cell has an area of 85.8 km2, and a 30’ grid cell of 3087.6 km2Q: (4) Figure 3 was not referred to in the manuscript.
Reply: We put in a description:
“Figure 3 shows the global distribution of GRDC stations (status: March 2022) with a high concentration of stations in North America and Europe and a lower and more clustered distribution in Africa and Asia.”Citation: https://doi.org/10.5194/essd-2022-231-AC1
-
AC1: 'Reply on RC1', Peter Burek, 22 Mar 2023
-
RC2: 'Comment on essd-2022-231', Anonymous Referee #2, 22 Feb 2023
The manuscript describes the procedure used to generate a dataset of station locations of observed discharge to be used at different resolutions for calibrating large-scale hydrological models. The authors update the 10-year-old database of GRDC watershed boundaries and provide source codes and high- and low-resolution watershed boundaries. The manuscript is interesting and the results are useful for scientific purposes. However, before it can be considered for publication, the authors need to undertake a thorough revision process to better explain the objectives of the manuscript and the steps needed to achieve them. In addition, figures and tables must be clearly explained in the text. Symbols and acronyms should be used consistently.
General remarks:
- Authors should describe the dataset of river GRDC discharge data in the introduction. GRDC is not only a river discharge time series dataset but it also contains information on hydrometric station location, upstream basin area…In this way the readers can better follow the manuscript and in particular it is easy to understand the meaning of “reported upstream area” (e.g., point e in lines 57-59 and line 71).
- The authors should better describe the objective of the manuscript, which is not only to revise and correct the shapefiles of the GRDC stations, but also to provide a Python code to easily select stations for calibration/validation of LSM models.
- The Methods section should be revised. The authors should clearly state the objectives of the study and the steps needed to achieve them. Some sentences should be added before line 75 to introduce paragraph 2.3 and its sub-paragraphs. For example, when reading line 161, it is not immediately clear what does the authors mean by “For the next selection step…”.
- 84. The last part of the sentence “or with no upstream area record” is misleading. If I understand correctly, this part should be deleted as in the paragraph 2.1.1 only stations with an upstream drainage area are considered.
- 120. Please be consistent with the notation. According to L. 44, “30 arc minutes resolution”, should be recalled here as 30’. Please modify the sentence.
- 132. Please be consistent with the notation. According to L. 87, the upstream area was abbreviated as “UPA”. The sentence on lines 132-133 can be changed as “We defined a minimum UPA for the station we wanted to use in the low-resolution hydrological model (e.g., UPA ≥ 9,000 km2 for 30’ (~180 cells), UPA ≥ 1,000 km2 for 5’ (~100 cells)).”
- L 134-135. This sentence is hard to understand. Please, rephrase it.
- 139. Please define the range of variability for the Intersection over Union ratio.
- 143. What does “𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦” stand for? Does it refer to the Intersection over Union ratio?
- L 144. Please explain Figure 1 clearly.
- Table 2 should be better explained in the text.
- L 201 vs L.107. How many stations do not have a catchment shapefile? 228 as stated in line 201 or 352 as is written in line 107? Please check.
- 224. Should be Table 3 instead of Table 1.
- L 225. What does “distance median” mean? Please add details.
- 272. Should be Lokoja station not Lokojo.
- L 316 should be Figure 7b. The caption for Figure 7 should clearly describe both the a) and b) panels.
- Figure 2 is not described in the manuscript. Please explain the figure or delete it.
- Figure 3 is not described in the manuscript. Please explain the figure or delete it. This figure could be moved to section 2.
Minor corrections:
- 52. A parenthesis is missing at the end of the sentence.
- L 53. Please, define the ISIMIP acronym.
- 67. Please delete the parenthesis before the “3 arcseconds”. Please be consistent with the conversion of 3” to metres. Here is it indicated that “3”~93m” whereas in Line 76 it is written “3 arc seconds (~100 m)”.
- L 67. Please, enter the corresponding metres to 15 arc seconds.
- 70. Please, enter the corresponding metres to 5 arc minutes.
- L 92. Please delete the parenthesis at the end of the sentence.
Citation: https://doi.org/10.5194/essd-2022-231-RC2 -
AC2: 'Reply on RC2', Peter Burek, 22 Mar 2023
Thank you for your detailed review of the manuscript and the constructive comments. Due to your effort, we could improve the paper. The comments have been addressed as following:
General remarks:
- Authors should describe the dataset of river GRDC discharge data in the introduction. GRDC is not only a river discharge time series dataset but it also contains information on hydrometric station location, upstream basin area…In this way the readers can better follow the manuscript and in particular it is easy to understand the meaning of “reported upstream area” (e.g., point e in lines 57-59 and line 71).
Reply: We added in the abstract: “The Global Runoff Data Centre provides time series of observed discharges and information on hydrometric stations that are valuable for calibrating and validating the results of hydrological models.”
We added in the introduction: “The GRDC database of river discharge comes which information about the stations from the data providers, like the location of the station, name of the station and the river, upstream area, elevation, mean discharge, and more. Especially the location and the upstream area are very important to compare model results from hydrological models with station discharge data.” - The authors should better describe the objective of the manuscript, which is not only to revise and correct the shapefiles of the GRDC stations, but also to provide a Python code to easily select stations for calibration/validation of LSM models.
Reply: We added in the abstract: “we provide source codes and high- and low-resolution watershed boundaries to easily select stations for calibration/validation of hydrological models.”We added in the introduction: “The objective of this paper is to provide a Python code to easily select stations for calibration/validation of hydrological models by adressing these possible errors and giving examples of how to correct them.”
- The Methods section should be revised. The authors should clearly state the objectives of the study and the steps needed to achieve them. Some sentences should be added before line 75 to introduce paragraph 2.3 and its sub-paragraphs. For example, when reading line 161, it is not immediately clear what does the authors mean by “For the next selection step…”.
Reply: We added some lines before L75 to describe the step wise approach of the methods:
“The methods can be split up into three main groups, each group building upon the results of the previous one. The first method describes allocating a station location from the GRDC database to fit best on a high-resolution network. This method reproduces the approach from Lehner (2012). The second method describes how to upscale the station location from a high-resolution network to a low-resolution network used in standard land-surface hydrological routing models by comparing upstream area and similarity of the station upstream areas in high and low resolution. The third method describes how to select the most appropriate stations for calibrating hydrological models, depending on the metadata of the stations and the chosen model grid resolution. “ - 84. The last part of the sentence “or with no upstream area record” is misleading. If I understand correctly, this part should be deleted as in the paragraph 2.1.1 only stations with an upstream drainage area are considered.
Reply: We added some numbers in L84f and added reported upstream area.
“For the evaluation, we used all stations with an reported upstream area greater than or equal to 10 km2 (124 stations have an upstream area smaller than 10 km2) or with no reported upstream area record (327 have no upstream area record in the GRDC dataset).”
We kept the stations with no reported upstream area in the GRDC dataset, because most of them could be clearly identified by location and most of them (201 stations) are in Africa and Asia, which are anyway underrepresented. - 120. Please be consistent with the notation. According to L. 44, “30 arc minutes resolution”, should be recalled here as 30’. Please modify the sentence.
Reply: We use now 30’ or 5’ instead of arc minutes from L44 on. Same with 3 arc seconds. We use 3’’ after we introduced 3’’ in L 72 - 132. Please be consistent with the notation. According to L. 87, the upstream area was abbreviated as “UPA”. The sentence on lines 132-133 can be changed as “We defined a minimum UPA for the station we wanted to use in the low-resolution hydrological model (e.g., UPA ≥ 9,000 km2 for 30’ (~180 cells), UPA ≥ 1,000 km2 for 5’ (~100 cells))
Reply: We followed your advice and used UPA from L 87 in lines 132-133 but also everywhere else in the text and figures. - L 134-135. This sentence is hard to understand. Please, rephrase it.
Reply: To find the grid cell on the coarse resolution network which fits best to the upstream area and shape of the high-resolution network, we calculated two objective criteria for all coarse grid cells with a distance <= 2 coarse cell distance (altogether 25 grid cells) to the location of the station on the high-resolution network - 139. Please define the range of variability for the Intersection over Union ratio.
Reply: It is [0,1]. We put in the text: “The Intersection over Union ratio can have a value between [0,1]. The closer to 1 the value of Intersection over Union ratio is, then the more similar the shapes are.” In L 162f. We also put in the range of upstream area accordance in L 154: “The upstream area accordance can have a value between ]0,1] with 1 having GRDC and coarse area the same value.” ( as ]0,1] as 0 is outside of the interval - 143. What does “𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦” stand for? Does it refer to the Intersection over Union ratio?
Reply: Yes similarity = Intersection over Union ratio. We replaced similarity by Intersection over Union ratio in text and figures - L 144. Please explain Figure 1 clearly.
Reply: We added explanation for figure 1
“Figure 1 illustrates this method for low resolution 5’ and for cell location No. 7, which is one 5’ cell south of the cell where the station “Passau/Inn” is located (see the zoom in the upper left part of figure 1). Even if this cell is not representing the cell where the station is located, this cell fits the upstream area accordance and the Intersection over Union ratio best of all 25 cells around the station location.” - Table 2 should be better explained in the text.
Reply: We explained table 2:“If a station had a higher Intersection over Union ratio or upstream area accordace than 80% it got for every 2% one scoring point. Stations earn scoring points for every five additional years of time series length and for end dates of the time series after 1985. For missing data in the time series scoring points are subtracted (see Table 2 for the scoring criteria). The station with the higher scoring points is chosen. These criteria are subjective and can be changed in the Python code”
- L 201 vs L.107. How many stations do not have a catchment shapefile? 228 as stated in line 201 or 352 as is written in line 107? Please check.
Reply: Thank you for checking. We corrected L 107:
“For 2.2% of the stations (228 stations), we could not find an adequate location on the high-resolution network” - 224. Should be Table 3 instead of Table 1.
Reply: Yes, you are right, changed to table 3 - L 225. What does “distance median” mean? Please add details.
Reply: Added:
“(Here the distance is the distance in meter between reported station location in the GRDC dataset and the location represented in the 3’’ MERRIT network. The median of distance is calculated as the median of all distances in each row of table3.) “ - 272. Should be Lokoja station not Lokojo. Reply: Changed
- L 316 should be Figure 7b. The caption for Figure 7 should clearly describe both the a) and b) panels.
Reply: Done L316 and we changed the caption:
Mismatch of basin allocation because of selection from upstream area only. a) shows the South Platte River, USA at 30’ resolution b) shows the river Pisuerga in Spain at 5’ resolution. © OpenStreetMap contributors 2022. Distributed under the Open Data Commons Open Database License (ODbL) v1.0. - Figure 2 is not described in the manuscript. Please explain the figure or delete it.
Reply: We added:
“Figure 2 shows four examples out of the 25 cell locations around station “Passau/Inn”. Figure 2a uses the cell where the station is located. This cell represents not only the Inn, but the also the Danube and the Inn basin. Figure 2b includes only a small tributary of the Inn and figure 2c contains only the Danube basin but not the Inn basin. Figure 2d shows the best location (one grid cell south of the grid cell with the station – same as in figure 1).” - Figure 3 is not described in the manuscript. Please explain the figure or delete it. This figure could be moved to section 2.
Reply: We did not move this to the method section, because the method part is independent d of the actual number of stations of the GRDC database. In the result part we show the application of the methods to the GRDC database of March 2022, which include 10701 stations at that time.We put in a description: “Figure 3 shows the global distribution of GRDC stations (status: March 2022) with a high concentration of stations in North America and Europe and a lower and more clustered distribution in Africa and Asia.”
Minor corrections:
- 52. A parenthesis is missing at the end of the sentence. Reply: Done
- L 53. Please, define the ISIMIP acronym.
Reply: Deleted ISIMIP here because it does not add information here. ISIMIP is explained in section 2.2 - 67. Please delete the parenthesis before the “3 arcseconds”. Please be consistent with the conversion of 3” to metres. Here is it indicated that “3”~93m” whereas in Line 76 it is written “3 arc seconds (~100 m)”.
Reply: Done, we stick to 3’’ ~100m, but we gave also the exact value of 3’’ at the equator = 92.61 m - L 67. Please, enter the corresponding metres to 15 arc seconds. Reply: Done
- 70. Please, enter the corresponding metres to 5 arc minutes. Reply: Done
- L 92. Please delete the parenthesis at the end of the sentence. Reply: Done
Citation: https://doi.org/10.5194/essd-2022-231-AC2 - Authors should describe the dataset of river GRDC discharge data in the introduction. GRDC is not only a river discharge time series dataset but it also contains information on hydrometric station location, upstream basin area…In this way the readers can better follow the manuscript and in particular it is easy to understand the meaning of “reported upstream area” (e.g., point e in lines 57-59 and line 71).
Peter Burek and Mikhail Smilovic
Data sets
The use of GRDC gauging stations for calibrating large-scale hydrological models Peter Burek, Mikhail Smilovic https://doi.org/10.5281/zenodo.6906577
Model code and software
The use of GRDC gauging stations for calibrating large-scale hydrological models Peter Burek, Mikhail Smilovic https://github.com/iiasa/CWATM_grdc_calibration_stations
Peter Burek and Mikhail Smilovic
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
520 | 261 | 26 | 807 | 15 | 20 |
- HTML: 520
- PDF: 261
- XML: 26
- Total: 807
- BibTeX: 15
- EndNote: 20
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1