<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "https://jats.nlm.nih.gov/nlm-dtd/publishing/3.0/journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" xml:lang="en" dtd-version="3.0" article-type="data-paper">
  <front>
    <journal-meta><journal-id journal-id-type="publisher">ESSD</journal-id><journal-title-group>
    <journal-title>Earth System Science Data</journal-title>
    <abbrev-journal-title abbrev-type="publisher">ESSD</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">Earth Syst. Sci. Data</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">1866-3516</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/essd-15-4389-2023</article-id><title-group><article-title>Geospatial dataset for hydrologic analyses in India (GHI): a quality-controlled dataset on river gauges, catchment boundaries and hydrometeorological time series</article-title><alt-title>GHI dataset</alt-title>
      </title-group><?xmltex \runningtitle{GHI dataset}?><?xmltex \runningauthor{G.~Goteti}?>
      <contrib-group>
        <contrib contrib-type="author" corresp="yes">
          <name><surname>Goteti</surname><given-names>Gopi</given-names></name>
          <email>saagu.neeru@gmail.com</email>
        <ext-link>https://orcid.org/0000-0001-6092-0667</ext-link></contrib>
        <aff id="aff1"><institution>independent researcher: 5741 NW 92nd Ct, Johnston, IA 50131, USA</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Gopi Goteti (saagu.neeru@gmail.com)</corresp></author-notes><pub-date><day>5</day><month>October</month><year>2023</year></pub-date>
      
      <volume>15</volume>
      <issue>10</issue>
      <fpage>4389</fpage><lpage>4415</lpage>
      <history>
        <date date-type="received"><day>4</day><month>February</month><year>2023</year></date>
           <date date-type="rev-request"><day>13</day><month>June</month><year>2023</year></date>
           <date date-type="rev-recd"><day>23</day><month>August</month><year>2023</year></date>
           <date date-type="accepted"><day>30</day><month>August</month><year>2023</year></date>
      </history>
      <permissions>
        <copyright-statement>Copyright: © 2023 Gopi Goteti</copyright-statement>
        <copyright-year>2023</copyright-year>
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://essd.copernicus.org/articles/15/4389/2023/essd-15-4389-2023.html">This article is available from https://essd.copernicus.org/articles/15/4389/2023/essd-15-4389-2023.html</self-uri><self-uri xlink:href="https://essd.copernicus.org/articles/15/4389/2023/essd-15-4389-2023.pdf">The full text article is available as a PDF file from https://essd.copernicus.org/articles/15/4389/2023/essd-15-4389-2023.pdf</self-uri>
      <abstract><title>Abstract</title>

      <p id="d1e77">Streamflow gauging stations not only track the pulse of rivers but also act as common reference points for hydrologic and other environmental analyses. As such, streamflow data and metadata on gauging stations – Geographic Information System (GIS) data on station locations, their upstream catchment boundaries and river flow networks – are critical for analyses. However, for India's river basins, the availability of such data is limited; when available, data are not in an analysis-ready format and can have substantial errors. Studies often use available information from India's water agencies as is, without checking its validity. This study addresses the above limitations by building a new dataset using existing metadata (from the Central Water Commission, CWC, and the Water Resources Information System, WRIS) and checking it against publicly available information from global data sources (e.g., World Wildlife Fund, Multi-Error-Removed Improved-Terrain Hydro and Copernicus) and online maps (e.g., Google Maps). The quality control process categorizes existing metadata based on their consistency with these sources; also, existing metadata are supplemented with additional information where needed. The new dataset developed here is called the “Geospatial dataset for Hydrologic analyses in India” (GHI) and uses Hydrological data and maps based on Shuttle Elevation Derivatives at multiple Scales (HydroSHEDS) data as the underlying template. GHI has both geospatial and time series information. In this initial version of GHI, the spatial domain includes only the river basins of Peninsular India where daily streamflow data are publicly available.</p>

      <p id="d1e80">Following the quality control process, the CWC's 645 stations in Peninsular India were categorized into three groups: Group 1 (reliable metadata and adequate daily streamflow data; 213 stations), Group 2 (reliable metadata and inadequate or no daily streamflow data; 259 stations) and Group 3 (missing or unreliable metadata; 173 stations). For each of the 472 stations falling into groups 1 and 2, catchment-specific annual and monthly time series spanning 71 water years (1950–2020) of the following were compiled: observed precipitation from the Indian Meteorological Department (IMD); observed streamflow from WRIS; estimated precipitation, evapotranspiration (ET) and streamflow from ERA5-Land; and ET from the Global Land Evaporation Amsterdam Model (GLEAM). A preliminary analysis of catchment-scale time series of data indicates that, while the compiled data appear reasonable over most of the study domain, spurious runoff–precipitation ratios were observed in the hilly coastal regions of Western India. This adds to yet another data-related obstacle faced by the hydrologic community. In order to quantify historical changes and reconcile them with anticipated future changes, the community needs robust and reliable hydrographic and hydrometeorological datasets as well as unrestricted access to such datasets. The goal of this study is to highlight the limitations of existing datasets and pave the way for a community-led effort towards building the needed datasets. GHI serves as a placeholder until such datasets become available. Potential improvements to GHI are discussed. GHI is publicly available at <ext-link xlink:href="https://doi.org/10.5281/zenodo.7563599" ext-link-type="DOI">10.5281/zenodo.7563599</ext-link> <xref ref-type="bibr" rid="bib1.bibx10" id="paren.1"/>.</p>
  </abstract>
    </article-meta>
  </front>
<body>
      

<?pagebreak page4390?><sec id="Ch1.S1" sec-type="intro">
  <label>1</label><title>Introduction</title>
      <p id="d1e98">Water resources assessments and other large-scale hydrologic analyses are important and useful means to objectively quantify the water budget. Such analyses often require reconciling observed streamflow at a gauging station with estimated or modeled fluxes over the upstream catchment area. As such, having accurate Geographic Information System (GIS) data on gauging station locations and catchment boundaries is critical. However, for India's watersheds, such metadata – GIS data on river gauging station locations, river networks and catchment area boundaries, etc. – are publicly available only to a limited extent. The primary sources of such information are the Central Water Commission (CWC) and the online Water Resources Information System (WRIS, <uri>https://indiawris.gov.in/wris</uri>, last access: 21 August 2022). Data from CWC, if available, are buried within various CWC reports; thus, users need to piece together the needed information from such reports. WRIS addresses some of CWC's deficiencies; however, catchment area boundaries from WRIS are available only for the large river basins, and contributing catchment areas outside of India are excluded by WRIS.</p>
      <p id="d1e104">There are further data-related challenges when it comes to hydrologic analyses over India. Streamflow data are available through WRIS only if the river basins are entirely within India's boundaries. Hence, WRIS data are only available for the basins of Peninsular India (shaded regions in Fig. <xref ref-type="fig" rid="Ch1.F1"/>a). For river basins such as the Ganga, Brahmaputra and Indus, which have catchment areas spanning multiple countries, data are “classified” by CWC and are not publicly available (unshaded regions in Fig. <xref ref-type="fig" rid="Ch1.F1"/>). Thus, streamflow data for a large portion of India are not readily available. Analysts could use compiled information from established sources such as the Global Runoff Data Center (GRDC, <uri>https://www.bafg.de/GRDC/</uri>, last access: 21 August 2022), the Global Monthly River Discharge Data Set (RivDIS) <xref ref-type="bibr" rid="bib1.bibx29" id="paren.2"/>, the Global Streamflow Indices and Metadata Archive (GSIM) <xref ref-type="bibr" rid="bib1.bibx5 bib1.bibx11" id="paren.3"/> or other global databases. RivDIS and the GRDC contain legacy data only for a small fraction of the gauging stations currently operated by CWC. There is limited or no information available on the specific Indian entities that supplied these data to GRDC or RivDIS nor on the extent of missing streamflow data in these sources. GSIM's streamflow data for India are based solely on “non-classified” information already available from WRIS.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F1" specific-use="star"><?xmltex \currentcnt{1}?><?xmltex \def\figurename{Figure}?><label>Figure 1</label><caption><p id="d1e122"><bold>(a)</bold> Major river basins of India and those used in this study (lightly shaded). <bold>(b)</bold> Composite basins of Peninsular India identified within the Geospatial dataset for Hydrologic analyses in India (GHI). Some basin names within the map are abbreviated for ease of display but are defined next to the map.</p></caption>
        <?xmltex \igopts{width=398.338583pt}?><graphic xlink:href="https://essd.copernicus.org/articles/15/4389/2023/essd-15-4389-2023-f01.png"/>

      </fig>

      <p id="d1e137">Due to a lack of readily available metadata or hydrographic data, studies often compile the needed information, using whatever is available from CWC and WRIS as a reference. For instance, <xref ref-type="bibr" rid="bib1.bibx26" id="text.4"/> calibrated and validated their hydrologic model at 18 stations across India; most of their data were from CWC/WRIS, and the remaining data were from RivDIS. It is not known whether <xref ref-type="bibr" rid="bib1.bibx26" id="text.5"/> accounted for any of the missing streamflow data in their study. <xref ref-type="bibr" rid="bib1.bibx17" id="text.6"/> assessed uncertainties in modeled fluxes using 20 stations from WRIS whose streamflow was minimally affected by human interventions. The station metadata from WRIS were used as is without checking their validity. CWC used more than a hundred stations across India within their water resources assessment <xref ref-type="bibr" rid="bib1.bibx2" id="paren.7"/> but did not provide any GIS data on the stations nor their catchment boundaries. <xref ref-type="bibr" rid="bib1.bibx9" id="text.8"/> noted some discrepancies in estimated catchment areas from <xref ref-type="bibr" rid="bib1.bibx2" id="text.9"/>.</p>
      <p id="d1e159"><xref ref-type="bibr" rid="bib1.bibx8" id="text.10"/> used catchment boundaries compiled by GSIM to quantify hydrometeorological variables of interest in their analysis of drought in Peninsular India. GSIM's goals were similar to those of this study but for thousands of stations across may countries <xref ref-type="bibr" rid="bib1.bibx5 bib1.bibx11" id="paren.11"/>. As such, GSIM could not perform manual verification of available station metadata. GSIM tailored its boundary delineation process by relocating the stations onto the river network such that the final catchment area estimates matched the “reference” catchment area estimates from CWC. Catchment boundaries were derived for select stations available from CWC's inventory from 2012. While relocating stations onto river networks to match the reference catchment area is a reasonable approach (and is also used by GRDC and other studies), the reference metadata from CWC or WRIS have several limitations and are not completely reliable. Station locations are prone to substantial errors, meaning that catchment area estimates are inconsistent with other sources (and sometimes inconsistent with CWC's own reports). The following is a discussion of these limitations.</p>
<sec id="Ch1.S1.SS1">
  <label>1.1</label><title>Issues with existing data</title>
<sec id="Ch1.S1.SS1.SSS1">
  <label>1.1.1</label><title>Spurious station locations</title>
      <p id="d1e181">Actual stations locations can sometimes be hundreds of kilometers away from their current location based on CWC's published coordinates. Some such examples are identified in Fig. <xref ref-type="fig" rid="Ch1.F2"/> and Table <xref ref-type="table" rid="Ch1.T1"/>. In some instances, current station locations fall in the Bay of Bengal (Station S2 in Table <xref ref-type="table" rid="Ch1.T1"/>) or in the Arabian Sea (Station S5 in Table <xref ref-type="table" rid="Ch1.T1"/>).</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F2" specific-use="star"><?xmltex \currentcnt{2}?><?xmltex \def\figurename{Figure}?><label>Figure 2</label><caption><p id="d1e194">Select stations used as examples to illustrate issues with existing metadata. Panel <bold>(a)</bold> presents the stations discussed in Table <xref ref-type="table" rid="Ch1.T1"/>. The current locations from CWC are shown as open red circles, whereas the potential correct locations are shown as shaded red circles. Panel <bold>(b)</bold> shows the stations discussed in Table <xref ref-type="table" rid="Ch1.T2"/> (squares), Table <xref ref-type="table" rid="Ch1.T3"/> (triangles) and Table <xref ref-type="table" rid="Ch1.T4"/> (circles).</p></caption>
            <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://essd.copernicus.org/articles/15/4389/2023/essd-15-4389-2023-f02.png"/>

          </fig>

      <p id="d1e218">One particular example is further illustrated in Fig. <xref ref-type="fig" rid="Ch1.F3"/>. According to CWC, the Eturunagaram station is on the Godavari River. However, when overlaid on Google Maps, the current location falls on the Manair River, near the Manair Bridge. Moreover, the current location of Eturunagaram almost coincides with the current location of another CWC station (Somanpally station). The potential correct location is more than 60 km to the southeast, by the town of Eturunagaram, close to the Godavari River. Based on the visual pattern of current locations and their potential correct locations in Fig. <xref ref-type="fig" rid="Ch1.F2"/>a, it appears that these errors could be due to typographical errors in<?pagebreak page4391?> station coordinates – the potential correct locations are generally along the latitude or the longitude passing through the current location. As such, the station coordinates from CWC are not reliable, and they individually need to be verified with other sources, such as Google Maps, before one can proceed with any analysis.</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T1" specific-use="star"><?xmltex \currentcnt{1}?><label>Table 1</label><caption><p id="d1e229">Examples of stations with spurious locations, labeled S1 through S5 in Fig. <xref ref-type="fig" rid="Ch1.F2"/>a. Catchment area is in square kilometers (<inline-formula><mml:math id="M1" display="inline"><mml:mrow class="unit"><mml:msup><mml:mi mathvariant="normal">km</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>).</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="5">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="left"/>
     <oasis:colspec colnum="4" colname="col4" align="left"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">ID</oasis:entry>
         <oasis:entry colname="col2">CWC ID</oasis:entry>
         <oasis:entry colname="col3">Site name</oasis:entry>
         <oasis:entry colname="col4">River/Basin</oasis:entry>
         <oasis:entry colname="col5">Catchment area</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">S1</oasis:entry>
         <oasis:entry colname="col2">CW1NAM001443</oasis:entry>
         <oasis:entry colname="col3">Awalighat</oasis:entry>
         <oasis:entry colname="col4">Narmada</oasis:entry>
         <oasis:entry colname="col5">45 598</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">S2</oasis:entry>
         <oasis:entry colname="col2">CW1VAM000996</oasis:entry>
         <oasis:entry colname="col3">Sirjholi</oasis:entry>
         <oasis:entry colname="col4">Sirjoli Nala</oasis:entry>
         <oasis:entry colname="col5">460</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">S3</oasis:entry>
         <oasis:entry colname="col2">CW1PRA000667</oasis:entry>
         <oasis:entry colname="col3">Eturunagaram</oasis:entry>
         <oasis:entry colname="col4">Godavari</oasis:entry>
         <oasis:entry colname="col5">270 600</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">S4</oasis:entry>
         <oasis:entry colname="col2">CW1TUL000762</oasis:entry>
         <oasis:entry colname="col3">Hoovinahole</oasis:entry>
         <oasis:entry colname="col4">Krishna</oasis:entry>
         <oasis:entry colname="col5">2585</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">S5</oasis:entry>
         <oasis:entry colname="col2">CW1PAR000447</oasis:entry>
         <oasis:entry colname="col3">Aruvipuram</oasis:entry>
         <oasis:entry colname="col4">Neyyar</oasis:entry>
         <oasis:entry colname="col5">194</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table><?xmltex \gdef\@currentlabel{1}?></table-wrap>

      <?xmltex \floatpos{t}?><fig id="Ch1.F3" specific-use="star"><?xmltex \currentcnt{3}?><?xmltex \def\figurename{Figure}?><label>Figure 3</label><caption><p id="d1e377"><bold>(a)</bold> Erroneous station location from CWC for the Eturunagaram station in the Godavari Basin. The potential correct location is at least 60 km southeast of CWC's current location and is verified using Google Maps – black square indicates a reference landmark corresponding to the town of Eturunagaram. <bold>(b)</bold> CWC's current location for the Eturunagaram station suspiciously coincides with CWC's location for the Somanpally station in the Godavari Basin. <bold>(c)</bold> The potential correct location for Eturunagaram station is on the Godavari River, by the town of Eturunagaram, matching CWC's site description of the station.</p></caption>
            <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://essd.copernicus.org/articles/15/4389/2023/essd-15-4389-2023-f03.png"/>

          </fig>

</sec>
<sec id="Ch1.S1.SS1.SSS2">
  <label>1.1.2</label><title>Spurious catchment area estimates</title>
      <p id="d1e402">Drawbacks of catchment area estimates from CWC are illustrated using examples in Tables <xref ref-type="table" rid="Ch1.T2"/>, <xref ref-type="table" rid="Ch1.T3"/> and <xref ref-type="table" rid="Ch1.T4"/>. In some cases, the catchment areas for different stations in the basin are identical – for instance, the six stations in the Cauvery Basin identified in Table <xref ref-type="table" rid="Ch1.T2"/> have an identical catchment area of 6410 <inline-formula><mml:math id="M2" display="inline"><mml:mrow class="unit"><mml:msup><mml:mi mathvariant="normal">km</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>. In some cases, the catchment area for upstream<?pagebreak page4392?> stations is higher than the catchment area for the downstream station. For instance, Simga station in the Mahanadi Basin is upstream of Jondhra and should have a smaller catchment area, but the opposite is the case: 30 761 <inline-formula><mml:math id="M3" display="inline"><mml:mrow class="unit"><mml:msup><mml:mi mathvariant="normal">km</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> at Simga vs. 29 645 <inline-formula><mml:math id="M4" display="inline"><mml:mrow class="unit"><mml:msup><mml:mi mathvariant="normal">km</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> at Jondhra (Table <xref ref-type="table" rid="Ch1.T3"/>). The catchment area estimates from this study, discussed in the subsequent sections, do not have such issues.</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T2" specific-use="star"><?xmltex \currentcnt{2}?><label>Table 2</label><caption><p id="d1e452">Catchment area estimates from CWC for select stations in the Cauvery Basin. The locations of these stations are shown as squares in Fig. <xref ref-type="fig" rid="Ch1.F2"/>b.  Catchment area is in square kilometers (<inline-formula><mml:math id="M5" display="inline"><mml:mrow class="unit"><mml:msup><mml:mi mathvariant="normal">km</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>).</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="4">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="left"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">CWC ID</oasis:entry>
         <oasis:entry colname="col2">Site name</oasis:entry>
         <oasis:entry colname="col3">River/Basin</oasis:entry>
         <oasis:entry colname="col4">Catchment area</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">CW1CAU001108</oasis:entry>
         <oasis:entry colname="col2">Beluru</oasis:entry>
         <oasis:entry colname="col3">Hemavathy</oasis:entry>
         <oasis:entry colname="col4">6410</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">CW1CAU000978</oasis:entry>
         <oasis:entry colname="col2">Bettadamane</oasis:entry>
         <oasis:entry colname="col3">Cauvery</oasis:entry>
         <oasis:entry colname="col4">6410</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">CW1CAU001188</oasis:entry>
         <oasis:entry colname="col2">Jannapura</oasis:entry>
         <oasis:entry colname="col3">Cauvery</oasis:entry>
         <oasis:entry colname="col4">6410</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">CW1CAU001271</oasis:entry>
         <oasis:entry colname="col2">Mukkkodlu</oasis:entry>
         <oasis:entry colname="col3">Cauvery</oasis:entry>
         <oasis:entry colname="col4">6410</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">CW1CAU000906</oasis:entry>
         <oasis:entry colname="col2">Napoklu</oasis:entry>
         <oasis:entry colname="col3">Cauvery</oasis:entry>
         <oasis:entry colname="col4">6410</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">CW1CAM001272</oasis:entry>
         <oasis:entry colname="col2">Thoreshettyhalli</oasis:entry>
         <oasis:entry colname="col3">Cauvery</oasis:entry>
         <oasis:entry colname="col4">6410</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table><?xmltex \gdef\@currentlabel{2}?></table-wrap>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T3" specific-use="star"><?xmltex \currentcnt{3}?><label>Table 3</label><caption><p id="d1e597">Potential erroneous catchment area in square kilometers (<inline-formula><mml:math id="M6" display="inline"><mml:mrow class="unit"><mml:msup><mml:mi mathvariant="normal">km</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>) from CWC for Simga station in the Mahanadi Basin. The corresponding values from GHI and MERIT (estimated in Sect. <xref ref-type="sec" rid="Ch1.S3"/>) are also shown. The ID for each station corresponds to its location in Fig. <xref ref-type="fig" rid="Ch1.F2"/>.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="7">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="left"/>
     <oasis:colspec colnum="4" colname="col4" align="left"/>
     <oasis:colspec colnum="5" colname="col5" align="left"/>
     <oasis:colspec colnum="6" colname="col6" align="left"/>
     <oasis:colspec colnum="7" colname="col7" align="left"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">ID</oasis:entry>
         <oasis:entry colname="col2">CWC ID</oasis:entry>
         <oasis:entry colname="col3">Site name</oasis:entry>
         <oasis:entry colname="col4">River/Basin</oasis:entry>
         <oasis:entry colname="col5">CWC</oasis:entry>
         <oasis:entry colname="col6">GHI</oasis:entry>
         <oasis:entry colname="col7">MERIT</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">X1</oasis:entry>
         <oasis:entry colname="col2">CW1MAU000247</oasis:entry>
         <oasis:entry colname="col3">Simga</oasis:entry>
         <oasis:entry colname="col4">Mahanadi</oasis:entry>
         <oasis:entry colname="col5">30 761</oasis:entry>
         <oasis:entry colname="col6">16 903</oasis:entry>
         <oasis:entry colname="col7">16 803</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">X2</oasis:entry>
         <oasis:entry colname="col2">CW1MAU000515</oasis:entry>
         <oasis:entry colname="col3">Jondhra</oasis:entry>
         <oasis:entry colname="col4">Mahanadi</oasis:entry>
         <oasis:entry colname="col5">29 645</oasis:entry>
         <oasis:entry colname="col6">29 822</oasis:entry>
         <oasis:entry colname="col7">29 620</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table><?xmltex \gdef\@currentlabel{3}?></table-wrap>

      <p id="d1e717">Another issue with the catchment areas from CWC is the extent of rounding used in the presentation of these estimates. Table <xref ref-type="table" rid="Ch1.T4"/> shows nine select stations in various basins whose catchment area varies from about 1000 <inline-formula><mml:math id="M7" display="inline"><mml:mrow class="unit"><mml:msup><mml:mi mathvariant="normal">km</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> to about 40 000 <inline-formula><mml:math id="M8" display="inline"><mml:mrow class="unit"><mml:msup><mml:mi mathvariant="normal">km</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>. Based on these catchment areas; those presented in Tables <xref ref-type="table" rid="Ch1.T1"/>, <xref ref-type="table" rid="Ch1.T2"/> and <xref ref-type="table" rid="Ch1.T3"/>; and a general examination of other CWC estimates, the rounding used by CWC appears to be arbitrary. While the rounding used by CWC can be reasonable in some cases (e.g., Jamsholaghat station, Y3), it can result in large departures from actual values (station Y7 and Y9) or be completely unreasonable (station Y1) in other cases.</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T4" specific-use="star"><?xmltex \currentcnt{4}?><label>Table 4</label><caption><p id="d1e754">Select catchment area estimates in square kilometers (<inline-formula><mml:math id="M9" display="inline"><mml:mrow class="unit"><mml:msup><mml:mi mathvariant="normal">km</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>) from CWC to showcase the rounding used by CWC. The ID for each station corresponds to its location in Fig. <xref ref-type="fig" rid="Ch1.F2"/>b.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="7">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="left"/>
     <oasis:colspec colnum="4" colname="col4" align="left"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:colspec colnum="6" colname="col6" align="right"/>
     <oasis:colspec colnum="7" colname="col7" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">ID</oasis:entry>
         <oasis:entry colname="col2">CWC ID</oasis:entry>
         <oasis:entry colname="col3">Site name</oasis:entry>
         <oasis:entry colname="col4">River/Basin</oasis:entry>
         <oasis:entry colname="col5">CWC</oasis:entry>
         <oasis:entry colname="col6">GHI</oasis:entry>
         <oasis:entry colname="col7">MERIT</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Y1</oasis:entry>
         <oasis:entry colname="col2">CW1MAM000657</oasis:entry>
         <oasis:entry colname="col3">Thettatanagar</oasis:entry>
         <oasis:entry colname="col4">Mahanadi</oasis:entry>
         <oasis:entry colname="col5">2500</oasis:entry>
         <oasis:entry colname="col6">1495</oasis:entry>
         <oasis:entry colname="col7">1461</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Y2</oasis:entry>
         <oasis:entry colname="col2">CW1NAU000327</oasis:entry>
         <oasis:entry colname="col3">Mandla</oasis:entry>
         <oasis:entry colname="col4">Narmada</oasis:entry>
         <oasis:entry colname="col5">13 000</oasis:entry>
         <oasis:entry colname="col6">12 944</oasis:entry>
         <oasis:entry colname="col7">12 919</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Y3</oasis:entry>
         <oasis:entry colname="col2">CW1SUB000352</oasis:entry>
         <oasis:entry colname="col3">Jamsholaghat</oasis:entry>
         <oasis:entry colname="col4">Subarnarekha</oasis:entry>
         <oasis:entry colname="col5">16 000</oasis:entry>
         <oasis:entry colname="col6">15 979</oasis:entry>
         <oasis:entry colname="col7">15 821</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Y4</oasis:entry>
         <oasis:entry colname="col2">CW1MHL000680</oasis:entry>
         <oasis:entry colname="col3">Pingalwada</oasis:entry>
         <oasis:entry colname="col4">Dhadhar (independent river)</oasis:entry>
         <oasis:entry colname="col5">2400</oasis:entry>
         <oasis:entry colname="col6">2566</oasis:entry>
         <oasis:entry colname="col7">2426</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Y5</oasis:entry>
         <oasis:entry colname="col2">CW1BHT000681</oasis:entry>
         <oasis:entry colname="col3">Madhuban Dam</oasis:entry>
         <oasis:entry colname="col4">Damanganga</oasis:entry>
         <oasis:entry colname="col5">1800</oasis:entry>
         <oasis:entry colname="col6">1865</oasis:entry>
         <oasis:entry colname="col7">1817</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Y6</oasis:entry>
         <oasis:entry colname="col2">CW1GDM000115</oasis:entry>
         <oasis:entry colname="col3">Purna</oasis:entry>
         <oasis:entry colname="col4">Godavari</oasis:entry>
         <oasis:entry colname="col5">15 000</oasis:entry>
         <oasis:entry colname="col6">15 555</oasis:entry>
         <oasis:entry colname="col7">15 492</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Y7</oasis:entry>
         <oasis:entry colname="col2">CW1IND000089</oasis:entry>
         <oasis:entry colname="col3">Pathagudem</oasis:entry>
         <oasis:entry colname="col4">Godavari</oasis:entry>
         <oasis:entry colname="col5">40 000</oasis:entry>
         <oasis:entry colname="col6">38 980</oasis:entry>
         <oasis:entry colname="col7">38 774</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Y8</oasis:entry>
         <oasis:entry colname="col2">CW1TUU000620</oasis:entry>
         <oasis:entry colname="col3">Byaladahalli</oasis:entry>
         <oasis:entry colname="col4">Krishna</oasis:entry>
         <oasis:entry colname="col5">2300</oasis:entry>
         <oasis:entry colname="col6">2508</oasis:entry>
         <oasis:entry colname="col7">2445</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Y9</oasis:entry>
         <oasis:entry colname="col2">CW1PAM000468</oasis:entry>
         <oasis:entry colname="col3">Theni</oasis:entry>
         <oasis:entry colname="col4">Vaigai</oasis:entry>
         <oasis:entry colname="col5">1200</oasis:entry>
         <oasis:entry colname="col6">1364</oasis:entry>
         <oasis:entry colname="col7">1319</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table><?xmltex \gdef\@currentlabel{4}?></table-wrap>

</sec>
<sec id="Ch1.S1.SS1.SSS3">
  <label>1.1.3</label><title>Other issues</title>
      <?pagebreak page4393?><p id="d1e1055">As mentioned earlier, daily streamflow data are available through WRIS only for stations in Peninsular India <xref ref-type="bibr" rid="bib1.bibx32" id="paren.12"/>. Such data can have missing information. While seasonal stations have data available only for a portion of the year, perennial stations can sometimes have a number of days with missing observations (i.e., blanks in the raw data). Ignoring such values when estimating monthly or annual statistics could lead to the underestimation of aggregate streamflow and the potential misrepresentation of the regional water balance.</p>
      <p id="d1e1061">Daily, monthly and annual streamflow data from WRIS-OL can be downloaded as spreadsheets. Within these spreadsheets, river gauging station names, parent river and parent river basin information, and other relevant details are provided. However, there is no use of the station identification codes developed by CWC. It appears that WRIS contains streamflow data not only corresponding to CWC stations but also to stations from other agencies (such as state and regional agencies). As such, the user needs to manually match the WRIS stations to the CWC stations based on available station description information. Given that WRIS contains duplicates and sometimes conflicting information on some stations, the user is burdened with inferring the streamflow data corresponding the desired station, checking for missing values and filling in missing values. There appears to be at least a few years of latency in the data provided by WRIS. Data for the current season are not available. The individual values within the downloaded spreadsheets do not have any quality control flags associated with them – such as those indicating overbank flow, gauge malfunction, outlier data, etc.</p>
</sec>
</sec>
<sec id="Ch1.S1.SS2">
  <label>1.2</label><title>Motivation for this study</title>
      <p id="d1e1073">In the presence of the abovementioned limitations, the analysts have to rely on their individual ability to clean the data and compile the needed information. The lack of reliable metadata on river gauges and catchment boundaries affects the compilation and subsequent analysis of catchment-scale hydrometeorological variables. India's river basins have witnessed a rise in average temperature, a decrease in monsoon precipitation and a rise in droughts, along with a number of other hydroclimatological changes <xref ref-type="bibr" rid="bib1.bibx13" id="paren.13"/>. In order to better understand and quantify such changes, the research community needs robust hydrometeorological datasets across all river basins of India. Reliable hydrographic data are a fundamental building block of such datasets. Increasingly available satellite and other high-resolution data products can be leveraged to build the needed datasets. Moreover, high-resolution river discharge measurements from the new Surface Water and Ocean Topography mission (SWOT, <uri>https://swot.jpl.nasa.gov/</uri>, last access: 1 February 2023) can be reconciled with historical information only when there are robust hydrographic data. Leveraging the state-of-the-art remote-sensing data to build the needed hydrographic datasets is the motivation behind this study.</p>
      <p id="d1e1082">The National Hydrography Dataset (NHD, <uri>https://www.usgs.gov/national-hydrography</uri>, last access: 1 February 2023), from the US Geological Survey (USGS) is an excellent example of a robust hydrography dataset. It is not feasible for one individual to build such a dataset for India's river basins; instead, this requires agency-level or community-wide effort. Based on the recent reports available from WRIS, it appears that WRIS is progressing towards creating such a dataset. In the interim, hydrologists and other analysts still need a reliable dataset on India's river gauging stations, catchment boundaries and other relevant data. By making use of the widely used Hydrological data and maps based on Shuttle Elevation Derivatives at multiple Scales (HydroSHEDS) dataset as the underlying template, this study leverages the valuable and abundant data resources<?pagebreak page4394?> available via HydroSHEDS. The specific goal of this study is to build a quality-controlled analysis-ready dataset from publicly available resources for use in hydrologic analyses while highlighting the major limitations of existing datasets.</p>
      <p id="d1e1088">The remainder of this paper is organized as follows: in Sect. <xref ref-type="sec" rid="Ch1.S2"/>, the data sources used in this study are discussed; in Sect. <xref ref-type="sec" rid="Ch1.S3"/>, the methodology used to differentiate the more reliable gauging stations from the less reliable stations is described; the final product from this study, called the Geospatial dataset for Hydrologic analyses in India (GHI), is described in Sect. <xref ref-type="sec" rid="Ch1.S4"/>; in Sect. <xref ref-type="sec" rid="Ch1.S5"/>, a preliminary analysis of GHI's time series of data is discussed; Sect. <xref ref-type="sec" rid="Ch1.S6"/> provides a discussion of the limitation of GHI and potential next steps; and the conclusions from this study are presented in Sect. <xref ref-type="sec" rid="Ch1.S8"/>.</p>
</sec>
</sec>
<sec id="Ch1.S2">
  <label>2</label><title>Data</title>
      <p id="d1e1113">A number of datasets were used in this study; they are described in this section and outlined in Table <xref ref-type="table" rid="Ch1.T5"/>. These datasets pertain to metadata on river gauging stations, publicly available hydrography products, online maps, and observed or modeled hydrometeorological data. The two main agencies providing metadata on India's river gauging stations and streamflow data are CWC and WRIS. Several reports from these agencies were used in this study, and the following notation is used to distinguish between the various reports from these agencies: CWC-21 refers to a list of the latest active streamflow gauging stations as of 2021 <xref ref-type="bibr" rid="bib1.bibx3" id="paren.14"/>; CWC-19 is a water resources assessment by CWC and WRIS conducted in 2019 <xref ref-type="bibr" rid="bib1.bibx2" id="paren.15"/>; CWC-YB refers to annual yearbooks published by CWC that contain statistical summaries on select streamflow gauging stations <xref ref-type="bibr" rid="bib1.bibx4" id="paren.16"/>; WRIS-GIS denotes GIS data on major river basin boundaries used by WRIS, obtained from WRIS via data request <xref ref-type="bibr" rid="bib1.bibx31" id="paren.17"/>; WRIS-OL refers to the online system from WRIS to disseminate streamflow and other hydrological data <xref ref-type="bibr" rid="bib1.bibx32" id="paren.18"/>; and WRIS-BR represents basin reports published by WRIS in 2014 <xref ref-type="bibr" rid="bib1.bibx30" id="paren.19"/>.</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T5" specific-use="star"><?xmltex \currentcnt{5}?><label>Table 5</label><caption><p id="d1e1140">Datasets used in this study and relevant information.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="5">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="justify" colwidth="3cm"/>
     <oasis:colspec colnum="4" colname="col4" align="justify" colwidth="4cm"/>
     <oasis:colspec colnum="5" colname="col5" align="left"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Data type</oasis:entry>
         <oasis:entry colname="col2">Raw format</oasis:entry>
         <oasis:entry colname="col3">Source</oasis:entry>
         <oasis:entry colname="col4">Purpose</oasis:entry>
         <oasis:entry colname="col5">Vintage</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Station metadata</oasis:entry>
         <oasis:entry colname="col2">PDF<inline-formula><mml:math id="M13" display="inline"><mml:msup><mml:mi/><mml:mi mathvariant="normal">a</mml:mi></mml:msup></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">CWC-21</oasis:entry>
         <oasis:entry colname="col4">Used within GHI</oasis:entry>
         <oasis:entry colname="col5">Sep 2021</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Station metadata</oasis:entry>
         <oasis:entry colname="col2">Spreadsheet</oasis:entry>
         <oasis:entry colname="col3">WRIS-OL</oasis:entry>
         <oasis:entry colname="col4">Comparison with GHI and CWC</oasis:entry>
         <oasis:entry colname="col5">Jul 2022</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Hydrography</oasis:entry>
         <oasis:entry colname="col2">GIS<inline-formula><mml:math id="M14" display="inline"><mml:msup><mml:mi/><mml:mi mathvariant="normal">b</mml:mi></mml:msup></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">HydroSHEDS</oasis:entry>
         <oasis:entry colname="col4">Used within GHI</oasis:entry>
         <oasis:entry colname="col5">2020</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Hydrography</oasis:entry>
         <oasis:entry colname="col2">GIS<inline-formula><mml:math id="M15" display="inline"><mml:msup><mml:mi/><mml:mi mathvariant="normal">b</mml:mi></mml:msup></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">MERIT</oasis:entry>
         <oasis:entry colname="col4">Comparison with GHI and CWC</oasis:entry>
         <oasis:entry colname="col5">Jul 2022</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Composite river  basin boundaries</oasis:entry>
         <oasis:entry colname="col2">GIS<inline-formula><mml:math id="M16" display="inline"><mml:msup><mml:mi/><mml:mi mathvariant="normal">b</mml:mi></mml:msup></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">WRIS-GIS</oasis:entry>
         <oasis:entry colname="col4">Comparison with GHI</oasis:entry>
         <oasis:entry colname="col5">Dec 2021</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">River network</oasis:entry>
         <oasis:entry colname="col2">GIS<inline-formula><mml:math id="M17" display="inline"><mml:msup><mml:mi/><mml:mi mathvariant="normal">b</mml:mi></mml:msup></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">
                    <xref ref-type="bibr" rid="bib1.bibx16" id="text.20"/>
                  </oasis:entry>
         <oasis:entry colname="col4">Comparison with GHI</oasis:entry>
         <oasis:entry colname="col5">Dec 2021</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Online maps</oasis:entry>
         <oasis:entry colname="col2">–</oasis:entry>
         <oasis:entry colname="col3">Google Maps and  OpenStreetMap</oasis:entry>
         <oasis:entry colname="col4">Verification of station <?xmltex \hack{\hfill\break}?>locations and river networks</oasis:entry>
         <oasis:entry colname="col5">2022</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Observed streamflow</oasis:entry>
         <oasis:entry colname="col2">Spreadsheet</oasis:entry>
         <oasis:entry colname="col3">WRIS-OL</oasis:entry>
         <oasis:entry colname="col4">Used within GHI</oasis:entry>
         <oasis:entry colname="col5">Aug 2022</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Observed streamflow</oasis:entry>
         <oasis:entry colname="col2">PDF<inline-formula><mml:math id="M18" display="inline"><mml:msup><mml:mi/><mml:mi mathvariant="normal">a</mml:mi></mml:msup></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">CWC-YB</oasis:entry>
         <oasis:entry colname="col4">Comparison with WRIS-OL</oasis:entry>
         <oasis:entry colname="col5">Dec 2020</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Precipitation</oasis:entry>
         <oasis:entry colname="col2">NetCDF<inline-formula><mml:math id="M19" display="inline"><mml:msup><mml:mi/><mml:mi mathvariant="normal">c</mml:mi></mml:msup></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">IMD</oasis:entry>
         <oasis:entry colname="col4">Used within GHI</oasis:entry>
         <oasis:entry colname="col5">May 2022</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Modeled estimates  (Precip., ET and runoff)</oasis:entry>
         <oasis:entry colname="col2">NetCDF<inline-formula><mml:math id="M20" display="inline"><mml:msup><mml:mi/><mml:mi mathvariant="normal">c</mml:mi></mml:msup></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">ERA5-Land</oasis:entry>
         <oasis:entry colname="col4">Used within GHI</oasis:entry>
         <oasis:entry colname="col5">Oct 2022</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">ET</oasis:entry>
         <oasis:entry colname="col2">NetCDF<inline-formula><mml:math id="M21" display="inline"><mml:msup><mml:mi/><mml:mi mathvariant="normal">c</mml:mi></mml:msup></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">GLEAM</oasis:entry>
         <oasis:entry colname="col4">Used within GHI</oasis:entry>
         <oasis:entry colname="col5">Jun 2022</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table><table-wrap-foot><p id="d1e1143"><inline-formula><mml:math id="M10" display="inline"><mml:msup><mml:mi/><mml:mi mathvariant="normal">a</mml:mi></mml:msup></mml:math></inline-formula> PDF is portable document format; <inline-formula><mml:math id="M11" display="inline"><mml:msup><mml:mi/><mml:mi mathvariant="normal">b</mml:mi></mml:msup></mml:math></inline-formula> GIS includes a variety of formats such as shapefiles, raster files and geodatabases; <inline-formula><mml:math id="M12" display="inline"><mml:msup><mml:mi/><mml:mi mathvariant="normal">c</mml:mi></mml:msup></mml:math></inline-formula> NetCDF is a file format often used to store and share gridded meteorological data.</p></table-wrap-foot><?xmltex \gdef\@currentlabel{5}?></table-wrap>

<?xmltex \hack{\newpage}?>
<sec id="Ch1.S2.SS1">
  <label>2.1</label><title>Station metadata</title>
<sec id="Ch1.S2.SS1.SSS1">
  <label>2.1.1</label><title>CWC</title>
      <p id="d1e1524">India's river gauging stations are maintained by CWC as well as regional agencies <xref ref-type="bibr" rid="bib1.bibx1" id="paren.21"/>. However, metadata on these gauging stations are limited to stations maintained by CWC and are available via a number of reports from CWC and WRIS. For instance, the annual yearbooks published by CWC (CWC-YB) contain information on station location and also streamflow measurements for select stations. The basin reports published by WRIS (WRIS-BR) contain maps of river basins along with the location of select gauging stations. However, the underlying GIS data on basin boundaries and other relevant information are not available from WRIS (personal communication, 4 January 2022). The only publication from CWC that contains the latest information on all of its active river gauging stations is in the form of a PDF file from 2021 (CWC-21). This PDF file was used in this study as the definitive source of station metadata from CWC.</p>
</sec>
<sec id="Ch1.S2.SS1.SSS2">
  <label>2.1.2</label><title>WRIS</title>
      <p id="d1e1538">The online WRIS portal (WRIS-OL) provides station coordinates within the downloadable streamflow data files. A snippet of this data is shown in Fig. <xref ref-type="fig" rid="App1.Ch1.S1.F14"/> in Appendix <xref ref-type="sec" rid="App1.Ch1.S1"/>. A preliminary analysis indicated that station coordinates from WRIS-OL matched those from CWC-21 in some instances, but there were many instances where blatant errors were found, such as coordinates falling in the Southern Hemisphere or a location with the same name having differing coordinate values. For these reasons, station coordinates from WRIS-OL were not used in this study.</p>
</sec>
</sec>
<sec id="Ch1.S2.SS2">
  <label>2.2</label><title>Hydrography</title>
<sec id="Ch1.S2.SS2.SSS1">
  <label>2.2.1</label><title>Major river basin boundaries</title>
      <p id="d1e1561">CWC and other water agencies subdivide the entire country into major basins for administrative and data<?pagebreak page4395?> management purposes. Due to the presence of river deltas, coastal basins and islands and due to national boundaries not always following topography, river basins are conveniently lumped to create “composite” river basins. CWC often uses such composite basins to cross-reference its data. WRIS-OL also uses similar composite basins from CWC to reference the stations associated with its streamflow data. However, other water agencies of India appear to have developed their own composite basins (<uri>https://indiawris.gov.in/wiki/doku.php?id=river_basins</uri>, last access: 1 September 2022). This study also uses such composite basins for ease of illustration and cross-referencing (see Sect. <xref ref-type="sec" rid="Ch1.S3.SS1"/>).</p>
</sec>
<sec id="Ch1.S2.SS2.SSS2">
  <label>2.2.2</label><title>HydroSHEDS</title>
      <p id="d1e1577">The Hydrological data and maps based on Shuttle Elevation Derivatives at multiple Scales (HydroSHEDS) dataset <xref ref-type="bibr" rid="bib1.bibx14 bib1.bibx15" id="paren.22"/> is one of the few high-resolution, global, quality-controlled and analysis-ready datasets currently available to the scientific community. It has been widely used by a number of studies worldwide and is continually being improved. HydroSHEDS version 1, used in the creation of GHI, is derived primarily from Shuttle Radar Topography Mission (SRTM) elevation data at a 3 arcsec resolution (about 90 m at the Equator).</p>
      <p id="d1e1583">The suite of products from HydroSHEDS comprises the basis of the geospatial information within GHI. The core products from HydroSHEDS are raster datasets, such as the digital elevation model (DEM) and derived flow direction. The secondary products from HydroSHEDS are derived from the core products and include vector products, such as watershed boundaries (HydroBASINS) and river networks (HydroRIVERS). This study makes extensive use of HydroBASINS and HydroRIVERS. HydroBASINS is a series of vectorized polygon layers on watershed boundaries nested within larger river basins. Within HydroBASINS, the Pfafstetter coding system is used to represent the hierarchical nesting of watersheds from level 1 (PF-1, large river basins) through level 12 (PF-12, smallest watersheds). A sample graphic illustrating watersheds at various PF levels is given in Appendix A (Fig. <xref ref-type="fig" rid="App1.Ch1.S1.F15"/>). HydroRIVERS is a vectorized line network of river paths in which the upstream catchment area is at least 10 <inline-formula><mml:math id="M22" display="inline"><mml:mrow class="unit"><mml:msup><mml:mi mathvariant="normal">km</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> or in which the estimated average streamflow is at least 0.1 <inline-formula><mml:math id="M23" display="inline"><mml:mrow class="unit"><mml:msup><mml:mi mathvariant="normal">m</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msup><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi mathvariant="normal">s</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>, or both. Both HydroBASINS and HydroRIVERS are derived from the core HydroSHEDS products after aggregating to a 15 arcsec resolution (about 500 m at the Equator). Thus, the underlying resolution for spatial products within GHI is 500 m.</p>
</sec>
<sec id="Ch1.S2.SS2.SSS3">
  <label>2.2.3</label><title>MERIT</title>
      <?pagebreak page4396?><p id="d1e1627">While several products similar to HydroSHEDS are available, including Hydrologic Derivatives for Modeling and Analysis (HDMA) <xref ref-type="bibr" rid="bib1.bibx28" id="paren.23"/>, Multi-Error-Removed Improved-Terrain Hydro (MERIT) <xref ref-type="bibr" rid="bib1.bibx33" id="paren.24"/>, river networks <xref ref-type="bibr" rid="bib1.bibx34" id="paren.25"/> and river drainage density <xref ref-type="bibr" rid="bib1.bibx16" id="paren.26"/>, such products neither have the breadth of data nor the widespread usage that HydroSHEDS currently has. MERIT-based products include relatively fewer sub-datasets, but these products are growing. MERIT was developed at a 3 arcsec resolution (about 90 m at the Equator) after correcting for errors using multiple satellite datasets and filtering techniques. While upstream catchment area is readily available from MERIT, river networks are not. However, <xref ref-type="bibr" rid="bib1.bibx16" id="text.27"/> used MERIT to delineate high-resolution river networks at a global scale. This river network from <xref ref-type="bibr" rid="bib1.bibx16" id="text.28"/> was used in this study as a proxy for river network data from MERIT. The information from MERIT serves as an independent check on the HydroSHEDS data.</p>
</sec>
</sec>
<sec id="Ch1.S2.SS3">
  <label>2.3</label><title>Online maps</title>
      <p id="d1e1658">Landmarks such as rivers, bridges and highways are often used as reference points by CWC to describe the location of its gauging stations. As publicly available maps from Google (Google Maps) and OpenStreetMap (OSM) contain such landmarks, they can be used to verify CWC's station description. The names of nearby towns and cities are also available from these sources and serve as reference landmarks. Such landmarks can be independently verified by users and are useful in the validation of station coordinates and other metadata. Google Maps products typically contain more information on cities and towns than OSM. However, OSM typically has more information than Google Maps on rivers and other waterbodies. An example graphic illustrating some of these differences between Google Maps and OSM is shown in Fig. <xref ref-type="fig" rid="App1.Ch1.S1.F16"/> in Appendix <xref ref-type="sec" rid="App1.Ch1.S1"/>. The publicly available QGIS software (<uri>https://qgis.org/en/site/</uri>, last access: 1 January 2022) has plug-ins for both Google Maps and OSM. QGIS was used throughout this study for GIS analyses, and the plug-ins for online maps were used wherever needed.</p>
</sec>
<sec id="Ch1.S2.SS4">
  <label>2.4</label><title>Streamflow</title>
<sec id="Ch1.S2.SS4.SSS1">
  <label>2.4.1</label><title>WRIS</title>
      <p id="d1e1683">Daily runoff data from WRIS-OL was used to estimate monthly and annual runoff as needed. WRIS provides daily river stage and/or daily river flow information for gauging stations. WRIS enforces a limit of 1 calendar year's worth of data for each download. Thus, the user has to go through the tedious process of downloading 1 year at a time. Such data were downloaded for more than 70 years for all available stations. Only daily river flow data were used here. River stage information was not used because appropriate stage–discharge relationships were not readily available. A sample of raw data from WRIS is shown in Fig. <xref ref-type="fig" rid="App1.Ch1.S1.F14"/> in Appendix <xref ref-type="sec" rid="App1.Ch1.S1"/>.</p>
</sec>
<sec id="Ch1.S2.SS4.SSS2">
  <label>2.4.2</label><title>Other sources</title>
      <p id="d1e1698">Streamflow data are available through global databases but were not used in this study because there is only limited information on the quality of such datasets. Only a brief description is provided here to make the reader aware of their existence. GRDC contains legacy streamflow data for only a handful of stations in India. Moreover, usage of data from the GRDC is permitted only after approval of a written request. It is also not known which specific Indian agency (CWC or other) provided data on India's gauging stations to GRDC nor the quality of such data. RivDIS contains monthly discharge data for 1018 stations around the world, including some in the GHI domain. However, this database has not been updated in more than 20 years. Moreover, the source of the streamflow data for India's gauging stations and the quality of such data are unknown. For these reasons, data from neither GRDC nor RivDIS were used in this study. GSIM contains streamflow data at gauging stations, among other metrics for use in hydrologic analyses. GSIM's streamflow data for India are based solely on information available from WRIS from around the year 2017. Thus, GSIM does not contain any information that is not already available for India's river basins and was, thus, not used in this study.</p>
</sec>
</sec>
<sec id="Ch1.S2.SS5">
  <label>2.5</label><title>Hydrometeorological data</title>
      <p id="d1e1710">Long-term hydrometeorological products spanning the entire study domain, often cited in scientific literature as reasonable products for use in hydrologic analyses, and currently being maintained and/or developed were chosen for this study. The gridded precipitation dataset from the Indian Meteorological Department (IMD) <xref ref-type="bibr" rid="bib1.bibx23" id="paren.29"/> has become the benchmark precipitation dataset and has been used by a number of studies <xref ref-type="bibr" rid="bib1.bibx24 bib1.bibx25 bib1.bibx27" id="paren.30"><named-content content-type="pre">e.g.,</named-content></xref>. <xref ref-type="bibr" rid="bib1.bibx18" id="text.31"/> evaluated ERA5 products (ERA5 is the predecessor of ERA5-Land) and found that ERA5 is superior to other reanalysis products analyzed in their study and is suitable for hydrologic assessments over India. <xref ref-type="bibr" rid="bib1.bibx9" id="text.32"/> used ET from ERA5-Land and the Global Land Evaporation Amsterdam Model (GLEAM), among other products, for water resources assessment in the Godavari and Krishna river basins. This study specifically uses data from IMD, ERA5-Land and GLEAM. Sample precipitation and ET data for water year (WY) 2020 are shown in Fig. <xref ref-type="fig" rid="App1.Ch1.S1.F17"/> in Appendix <xref ref-type="sec" rid="App1.Ch1.S1"/>.</p>
<sec id="Ch1.S2.SS5.SSS1">
  <label>2.5.1</label><title>IMD</title>
      <p id="d1e1739">The IMD dataset used here is the monthly total precipitation for the period of 1950–2020 on a 0.25<inline-formula><mml:math id="M24" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula> latitude–longitude grid (about 25 km at the Equator).</p>
</sec>
<sec id="Ch1.S2.SS5.SSS2">
  <label>2.5.2</label><title>ERA5-Land</title>
      <p id="d1e1759">ERA5-Land is based on the land component of the European Centre for Medium-Range Weather Forecasts (ECMWF) ERA5 climate reanalysis <xref ref-type="bibr" rid="bib1.bibx21" id="paren.33"/>. The dataset used here is the post-processed monthly data on a 0.10<inline-formula><mml:math id="M25" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula> latitude–longitude grid (about 10 km at the Equator) for the period of 1950–2020. In the remainder of this paper, ERA5-Land is referred to as ERA unless otherwise specified.</p>
</sec>
<sec id="Ch1.S2.SS5.SSS3">
  <label>2.5.3</label><title>GLEAM</title>
      <p id="d1e1782">The Global Land Evaporation Amsterdam Model (GLEAM) is a set of algorithms that separately estimate the different<?pagebreak page4397?> components of ET, including transpiration, bare-soil evaporation, interception loss, open-water evaporation and sublimation <xref ref-type="bibr" rid="bib1.bibx19 bib1.bibx20" id="paren.34"/>. The dataset used here is the GLEAM v3.6a global dataset (<uri>https://www.gleam.eu/</uri>, last access: 1 July 2022) which provides monthly total ET for the period of 1980–2020 on a 0.25<inline-formula><mml:math id="M26" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula> latitude–longitude grid (about 25 km at the Equator).</p>
</sec>
</sec>
</sec>
<sec id="Ch1.S3">
  <label>3</label><title>Methodology</title>
      <p id="d1e1810">GHI contains both geospatial and time series of data. The geospatial component of GHI includes three layers, and the time series component includes annual and monthly observations or modeled estimates of hydrometeorological variables such as precipitation, ET and runoff. The end product includes the above information in typical data-sharing formats, such as shapefiles and plain-text files. An overview of the input datasets used within GHI, the methods applied to prepare the outputs, the specific quality control measures used during the preparation and the final outputs from GHI are outlined in Fig. <xref ref-type="fig" rid="Ch1.F4"/>a. The specific sequence of steps taken to get to the final GHI product is outlined in Fig. <xref ref-type="fig" rid="Ch1.F4"/>b. The remainder of this section describes the creation of the different components of GHI.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F4" specific-use="star"><?xmltex \currentcnt{4}?><?xmltex \def\figurename{Figure}?><label>Figure 4</label><caption><p id="d1e1819">Overview of GHI: <bold>(a)</bold> input datasets used within GHI, the specific quality control measures used to compile the needed outputs and the final outputs from GHI; <bold>(b)</bold> the sequence of steps followed within GHI to create the final GHI outputs.</p></caption>
        <?xmltex \igopts{width=398.338583pt}?><graphic xlink:href="https://essd.copernicus.org/articles/15/4389/2023/essd-15-4389-2023-f04.png"/>

      </fig>

<sec id="Ch1.S3.SS1">
  <label>3.1</label><title>Composite basins</title>
      <p id="d1e1841">The study domain comprises only those river basins of India where daily streamflow data are publicly available (Fig. <xref ref-type="fig" rid="Ch1.F1"/>). For ease of analysis and to be consistent with WRIS-GIS, the study domain was separated into composite river basins. Such composite basins serve as a regional spatial reference. Every gauging station within GHI is tagged with the parent composite basin information. Starting with PF-12 watersheds from HydroSHEDS, composite basins were manually delineated using the QGIS software. It was ensured that the boundaries of these basins followed topographic divides, and the resulting basins were consistent with those from WRIS-GIS to the extent possible. The study domain was grouped into 15 composite hydrologic basins (Fig. <xref ref-type="fig" rid="Ch1.F1"/>). The names of these composite basins are generally consistent with those from WRIS-GIS, but they are more consistent with those from CWC-19. Most composite basins are contiguous and are shown in gray shading in Fig. <xref ref-type="fig" rid="Ch1.F1"/>. Basins that are not contiguous are shaded; these include the East-Flowing Rivers North Basin (EFR North, dark blue) and the East-Flowing Rivers South Basin (EFR South, light blue).</p>
      <p id="d1e1850">The catchment area of these GHI composite basins is compared with equivalent basins from other sources in Table <xref ref-type="table" rid="Ch1.T6"/>. The categorization of basins is not the same across sources; hence, a new category (West-Flowing Rivers, or WFR, North and South) was created to facilitate comparison. For ease of comparison, the different sources are compared against WRIS-GIS. The discrepancies between the different sources are typically less than 5 %. GHI-estimated catchment areas are much closer to CWC-19 than to WRIS-GIS. Discrepancies between the various estimates are attributed to differences in the underlying topographic data's spatial resolution, land vs. ocean demarcation, used within the data and GIS projection system and/or coordinate system used.</p>
      <p id="d1e1855">Within GIS analyses, it is customary to choose an appropriate coordinate projection system that determines how three-dimensional landscapes can be projected onto two dimensions for analysis and visualization. Estimation of area on the surface of the Earth, such as catchment area, is affected by the choice of the projection system or coordinate system and is based on a number of factors (e.g., <uri>https://pro.arcgis.com/en/pro-app/latest/help/mapping/properties/coordinate-systems-and-projections.htm</uri>, last access: 1 September 2022). While the basin boundaries from WRIS-GIS are in the Lambert conformal conic projection (<uri>https://pro.arcgis.com/en/pro-app/latest/help/mapping/properties/lambert-conformal-conic.htm</uri>, last access: 1 September 2022), it is not known whether this is the best or recommended projection system for India's river basins. Data sources such as MERIT estimate catchment areas using the World Geodetic System 1984 (WGS84) coordinate reference system (<uri>https://pro.arcgis.com/en/pro-app/latest/help/mapping/properties/specify-a-coordinate-system.htm</uri>, last access: 1 September 2022), without using a specific projection system. Gridded meteorological data often use the WGS84 coordinate reference system as well. For the sake of convenience, this study also uses the WGS84 coordinate reference system.</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T6" specific-use="star"><?xmltex \currentcnt{6}?><label>Table 6</label><caption><p id="d1e1871">Summary of catchment area in square kilometers (<inline-formula><mml:math id="M27" display="inline"><mml:mrow class="unit"><mml:msup><mml:mi mathvariant="normal">km</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>) by composite basin from various sources. The numbers in parentheses indicate the percentage deviation from WRIS-GIS values.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="4">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Composite basin</oasis:entry>
         <oasis:entry colname="col2">WRIS-GIS</oasis:entry>
         <oasis:entry colname="col3">CWC-19</oasis:entry>
         <oasis:entry colname="col4">GHI</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Brahmani–Baitarani</oasis:entry>
         <oasis:entry colname="col2">51 897</oasis:entry>
         <oasis:entry colname="col3">53 902 (4 %)</oasis:entry>
         <oasis:entry colname="col4">54 890 (6 %)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Cauvery</oasis:entry>
         <oasis:entry colname="col2">85 626</oasis:entry>
         <oasis:entry colname="col3">85 167 (<inline-formula><mml:math id="M28" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> %)</oasis:entry>
         <oasis:entry colname="col4">85 361 (0 %)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">EFR North</oasis:entry>
         <oasis:entry colname="col2">79 920</oasis:entry>
         <oasis:entry colname="col3">82 073 (3 %)</oasis:entry>
         <oasis:entry colname="col4">82 680 (3 %)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">EFR South</oasis:entry>
         <oasis:entry colname="col2">102 288</oasis:entry>
         <oasis:entry colname="col3">101 657 (<inline-formula><mml:math id="M29" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> %)</oasis:entry>
         <oasis:entry colname="col4">102 082 (0 %)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Godavari</oasis:entry>
         <oasis:entry colname="col2">302 063</oasis:entry>
         <oasis:entry colname="col3">312 150 (3 %)</oasis:entry>
         <oasis:entry colname="col4">311 732 (3 %)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Krishna</oasis:entry>
         <oasis:entry colname="col2">254 750</oasis:entry>
         <oasis:entry colname="col3">259 439 (2 %)</oasis:entry>
         <oasis:entry colname="col4">260 565 (2 %)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Mahanadi</oasis:entry>
         <oasis:entry colname="col2">139 651</oasis:entry>
         <oasis:entry colname="col3">144 905 (4 %)</oasis:entry>
         <oasis:entry colname="col4">144 888 (4 %)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Mahi</oasis:entry>
         <oasis:entry colname="col2">38 052</oasis:entry>
         <oasis:entry colname="col3">39 566 (4 %)</oasis:entry>
         <oasis:entry colname="col4">40 226 (6 %)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Narmada</oasis:entry>
         <oasis:entry colname="col2">93 494</oasis:entry>
         <oasis:entry colname="col3">96 660 (3 %)</oasis:entry>
         <oasis:entry colname="col4">97 476 (4 %)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Pennar</oasis:entry>
         <oasis:entry colname="col2">54 243</oasis:entry>
         <oasis:entry colname="col3">54 905 (1 %)</oasis:entry>
         <oasis:entry colname="col4">55 270 (2 %)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Sabarmati</oasis:entry>
         <oasis:entry colname="col2">30 679</oasis:entry>
         <oasis:entry colname="col3">31 901 (4 %)</oasis:entry>
         <oasis:entry colname="col4">31 642 (3 %)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Subernarekha</oasis:entry>
         <oasis:entry colname="col2">25 792</oasis:entry>
         <oasis:entry colname="col3">26 804 (4 %)</oasis:entry>
         <oasis:entry colname="col4">26 844 (4 %)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Tapi</oasis:entry>
         <oasis:entry colname="col2">63 432</oasis:entry>
         <oasis:entry colname="col3">65 806 (4 %)</oasis:entry>
         <oasis:entry colname="col4">66 268 (4 %)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">WFR North and South</oasis:entry>
         <oasis:entry colname="col2">111 629</oasis:entry>
         <oasis:entry colname="col3">112 591 (1 %)</oasis:entry>
         <oasis:entry colname="col4">113 792 (2 %)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"><?xmltex \hack{\hspace{5mm}}?> WFR North</oasis:entry>
         <oasis:entry colname="col2">NA</oasis:entry>
         <oasis:entry colname="col3">58 360</oasis:entry>
         <oasis:entry colname="col4">58 923</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"><?xmltex \hack{\hspace{5mm}}?> WFR South</oasis:entry>
         <oasis:entry colname="col2">NA</oasis:entry>
         <oasis:entry colname="col3">54 231</oasis:entry>
         <oasis:entry colname="col4">54 869</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Total</oasis:entry>
         <oasis:entry colname="col2">1 433 516</oasis:entry>
         <oasis:entry colname="col3">1 467 526 (2 %)</oasis:entry>
         <oasis:entry colname="col4">1 473 716 (3 %)</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table><table-wrap-foot><p id="d1e1885">NA denotes that catchment area information was not available.</p></table-wrap-foot><?xmltex \gdef\@currentlabel{6}?></table-wrap>

</sec>
<sec id="Ch1.S3.SS2">
  <label>3.2</label><title>Gauging station verification and landmark identification</title>
      <p id="d1e2211">The second geospatial layer in GHI contains metadata on river gauging stations. Station-specific metadata – latitude, longitude, site name and river name – were used to verify the location of each CWC station within Google Maps and/or OSM. CWC's stations are typically named after nearby towns, cities or other landmarks. A reference landmark was identified for each station whenever possible, and the coordinates of the landmark are included within GHI. Such landmarks provide a definitive reference location which can be reverified by other users and/or repositioned if needed. The distance between CWC's station location and corresponding GHI landmark as well as the direction of the landmark relative to the CWC station are also noted.</p>
      <p id="d1e2214">An example of the abovementioned site verification and landmark identification is presented in Fig. <xref ref-type="fig" rid="Ch1.F5"/> for the Neeleeswaram station on Periyar River in the WFR South Basin. The CWC location is shown as a red circle. The town of Neeleeswaram is about 4 km to the west of the CWC location and is chosen as the landmark (black square) for this station. The CWC locations do not fall on the pixel-based river network from HydroSHEDS nor MERIT due to the approximate nature of these networks. For feasibility of catchment<?pagebreak page4398?> area delineation and other subsequent analyses, the original station location needs to be relocated onto these river networks such that the relocated station is in the middle of a HydroSHEDS (or MERIT) pixel. The relocated station is shown by the blue circle on the river network from HydroSHEDS (Fig. <xref ref-type="fig" rid="Ch1.F5"/>a) and by the green circle on the river network from MERIT (Fig. <xref ref-type="fig" rid="Ch1.F5"/>b).</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F5" specific-use="star"><?xmltex \currentcnt{5}?><?xmltex \def\figurename{Figure}?><label>Figure 5</label><caption><p id="d1e2225">Example showing GHI site verification and landmark identification for Neeleeswaram station on Periyar River. <bold>(a)</bold> CWC's station (red circle) is on the right of the graphic. The center of the nearest HydroSHEDS pixel on the GHI river network is the relocated location (blue circle). The landmark is with reference to the CWC location and is on the left (black square). Panel <bold>(b)</bold> is the same as panel <bold>(a)</bold> but for the MERIT network. The text box below panel <bold>(c)</bold> shows metadata on landmarks and relocated locations included within GHI.</p></caption>
          <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://essd.copernicus.org/articles/15/4389/2023/essd-15-4389-2023-f05.png"/>

        </fig>

</sec>
<?pagebreak page4399?><sec id="Ch1.S3.SS3">
  <label>3.3</label><title>Catchment boundary delineation</title>
      <p id="d1e2254">The third geospatial layer in GHI contains station-specific catchment boundaries and river networks. Catchment boundaries were derived using the PF-12 watersheds from HydroSHEDS. Using information on the upstream watersheds associated with each PF-12 watershed, all of the PF-12 watersheds upstream of the relocated station were recursively identified. The polygons corresponding to the most downstream PF-12 watershed and all of identified upstream watersheds were merged (or “dissolved” in GIS jargon) to create a catchment boundary topographically consistent with HydroSHEDS. At this juncture, the delineated boundary includes the most downstream PF-12 watershed in its entirety. However, the portion downstream of the relocated station is not needed. Using pixel-specific flow direction information from HydroSHEDS, only the portion contributing to flow at the relocated location was extracted and the remainder was discarded. The boundary delineation procedure just described is conceptually similar to standard procedures available within GIS software. The specific procedure used here ensures that the final delineated boundary is consistent with HydroSHEDS' underlying topographic data and is also accurate to the extent feasible. The station-specific river network was identified by extracting (or “clipping” in GIS jargon) the HydroSHEDS river network present within the upstream catchment boundary.</p>
      <p id="d1e2257">A schematic illustrating the abovementioned catchment boundary delineation procedure is presented in Fig. <xref ref-type="fig" rid="Ch1.F6"/> for the Watrak Dam station in the Sabarmati River basin. The original CWC location (red circle), the relocated GHI station (blue circle) and the GHI river network (blue lines) are overlaid on Google Maps information (Fig. <xref ref-type="fig" rid="Ch1.F6"/>a). The gray polygons are the HydroSHEDS PF-12 watersheds upstream of the Watrak Dam station. The portion downstream of the station, not contributing to flow at the station, was discarded, and the final catchment boundary was delineated.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F6" specific-use="star"><?xmltex \currentcnt{6}?><?xmltex \def\figurename{Figure}?><label>Figure 6</label><caption><p id="d1e2266">Example illustrating GHI's boundary delineation process: <bold>(a)</bold> CWC's Watrak Dam station (red circle) on the Sabarmati River and GHI relocated location (blue circle); <bold>(b)</bold> initial delineated catchment boundary using PF-12 watersheds; and <bold>(c)</bold> final catchment boundary after discarding the portion downstream of the relocated station.</p></caption>
          <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://essd.copernicus.org/articles/15/4389/2023/essd-15-4389-2023-f06.png"/>

        </fig>

      <p id="d1e2285">The catchment area associated with each station was obtained by estimating the area enclosed by the delineated HydroSHEDS-based catchment boundary. An additional estimate of catchment area was obtained using the MERIT 90 m raster layer on upstream catchment area. CWC's stations were relocated onto the MERIT river network. The catchment area corresponding to the station was estimated as the raster value at the 90 m pixel containing the relocated location. The HydroSHEDS and MERIT relocated station locations as well as the estimated upstream catchment areas are included within the metadata available through GHI.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F7" specific-use="star"><?xmltex \currentcnt{7}?><?xmltex \def\figurename{Figure}?><label>Figure 7</label><caption><p id="d1e2290">Overview of GHI's quality control process used in the categorization of gauging stations. CWC's inventory of stations are placed in groups 1, 2 or 3 based on the outcome (“P” for pass or “F” for fail) of each quality check.</p></caption>
          <?xmltex \igopts{width=398.338583pt}?><graphic xlink:href="https://essd.copernicus.org/articles/15/4389/2023/essd-15-4389-2023-f07.png"/>

        </fig>

</sec>
<sec id="Ch1.S3.SS4">
  <label>3.4</label><title>Quality control</title>
      <p id="d1e2307">The quality control process used to assess the reliability of station-wise metadata and hydrography data involves several steps (as outlined in Fig. <xref ref-type="fig" rid="Ch1.F7"/>) and is described in the following text. There are 10 quality check (QCs) and 1 data check (DC), and each station is assigned a “P” (for pass) or an “F” (for fail) corresponding to each of these 11 checks. If a station passes a specific check, then it is flagged as P for the particular check. Otherwise, the station is flagged as F for the particular check. A station is placed in Group 3 if it fails to meet any one of the 10 QCs. Otherwise, stations are placed in Group 1 or Group 2 based on the availability of streamflow<?pagebreak page4400?> data (i.e., based the status of the only DC). Thus, metadata associated with stations in Group 3 should be considered the least reliable. Stations in Group 1 and Group 2 are equally reliable when it comes to metadata, but Group 1 has reliable daily streamflow data available.</p>
      <p id="d1e2312">QC1 is on the availability and reliability of station coordinates – latitude and longitude. Stations with missing or spurious coordinates are placed in Group 3. Latitude and longitude from CWC are presented in “DDMMSS” format. The numerical value of minutes and seconds under such a format should span the values from 0 to 60. When values violate such bounds, the coordinates are considered spurious. QC2 ensures that CWC's description of the station is broadly consistent with its coordinates. If the river basin specified by CWC did not match the composite basin associated with the station's coordinates, then it was placed in Group 3.</p>
      <p id="d1e2315">QC3 ensures that CWC's description of the station is verifiable using either Google Maps or OSM. CWC's description includes the name of the station, the name of the main river (and sometimes a tributary) on which the station is located. Within a general vicinity of 50 km around the station coordinates, a visual search was performed for the namesake river body and a namesake village, town or other landmark (such as a bridge or a highway). The visual search was performed first using Google Maps and then, if needed, OSM. An approximate match was also considered acceptable when searching for names of rivers and places. Sometimes, CWC's stations were nowhere near a waterbody, and sometimes one station's location exactly coincided with the location of another station with a different name. There were also instances in which a station was present at a river confluence and it was not evident if the station was intended to be downstream of the confluence or upstream of the confluence (on one of the river branches). One such example is presented in Fig. <xref ref-type="fig" rid="App1.Ch1.S1.F18"/> in Appendix <xref ref-type="sec" rid="App1.Ch1.S1"/>. All such ambiguous situations resulted in the station being placed in Group 3.</p>
      <p id="d1e2322">QC4 is related to QC3 but is an independent check on CWC's station names. QC4 ensures that a reference landmark matching the station name can be found, regardless of whether a river body is present in the vicinity or not. If a<?pagebreak page4401?> reliable landmark was not found, the station was placed in Group 3. QC5 ensures that the identified reference landmark is not far from the original CWC station. Some of CWC's stations began operation more than 50 years ago. It is possible that population centers (and their names) have experienced changes during this time and do not always reflect those shown within present-day Google Maps or OSM. A distance of 20 km was selected as a reasonable threshold based on an examination of the typical distances between stations and their landmarks. A majority of the stations meet this criterion and were within 5 km of the corresponding GHI landmark (Fig. <xref ref-type="fig" rid="Ch1.F8"/>a). If a station was more than 20 km away from its landmark, it was placed in Group 3.</p>
      <p id="d1e2328">During the catchment delineation process, stations were relocated onto the pixel-based river networks.
QC6 ensures that the relocated station is in the proximity of the original CWC location. Typically, the relocated stations are only a few pixels away from the original location. However, the relocation distance was much larger on some occasions. Given that some of the larger rivers can have channels spanning multiple HydroSHEDS pixels, a distance of 5 km (approximately 10 HydroSHEDS pixels) was assumed to be a reasonable relocation distance. A majority of the stations meet this criterion and were relocated less than 1 km (Fig. <xref ref-type="fig" rid="Ch1.F8"/>b). If a station was relocated more than 5 km, it was placed in Group 3.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F8" specific-use="star"><?xmltex \currentcnt{8}?><?xmltex \def\figurename{Figure}?><label>Figure 8</label><caption><p id="d1e2335">Summary of select quality control metrics for stations in groups 1 and 2: <bold>(a)</bold> QC5 – distance between GHI landmark and the corresponding CWC station; <bold>(b)</bold> QC6 – distance between GHI relocated station and the corresponding CWC station; <bold>(c)</bold> QC8 – discrepancy in estimated catchment area between GHI and CWC; and <bold>(d)</bold> QC9 – discrepancy in estimated catchment area between GHI and MERIT. The numbers within each panel indicate the number of stations associated with each bar of the bar plot.</p></caption>
          <?xmltex \igopts{width=398.338583pt}?><graphic xlink:href="https://essd.copernicus.org/articles/15/4389/2023/essd-15-4389-2023-f08.png"/>

        </fig>

      <p id="d1e2356">QC7, QC8 and QC9 pertain to the availability and reliability of catchment area estimates from CWC.
Catchment area was not available for certain CWC stations, and QC7 ensures that such stations are placed in Group 3. As discussed in Sect. <xref ref-type="sec" rid="Ch1.S1.SS1"/>, CWC's catchment area estimates are not reliable, and they could sometimes have more than a 50 % discrepancy. The discrepancy between GHI's catchment area estimates and those from CWC and MERIT are shown in Fig. <xref ref-type="fig" rid="Ch1.F8"/>. Based on this chart and an examination of other discrepancies, a discrepancy of 80 % between GHI and CWC (GHI relative to CWC) was considered an appropriate threshold. As MERIT-based catchment area estimates are considered equally reliable with respect to those from GHI, a smaller discrepancy of 10 % between GHI and MERIT (GHI relative to MERIT) was considered an appropriate threshold. QC8 and QC9 ensure that the catchment area estimates from GHI stay below these thresholds.</p>
      <p id="d1e2363">QC10 is on the adequacy of the delineated HydroSHEDS-based river network upstream of the gauging station. While<?pagebreak page4402?> the HydroSHEDS river network typically resembles the actual river network based on Google Maps or OSM, it can be grossly inadequate, particularly in the presence of braided river channels, distributaries or other complex flow regimes. The reasoning behind the use of QC10 is to qualitatively assess the adequacy of the delineated network by visually comparing it with actual river flow paths from Google Maps and OSM. A number of stations in the river deltas of the Mahanadi and Cauvery basins and a number of stations on smaller tributaries did not pass this check. One such example is presented in Fig. <xref ref-type="fig" rid="App1.Ch1.S1.F18"/> in Appendix <xref ref-type="sec" rid="App1.Ch1.S1"/>.</p>
      <p id="d1e2370">The final check is on the adequacy of daily streamflow data. As discussed earlier, for many of CWC's stations in Peninsular India, daily streamflow data are available via WRIS. However, there are often many missing records. It should be noted that a flow value of zero was not considered to be missing data in this study. Missing values from WRIS are blanks in the downloaded data spreadsheets. When available, daily data were aggregated to monthly and annual data. Only months with a maximum of 5 missing days and years with a maximum of 60 missing days were used. This would mean that data would be available for at least 80 % of the days in a typical month and for 80 % of the days in a year. Finally, only those stations with a minimum of 5 years of annual streamflow data were selected. As water resources assessments and other hydrologic studies often desire sufficiently long time series, a minimum threshold of 5 years was used in this study. However, these 5 years need not be sequential.</p>
      <p id="d1e2374">The compiled annual time series was checked against statistics presented by CWC-YB for select stations to ensure consistency between the two sets of estimates. A<?pagebreak page4403?> self-consistency check was also performed to ensure that there were no spuriously large values. After a visual examination of annual time series of streamflow across all available stations, a value of 2 times the 95th percentile was deemed to be a reasonable threshold. Two stations – Pathagudem in Godavari Basin and Sadalga in Krishna Basin, had annual values that exceeded the abovementioned threshold for WY 2017. Both annual and monthly data for WY 2017 for these two stations were discarded.</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T7"><?xmltex \currentcnt{7}?><label>Table 7</label><caption><p id="d1e2380">Distribution of gauging stations by GHI group within each composite basin.</p></caption><oasis:table frame="topbot"><?xmltex \begin{scaleboxenv}{.95}[.95]?><oasis:tgroup cols="5">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Basin</oasis:entry>
         <oasis:entry colname="col2">Group 1</oasis:entry>
         <oasis:entry colname="col3">Group 2</oasis:entry>
         <oasis:entry colname="col4">Group 3</oasis:entry>
         <oasis:entry colname="col5">Total</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Brahmani–Baitarani</oasis:entry>
         <oasis:entry colname="col2">7</oasis:entry>
         <oasis:entry colname="col3">8</oasis:entry>
         <oasis:entry colname="col4">9</oasis:entry>
         <oasis:entry colname="col5">24</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Cauvery</oasis:entry>
         <oasis:entry colname="col2">20</oasis:entry>
         <oasis:entry colname="col3">9</oasis:entry>
         <oasis:entry colname="col4">25</oasis:entry>
         <oasis:entry colname="col5">54</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">EFR North</oasis:entry>
         <oasis:entry colname="col2">5</oasis:entry>
         <oasis:entry colname="col3">10</oasis:entry>
         <oasis:entry colname="col4">5</oasis:entry>
         <oasis:entry colname="col5">20</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">EFR South</oasis:entry>
         <oasis:entry colname="col2">12</oasis:entry>
         <oasis:entry colname="col3">13</oasis:entry>
         <oasis:entry colname="col4">12</oasis:entry>
         <oasis:entry colname="col5">37</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Godavari</oasis:entry>
         <oasis:entry colname="col2">40</oasis:entry>
         <oasis:entry colname="col3">70</oasis:entry>
         <oasis:entry colname="col4">30</oasis:entry>
         <oasis:entry colname="col5">140</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Krishna</oasis:entry>
         <oasis:entry colname="col2">42</oasis:entry>
         <oasis:entry colname="col3">22</oasis:entry>
         <oasis:entry colname="col4">8</oasis:entry>
         <oasis:entry colname="col5">72</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Mahanadi</oasis:entry>
         <oasis:entry colname="col2">19</oasis:entry>
         <oasis:entry colname="col3">19</oasis:entry>
         <oasis:entry colname="col4">17</oasis:entry>
         <oasis:entry colname="col5">55</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Mahi</oasis:entry>
         <oasis:entry colname="col2">6</oasis:entry>
         <oasis:entry colname="col3">10</oasis:entry>
         <oasis:entry colname="col4">3</oasis:entry>
         <oasis:entry colname="col5">19</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Narmada</oasis:entry>
         <oasis:entry colname="col2">18</oasis:entry>
         <oasis:entry colname="col3">34</oasis:entry>
         <oasis:entry colname="col4">19</oasis:entry>
         <oasis:entry colname="col5">71</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Pennar</oasis:entry>
         <oasis:entry colname="col2">6</oasis:entry>
         <oasis:entry colname="col3">6</oasis:entry>
         <oasis:entry colname="col4">0</oasis:entry>
         <oasis:entry colname="col5">12</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Sabarmati</oasis:entry>
         <oasis:entry colname="col2">2</oasis:entry>
         <oasis:entry colname="col3">9</oasis:entry>
         <oasis:entry colname="col4">2</oasis:entry>
         <oasis:entry colname="col5">13</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Subernarekha</oasis:entry>
         <oasis:entry colname="col2">4</oasis:entry>
         <oasis:entry colname="col3">7</oasis:entry>
         <oasis:entry colname="col4">4</oasis:entry>
         <oasis:entry colname="col5">15</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Tapi</oasis:entry>
         <oasis:entry colname="col2">4</oasis:entry>
         <oasis:entry colname="col3">21</oasis:entry>
         <oasis:entry colname="col4">15</oasis:entry>
         <oasis:entry colname="col5">40</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">WFR North</oasis:entry>
         <oasis:entry colname="col2">4</oasis:entry>
         <oasis:entry colname="col3">13</oasis:entry>
         <oasis:entry colname="col4">5</oasis:entry>
         <oasis:entry colname="col5">22</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">WFR South</oasis:entry>
         <oasis:entry colname="col2">24</oasis:entry>
         <oasis:entry colname="col3">8</oasis:entry>
         <oasis:entry colname="col4">19</oasis:entry>
         <oasis:entry colname="col5">51</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">All basins</oasis:entry>
         <oasis:entry colname="col2">213</oasis:entry>
         <oasis:entry colname="col3">259</oasis:entry>
         <oasis:entry colname="col4">173</oasis:entry>
         <oasis:entry colname="col5">645</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup><?xmltex \end{scaleboxenv}?></oasis:table><?xmltex \gdef\@currentlabel{7}?></table-wrap>

      <?pagebreak page4404?><p id="d1e2713">The number of stations falling within each group are tabulated in Table <xref ref-type="table" rid="Ch1.T7"/>. Out of the 645 stations analyzed here, 213 are in Group 1, 259 are in Group 2 and the remaining 173 are in Group 3. Thus 472 of 645 (or about 73 %) of the stations are in Group 1 and Group 2. Except for the Pennar Basin, all river basins have at least one station falling in Group 3. All of the basins have at least a few stations in Group 1. Figure <xref ref-type="fig" rid="Ch1.F9"/> shows the spatial distribution of stations across groups. Figure <xref ref-type="fig" rid="Ch1.F9"/>d shows a cluster of stations in the Mahanadi and Cauvery deltas. These stations are associated with river networks that are too complex for GHI to delineate. Stations in the WFR South Basin correspond to catchment areas that are typically much smaller than those in other basins. This is due to the presence of the Western Ghats (or mountains) along the coast, resulting in many small coastal watersheds along the western coast of India. Several such catchments of the WFR South Basin have catchment area discrepancies larger than the thresholds defined in the quality control process (QC8 and QC9 in Fig. <xref ref-type="fig" rid="Ch1.F7"/>). Such stations typically have catchment areas of a few hundred square kilometers or less. As discussed in Sect. <xref ref-type="sec" rid="Ch1.S6"/>, the 500 m resolution of HydroSHEDS data is probably too coarse to adequately resolve the boundaries and river networks of such catchments.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F9"><?xmltex \currentcnt{9}?><?xmltex \def\figurename{Figure}?><label>Figure 9</label><caption><p id="d1e2728">Stations falling with each GHI group: <bold>(a)</bold> all stations, <bold>(b)</bold> Group 1, <bold>(c)</bold> Group 2 and <bold>(d)</bold> Group 3.</p></caption>
          <?xmltex \igopts{width=236.157874pt}?><graphic xlink:href="https://essd.copernicus.org/articles/15/4389/2023/essd-15-4389-2023-f09.png"/>

        </fig>

</sec>
<sec id="Ch1.S3.SS5">
  <label>3.5</label><title>Time series data compilation</title>
      <p id="d1e2757">Station-wise monthly and annual time series of data of select hydrometeorological variables are compiled for only those stations falling into Group 1 or Group 2. The variables compiled include observed streamflow; observation-based precipitation from IMD; model-based precipitation, ET and runoff from ERA; and ET from GLEAM.</p>
      <p id="d1e2760">In order to facilitate comparison across these variables, they are expressed in the same units of volume. As various reports from CWC (e.g., CWC-19 and CWC-YB) present summary statistics on streamflow (and other hydrometeorological variables) in units of “<inline-formula><mml:math id="M30" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">MCM</mml:mi></mml:mrow></mml:math></inline-formula>” (or <inline-formula><mml:math id="M31" display="inline"><mml:mrow><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mn mathvariant="normal">6</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M32" display="inline"><mml:mrow class="unit"><mml:msup><mml:mi mathvariant="normal">m</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>), this unit was chosen to facilitate cross-checking. For the purposes of discussion and graphical display, sometimes the unit of billion cubic meters (“<inline-formula><mml:math id="M33" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">BCM</mml:mi></mml:mrow></mml:math></inline-formula>”, equivalent to <inline-formula><mml:math id="M34" display="inline"><mml:mrow class="unit"><mml:msup><mml:mi mathvariant="normal">km</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>) was also used. Observed daily streamflow, available in units of cubic meters per second  (<inline-formula><mml:math id="M35" display="inline"><mml:mrow class="unit"><mml:msup><mml:mi mathvariant="normal">m</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msup><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msup><mml:mi mathvariant="normal">s</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>) was aggregated to cumulative monthly and annual volumes (MCM per month and MCM per year, respectively). Gridded data on precipitation, modeled ET and runoff, available in units of depth per unit area per month (e.g., millimeters per month), were also aggregated to cumulative monthly and annual volumes. The process of aggregating grid-based products to a catchment scale involves identifying the spatial overlap between the grids and the catchment. Such relationships were identified using a GIS analysis. Grid-specific fractional areas were used to aggregate the gridded data to each catchment. A schematic illustrating the process of aggregation is given in Fig. <xref ref-type="fig" rid="App1.Ch1.S1.F19"/>.</p>
      <p id="d1e2835">Instead of the calendar year (January–December), a water year (WY) is often used in hydrological analyses. Consistent with CWC-19 and other CWC publications, the water year in this study is defined as the period starting on 1 June and ending on 31 May of the following calendar year. For example, WY 2020 spans the period  1 June 2020 through 31 May 2021.</p>
      <p id="d1e2838">The time span of GHI's time series of data is WY 1950 to 2020 (71 water years), whenever data are available. Precipitation data from IMD and all data from ERA span the 71 WYs, whereas GLEAM data span WY 1980 to 2020 (41 WYs). The availability of streamflow data from WRIS is station dependent, and no station spans the entirety of the above 71 WYs. The compiled streamflow observations for stations in Group 1 span WY 1965 to 2017, with individual station record lengths varying from 6 WYs to 53 WYs (median length of 34 WYs). There are 175 stations within GHI that have at least 20 WYs of streamflow data.</p>
</sec>
</sec>
<sec id="Ch1.S4">
  <label>4</label><title>Final product</title>
      <p id="d1e2850">GHI is publicly available <xref ref-type="bibr" rid="bib1.bibx10" id="paren.35"/> and includes metadata on stations, GIS data on catchment boundaries and river networks, station-wise time series of data on hydrometeorological variables, and station-wise summary graphics. Additional information on files included within GHI is provided in Sect. <xref ref-type="sec" rid="Ch1.S7"/>.</p>
      <p id="d1e2858">While users of this dataset can filter down to the specific station or river basin of interest, it is useful to have readily available summary graphics that provide an overview of catchment-scale water balance metrics. Such summary graphics not only provide a visual check on the data created for this station but could also be useful tools for water managers and other stakeholders. Two sets of summary graphics are included within GHI: a station-wise annual time series chart and a monthly time series chart. An example for the Mancherial station in the Godavari Basin is shown in Figs. <xref ref-type="fig" rid="Ch1.F10"/> and <xref ref-type="fig" rid="Ch1.F11"/>. Figure <xref ref-type="fig" rid="Ch1.F10"/> includes a map of the station location and also a map of the upstream catchment area and the river network. The charts show the time series of catchment-averaged hydrometeorological data available through GHI.</p>
      <p id="d1e2867">From Fig. <xref ref-type="fig" rid="Ch1.F10"/>, it is evident that annual precipitation from IMD and ERA are generally consistent with each other. Estimated ET values from ERA and GLEAM are also temporally<?pagebreak page4405?> consistent with each other but consistently differ in magnitude for this station. While the specific causes of discrepancies between these two datasets are unknown, the reader is referred to <xref ref-type="bibr" rid="bib1.bibx21" id="text.36"/> and the references therein for further discussion. Annual runoff from ERA tends to be lower than observed runoff prior to 1990 but higher than observed runoff after 1990. While the exact cause of this is unknown, it is speculated that flow regulation by dams and reservoirs could be one of the reasons. ERA does not account for flow regulation by dams and reservoirs and is not expected to match the observed flow. Overall, ERA reasonably captures annual-scale observed precipitation (from IMD) and observed runoff (streamflow from WRIS).</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F10" specific-use="star"><?xmltex \currentcnt{10}?><?xmltex \def\figurename{Figure}?><label>Figure 10</label><caption><p id="d1e2878">One-page summary of final GHI output for Mancherial station in the Godavari Basin. Panel <bold>(a)</bold> shows the location of the Mancherial catchment within the Godavari Basin. Panel <bold>(b)</bold> displays the Mancherial catchment and river network. Panels <bold>(c)</bold>, <bold>(d)</bold> and <bold>(e)</bold> present respective time series of precipitation, evapotranspiration and runoff by water year (WY).</p></caption>
        <?xmltex \igopts{width=398.338583pt}?><graphic xlink:href="https://essd.copernicus.org/articles/15/4389/2023/essd-15-4389-2023-f10.png"/>

      </fig>

      <p id="d1e2902">The monthly time series of hydrometeorological variables for the Mancherial station is shown for the most recent 10-year period in Fig. <xref ref-type="fig" rid="Ch1.F11"/>. The monthly time series for each year in the chart begin in June of the current year and end in May of the following calendar year, consistent with the definition of water year used throughout this study. Similar to Fig. <xref ref-type="fig" rid="Ch1.F10"/>, precipitation from ERA and IMD are highly consistent with each other. ET from ERA is higher than that from GLEAM, particularly at the beginning of the water year (summer and fall months). As mentioned earlier, ERA does not account for flow regulation, and this could be one of the reasons why ERA's runoff is higher than observed flow. The monthly time series of precipitation, ET and runoff reflect the seasonality imposed by the southwest monsoons – wet season from June to September, followed by a dry season.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F11" specific-use="star"><?xmltex \currentcnt{11}?><?xmltex \def\figurename{Figure}?><label>Figure 11</label><caption><p id="d1e2911">Same as Fig. <xref ref-type="fig" rid="Ch1.F10"/> but showing monthly time series of compiled hydrometeorological data for WY 2011–2020.</p></caption>
        <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://essd.copernicus.org/articles/15/4389/2023/essd-15-4389-2023-f11.png"/>

      </fig>

      <p id="d1e2922">Similar graphics are available for the rest of the stations in Group 1 and Group 2. While users can visually examine graphics on individual stations, it is not feasible to assess the overall adequacy of the compiled hydrometeorological through visual examination alone. A preliminary analysis was performed in Sect. <xref ref-type="sec" rid="Ch1.S5"/> using the compiled hydrometeorological data. The aim of this analysis was to check for the presence (or absence) of patterns expected based on the hydrology and climate of the study domain. Such an analysis would also reveal any spurious patterns within the compiled time series of data and help understand the consistency between different datasets.</p>
</sec>
<sec id="Ch1.S5">
  <label>5</label><title>Preliminary analysis using GHI</title>
      <p id="d1e2936">The analysis presented here focuses only on catchment-averaged annual-scale metrics for the sake of simplicity. Metrics examined include the correlation between precipitation from ERA and IMD, the correlation between ET from ERA and GLEAM, and the ratio of observed runoff and precipitation. Only stations in groups 1 and 2 (472 stations) were considered when analyzing precipitation and ET, while only stations in Group 1 (213 stations) were considered when analyzing observed runoff.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F12" specific-use="star"><?xmltex \currentcnt{12}?><?xmltex \def\figurename{Figure}?><label>Figure 12</label><caption><p id="d1e2941"><bold>(a)</bold> Correlation between annual catchment-averaged precipitation from IMD and ERA and <bold>(b)</bold> correlation between annual catchment-averaged ET from GLEAM and ERA.</p></caption>
        <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://essd.copernicus.org/articles/15/4389/2023/essd-15-4389-2023-f12.png"/>

      </fig>

      <p id="d1e2955">The linear correlation coefficient (Pearson correlation using pairwise complete data) between IMD's annual precipitation and ERA's annual precipitation is shown in Fig. <xref ref-type="fig" rid="Ch1.F12"/>. For a majority of the stations (331 out of 472), the correlation is between 0.50 and 0.75. The correlation is greater than 0.75 for a small number of stations (31 out of 472), whereas it is lower in the southwestern portion of the study domain where hilly terrain is typical – this includes southern Krishna Basin, western Cauvery Basin, parts of WFR North Basin, and almost the entirety of WFR South Basin. The general consistency between IMD and ERA precipitation was also reported by <xref ref-type="bibr" rid="bib1.bibx18" id="text.37"/> and <xref ref-type="bibr" rid="bib1.bibx9" id="text.38"/>.</p>
      <p id="d1e2967">The correlation between annual ET from GLEAM and ERA is also shown in Fig. <xref ref-type="fig" rid="Ch1.F12"/>. For a majority of the stations (250 out of 472), the correlation is greater than 0.75. Similar to precipitation, the correlation is higher in other parts of the study domain, compared with the southwestern portion of the study domain. The consistent high correlation between GLEAM and ERA estimated ET was also noted by <xref ref-type="bibr" rid="bib1.bibx9" id="text.39"/>.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F13" specific-use="star"><?xmltex \currentcnt{13}?><?xmltex \def\figurename{Figure}?><label>Figure 13</label><caption><p id="d1e2977"><bold>(a)</bold> Median ratio of observed annual runoff and IMD annual catchment-averaged precipitation and <bold>(b)</bold> median ratio of observed annual runoff and ERA annual catchment-averaged precipitation.</p></caption>
        <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://essd.copernicus.org/articles/15/4389/2023/essd-15-4389-2023-f13.png"/>

      </fig>

      <p id="d1e2991">Figure <xref ref-type="fig" rid="Ch1.F13"/> shows median ratio of observed runoff and precipitation (<inline-formula><mml:math id="M36" display="inline"><mml:mrow><mml:mi>R</mml:mi><mml:mo>/</mml:mo><mml:mi>P</mml:mi></mml:mrow></mml:math></inline-formula>) from IMD and ERA. For each year, <inline-formula><mml:math id="M37" display="inline"><mml:mrow><mml:mi>R</mml:mi><mml:mo>/</mml:mo><mml:mi>P</mml:mi></mml:mrow></mml:math></inline-formula> was computed using available data, and the median value was estimated and is plotted in Fig. <xref ref-type="fig" rid="Ch1.F13"/>. In general, for a majority of the stations (125 out of 213 for IMD and 136 out of 213 for ERA), median <inline-formula><mml:math id="M38" display="inline"><mml:mrow><mml:mi>R</mml:mi><mml:mo>/</mml:mo><mml:mi>P</mml:mi></mml:mrow></mml:math></inline-formula> is less than 0.33. This is expected given the semiarid climate of a large part of the study domain. In such climates, a large portion of the annual precipitation would go towards satisfying the evaporative demand, resulting in low <inline-formula><mml:math id="M39" display="inline"><mml:mrow><mml:mi>R</mml:mi><mml:mo>/</mml:mo><mml:mi>P</mml:mi></mml:mrow></mml:math></inline-formula> ratios. For the southwestern portion of the study domain, where hilly terrain and wet climate are prevalent, <inline-formula><mml:math id="M40" display="inline"><mml:mrow><mml:mi>R</mml:mi><mml:mo>/</mml:mo><mml:mi>P</mml:mi></mml:mrow></mml:math></inline-formula> ratios are expected to be higher. This was the case for several stations. However, median <inline-formula><mml:math id="M41" display="inline"><mml:mrow><mml:mi>R</mml:mi><mml:mo>/</mml:mo><mml:mi>P</mml:mi></mml:mrow></mml:math></inline-formula> is greater than 1.0 for certain stations, regardless of whether the precipitation was from IMD or ERA. Such stations are present mostly in the hilly regions along the western coast of India. A <inline-formula><mml:math id="M42" display="inline"><mml:mrow><mml:mi>R</mml:mi><mml:mo>/</mml:mo><mml:mi>P</mml:mi></mml:mrow></mml:math></inline-formula> value greater than 1.0 implies that total annual runoff from the catchment exceeds the total annual precipitation. Such a finding may seem “absurd” at a first glance and warrants further discussion.</p>
      <p id="d1e3083">From basic hydrologic balance, in the absence of any human intervention, the annual volume of runoff from a river basin is expected to be less than the annual volume of precipitation. Hence, values of <inline-formula><mml:math id="M43" display="inline"><mml:mrow><mml:mi>R</mml:mi><mml:mo>/</mml:mo><mml:mi>P</mml:mi></mml:mrow></mml:math></inline-formula> greater than 1.0 suggest either substantial human intervention to the water cycle (e.g., carryover reservoir storage) or errors associated with data. Such errors could be with catchment area delineation, erroneous data compilation, or erroneous underlying runoff or precipitation data. Catchment area discrepancy between GHI and CWC-21 for stations where <inline-formula><mml:math id="M44" display="inline"><mml:mrow><mml:mi>R</mml:mi><mml:mo>/</mml:mo><mml:mi>P</mml:mi></mml:mrow></mml:math></inline-formula> is greater than 1.0 is within 5 % for most of the stations. Moreover, precipitation and observed streamflow data compiled from this study are consistent with independent compilations from other studies (CWC-YB and CWC-19). Hence, data compilation errors can be ruled out.</p>
      <p id="d1e3110">It is speculated that the “absurdity” of <inline-formula><mml:math id="M45" display="inline"><mml:mrow><mml:mi>R</mml:mi><mml:mo>/</mml:mo><mml:mi>P</mml:mi></mml:mrow></mml:math></inline-formula> being greater than 1.0 is either due to carryover reservoir flow or due to<?pagebreak page4406?> errors with the underlying streamflow or precipitation data. It is not known whether there are any gross measurement inaccuracies associated with streamflow data. Other studies have indicated the complexities of capturing precipitation in this region where orographic effects play a major role in creating intense precipitation events <xref ref-type="bibr" rid="bib1.bibx24 bib1.bibx27" id="paren.40"><named-content content-type="pre">e.g.,</named-content></xref>. The estimated catchment areas of these stations with spurious <inline-formula><mml:math id="M46" display="inline"><mml:mrow><mml:mi>R</mml:mi><mml:mo>/</mml:mo><mml:mi>P</mml:mi></mml:mrow></mml:math></inline-formula> ratios are typically less than 1000 <inline-formula><mml:math id="M47" display="inline"><mml:mrow class="unit"><mml:msup><mml:mi mathvariant="normal">km</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>. As IMD's individual grids are about 25 km <inline-formula><mml:math id="M48" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 25 km or 625 <inline-formula><mml:math id="M49" display="inline"><mml:mrow class="unit"><mml:msup><mml:mi mathvariant="normal">km</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> in size, IMD's data may not be suited to capture the necessary spatial variability in precipitation. However, ERA's precipitation suffers from a similar issue despite being at a higher resolution (grid size of about 100 <inline-formula><mml:math id="M50" display="inline"><mml:mrow class="unit"><mml:msup><mml:mi mathvariant="normal">km</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>). It should also be noted that this is not the first study to encounter spurious <inline-formula><mml:math id="M51" display="inline"><mml:mrow><mml:mi>R</mml:mi><mml:mo>/</mml:mo><mml:mi>P</mml:mi></mml:mrow></mml:math></inline-formula> values in this region. CWC-19 tabulated annual runoff values greater than annual precipitation for some of the catchments of the WFR South Basin (<xref ref-type="bibr" rid="bib1.bibx2" id="altparen.41"/>, see their Appendix R, Tables R-1 to R-10) but did not delve into the underlying causes. A further discussion of this topic is beyond the scope of this paper and will be addressed in the future.</p>
</sec>
<sec id="Ch1.S6">
  <label>6</label><title>Limitations and potential improvements</title>
      <p id="d1e3206">GHI's adoption of 15 arcsec (500 m) HydroSHEDS data as the underlying template imposes a limit on the spatial<?pagebreak page4407?> accuracy of delineated catchment boundaries and river networks. For larger river basins, this resolution is probably adequate for most practical purposes. However, for smaller river basins, such as those less than 100 <inline-formula><mml:math id="M52" display="inline"><mml:mrow class="unit"><mml:msup><mml:mi mathvariant="normal">km</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> (equivalent to about 400 HydroSHEDS pixels), topographic data based on higher resolution might be more appropriate. A new version of HydroSHEDS (v2, <uri>https://www.hydrosheds.org/</uri>, last access: 1 February 2023) based on 12 m topographic data is scheduled to be released in 2023. It is expected that this new version would improve the spatial accuracy of the entire suite of HydroSHEDS products. GHI's catchment boundaries and river networks could be updated with this latest dataset. While significant effort was made to accurately identify the catchment boundaries corresponding to each station, there were instances where the delineated boundary included a few 500 m pixels not contributing to flow at the station. The effect of such pixels on delineated catchment boundaries and area estimates is minimal, but they need to be discarded to make the boundaries more accurate. In future revisions, this issue will be addressed.</p>
      <p id="d1e3223">Some of the quality checks used in the development of GHI use subjective thresholds and were devised to separate the more reliable data from the less reliable. One could end up with a different number of stations within each group if<?pagebreak page4408?> those subjective thresholds were changed. Out of the 645 stations analyzed here, 173 stations were placed in Group 3 because sufficient reliable metadata could not be compiled for many of these stations. It is hoped that CWC will corroborate metadata on these stations so that they can eventually be moved into Group 1 or Group 2. An obvious next step is to extend the domain of GHI from Peninsular India to the whole of India. However, streamflow data for the rest of India are not publicly available. Such data are critical for hydrologic assessments, climate change studies and other modeling analyses. It is also hoped that this study will encourage CWC and other custodians of streamflow data to make such data publicly available. Another issue with extending the domain of GHI to the whole of India is the issue of drainage boundaries crossing international boundaries and the prevailing uncertainty about these drainage boundaries. An analysis on the Ganga's basin boundaries delineated using HydroSHEDS data and the discrepancies between such boundaries and those available from CWC and WRIS is currently being pursued by the author.</p>
      <p id="d1e3226">Missing daily streamflow values were filled in using the average of available daily data. Such a procedure will not adequately capture the rising and falling limbs of the daily hydrograph. Future revisions could use established methods such as the time series interpolation of <xref ref-type="bibr" rid="bib1.bibx6" id="text.42"/>, which is readily available within many data analysis software packages. Currently, hydrometeorological data within GHI includes precipitation, modeled ET and runoff, and observed streamflow. Additional variables, such as catchment-averaged soil moisture, could be useful in water balance studies. Modeled runoff was aggregated to the catchment without accounting for travel time from individual grids to the catchment outlet, assuming that the resulting monthly and annual hydrographs are unaffected by channel routing. Such an assumption may not be appropriate for the downstream locations of large river basins. Agriculture is the biggest land use in India <xref ref-type="bibr" rid="bib1.bibx22" id="paren.43"/> and is responsible for about 70 % of water abstractions (CWC-19). Inclusion of annual maps of cropland and irrigated area would be useful in the estimation of consumptive water use within water resources assessments, such as those by CWC-19 and <xref ref-type="bibr" rid="bib1.bibx9" id="text.44"/>.</p>
      <p id="d1e3238">While event-scale precipitation over the western coast of India has been analyzed before, such as the extreme precipitation resulting in the floods of 2018 <xref ref-type="bibr" rid="bib1.bibx12" id="paren.45"><named-content content-type="pre">e.g.,</named-content></xref>, this is probably the first study to report potential annual-scale underrepresentation of precipitation in this region. This issue needs to be investigated further. High-resolution datasets on precipitation, such as those from the Indian Monsoon Data Assimilation and Analysis (IMDAA) <xref ref-type="bibr" rid="bib1.bibx25" id="paren.46"/> and Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) <xref ref-type="bibr" rid="bib1.bibx7" id="paren.47"/> could be helpful in addressing such issues. Human management of the water could have a substantial effect on streamflow and the hydrologic cycle. The inclusion of data on dams and reservoirs (both metadata and live storage) would be helpful in quantifying the effects of flow regulation. WRIS-OL provides live storage information for select reservoirs, and these data could be included within GHI in future revisions.</p>
</sec>
<sec id="Ch1.S7">
  <label>7</label><title>Data availability</title>
      <p id="d1e3260">GHI is publicly available at <ext-link xlink:href="https://doi.org/10.5281/zenodo.7563599" ext-link-type="DOI">10.5281/zenodo.7563599</ext-link> <xref ref-type="bibr" rid="bib1.bibx10" id="paren.48"/> and includes the following files: (1) a “Readme.txt” file that outlines the available files, the format of these files and the data fields within these files; (2) plain-text files on station metadata (station locations from CWC, relocated locations, landmarks, catchment areas and other attributes)<?pagebreak page4409?> and hydrometeorological data time series (monthly and annual files); (3) shapefiles (GIS) on composite river basin boundaries, catchment boundaries and catchment-specific river networks for stations present in Group 1 and Group 2; and (4) PDF files showing station-wise summary maps and time series (monthly and annual files).</p>
</sec>
<sec id="Ch1.S8" sec-type="conclusions">
  <label>8</label><title>Conclusions</title>
      <p id="d1e3277">GIS data on river gauging stations, their upstream catchment area boundaries and river networks comprise a fundamental building block of hydrologic analyses. Information on India’s river basins is insufficient and is limited by drawbacks such as ambiguous station locations and inconsistent or erroneous catchment area estimates, among others. The goal of this study is to highlight the limitations of existing data and to build a publicly available hydrographic dataset using state-of-the-art global resources. The dataset developed by this study, GHI, categorizes available information from India's water agencies based on its consistency with global data sources. Existing metadata are supplemented with additional information where needed. The quality control aspect of GHI includes the following: verifying the station description using online maps, checking for visual consistency of delineated river network with online maps, comparing GHI-estimated catchment areas with those from CWC and MERIT, and checking streamflow data for missing records and extreme outliers.</p>
      <p id="d1e3280">The current version of GHI is limited to 645 stations in 15 river basins of Peninsular India. Out of these 645 stations, 472 were deemed reliable for subsequent analyses and 173 were not. While the geospatial information within GHI includes GIS data on gauge locations, catchment boundaries and river networks, the time series information within GHI includes precipitation, ET and runoff at monthly and annual timescales for each of the above 472 stations. A preliminary analysis using GHI's time series of data suggests that, while the compiled data are reasonable over most of the study domain, spurious runoff–precipitation ratios were observed in the hilly coastal regions of Western India. This issue needs to be investigated further. Building a robust hydrographic and hydrometeorological dataset is beyond the scope of one individual. Until such datasets become available, GHI is intended to serve as a building block and a reliable reference for hydrologic analyses on India's river basins.</p><?xmltex \hack{\clearpage}?>
</sec>

      
      </body>
    <back><app-group>

<?pagebreak page4410?><app id="App1.Ch1.S1">
  <?xmltex \currentcnt{A}?><label>Appendix A</label><title>Additional graphics on GHI's data sources</title>

      <?xmltex \floatpos{h!}?><fig id="App1.Ch1.S1.F14"><?xmltex \currentcnt{A1}?><?xmltex \def\figurename{Figure}?><label>Figure A1</label><caption><p id="d1e3297">Snippet showing the metadata information available from CWC and WRIS: <bold>(a)</bold> data from CWC-21 and <bold>(b)</bold> raw downloaded streamflow data from WRIS-OL for the year 2007. Daily data from WRIS-OL are sometimes available as the gauge level and/or discharge. Note the discrepancy in latitude and longitude for the Saradaput station; also note the spurious latitude and longitude  for the Sitalpur station (which corresponds to a location in the Southern Hemisphere).</p></caption>
        <?xmltex \hack{\hsize\textwidth}?>
        <?xmltex \igopts{width=398.338583pt}?><graphic xlink:href="https://essd.copernicus.org/articles/15/4389/2023/essd-15-4389-2023-f14.png"/>

      </fig>

<?xmltex \hack{\clearpage}?><?xmltex \floatpos{h!}?><fig id="App1.Ch1.S1.F15"><?xmltex \currentcnt{A2}?><?xmltex \def\figurename{Figure}?><label>Figure A2</label><caption><p id="d1e3317">HydroSHEDS watersheds (gray lines) across the Pennar River basin (black line) at different Pfafstetter (PF) levels: PF-6, PF-8, PF-10 and PF-12.</p></caption>
        <?xmltex \hack{\hsize\textwidth}?>
        <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://essd.copernicus.org/articles/15/4389/2023/essd-15-4389-2023-f15.png"/>

      </fig>

      <?xmltex \floatpos{h!}?><fig id="App1.Ch1.S1.F16"><?xmltex \currentcnt{A3}?><?xmltex \def\figurename{Figure}?><label>Figure A3</label><caption><p id="d1e3331">Example showing the typical information available from <bold>(a)</bold> Google Maps and <bold>(b)</bold> OSM for a selected region in the Mahanadi River basin. Google Maps typically has more town and city names, whereas OSM has more natural features such as rivers and waterbodies.</p></caption>
        <?xmltex \hack{\hsize\textwidth}?>
        <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://essd.copernicus.org/articles/15/4389/2023/essd-15-4389-2023-f16.png"/>

      </fig>

<?xmltex \hack{\clearpage}?><?xmltex \floatpos{h!}?><fig id="App1.Ch1.S1.F17"><?xmltex \currentcnt{A4}?><?xmltex \def\figurename{Figure}?><label>Figure A4</label><caption><p id="d1e3351">Total annual precipitation (<inline-formula><mml:math id="M53" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi mathvariant="normal">yr</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>) for WY 2020 (June 2020 through May 2021) from <bold>(a)</bold> IMD and <bold>(b)</bold> ERA. Total ET (<inline-formula><mml:math id="M54" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msup><mml:mi mathvariant="normal">yr</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>) for WY 2020 from <bold>(c)</bold> ERA and <bold>(d)</bold> GLEAM. “NA” in the legend indicates missing or unavailable data. IMD and GLEAM data are shown on the native 0.25<inline-formula><mml:math id="M55" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula> grid, whereas ERA data are shown on the native 0.10<inline-formula><mml:math id="M56" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula> grid.</p></caption>
        <?xmltex \hack{\hsize\textwidth}?>
        <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://essd.copernicus.org/articles/15/4389/2023/essd-15-4389-2023-f17.png"/>

      </fig>

<?xmltex \hack{\clearpage}?><?xmltex \floatpos{h!}?><fig id="App1.Ch1.S1.F18"><?xmltex \currentcnt{A5}?><?xmltex \def\figurename{Figure}?><label>Figure A5</label><caption><p id="d1e3430"><bold>(a1)</bold> Kishan Nagar station (red circle) is on one of the distributaries within the delta of the Mahanadi River (background from Google Maps). <bold>(a2)</bold> The HydroSHEDS river network (blue line) cannot capture such complex features. <bold>(b1)</bold> Jannapura station in the Cauvery Basin (red circle, background from OSM) could not be decisively relocated onto the HydroSHEDS network (blue line in <bold>b2</bold>) because it was unclear whether the station was intended to be upstream or downstream of the confluence.</p></caption>
        <?xmltex \hack{\hsize\textwidth}?>
        <?xmltex \igopts{width=284.527559pt}?><graphic xlink:href="https://essd.copernicus.org/articles/15/4389/2023/essd-15-4389-2023-f18.png"/>

      </fig>

      <?xmltex \floatpos{h!}?><fig id="App1.Ch1.S1.F19"><?xmltex \currentcnt{A6}?><?xmltex \def\figurename{Figure}?><label>Figure A6</label><caption><p id="d1e3455">Example illustrating the area-weighted averaging procedure used within GHI to obtain basin-aggregated hydrometeorological variables. Gridded products typically have many grids (red lines) spanning a basin (black line): some grids are completely present in the basin (100 % fractional extent), whereas the rest are only partially present (<inline-formula><mml:math id="M57" display="inline"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">100</mml:mn></mml:mrow></mml:math></inline-formula> % fractional extent). The grid-wise fractional extent is accounted for when estimating basin-scale average values of hydrometeorological variables.</p></caption>
        <?xmltex \igopts{width=236.157874pt}?><graphic xlink:href="https://essd.copernicus.org/articles/15/4389/2023/essd-15-4389-2023-f19.png"/>

      </fig>

<?xmltex \hack{\vspace*{12cm}}?>
</app>
  </app-group><notes notes-type="competinginterests"><title>Competing interests</title>

      <p id="d1e3480">The author has declared that there are no competing interests.</p>
  </notes><notes notes-type="disclaimer"><title>Disclaimer</title>

      <p id="d1e3486">Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.</p>
  </notes><ack><title>Acknowledgements</title><p id="d1e3493">A number of publicly available datasets were used in this study and are cited wherever applicable. Software used in this study includes the R statistical computing and graphics software for data analysis (<uri>https://www.r-project.org/</uri>, last access: 1 June 2021) and QGIS for GIS analysis (<uri>https://qgis.org/en/site/</uri>, last access: 1 January 2022). Political boundaries for India were obtained from the Survey of India (<uri>https://surveyofindia.gov.in/pages/outline-maps-of-india</uri>, last access: 1 June 2021) and used for illustration only.</p></ack><notes notes-type="reviewstatement"><title>Review statement</title>

      <p id="d1e3507">This paper was edited by Yuanzhi Yao and reviewed by two anonymous referees.</p>
  </notes><ref-list>
    <title>References</title>

      <ref id="bib1.bibx1"><?xmltex \def\ref@label{{Chatterjee and Sinha(2014)}}?><label>Chatterjee and Sinha(2014)</label><?label chatterjee2014water?><mixed-citation>
Chatterjee, R. and Sinha, S.: Water Resources Database–Development and
Management, Proc. Indian Natn. Sci. Acad., 80, 713–730, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx2"><?xmltex \def\ref@label{{CWC-19(2019)}}?><label>CWC-19(2019)</label><?label cwc19?><mixed-citation>CWC-19: Reassessment of Water Availability in India using Space Inputs,
Central Water Commission, Basin Planning and Management Organisation,
<uri>http://www.cwc.gov.in/water-resource-estimation</uri> (last access: 1 June 2021), 2019.</mixed-citation></ref>
      <ref id="bib1.bibx3"><?xmltex \def\ref@label{{CWC-21(2021)}}?><label>CWC-21(2021)</label><?label cwc21?><mixed-citation>CWC-21: Hydrological Observation Stations in India under Central Water
Commission, September 2021,
<ext-link xlink:href="http://cwc.gov.in/hydrological-observation-stations-india-under-central-water-commission-september-2021">http://cwc.gov.in/hydrological-observation-stations-india-under-central-water-commission-september-2021</ext-link> (last access: 1 September 2022), 2021.</mixed-citation></ref>
      <ref id="bib1.bibx4"><?xmltex \def\ref@label{{CWC-YB(2021)}}?><label>CWC-YB(2021)</label><?label cwc_yb?><mixed-citation>CWC-YB: Hydrological Year Books, <uri>http://www.cwc.gov.in/publications</uri> (last access: 1 December 2022),
2021.</mixed-citation></ref>
      <ref id="bib1.bibx5"><?xmltex \def\ref@label{{Do et~al.(2018)}}?><label>Do et al.(2018)</label><?label do2018global?><mixed-citation>Do, H. X., Gudmundsson, L., Leonard, M., and Westra, S.: The Global Streamflow Indices and Metadata Archive (GSIM) – Part 1: The production of a daily streamflow archive and metadata, Earth Syst. Sci. Data, 10, 765–785, <ext-link xlink:href="https://doi.org/10.5194/essd-10-765-2018" ext-link-type="DOI">10.5194/essd-10-765-2018</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx6"><?xmltex \def\ref@label{{Fritsch and Carlson(1980)}}?><label>Fritsch and Carlson(1980)</label><?label fritsch1980monotone?><mixed-citation>Fritsch, F. N. and Carlson, R. E.: Monotone piecewise cubic interpolation, SIAM
J. Numer. Anal., 17, 238–246,
<ext-link xlink:href="https://doi.org/10.1137/0717021" ext-link-type="DOI">10.1137/0717021</ext-link>, 1980.</mixed-citation></ref>
      <ref id="bib1.bibx7"><?xmltex \def\ref@label{{Funk et~al.(2014)}}?><label>Funk et al.(2014)</label><?label funk2014quasi?><mixed-citation>Funk, C. C., Peterson, P. J., Landsfeld, M. F., Pedreros, D. H., Verdin, J. P.,
Rowland, J. D., Romero, B. E., Husak, G. J., Michaelsen, J. C., and Verdin,
A. P.: A quasi-global precipitation time series for drought
monitoring, US Geological Survey data series, 832, 1–12,
<ext-link xlink:href="https://doi.org/10.3133/ds832" ext-link-type="DOI">10.3133/ds832</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx8"><?xmltex \def\ref@label{{Ganguli et~al.(2022)}}?><label>Ganguli et al.(2022)</label><?label ganguli2022climate?><mixed-citation>Ganguli, P., Singh, B., Reddy, N. N., Raut, A., Mishra, D., and Das, B. S.:
Climate-catchment-soil control on hydrological droughts in peninsular India,
Sci. Rep., 12, 1–14,
<ext-link xlink:href="https://doi.org/10.1038/s41598-022-11293-7" ext-link-type="DOI">10.1038/s41598-022-11293-7</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx9"><?xmltex \def\ref@label{{Goteti(2022)}}?><label>Goteti(2022)</label><?label goteti2022estimation?><mixed-citation>
Goteti, G.: Estimation of water resources availability (WRA) using gridded
evapotranspiration data: A simpler alternative to Central Water
Commission’s WRA assessment, J. Earth Syst. Sci., 131, 1–24, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx10"><?xmltex \def\ref@label{{Goteti(2023)}}?><label>Goteti(2023)</label><?label gotetighi?><mixed-citation>Goteti, G.: Geospatial dataset for Hydrologic analyses in India (GHI): A
quality controlled dataset on river gauges, catchment boundaries and
hydrometeorological time series, Zenodo [data set],
<ext-link xlink:href="https://doi.org/10.5281/zenodo.7563599" ext-link-type="DOI">10.5281/zenodo.7563599</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx11"><?xmltex \def\ref@label{{Gudmundsson et~al.(2018)}}?><label>Gudmundsson et al.(2018)</label><?label gudmundsson2018global?><mixed-citation>Gudmundsson, L., Do, H. X., Leonard, M., and Westra, S.: The Global Streamflow Indices and Metadata Archive (GSIM) – Part 2: Quality control, time-series indices and homogeneity assessment, Earth Syst. Sci. Data, 10, 787–804, <ext-link xlink:href="https://doi.org/10.5194/essd-10-787-2018" ext-link-type="DOI">10.5194/essd-10-787-2018</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx12"><?xmltex \def\ref@label{{Hunt and Menon(2020)}}?><label>Hunt and Menon(2020)</label><?label hunt20202018?><mixed-citation>Hunt, K. M. and Menon, A.: The 2018 Kerala floods: a climate change
perspective, Clim. Dynam., 54, 2433–2446,
<ext-link xlink:href="https://doi.org/10.1007/s00382-020-05123-7" ext-link-type="DOI">10.1007/s00382-020-05123-7</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx13"><?xmltex \def\ref@label{{Krishnan et~al.(2020)}}?><label>Krishnan et al.(2020)</label><?label krishnan2020assessment?><mixed-citation>Krishnan, R., Sanjay, J., Gnanaseelan, C., Mujumdar, M., Kulkarni, A., and
Chakraborty, S.: Assessment of climate change over the Indian region: a
report of the ministry of earth sciences (MOES), government of India,
Springer Nature,
<uri>https://library.oapen.org/handle/20.500.12657/39973</uri> (last access: 1 September 2021), 2020.</mixed-citation></ref>
      <ref id="bib1.bibx14"><?xmltex \def\ref@label{{Lehner and Grill(2013)}}?><label>Lehner and Grill(2013)</label><?label lehner2013global?><mixed-citation>
Lehner, B. and Grill, G.: Global river hydrography and network routing:
baseline data and new approaches to study the world's large river systems,
Hydrol. Process., 27, 2171–2186, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx15"><?xmltex \def\ref@label{{Lehner et~al.(2008)}}?><label>Lehner et al.(2008)</label><?label lehner2008new?><mixed-citation>Lehner, B., Verdin, K., and Jarvis, A.: New global hydrography derived from
spaceborne elevation data, Eos, Transactions American Geophysical Union, 89,
93–94, <ext-link xlink:href="https://doi.org/10.1029/2008EO100001" ext-link-type="DOI">10.1029/2008EO100001</ext-link>, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx16"><?xmltex \def\ref@label{{Lin et~al.(2021)}}?><label>Lin et al.(2021)</label><?label lin2021new?><mixed-citation>Lin, P., Pan, M., Wood, E. F., Yamazaki, D., and Allen, G. H.: A new
vector-based global river network dataset accounting for variable drainage
density, Sci. Data, 8, 1–9,
<ext-link xlink:href="https://doi.org/10.1038/s41597-021-00819-9" ext-link-type="DOI">10.1038/s41597-021-00819-9</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx17"><?xmltex \def\ref@label{{Madhusoodhanan et~al.(2017)}}?><label>Madhusoodhanan et al.(2017)</label><?label madhusoodhanan2017assessment?><mixed-citation>Madhusoodhanan, C., Sreeja, K., and Eldho, T.: Assessment of uncertainties in
global land cover products for hydro-climate modeling in India, Water
Resour. Res., 53, 1713–1734,
<ext-link xlink:href="https://doi.org/10.1002/2016WR020193" ext-link-type="DOI">10.1002/2016WR020193</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx18"><?xmltex \def\ref@label{{Mahto and Mishra(2019)}}?><label>Mahto and Mishra(2019)</label><?label mahto2019does?><mixed-citation>Mahto, S. S. and Mishra, V.: Does ERA-5 outperform other reanalysis products
for hydrologic applications in India?, J. Geophys. Res.-Atmos., 124, 9423–9441,
<ext-link xlink:href="https://doi.org/10.1029/2019JD031155" ext-link-type="DOI">10.1029/2019JD031155</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx19"><?xmltex \def\ref@label{{Martens et~al.(2017)}}?><label>Martens et al.(2017)</label><?label martens2017gleam?><mixed-citation>Martens, B., Miralles, D. G., Lievens, H., van der Schalie, R., de Jeu, R. A. M., Fernández-Prieto, D., Beck, H. E., Dorigo, W. A., and Verhoest, N. E. C.: GLEAM v3: satellite-based land evaporation and root-zone soil moisture, Geosci. Model Dev., 10, 1903–1925, <ext-link xlink:href="https://doi.org/10.5194/gmd-10-1903-2017" ext-link-type="DOI">10.5194/gmd-10-1903-2017</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx20"><?xmltex \def\ref@label{{Miralles et~al.(2011)}}?><label>Miralles et al.(2011)</label><?label miralles2011global?><mixed-citation>Miralles, D. G., Holmes, T. R. H., De Jeu, R. A. M., Gash, J. H., Meesters, A. G. C. A., and Dolman, A. J.: Global land-surface evaporation estimated from satellite-based observations, Hydrol. Earth Syst. Sci., 15, 453–469, <ext-link xlink:href="https://doi.org/10.5194/hess-15-453-2011" ext-link-type="DOI">10.5194/hess-15-453-2011</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx21"><?xmltex \def\ref@label{{Mu{\~{n}}oz-Sabater et~al.(2021)}}?><label>Muñoz-Sabater et al.(2021)</label><?label munoz2021era5?><mixed-citation>Muñoz-Sabater, J., Dutra, E., Agustí-Panareda, A., Albergel, C., Arduini, G., Balsamo, G., Boussetta, S., Choulga, M., Harrigan, S., Hersbach, H., Martens, B., Miralles, D. G., Piles, M., Rodríguez-Fernández, N. J., Zsoter, E., Buontempo, C., and Thépaut, J.-N.: ERA5-Land: a state-of-the-art global reanalysis dataset for land applications, Earth Syst. Sci. Data, 13, 4349–4383, <ext-link xlink:href="https://doi.org/10.5194/essd-13-4349-2021" ext-link-type="DOI">10.5194/essd-13-4349-2021</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx22"><?xmltex \def\ref@label{{NRSC(2007)}}?><label>NRSC(2007)</label><?label nrsc05?><mixed-citation>NRSC: National Land Use and Land Cover Mapping Using Multi-Temporal AWiFS data,
Second Cycle Report, 2005-06, Bhuvan thematic
services, <uri>https://bhuvan-app1.nrsc.gov.in/2dresources/bhuvanstore.php</uri> (last access: 1 June 2021),
2007.</mixed-citation></ref>
      <ref id="bib1.bibx23"><?xmltex \def\ref@label{{Pai et~al.(2014)}}?><label>Pai et al.(2014)</label><?label pai2014development?><mixed-citation>Pai, D., Sridhar, L., Rajeevan, M., Sreejith, O., Satbhai, N., and
Mukhopadhyay, B.: Development of a new high spatial resolution (<inline-formula><mml:math id="M58" display="inline"><mml:mrow><mml:mn mathvariant="normal">0.25</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">0.25</mml:mn></mml:mrow></mml:math></inline-formula>) long period (1901–2010) daily gridded rainfall data set over India and
its comparison with existing data sets over the region, Mausam, 65, 1–18,
2014.</mixed-citation></ref>
      <ref id="bib1.bibx24"><?xmltex \def\ref@label{{Rana et~al.(2015)}}?><label>Rana et al.(2015)</label><?label rana2015precipitation?><mixed-citation>Rana, S., McGregor, J., and Renwick, J.: Precipitation seasonality over the
Indian subcontinent: An evaluation of gauge, reanalyses, and satellite
retrievals, J. Hydrometeorol., 16, 631–651,
<ext-link xlink:href="https://doi.org/10.1175/JHM-D-14-0106.1" ext-link-type="DOI">10.1175/JHM-D-14-0106.1</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx25"><?xmltex \def\ref@label{{Rani et~al.(2021)}}?><label>Rani et al.(2021)</label><?label rani2021imdaa?><mixed-citation>Rani, S. I., Arulalan, T., George, J. P., Rajagopal, E., Renshaw, R., Maycock,
A., Barker, D. M., and Rajeevan, M.: IMDAA: High-Resolution Satellite-Era
Reanalysis for the Indian Monsoon Region, J. Climate, 34, 5109–5133,
<ext-link xlink:href="https://doi.org/10.1175/JCLI-D-20-0412.1" ext-link-type="DOI">10.1175/JCLI-D-20-0412.1</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx26"><?xmltex \def\ref@label{{Shah and Mishra(2016)}}?><label>Shah and Mishra(2016)</label><?label shah2016hydrologic?><mixed-citation>Shah, H. L. and Mishra, V.: Hydrologic changes in Indian subcontinental river
basins (1901–2012), J. Hydrometeorol., 17, 2667–2687,
<ext-link xlink:href="https://doi.org/10.1175/JHM-D-15-0231.1" ext-link-type="DOI">10.1175/JHM-D-15-0231.1</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx27"><?xmltex \def\ref@label{{Thakur et~al.(2019)}}?><label>Thakur et al.(2019)</label><?label thakur2019new?><mixed-citation>Thakur, M. K., Kumar, T., Koteswara Rao, K., Barbosa, H., and Rao, V. B.: A new
perspective in understanding rainfall from satellites over a complex
topographic region of India, Sci. Rep., 9, 1–10,
<ext-link xlink:href="https://doi.org/10.1038/s41598-019-52075-y" ext-link-type="DOI">10.1038/s41598-019-52075-y</ext-link>, 2019.</mixed-citation></ref>
      <?pagebreak page4415?><ref id="bib1.bibx28"><?xmltex \def\ref@label{{Verdin(2017)}}?><label>Verdin(2017)</label><?label verdin2017hydrologic?><mixed-citation>Verdin, K. L.: Hydrologic Derivatives for Modeling and Analysis – A new global
high-resolution database, Tech. rep., US Geological Survey,
<ext-link xlink:href="https://doi.org/10.3133/ds1053" ext-link-type="DOI">10.3133/ds1053</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx29"><?xmltex \def\ref@label{{Vorosmarty et~al.(1998)}}?><label>Vorosmarty et al.(1998)</label><?label vorosmarty1998global?><mixed-citation>Vorosmarty, C., Fekete, B., and Tucker, B.: Global River Discharge, 1807–1991,
Version 1.1 (RivDIS), ORNL DAAC, Oak Ridge, Tennessee, USA,
<ext-link xlink:href="https://doi.org/10.3334/ORNLDAAC/199" ext-link-type="DOI">10.3334/ORNLDAAC/199</ext-link>, 1998.</mixed-citation></ref>
      <ref id="bib1.bibx30"><?xmltex \def\ref@label{{WRIS-BR(2014)}}?><label>WRIS-BR(2014)</label><?label wris14?><mixed-citation>WRIS-BR: River Basin Reports, <uri>https://indiawris.gov.in/wris/#/Basin</uri> (last access: 1 September 2021),
2014.</mixed-citation></ref>
      <ref id="bib1.bibx31"><?xmltex \def\ref@label{{WRIS-GIS(2021)}}?><label>WRIS-GIS(2021)</label><?label wrisgis?><mixed-citation>WRIS-GIS: GIS data on major river basin boundaries of India, data obtained via
email request, 14 December 2021, National Water Informatics Centre,
<uri>https://indiawris.gov.in/wris/</uri> (last access: 14 December 2021), 2021.
</mixed-citation></ref><?xmltex \hack{\newpage}?>
      <ref id="bib1.bibx32"><?xmltex \def\ref@label{{WRIS-OL(2022)}}?><label>WRIS-OL(2022)</label><?label wrisol?><mixed-citation>WRIS-OL: India Water Resources Information System,
<uri>https://indiawris.gov.in/wris/</uri> (last access: 21 August 2022), 2022.</mixed-citation></ref>
      <ref id="bib1.bibx33"><?xmltex \def\ref@label{{Yamazaki et~al.(2019)}}?><label>Yamazaki et al.(2019)</label><?label yamazaki2019merit?><mixed-citation>Yamazaki, D., Ikeshima, D., Sosa, J., Bates, P. D., Allen, G. H., and Pavelsky,
T. M.: MERIT Hydro: A high-resolution global hydrography map based on latest
topography dataset, Water Resour. Res., 55, 5053–5073,
<ext-link xlink:href="https://doi.org/10.1029/2019WR024873" ext-link-type="DOI">10.1029/2019WR024873</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx34"><?xmltex \def\ref@label{{Yan et~al.(2019)}}?><label>Yan et al.(2019)</label><?label yan2019data?><mixed-citation>Yan, D., Wang, K., Qin, T., Weng, B., Wang, H., Bi, W.,  Li, X., Li, M., Lv, Z., Liu, F., He, S., Ma, J., Shen, Z., Wang, J., Bai, H., Man, Z., Sun, C., Liu, M., Shi, X., Jing, L., Sun, R., Cao, S.,  Hao, C., Wang, L., Pei, M., Dorjsuren, B., Gedefaw, M.,  Girma, A., and Abiyu, A.: A data set of global river networks and corresponding water
resources zones divisions, Sci. Data, 6, 1–11,
<ext-link xlink:href="https://doi.org/10.1038/s41597-019-0243-y" ext-link-type="DOI">10.1038/s41597-019-0243-y</ext-link>, 2019.</mixed-citation></ref>

  </ref-list></back>
    <!--<article-title-html>Geospatial dataset for hydrologic analyses in India (GHI): a quality-controlled dataset on river gauges, catchment boundaries and hydrometeorological time series</article-title-html>
<abstract-html/>
<ref-html id="bib1.bib1"><label>Chatterjee and Sinha(2014)</label><mixed-citation>
      
Chatterjee, R. and Sinha, S.: Water Resources Database–Development and
Management, Proc. Indian Natn. Sci. Acad., 80, 713–730, 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>CWC-19(2019)</label><mixed-citation>
      
CWC-19: Reassessment of Water Availability in India using Space Inputs,
Central Water Commission, Basin Planning and Management Organisation,
<a href="http://www.cwc.gov.in/water-resource-estimation" target="_blank"/> (last access: 1 June 2021), 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>CWC-21(2021)</label><mixed-citation>
      
CWC-21: Hydrological Observation Stations in India under Central Water
Commission, September 2021,
<a href="http://cwc.gov.in/hydrological-observation-stations-india-under-central-water-commission-september-2021" target="_blank">http://cwc.gov.in/hydrological-observation-stations-india-under-central-water-commission-september-2021</a> (last access: 1 September 2022), 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>CWC-YB(2021)</label><mixed-citation>
      
CWC-YB: Hydrological Year Books, <a href="http://www.cwc.gov.in/publications" target="_blank"/> (last access: 1 December 2022),
2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>Do et al.(2018)</label><mixed-citation>
      
Do, H. X., Gudmundsson, L., Leonard, M., and Westra, S.: The Global Streamflow Indices and Metadata Archive (GSIM) – Part 1: The production of a daily streamflow archive and metadata, Earth Syst. Sci. Data, 10, 765–785, <a href="https://doi.org/10.5194/essd-10-765-2018" target="_blank">https://doi.org/10.5194/essd-10-765-2018</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>Fritsch and Carlson(1980)</label><mixed-citation>
      
Fritsch, F. N. and Carlson, R. E.: Monotone piecewise cubic interpolation, SIAM
J. Numer. Anal., 17, 238–246,
<a href="https://doi.org/10.1137/0717021" target="_blank">https://doi.org/10.1137/0717021</a>, 1980.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>Funk et al.(2014)</label><mixed-citation>
      
Funk, C. C., Peterson, P. J., Landsfeld, M. F., Pedreros, D. H., Verdin, J. P.,
Rowland, J. D., Romero, B. E., Husak, G. J., Michaelsen, J. C., and Verdin,
A. P.: A quasi-global precipitation time series for drought
monitoring, US Geological Survey data series, 832, 1–12,
<a href="https://doi.org/10.3133/ds832" target="_blank">https://doi.org/10.3133/ds832</a>, 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>Ganguli et al.(2022)</label><mixed-citation>
      
Ganguli, P., Singh, B., Reddy, N. N., Raut, A., Mishra, D., and Das, B. S.:
Climate-catchment-soil control on hydrological droughts in peninsular India,
Sci. Rep., 12, 1–14,
<a href="https://doi.org/10.1038/s41598-022-11293-7" target="_blank">https://doi.org/10.1038/s41598-022-11293-7</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>Goteti(2022)</label><mixed-citation>
      
Goteti, G.: Estimation of water resources availability (WRA) using gridded
evapotranspiration data: A simpler alternative to Central Water
Commission’s WRA assessment, J. Earth Syst. Sci., 131, 1–24, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>Goteti(2023)</label><mixed-citation>
      
Goteti, G.: Geospatial dataset for Hydrologic analyses in India (GHI): A
quality controlled dataset on river gauges, catchment boundaries and
hydrometeorological time series, Zenodo [data set],
<a href="https://doi.org/10.5281/zenodo.7563599" target="_blank">https://doi.org/10.5281/zenodo.7563599</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>Gudmundsson et al.(2018)</label><mixed-citation>
      
Gudmundsson, L., Do, H. X., Leonard, M., and Westra, S.: The Global Streamflow Indices and Metadata Archive (GSIM) – Part 2: Quality control, time-series indices and homogeneity assessment, Earth Syst. Sci. Data, 10, 787–804, <a href="https://doi.org/10.5194/essd-10-787-2018" target="_blank">https://doi.org/10.5194/essd-10-787-2018</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>Hunt and Menon(2020)</label><mixed-citation>
      
Hunt, K. M. and Menon, A.: The 2018 Kerala floods: a climate change
perspective, Clim. Dynam., 54, 2433–2446,
<a href="https://doi.org/10.1007/s00382-020-05123-7" target="_blank">https://doi.org/10.1007/s00382-020-05123-7</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>Krishnan et al.(2020)</label><mixed-citation>
      
Krishnan, R., Sanjay, J., Gnanaseelan, C., Mujumdar, M., Kulkarni, A., and
Chakraborty, S.: Assessment of climate change over the Indian region: a
report of the ministry of earth sciences (MOES), government of India,
Springer Nature,
<a href="https://library.oapen.org/handle/20.500.12657/39973" target="_blank"/> (last access: 1 September 2021), 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>Lehner and Grill(2013)</label><mixed-citation>
      
Lehner, B. and Grill, G.: Global river hydrography and network routing:
baseline data and new approaches to study the world's large river systems,
Hydrol. Process., 27, 2171–2186, 2013.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>Lehner et al.(2008)</label><mixed-citation>
      
Lehner, B., Verdin, K., and Jarvis, A.: New global hydrography derived from
spaceborne elevation data, Eos, Transactions American Geophysical Union, 89,
93–94, <a href="https://doi.org/10.1029/2008EO100001" target="_blank">https://doi.org/10.1029/2008EO100001</a>, 2008.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>Lin et al.(2021)</label><mixed-citation>
      
Lin, P., Pan, M., Wood, E. F., Yamazaki, D., and Allen, G. H.: A new
vector-based global river network dataset accounting for variable drainage
density, Sci. Data, 8, 1–9,
<a href="https://doi.org/10.1038/s41597-021-00819-9" target="_blank">https://doi.org/10.1038/s41597-021-00819-9</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>Madhusoodhanan et al.(2017)</label><mixed-citation>
      
Madhusoodhanan, C., Sreeja, K., and Eldho, T.: Assessment of uncertainties in
global land cover products for hydro-climate modeling in India, Water
Resour. Res., 53, 1713–1734,
<a href="https://doi.org/10.1002/2016WR020193" target="_blank">https://doi.org/10.1002/2016WR020193</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>Mahto and Mishra(2019)</label><mixed-citation>
      
Mahto, S. S. and Mishra, V.: Does ERA-5 outperform other reanalysis products
for hydrologic applications in India?, J. Geophys. Res.-Atmos., 124, 9423–9441,
<a href="https://doi.org/10.1029/2019JD031155" target="_blank">https://doi.org/10.1029/2019JD031155</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>Martens et al.(2017)</label><mixed-citation>
      
Martens, B., Miralles, D. G., Lievens, H., van der Schalie, R., de Jeu, R. A. M., Fernández-Prieto, D., Beck, H. E., Dorigo, W. A., and Verhoest, N. E. C.: GLEAM v3: satellite-based land evaporation and root-zone soil moisture, Geosci. Model Dev., 10, 1903–1925, <a href="https://doi.org/10.5194/gmd-10-1903-2017" target="_blank">https://doi.org/10.5194/gmd-10-1903-2017</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>Miralles et al.(2011)</label><mixed-citation>
      
Miralles, D. G., Holmes, T. R. H., De Jeu, R. A. M., Gash, J. H., Meesters, A. G. C. A., and Dolman, A. J.: Global land-surface evaporation estimated from satellite-based observations, Hydrol. Earth Syst. Sci., 15, 453–469, <a href="https://doi.org/10.5194/hess-15-453-2011" target="_blank">https://doi.org/10.5194/hess-15-453-2011</a>, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>Muñoz-Sabater et al.(2021)</label><mixed-citation>
      
Muñoz-Sabater, J., Dutra, E., Agustí-Panareda, A., Albergel, C., Arduini, G., Balsamo, G., Boussetta, S., Choulga, M., Harrigan, S., Hersbach, H., Martens, B., Miralles, D. G., Piles, M., Rodríguez-Fernández, N. J., Zsoter, E., Buontempo, C., and Thépaut, J.-N.: ERA5-Land: a state-of-the-art global reanalysis dataset for land applications, Earth Syst. Sci. Data, 13, 4349–4383, <a href="https://doi.org/10.5194/essd-13-4349-2021" target="_blank">https://doi.org/10.5194/essd-13-4349-2021</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>NRSC(2007)</label><mixed-citation>
      
NRSC: National Land Use and Land Cover Mapping Using Multi-Temporal AWiFS data,
Second Cycle Report, 2005-06, Bhuvan thematic
services, <a href="https://bhuvan-app1.nrsc.gov.in/2dresources/bhuvanstore.php" target="_blank"/> (last access: 1 June 2021),
2007.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>Pai et al.(2014)</label><mixed-citation>
      
Pai, D., Sridhar, L., Rajeevan, M., Sreejith, O., Satbhai, N., and
Mukhopadhyay, B.: Development of a new high spatial resolution (0.25×0.25) long period (1901–2010) daily gridded rainfall data set over India and
its comparison with existing data sets over the region, Mausam, 65, 1–18,
2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>Rana et al.(2015)</label><mixed-citation>
      
Rana, S., McGregor, J., and Renwick, J.: Precipitation seasonality over the
Indian subcontinent: An evaluation of gauge, reanalyses, and satellite
retrievals, J. Hydrometeorol., 16, 631–651,
<a href="https://doi.org/10.1175/JHM-D-14-0106.1" target="_blank">https://doi.org/10.1175/JHM-D-14-0106.1</a>, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>Rani et al.(2021)</label><mixed-citation>
      
Rani, S. I., Arulalan, T., George, J. P., Rajagopal, E., Renshaw, R., Maycock,
A., Barker, D. M., and Rajeevan, M.: IMDAA: High-Resolution Satellite-Era
Reanalysis for the Indian Monsoon Region, J. Climate, 34, 5109–5133,
<a href="https://doi.org/10.1175/JCLI-D-20-0412.1" target="_blank">https://doi.org/10.1175/JCLI-D-20-0412.1</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>Shah and Mishra(2016)</label><mixed-citation>
      
Shah, H. L. and Mishra, V.: Hydrologic changes in Indian subcontinental river
basins (1901–2012), J. Hydrometeorol., 17, 2667–2687,
<a href="https://doi.org/10.1175/JHM-D-15-0231.1" target="_blank">https://doi.org/10.1175/JHM-D-15-0231.1</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>Thakur et al.(2019)</label><mixed-citation>
      
Thakur, M. K., Kumar, T., Koteswara Rao, K., Barbosa, H., and Rao, V. B.: A new
perspective in understanding rainfall from satellites over a complex
topographic region of India, Sci. Rep., 9, 1–10,
<a href="https://doi.org/10.1038/s41598-019-52075-y" target="_blank">https://doi.org/10.1038/s41598-019-52075-y</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>Verdin(2017)</label><mixed-citation>
      
Verdin, K. L.: Hydrologic Derivatives for Modeling and Analysis – A new global
high-resolution database, Tech. rep., US Geological Survey,
<a href="https://doi.org/10.3133/ds1053" target="_blank">https://doi.org/10.3133/ds1053</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>Vorosmarty et al.(1998)</label><mixed-citation>
      
Vorosmarty, C., Fekete, B., and Tucker, B.: Global River Discharge, 1807–1991,
Version 1.1 (RivDIS), ORNL DAAC, Oak Ridge, Tennessee, USA,
<a href="https://doi.org/10.3334/ORNLDAAC/199" target="_blank">https://doi.org/10.3334/ORNLDAAC/199</a>, 1998.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>WRIS-BR(2014)</label><mixed-citation>
      
WRIS-BR: River Basin Reports, <a href="https://indiawris.gov.in/wris/#/Basin" target="_blank"/> (last access: 1 September 2021),
2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>WRIS-GIS(2021)</label><mixed-citation>
      
WRIS-GIS: GIS data on major river basin boundaries of India, data obtained via
email request, 14 December 2021, National Water Informatics Centre,
<a href="https://indiawris.gov.in/wris/" target="_blank"/> (last access: 14 December 2021), 2021.


    </mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>WRIS-OL(2022)</label><mixed-citation>
      
WRIS-OL: India Water Resources Information System,
<a href="https://indiawris.gov.in/wris/" target="_blank"/> (last access: 21 August 2022), 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>Yamazaki et al.(2019)</label><mixed-citation>
      
Yamazaki, D., Ikeshima, D., Sosa, J., Bates, P. D., Allen, G. H., and Pavelsky,
T. M.: MERIT Hydro: A high-resolution global hydrography map based on latest
topography dataset, Water Resour. Res., 55, 5053–5073,
<a href="https://doi.org/10.1029/2019WR024873" target="_blank">https://doi.org/10.1029/2019WR024873</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib34"><label>Yan et al.(2019)</label><mixed-citation>
      
Yan, D., Wang, K., Qin, T., Weng, B., Wang, H., Bi, W.,  Li, X., Li, M., Lv, Z., Liu, F., He, S., Ma, J., Shen, Z., Wang, J., Bai, H., Man, Z., Sun, C., Liu, M., Shi, X., Jing, L., Sun, R., Cao, S.,  Hao, C., Wang, L., Pei, M., Dorjsuren, B., Gedefaw, M.,  Girma, A., and Abiyu, A.: A data set of global river networks and corresponding water
resources zones divisions, Sci. Data, 6, 1–11,
<a href="https://doi.org/10.1038/s41597-019-0243-y" target="_blank">https://doi.org/10.1038/s41597-019-0243-y</a>, 2019.

    </mixed-citation></ref-html>--></article>
