Global River Radar Altimetry Time Series (GRRATS): new river elevation earth science data records for the hydrologic community

The capabilities of radar altimetry to measure inland water bodies are well established, and several river altimetry datasets are available. Here we produced a globally distributed dataset, the Global River Radar Altimeter Time Series (GRRATS), using Envisat and Ocean Surface Topography Mission (OSTM)/Jason-2 radar altimeter data spanning the time period 2002–2016. We developed a method that runs unsupervised, without requiring parameterization at the measurement location, dubbed virtual station (VS) level, and applied it to all altimeter crossings of ocean-draining rivers with widths > 900 m (> 34 % of the global drainage area). We evaluated every VS, either quantitatively for VS locations where in situ gages are available or qualitatively using a grade system. We processed nearly 1.5 million altimeter measurements from 1478 VSs. After quality control, the final product contained 810 403 measurements distributed over 932 VSs located on 39 rivers. Available in situ data allowed quantitative evaluation of 389 VSs on 12 rivers. The median standard deviation of river elevation error is 0.93 m, Nash–Sutcliffe efficiency is 0.75, and correlation coefficient is 0.9. GRRATS is a consistent, well-documented dataset with a user-friendly data visualization portal, freely available for use by the global scientific community. Data are available at https://doi.org/10.5067/PSGRA-SA2V1 (Coss et al., 2016).


Introduction
Despite growing demand from emerging large-scale hydrologic science and applications, global and freely available observations of river water levels are still scarce (Hannah et al., 2011;Pavelsky et al., 2014;Shiklomanov et al., 2002). Advances in remote sensing and computing capabilities have enabled new areas of global fluvial research that are dependent upon river elevations, including global hydrologic quantification of carbon and nitrogen fluxes (e.g., Allen and Pavelsky, 2018;Oki and Yasuoka, 2008) and characteriza-tion of flood risk for future climate scenarios (Schumann et al., 2018;Smith et al., 2015). Evaluation of these global river elevation models requires global datasets of river elevation time series, but in situ river water levels are scarce, as they are often not shared outside specific government agencies. Thus model evaluation and calibration increasingly relies on remotely sensed data (Overton, 2015;Pavelsky et al., 2014;Sampson et al., 2015). Newer radar altimeter missions like Sentinel-3 are improving the contemporary record with features like automated processing, alleviating the need for retracking and other postprocessing to generate useful mea-surements. In addition, the Surface Water and Ocean Topography (SWOT; https://swot.jpl.nasa.gov/, last access: 24 October 2019) satellite mission, scheduled for launch in 2021, will observe global river elevations with an unprecedented global spatial resolution despite variation within its measurement swath. Establishing robust global river elevation datasets for the pre-SWOT period is critical to prepare for the SWOT mission and for the study of hydrology more broadly.
Satellite radar altimetry data have enabled important scientific advances in hydrology (Birkett et al., 2002;Bjerklie et al., 2005;Calmant et al., 2008;Jung et al., 2010;Getirana et al., 2009;Birkinshaw et al., 2014;Frappart et al., 2015;Becker et al., 2018;Emery et al., 2018, among many others), but spatial coverage is limited. This is for two primary reasons: inclination or latitude coverage limits of radar altimeter orbits (orbits with better temporal resolution have worse spatial coverage), and technical measurement challenges associated with retrieving elevation over seasonally varying rivers. Indeed, radar altimeter orbits and elevation retrieval technology were originally designed for characterizing ocean surface topography. The orbital characteristics of historic and contemporary radar altimetry missions used for hydrology tend to follow either the 10 d TOPEX/POSIEDON, Jason-1, Jason-2, and Jason-3 orbit with relatively high temporal resolution but low spatial coverage or the 35 d ERS-1, ERS-2, Envisat, and SARAL-AltiKa orbits with low temporal resolution but higher spatial coverage. Neither of these orbit paradigms captures all global rivers (Alsdorf et al., 2007).
The second fundamental cause of poor global coverage of river radar altimeter observation availability is rooted in the measurement itself. There are a set of criteria, such as river width, nearby topography, and groundcover, associated with successful water surface level retrieval, but none have been shown to be fully predictive of water level accuracy (Maillard et al., 2015). Most of Earth's rivers are too narrow to be accurately measured by satellite radar altimeters: Lettenmaier et al. (2015) suggest that rivers should be wider than 1000 m for optimal retrieval, primarily due to the 1-2 km footprint size of pulse-limited satellite altimeters. Radar altimeter effective footprint size is a function of the surface characteristics and pulse emission mode. For example, in low-resolution mode (LRM), which was commonly used for satellite altimeters until ∼ 2016, footprints typically range from 1.5 to 6.0 km in diameter, depending on the land topography near rivers. Thus, all but the widest rivers are (technically) subfootprint features in LRM. Radar altimetry retrieval of river surface elevations thus relies on the fact that rivers reflect more radar signal than does land, due to the high dielectric constant of water. Some studies have developed methods to process radar altimetry data for far narrower rivers with LRM altimeters (e.g., ∼ 100 m) for a particular location (e.g., Santo da Silva, 2010; Maillard et al., 2015;Boergens et al., 2016;Biancamaria et al., 2017). Since ∼ 2016, retrieving water levels over narrow rivers is increasingly common with the synthetic aperture radar (SAR) altimetry missions (e.g., Cryosat-2 and Sentinel-3) for which the equivalent footprint (300 m wide along flight track band) enables much easier detection and processing of radar returns from rivers.
Regardless of the specifics of a particular measurement location, altimeter range data (direct sensor measurement) require a great deal of processing to be converted into usable surface heights. Measurements of ocean height rely on an onboard processor known as a "tracker" to dynamically estimate the approximate range of the target (i.e., the sea surface) in order to map received radar pulses to precise surface elevations. The onboard tracker works well for measuring ocean surface elevations, but it is unsuitable for estimating continental surface elevations. It thus requires further processing steps, known as "retracking". Using retracked river observations, inland radar altimetry can accurately measure changing river surface elevation (Koblinsky et al., 1993;Berry et al., 2005;Frappart et al., 2006;Alsdorf et al., 2007;Santos da Silva et al., 2010;Papa et al., 2010;Dubey et al., 2015;Tourian et al., 2016;Verron et al., 2018). While custom retrackers have been derived and tested in particular locations (Huang et al., 2018;Maillard et al., 2015;Sulistioadi et al., 2015) the ICE-1 retracker (Wingham et al., 1986) is arguably the best compromise between being consistently reliable and available for many altimeter missions (Biancamaria et al., 2017;Frappart et al., 2006;Santos da Silva et al., 2010). While available globally, the ICE-1 retracked data must be extracted over river targets, and carefully filtered, to make them useful to global hydrological modeling applications.
The four currently available radar altimeter datasets for rivers represent tremendous technical achievements: (1) Hydroweb (http://hydroweb.theia-land.fr/, last access: 3 January 2019), (2) Database for Hydrological Time Series over Inland Waters (DAHITI) (https://dahiti.dgfi.tum.de/en/, last access: 3 January 2019), (3) River&Lake Near Real Time (NRT) (https://web.archive.org/web/20181006062149/http:// tethys.eaprs.cse.dmu.ac.uk/RiverLake/shared/main, last access: 3 January 2019), and (4) HydroSat (http://hydrosat.gis. uni-stuttgart.de/php/index.php, last access: 3 January 2019). However, they are not optimized for the specific needs of global hydrologic modelers, who require global coverage and enhanced ease of use (accessibility and metadata). Note that River&LakeNRT is no longer online but we compare against it for historical reasons (an archive link has been provided). Existing datasets have several characteristics that make them challenging to use for global hydrologic modeling. First, they tend to include dense coverage where altimeters perform well (e.g., over large, tropical rivers) or are based on programmatic priorities of funding agencies. Hydroweb has 991 river VSs in South America alone, for example, primarily in the Amazon basin, while most include few Arctic rivers. One challenge of including Arctic rivers involves the complicating effect of river ice, which is widespread for much of the year. Three of the four datasets (Hydroweb being the exception) cannot be downloaded in bulk but require repetitive clicking via web interface.
In this study, we determined what fraction of available altimeter data would be useful for global rivers using retracked data available from the official distribution of the instrument data (the geophysical data records, GDRs), unsupervised methods, and automatic data filtering processes. The result is the Global River Radar Altimetry Time Series (GRRATS), a global river altimetry dataset comprised of an opportunistic exploitation of VSs on the world's largest rivers specifically suited for the needs of global hydrological applications. GRRATS uses the VS as its fundamental organizational element. VSs are locations where ground tracks of exact repeat altimetry mission orbits cross rivers, enabling the development of a time series of water elevation observations. VSs can be thought of much in the same way as an in situ river gaging station but are entirely derived from remotely sensed measurements of river surface elevation. GR-RATS is an Earth Science Data Record (ESDR) hosted at the Physical Oceanography Distributed Active Archive Center (PO.DAAC) with a focus on conforming to data management and stewardship best practices (Wilkinson et al., 2016). GRRATS currently spans 2002-2016 and includes global ocean-draining rivers greater than 900 m in width: these collectively drain a total of > 34 % of global land area. GR-RATS follows data management best practices as outlined by Wilkinson et al. (2016), and it includes extensive metadata. In developing GRRATS, our purpose is to create an accurate dataset, and also to create a better data product focused on ease of use.

Methods
There are four major steps in building GRRATS (Coss et al., 2016): (1) identification of potential VSs on global rivers, (2) extraction of altimeter observations from the geophysical data records (GDRs), (3) filtering out noisy returns from the altimetry, and (4) performing either quantitative of qualitative evaluation. The philosophy and overview of GRRATS methods are reviewed here, whereas details of GRRATS production are more thoroughly described in the user handbook (ftp://podaac-ftp.jpl.nasa.gov/allData/ preswot_hydrology/L2/rivers/docs/, last access: 3 January 2019).

Identification of potential VSs
We began by identifying potential VSs for GDR extraction by identifying locations on global ocean-draining rivers where altimeter orbital ground tracks cross river locations greater than 900 m in width. We chose 900 m as our lower width limit as previous work has shown that VSs with widths > 1 km present a higher probability of good performance (Birkett et al., 2002;Frappart et al., 2006;Kuo and Kao, 2011;Papa et al., 2012). This selection of rivers is spatially varied and large enough to provide a sensible constraint on global models. We used the intersection of the nominal al-timeter ground tracks with the Global River Widths from Landsat (GRWL) dataset to identify such locations (Allen and Pavelsky, 2018).

GDR extraction
We extracted altimeter observations at the VS from the GDRs; this consisted of three steps. First we spatially joined Landsat imagery (selected from times of mean river discharge) compiled for the Global River Widths from Landsat (GRWL) river centerlines dataset Pavelsky, 2015, 2018) with satellite ground tracks to define the width extent of the mask used for the extraction of water elevations. Each mask was constructed using the width extent and upstream and downstream limits that were 2 km perpendicular to the crossing location. We extracted all altimeter returns with centroids falling within each mask for each pass from Jason-2 GDR version D (Dumont et al., 2009) and the Envisat GDR, version 2.1 or later (Soussi and Féménias, 2009), using corrections outlined in product documentation. We extracted ICE-1 retracked ranges from the GDR (Gommenginger et al., 2011;Wingham et al., 1986). To get ellipsoidal heights, we applied the standard combination of parameters and corrections. We then converted these ellipsoidal heights to an orthometric height above the geoid, using the Earth Gravitational Model 2008 (EGM08; Pavlis et al., 2012).

Data filtering
We filtered altimetry data in a six-step process. First, we filtered using an a priori digital elevation model (DEM) data baseline elevation (median of all best available DEM values falling within the extraction polygon) at each VS. We used Shuttle Radar Topography Mission (SRTM), Global Multi-Resolution Terrain Elevation Data (GMTED), and Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER), in that order of preference (Abrams, 2000;Danielson and Gesch, 2011;Van Zyl, 2001). We filtered out elevations 15 m above or 10 m below the constrained baseline elevation. We arrived at these limits by examining over 150 United States Geological Survey (USGS) gages with upstream drainage areas larger than 20 000 km 2 and changing the upper filter limit (responsible for 90.5 % of data points filtered due to height) to 14 or 16 m, resulting in a 4.2 % increase and 3.8 % decrease in filtered points respectively. We determined these limits should reasonably encompass any measurements of the river surface as the Amazon flood wave is capped around 15 m from trough to peak (Trigg et al., 2009). Second, we applied an additional elevation filter removing any elevations that fell 2 m or more below the 5th percentile of surface elevations in the time series (0.03 % of total returns). We obtained low-end filter criteria for removing observations impacted by near-river topography at low flow by trial and error. Third, we flagged and removed elevations from times of likely ice cover. Ice cover dates were de- termined from USGS and Environment and Climate Change Canada (ECCC) data when available. If ice breakup data were not available, we applied broad date limits regionally, using observations from the Pavelsky and Smith (2004) study of Arctic river ice breakup timing. Breakup dates range from late September to early June. Fourth, remaining elevations were averaged for each cycle at each VS. Fifth, we removed any potential VS with < 25 % or 50 % of available cycles for rivers with and without ice cover, respectively. Finally, we determined a flow distance limit for tidal VSs (those where the tidal signal was dominant) using visual inspection of the time series on each river and removed VSs below that point.

Data evaluation
We acquired evaluation stage data from 65 stream gages (on 12 rivers) (Environment Canada, 2016;Jacobs, 2002;Martinez, 2003;USGS, 2016). All stage data are publicly available with the exception of data from the Congo, Ganges, Brahmaputra, and Zambezi, which were provided by the authors. Note that VSs rarely fall in the same location as a stream gage; thus, most studies recommend some VS-in situ stream gage distance (e.g., 200 km) beyond which comparisons are not performed (Michailovsky et al., 2012). Analyses showed that VS-stream gage distance was often not an accurate predictor of height anomaly differences. This is likely due to the hydraulics (width, nearby dams, confluences) of a more distant gage being more similar to the location of the VS than the most proximal gage. Thus, in this study, we compared each virtual station with all in situ gages available on the main channel of that river. At each VS, we reported error metrics for the best, median, and the spatially closest comparison. For completeness, we included VSs with poor error metrics; users can then select which of the VSs to use, based on their reported error statistics and the user applications. Following the normal practice in the field (e.g., Berry and Benveniste, 2010;Schwatke et al., 2015), we compare relative heights between VSs and gages, as opposed to absolute heights, in order to avoid the influence of the difference in datum and the lack of correspondence between satellite ground tracks and gage locations. We calculated relative heights by removing the long-term mean between the sample pairs of VS heights and the stage measured by the stream gages. Error metrics in GRRATS include the correlation coefficient (R), Nash-Sutcliffe efficiency (NSE), and standard deviation of the errors (SDEs). NSE is typically employed to describe the goodness of fit for a modeled result with measured values, so our use here is nontraditional. Nonetheless, we use NSE because, as opposed to R and SDE, NSE normalizes error with variation from the mean in the observed, or in our case, in situ data, by comparing error to actual variability. For example, 1 m of error can be an issue of varying severity when rivers can have height variation ranging from > 10 m (Amazon) to < 5 m (Saint Lawrence River). It is also an established metric for goodness of fit within the altimeter literature Tourian et al., 2016).
While qualitative grades are not as reproducible as best fit statistics, they have been used in the past to guide users to preferable time series when no other error metrics are available (Birkett et al., 2002). For the remainder of our VSs (without stage gages), we performed a qualitative evaluation of the station represented by a letter grade ranging from A (highest level of confidence on the data quality) to D (lowest level of confidence). The criteria used in the assignment of letter grades were based on the presence of obvious outliers, number of data points in the time series, and time series continuity with nearby VSs. We determined outliers by visual inspection. Letter grades take into consideration all of these criteria, but in general VSs with an A rating would have one or fewer obvious outliers per year, would have no more than two cycles filtered out per year, and would fit nicely above VS downriver and below VS upriver. A D rating might be applied to a VS with three or more outliers per year and with five or more cycles missing per year, and it might fall below VS downriver from it and above VS upriver from it. We explicitly recorded and document which VSs in GRRATS are evaluated using this qualitative approach.

Results and discussion
GRRATS processing produced a total of 932 globally distributed virtual stations (Fig. 1). The 39 GRRATS rivers account for 50 million km 2 (> 34 %) of the global drainage area and include 13 Arctic rivers. To attain these results, we extracted and processed a total of 1.5 million individual radar returns at 1478 potential VS locations.

Filtering returns
We removed 309 700 altimetry returns with our height filters (steps 1 and 2 of our filtering process), leaving 1.1 million (78.2 %) viable measurements. Our ice filter removed an additional 296 900 of the remaining returns (step 3) resulting in 810 400 viable returns (57.2 %). Averaging all height returns within the river polygons for each pass at each VS (step 4) led to a total of 102 300 (21 900 on Arctic rivers) pass-averaged measurements. VSs were required to retain 50 % (without ice) or 25 % (with ice) of their passes postfiltering to be included in the final data product, resulting in the removal of 465 potential VS locations (step 5). VSs were also removed by visual inspection if they were tidal, resulting in the removal of an additional 45 stations (step 6). While many VSs were filtered heavily, 72.8 % of the total returns for all VSs in the final product passed all filters (the median VS value being 97.7 %) and 227 VSs lost no returns. The filtering process resulted in a total 932 VSs for evaluation derived from standard retracked data (ICE 1). These VSs had a dataset-wide average of ∼ 16 measurements per year (9.5 for Envisat VSs and 35.8 for Jason2 VSs). Maximum NSE (best fit) plotted in yellow to red (shown on all rivers with gage data) and qualitative grades plotted in teal to dark purple. In both cases, darker colors indicate better evaluation results. Each river is evaluated using only one of these methods. Figure 2 shows example GRRATS time series for the Mackenzie and Amazon Rivers and corresponding in situ gages. Error bars represent the range of the values that were averaged to generate each data point (does not include filtered data points). Data necessary to compute error bars are a part of the data product. Comparison between the Jason-2 time series and the gage on the Mackenzie River produced SDE = 0.5 m, NSE = 0.41, and R = 0.64. In this case, the gage used for evaluation was located ∼ 700 km upriver (Fig. 2a). The SDE is approximately consistent with what is expected from the literature (Asadzadeh Jarihani et al., 2013;Frappart et al., 2006). However, the SDE is relatively large in comparison with the overall annual range in the time series (typically ∼ 2 m) observed from the gage (see Fig. 2a), leading to a relatively low NSE. Additionally, several cycles have far larger errors, reaching up to 2 m in some cases. There are a total of three in situ gages used for evaluation on the Mackenzie River. Across the three gage comparisons, this VS had median statistics of 0.58 m, 0.35, and 0.64 for SDE, NSE, and R, respectively. Comparing the VS data to the gage on the Amazon River yields SDE = 0.98 m, NSE = 0.94 and R = 0.97, with the evaluation gage 263 km upriver from the VS (Fig. 2b). Despite the SDE being nearly twice as large, the magnitude of change on the Amazon allowed for a much better fit due to the large interannual variability of the Amazon flood wave (> 10 m). Most of the error was from times of low flow near the ends of the calendar year in 2009, 2011, and 2012. There are six in situ gages on the Amazon River. Across these comparisons, this VS had median statistics of 0.94 m, 0.95, and 0.98 for SDE, NSE, and R, respectively.

GRRATS evaluation across all rivers
We compared GRRATS against in situ evaluation data on a total of 12 rivers. This provided evaluation of 380 of the 920 virtual stations (42 %). On each river, the total number of time series evaluations was the product of the number of VSs and the number of gages (Fig. 1). Thus, the total number of time series evaluations (summed across all 12 rivers) was 1915 (Table 1). A total of 72.5 % of the quantitatively evaluated virtual stations had an NSE greater than 0.4 when compared with at least one gage. The highest maximum NSE (Fig. 3a) was 0.98, from an Envisat VS in the upper reaches of the Amazon. The median value for maximum NSEs for all VSs was 0.75 (0.67 from closet gage comparison Fig. 3c). A total of 341 of the 389 (87.7 %) virtual stations had a maximum NSE > 0 (Fig. 3a) .The highest median NSE ( Fig. 3b and values were 0.96 at two Envisat VSs on the Orinoco river (lower and middle). A total of 277 of 389 (71.2 %) had a median NSE > 0.  The smallest minimum SDE (to two significant digits) was 0.11 m and occurred at an Envisat VS on the upper Congo. The median value for minimum SDE (Fig. 3d) for all VSs was 0.93 m (1.08 m from closest gage comparison Fig. 3f). The minimum and median values for median SDE (Fig. 3e) were 0.31 m and 1.3 m respectively. Our SDE error statistics are greater than previous work reporting accuracies ranging from 0.14 to 0.43 m for Envisat data and 0.19 to 0.31 m for Jason-2 data (Frappart et al., 2006;Kuo and Kao, 2011;Papa et al., 2012;Santos da Silva et al., 2010). This discrepancy is likely because GRRATS includes VSs on rivers where evaluations have not previously been reported in the literature and because of the fact that we do not fine-tune processing or filtering to each VS due to the global nature of the dataset.
Some locations with relatively low SDE values showed poor performance in terms of NSE, particularly for rivers with relatively low water elevation variability. VSs on the Saint Lawrence River had a minimum SDE ranging from 0.58 to 3.27 m. The VS with a 0.58 m SDE corresponded with a maximum NSE value of −0.27, indicating quite poor performance in resolving river variations (standard deviation of 0.35 m). The Saint Lawrence River is anomalous in other ways as well. For two potential VSs (one each from Jason-2 and Envisat), the unprocessed data (ICE-1 retracked GDR data) showed a bias of several tens of meters above the baseline height, and thus no data for these VSs are included in GRRATS. Closer examination of these VSs seems to indicate that the onboard tracking window was often tens of meters outside of the river surface range, making retrievals from the surface impossible. This case is particularly odd as such errors are not expected for wider rivers; the Saint Lawrence River is between 2 and 7 km wide where we sampled it. Such errors are more commonly associated with altimeter returns from near-river topography on narrow rivers (Biancamaria et al., 2017;Frappart et al., 2006;Maillard et al., 2015;Santos da Silva et al., 2010). Moderately poor performance from the remainder of VSs in terms of NSE and SDE on the river is likely due to the river lacking enough variation in height to allow for retrieval of a good signal outside the error range of radar altimeters. However, these low-variation data can still be quite useful to modelers for determining if their results show excessive change in the annual cycle of water elevations.
The median of the maximum R values (Fig. 3g) for each station is 0.9 (0.87 from closest gage comparison Fig. 3i).
The maximum R value plot shows left skewness, similar to the NSE results. The lowest maximum R value of −0.15 occurred at an Envisat VS on the middle Saint Lawrence River, which was the only virtual station to display a negative correlation. The best maximum R value was 0.99 for an Envisat station near the mouth of the Ganges River that also displayed high NSE and low SDE. The median value of the median R (Fig. 3h) is 0.69. The values range from −0.18 (an Envisat VS on the lower Saint Lawrence River) to 0.99 (an Envisat VS on the lower Brahmaputra).
For 27 of the 39 rivers in the GRRATS dataset, no in situ data are available for evaluation. We gave the remaining 27 rivers qualitative letter grades based on number of missing data points, obvious outliers, and agreement with nearby stations. These grades are included with the data for end users ( Table 2). The majority of rivers evaluated this way fall into the B or C category (∼ 61 %), with only ∼ 15 % getting an A rating.

Towards quantitative performance prediction
As is evident above, radar altimeter performance varies dramatically across rivers and across VSs. Generally, measurements from wide rivers without large topographic features in the altimeter footprints that have large seasonal water elevation variations tend to result in better altimeter performance. In order to identify conditions that may contribute to poor return quality, we compared both VS width and percentage of original returns postfiltering, near-river topography, and river height variation with all three fit statistics. We found no statistically significant relationships in this evaluation, a finding that supports existing literature on quantitative prediction of altimeter performance (Maillard et al., 2015). Indeed, we found many examples of counterintuitive performance in our examination. The Saint Lawrence River (described above) is an example of unexpectedly poor performance; typical predictors such as width (smallest VS ∼ 1.5 km wide) and the lack of extreme proximal topography led to an expectation of accurate performance that was not met. Meanwhile, other rivers defied the normal pattern by showing good fit metrics while being far narrower. The Mississippi River was consistently at our lower limit for river width. The VS widths ranged from 509.1 to 2 608.0 m and had an average width of just 955.3 m. The average near-river relief ranged from 10 to 60 m. The Mississippi maximum NSE values ranged from −0.22 to 0.96, with an average of 0.43. Minimum SDE values ranged from 0.34 to 2.22 m, with an average of 1.18 m. Additionally, we computed average error statistics across all VSs along each river. Some rivers stood out as particularly good or poor performers (Table 3), but no broad geographical patterns emerged. For this reason, we recommend using the median (dataset wide) value for the evaluated SDE (0.93 m) as an error estimate for VSs without evaluation data, as this is representative of 42 % of all of the VSs in the dataset. While we do not provide error estimates at the individual data point level, we suggest that individual VS data point errors be treated as the SDE of the time series they are a component of.

Comparison to other altimetry datasets
While it is outside the scope of this study to compare GR-RATS exhaustively with existing datasets, we find it appropriate to demonstrate that our dataset is comparable. Therefore, we compared three VS locations that are in each of the four datasets discussed (one on the Amazon, Congo, and Brahmaputra). Figure 4a-c show time series anomaly at each VS and the closest gage. Note that time series lengths are limited to the shortest time series in the comparison and do not match the coverage of any particular mission. Also note that River&LakeNRT data were unavailable for the VS location shown on the Brahmaputra. GRRATS, DAHITI, and Hydroweb are similar and fit with the in situ gage well (Table 4). DAHITI is missing data on the Amazon time series. HydroSat and River&LakeNRT are frequently out of phase, particularly on the Amazon River (Fig. 4a). Performance is similar on ungaged rivers when compared (Fig. 5). GRRATS and DAHITI showed good agreement on the Paraná River (Fig. 5a). HydroSat and Hydroweb (Fig. 5b-c) are differentiated from GRRATS on the Ob and Lena rivers, as they show heights from a frozen river that GRRATS flags and removes. During overlap, HydroSat and GRRATS were similar at the Ob River VS. Hydroweb data on the Lena is similar to GRRATS, with the exception of the 2006 peak flow, which is missing. Note that much of the rising limb is missing in these time series as it occurs during times of ice cover. Unfiltered data and ice flags are available to data users if needed. This process demonstrated that our quasi-automated methods produce a dataset with global coverage and performance that approximates the accuracy of regional altimetry datasets.

Data availability
GRRATS (https://doi.org/10.5067/PSGRA-SA2V1) for noncommercial use only (Coss et al., 2016). Data are provided in NetCDF format. For a file content description please see Appendix A. An interactive map of the data is located at http://research.bpcrc.osu.edu/grrats/ (last access: 6 November 2018; Gou, 2017). This tool is intended for exploration only and may not reflect the most-up-to-date version of the data. As with Fig. 2, error bars represent the range of the values that were averaged to generate each data point (does not include filtered data points). Data necessary to compute error bars are a part of the data product.

Conclusions
We find that uniform altimeter data processing produces usable data with accessible documentation for end users. Encouraging end user understanding of how these kinds of data are produced is critical in fostering its use across the scientific and stakeholder communities. GRRATS considers only ocean-draining (highest order) rivers, while other datasets in-  clude some VSs on large tributaries. However, our use of the GRWL dataset allowed for a comprehensive selection of altimeter crossings on a global scale. These features should enable broad use by the scientific community. This resulted in GRRATS having the best coverage available for North American rivers as well. We produced GRRATS with ease of use in mind. VS metadata are included and the product can be downloaded in bulk.
On the whole, the median value of the error standard deviation is 0.93 m, which is similar to or slightly larger than values reported for the rivers that are most commonly studied using radar altimetry (e.g., the Amazon and Congo). Our philosophy in constructing the dataset was to maximize the spatial coverage of altimeter crossings, to construct the product in a uniform way, and to provide an evaluation of quality for each VS. Thus, users can decide whether each VS is useful given their data needs. Note that a total of 77.2 % of virtual stations evaluated against in situ data had an NSE > 0.4. Our uniform production method allowed us to evaluate whether river width or the height of bluffs proximal to rivers at altimeter crossings correlates with altimeter performance, as was expected in the literature. However, we were unable to identify a predictive model for altimeter performance and leave this exercise for future work.
The GRRATS dataset maximizes traceability: all of the information needed to reprocess these VSs is included in the final data product. It is our expectation that other researchers could implement other methods of filtering and processing to achieve derived data products tailored to their applications.  Table A1. The global variables are longitude and latitude of the center of the virtual station, the virtual station ID, the satellite name, flow distance, sampling rate, the satellite pass number and a suite of fit statistics, or a qualitative letter grade. Qualitative letter grades were assigned based on amount of data points, seasonal pattern, and similarity to nearby VS. This was done only when validation data were unavailable. When validation was possible, the VS was evaluated with all gages on the river through relative height comparison. Maximum Nash-Sutcliffe efficiency (NSE), average NSE, maximum R (correlation coefficient), minimum standard deviation of error (SDE), and average SDEs are reported.  Table A2. This includes the data from each return: long and lat, the height of the water level in meters, the signal strength, sigma0, in decibels, a "peakiness" value, the cycle number, the time of the return, and filter flags that signal 1 for data that should be included and 0 for data that should be excluded. The flags are for a height filter, an ice filter, and the logical intersection of the two (allfilter), with 1 denoting returns that pass through the filter and 0 denoting returns that do not.  Table A5. These are the filter data; nNODATA gives the number of cycles that have no data because of a lack of data in the GDR and/or data that are filtered out. riverh gives the river elevation extracted from a 30 arcsec DEM of the region. This is used for the height filter. maxh and minh are the upper and lower bounds of river heights included in the filtered data; we set an elevation of +15 m or −10 m from the DEM river elevation as a first pass, and we then removed any data that was 5 m below the 5th percentile of river stage heights. icethaw and icefreeze are the thaw and freeze dates, respectively, for the years included in the altimetry dataset. The DEM used refers to the DEM that the baseline height was taken from.

Filter
Variable Dimension Data type Units Name nNODATA -int32 count number of cycles without data riverh Z double meters above EGM2008 geoid river elevation from filter file maxh Z double meters above EGM2008 geoid max elevation allowed by filter minh Z double meters above EGM2008 geoid min elevation allowed by filter icethaw T double days since 1 Jan 1900 00:00:00 thaw dates for river icefreeze T double days since 1 Jan 1900 00:00:00 freeze dates for river DEMused DEM char -DEM used in height filter Author contributions. SCo developed and finalized processing algorithms, performed the methods exploration, was the primary data manager, analyzed the final product, and finalized the manuscript. MD developed algorithms, performed the quality analysis, and provided editorial and graphical assistance. YY performed the GDR extraction and geodetic corrections and was the primary unprocessed data manager. YJ performed the methods exploration. QG performed the methods exploration. ST developed algorithms and performed the methods exploration. CKS provided technical expertise and editorial assistance. GHA and TP provided access to the GRWL width data used to find the virtual station targets. SCa provided technical expertise, in situ data, and editorial assistance.