HydroSat: a repository of global water cycle products from spaceborne geodetic sensors

. Against the backdrop of global change, both in terms of climate and demography, there is a pressing need for monitoring the global water cycle. The publicly available global database is very limited in its spatial and temporal coverage worldwide. Moreover, the acquisition of in situ data and their delivery to the database are in decline since the late 1970s, be it for economical or political reasons. Given the insufﬁcient monitoring from in situ gauge networks, and with no outlook for improvement, spaceborne approaches have been under investigation for some years now. Satellite-based Earth observation with 5 its global coverage and homogeneous accuracy has been demonstrated to be a potential alternative to in situ measurements. This paper presents HydroSat as a repository (cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58) database of global water cycle products from spaceborne geodetic sensors. HydroSat provides time series and their uncertainty of: water level from satellite altimetry, surface water extent from satellite imagery, terrestrial water storage anomaly from satellite gravimetry, lake and reservoir water storage anomaly from a combination of satellite altimetry and imagery, and river discharge from either satellite altimetry or imagery. These products can contribute to 10 understanding the global water cycle within the Earth system in several ways. They can act as inputs to hydrological models, they can play a complementary role to current and future spaceborne observations, and they can deﬁne indicators of the past and future state of the global freshwater system. The repository (cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58) HydroSat (cid:58) is publicly available through http://hydrosat.gis. products come with an improved temporal resolution relying multi-mission

.6292/ 2135.75 1 Nodal day The OLTC was first implemented on the Poseidon-3 altimeter onboard the Jason 2 satellite to improve the on-board tracker through properly setting the reception window of the return echoes. This can be realized by OLTC tables consisting of the overflown surface elevation (Le Gac et al., 2019) Such an adjustment helps the altimeter to successfully track inland water 95 bodies, especially within rough topography. After the successful experience with Jason-2, OLTC was also implemented in SARAL/AltiKa, Jason-3 and Sentinel-3. On the other hand, inland water monitoring using satellite altimetry has benefited extremely from the Delay Doppler technique or the concept of Synthetic ApertureRadar (SAR), first used in CryoSat-2. In the delay Doppler technique, after delay compensation, the height estimates are sorted by Doppler frequencies and integrated in parallel, thus accumulating more looks than a conventional altimeter and a relatively small along-track footprint (ca.

100
) (Raney, 1998). Sentinel-3A was the first mission to operate in SAR mode and in OLTC nearly globally.
Inspired by recent developments, the abundance of altimetry missions, and the SWOT mission in view, the development of repositories and services to provide inland water level time series to supply data for Earth system understanding, hydrologic cycle monitoring, and hydraulic studies is becoming more important than ever. Table 1 lists currently available websites or repositories for providing water level time series of inland water bodies.   (Coss et al., 2020). Within a rather unprecedented framework, the 120 open source web application, AlteEx allows for exploring altimetry datasets and generating water level time series on-the-fly.
All measurements inside a lake or reservoir, for instance, could belong to the same virtual station. This is to assume that the along-track geoid height and altimetric corrections are properly known. In case of a river, the assumption further implies that the spatial dynamics of the water body is locally negligible. Defining a VS, in effect, allows for reducing random noise carried 180 by the altimetric measurements.
HydroSat follows a rather flexible approach in defining a VS. The boundaries can be determined using static or dynamic shapefiles, a radial extent around a specific point of interest, or an intersection of the two. Furthermore, HydroSat employs auxiliary sources of information like the water occurrence frequency derived from Landsat imagery by Pekel et al. (2016). This allows for removing measurements over areas with a water occurrence frequency below a certain threshold. Figure S1 shows 185 the block diagram for generating SR water level time series, where the properties of the VS is the basic setup information to be introduced. Defining the VS setup is in fact the only subjective choice made throughout the whole procedure. Nevertheless, flexibility in defining a VS is tolerated as long as reproducibility of the results is guaranteed.
The altimetric water level is initially determined for each sample within the VS. First, range measurements are corrected for geophysical effects (solid earth tide and pole tide) and path delays caused by the atmosphere (wet tropospheric, dry tropospheric 190 and ionospheric). The water level is then calculated by subtracting the corrected range from the satellite altitude. In the next step, the reference height is changed to geoid according to static gravity field models from XGM2019e (Pail et al., 2018), EGM2008 (Pavlis et al., 2012), or EIGEN6C3 (Förste et al., 2012). To ensure a robust estimation at each overpass, the median of orthometric heights inside the VS is chosen to be the representative height.
It is important to notice that regardless of the choice of the VS, retracker, corrections, and statistics by which a typical 195 altimetric water level time series is generated, the result may be affected by many outliers. As already mentioned, missions supporting OLTC and SAR mode are less likely to provide erroneous range measurements. However, neither the OLTC nor the delay/Doppler concept are capable of compensating for all unwanted radar processes over inland water bodies. Their usefulness may even be restricted by a number of known factors -e.g. crossing angle for SAR missions. An outlier identification algorithm is, therefore, required to clean the final water level time series from erroneous measurements.

200
Flowchart of obtaining SR altimetric water level time series for a single virtual station.
In order to identify the outliers, HydroSat uses an automated, data-driven outlier identification methodology designed within an iterative, non-parametric adjustment scheme. As indicated in Figure S1, the inputs to the algorithm are the water level time series and stochastic information derived from a number of outlier indicators (e.g. standard deviation of along-track estimates within a VS). The algorithm uses Singular Spectrum Analysis (SSA) for gap-filling, Savitzky-Golay filtering for 205 smoothing, and a specially developed outlier identification method (Behnia et al., 2021b). The identification method benefits from applying a local kernel derived based on a local definition of an outlier. The identified outliers are not discarded instantly.
The overall scheme allows for a possible correction of the outlying estimation through a retracking method based on Leading Edge Identification with Prior Information (LEIPI) (Behnia et al., 2021b) ::: that :::::::: identifies ::: the :::::: leading ::::: edge :: by ::::::::: benefiting ::::: from :::: prior :::::::::: information. Such possibility is assessed via comparing the single water level estimations inside the VS with the water 210 level estimation coming from an ensemble model. The model, which is a by-product of the outlier identification algorithm, is used to bound the search area for identifying the true leading edge of the waveform. After retracking, the newly estimated heights are verified by their similarity to the along-track pattern of outlier-free cycles. It is important to notice that after rejecting or correcting the outlying measurements, HydroSat does not low-pass filter the water level time series. It can therefore fully capture the high frequency behavior within the limitations of satellite sampling.

230
However, this strategy may come at the cost of a higher error level for some stations, e.g. the Mississippi river. A SR water level time series is the basic altimetry product of HydroSat. It is the input to algorithms which provide HR water level time series over lakes, reservoirs, and rivers.

235
2.3 High-Rate water level time series from satellite altimetry 2.2.1 ::::::::: High-Rate ::::: water ::::: level :::: time ::::: series ::::: from ::::::: satellite :::::::: altimetry In order to obtain a HR product and cope with the limitation of temporal sampling of single-satellite inland water monitoring, multi-mission altimetry is applied. The multi-mission altimetry for lake monitoring is now a standard approach practiced by various studies and data providers (Crétaux et al., 2013). Assuming that a lake surface is an equipotential surface, allows 240 to perform even calibration studies over lakes. However, for multi-mission altimetry one challenge is posed by the intersatellite biases, which impedes a straightforward combination of water level measurements. Moreover, inaccurate atmospheric corrections (wet tropospheric) may cause large biases of several decimeters, which is even more pronounced for rivers due to inhomogeneous neighboring topography (Fernandes et al., 2014). Unlike lakes, multi-mission studies over rivers are very limited. Only a few studies have been dedicated to water level monitoring of rivers using a multi-mission approach with the 245 focus on improving the temporal resolution Boergens et al., 2017). Here the challenge is to combine measurements from different missions at different locations with dissimilar dynamic behavior and hydraulic parameters. In general, HydroSat follows two different approaches for obtaining the HR product over lakes and rivers, outlined in the following sub-sections.
2.2.2 High-Rate altimetric water level over lakes and reservoirs 250 2.2.2 ::::::::: High-Rate ::::::::: altimetric ::::: water :::: level ::::: over :::: lakes :::: and ::::::::: reservoirs HR lake and reservoir level time series are provided based on the well-known multi-mission concept. As already mentioned, however, the integration of single water level time series is hampered by unknown biases, typically referred to as inter-satellite biases. Studies have been performed to tackle the problem, some targeting the specific case of lakes and reservoirs. Crétaux satellite biases based on a maximum likelihood approach either for two missions with overlapping periods or by introducing ICESat-2 as intermediary for non-overlapping ones.
HydroSat resolves biases in a relative, generic, regional and mission independent manner (see Figure S2). First, SR water level time series are categorized into groups of temporally overlapping and non-overlapping time series. For a group of overlapping time series, relative biases are estimated by minimizing a cost function for the merged time series. The cost function 265 represents the difference between the power content of individual SR time series. If stationarity holds, minimizing such a cost function ensures the estimation of a correct relative bias. If stationarity does not hold, or in case any unresolved bias remains (scenario 2), remotely sensed surface area time series are used to act as an anchor of biased time series, allowing for estimation of the relevant biases. Here, a 2D cost function in surface area-water level coordinates is minimized within a Gauss-Helmert adjustment scheme. Such a cost function is also used in case measurements have no overlap (scenario 1), leading to bias removed 270 SR lake water level time series. It is important to notice that relative biases are not necessarily estimated between missions but between time series. This allows for considering the inaccuracies of geoid or altimetry corrections within one lake and over different tracks (Behnia et al., 2021a).
Flowchart of bias correction for obtaining SR and HR altimetric water level time series over lakes.
A relative solution is preferred because absolute estimation of biases requires along-track in situ measurements. The absolute 275 inter-satellite biases over a specific region, on the other hand, are not necessarily applicable elsewhere due to inhomogeneity of correction models at global scale. It shall also be mentioned that a great portion of lakes and reservoirs are monitored by a few altimetry missions, often times with less than sufficient periods of overlap. For instance, the long-term, overlapping, 10-day-revisiting Jason series, only monitor a small number of lakes given their coarse ground-track pattern. Our proposed method is therefore designed to be least affected by these restricting conditions. 280 Figure 2 shows 2.2.3 ::::::::: High-Rate ::::::::: altimetric ::::: water :::: level ::::: over ::::: rivers : HR water level time series over a selection of five lakes and reservoirs of very diverse characteristics. While lake water level time series from HydroSat, DAHITI, Hydroweb, and G-REALM are in acceptable agreement with one another and with in situ measurements, some differences are noticeable. Over lake Erie, for instance, HydroSat better captures measurements at the 285 tails of the water level distribution meaning that the actual fluctuations in lake level are better presented. This is due to the fact that besides outlier rejection, as described in Section 2.2.1, no further smoothing is applied. The HR water level time series of lake Urmia signals yet another difference. All altimetric time series seem to have captured the long-term depletion of the lake level, followed by the recent and ongoing restoration period (Saemian et al., 2020). The majority, however, overestimate the lake level between late 2010 and early 2019. The altimetric measurements during this period are mainly from Jason-2 290 and Jason-3 missions. The satellite's ground track happens to cross the shallow south east of the lake. During this 9-year low water period, the altimeters have measured the range to the salt pan that remain in the southern part after the lake desiccation.
Excluding such a measurement without incorporating auxiliary sources of data is rather impossible. HydroSat can deal with this issue within the inter-satellite bias estimation ( Figure S2) by taking the surface area into account. Any SR water level time series that exhibits inconsistency with an expected behavior fails to contribute to the HR lake water level time series. altimeters along a river hydraulically and statistically. As an example, Figure S4 shows individual SR time series from different satellite missions along the Weser River in Germany. The idea of a HR product is to combine all these measurements for an arbitrary location along the river into a time series with improved temporal resolution. Water level estimates from different satellite altimetry missions along the Weser River in Germany. Figure S3 shows the main processing steps of the densification method. Initially, the time lag due to streamflow between 305 the altimetric virtual stations and the selected location is estimated. The average river width from satellite imagery together with the slope derived from satellite altimetry are used as inputs to a simple empirical hydraulic equation, which ultimately estimates the average flow velocity and thus the time lag Bjerklie et al., 2003) Flowchart of obtaining HR product over rivers through densification of individual altimetric water level time series along a river 310 Using the estimated time lag, the water level hydrographs of all measurements are shifted and stacked at the selected location.
The stacked time series is then normalized according to its statistical distribution with the 3rd percentile assigned to 0 and the 85th percentile to 1. Outliers are then identified and removed from the normalized time series by a Student's t-test for a sliding time window of one month. All measurements outside the confidence limit are identified as outliers and removed from the measurements. The outlier-free normalized time series is then rescaled according to the statistical water level distribution of 315 the selected site. Details on the implementation of the method can be found in . correlations are reported. It is worth mentioning that the Weser is a river with an average width of ca. and a maximum of ca.

320
, over which the water level agrees with the discharge with a rank correlation coefficient of 0.71. Similarly, the Vistula River with its average width of is an utterly challenging river for satellite altimetry. However, the HR water level time series shows a rank correlation coefficient of 0.65 with in situ river discharge. Note that the large uncertainty in the HR time series is due to the large discrepancy between measurements from different satellite missions along the river (see Figure S4). 3 Surface water extent from satellite imagery 330 Surface water storage is an important component of the hydrological cycle and its accurate monitoring requires a realistic representation of the surface water extent . Lack of such observations, however, has obscured the proper quantification of the freshwater storage and its spatio-temporal dynamics over many water bodies. With their global coverage and fine temporal resolution, satellite images provide an opportunity to monitor the surface water extent on a global scale and for almost all river basins. To this end, attempts have been made to generate dynamic water masks from different spaceborne 335 missions with various temporal and spatial resolution. Some studies (Klein et al., 2017;Zhang and Gao, 2016;Khandelwal et al., 2017) take advantage of MODIS images to generate time series of surface water mask with a fine temporal resolution.

High
Due to the coarse spatial resolution of MODIS images, however, small water bodies are excluded from their dataset. On the other hand, some studies (Donchyts et al., 2016;Pekel et al., 2016;Schwatke et al., 2020) use Landsat images to generate time series of surface water bodies. In comparison to MODIS, Landsat images have a better spatial resolution (30 m). Nevertheless, 340 their coarse temporal resolution is a limiting factor for monitoring the fast dynamics of the water bodies.
Similar to altimetric water level time series, Hydroweb is the first website to provide lake surface area time series (Crétaux et al., 2011). Recently DAHITI  has also boosted its database by the time series of lake area from optical satellite images (Table 2). Moreover, since recent years the Bluedot observatory provides reliable and timely information about surface area of lakes and reservoirs based on Sentinel-2 imagery, globally.

Canyon Ferry Lake
McConaughy Lake Harlan County Lake Mark Twain Lake Rathbun Lake Stockton Lake Arkabutla Lake Barren Lake Dale Hollow Lake

380
Hence, in addition to spatial correlations, strong temporal correlations can be used as an additional source of information.
In order to derive a river mask, Elmi et al. (2016)

estimate a Maximum A Posterior solution of a Markov Random Field
(MAP-MRF), in which the spatial interactions between pixels and temporal variation of the pixel values is considered. Figure   S5 (top panel) presents the block diagram of the proposed method. First, the cloud-covered images are removed. Initial water masks are then generated by applying a dynamic threshold. The procedure continues by developing the joint conditional models, rearranging the problem as one of energy minimization, and developing a graph. In the next step the MAP solution is found by applying the graph cuts technique. Using the MAP solution, the initial water masks and the frequency coverage map are updated, allowing for modifying the already developed graph. The final river mask is then obtained by finding the MAP solution for the modified graph. The uncertainty of the derived masks are estimated by marginalizing the final residual graph Elmi, 2019)) :::::::::::::::::::::::: Elmi, 2019).

390
Flowchart of the proposed method for generating time series of surface water extent from MODIS images (top pannel).
Flowchart of the proposed algorithm for generating time series of surface water extent from Landsat-based GSWD product (bottom panel) Although MODIS provides homogeneous daily snapshots of the Earth's surface for more than 20 years, its coarse spatial resolution is a limiting factor for generating dynamic river masks of relatively narrow river reaches and small lakes. To tackle from this dataset, the algorithm described in Figure S5 (bottom panel)is developed . The algorithm performs 405 similar steps as for MODIS relying on the GSWD masks instead of the original images.
Using the described algorithms, HydroSat provides surface water extent time series of lakes, reservoirs and river reaches from optical satellite images. From the obtained river reach area an effective river width can be determined by dividing the area by the length of the river reach. Figure 4 shows time series of surface water extent for rivers and lakes derived from Landsat imagery in the Mississippi River Basin.

410
Time series of river reach area of selected river reaches in the Mississippi River basin are in the middle panel.
To analyze the performance of the algorithm, we compare the time series of monthly river masks from 9 river sections (average length of ) in the Mississippi River basin with in situ discharge measurements from nearby USGS stream gauging stations. In all river reaches, we observe a relatively high rank correlation. However, in some river sections, as in Figure 4(g, f), the sections are too narrow (about 25 and ), even for Landsat imagery with spatial resolution.

415
For lakes and reservoirs (second set of time series in Figure 4), the time series of surface water area are compared with altimetric water level time series. For this comparison, 9 small to medium-sized lakes or reservoirs in the Mississippi River basin are selected. In general, water level time series and surface water area show a good agreement represented by the reported rank correlation coefficients. For water bodies with smaller size, it is expected that the rank correlation decreases. As shown in Figure 4, the rank correlation falls off to 0.57 and 0.61 for Rathbun and Barren Lakes with average areas below and .

Water storage anomaly from GRACE and GRACE Follow On
Global observation of total water storage change is vital for understanding the water cycle and climate system dynamics.
The variations of water storage indirectly reflect the Earth's energy storage, ocean heat content, land surface water storage, biogeochemical, and ice-sheet response to global warming (Tapley et al., 2019;Famiglietti, 2004). Water storage variation, 425 both globally and regionally, influences our societies as it affects agricultural, industrial, and domestic water use. ::::::::: Moreover, :: the ::::::::: dynamics :: of ::: lake :::: and ::::::: reservoir ::::::: storage : is :: a ::: key :::::::: parameter :: in :::::: studies ::::: about ::: the :::::: global :::::::::: hydrological ::::: cycle. : Nevertheless, for a long time, monitoring of :::::::: terrestrial ::: and :::: lake : water storage changes relied on insufficient site measurements, which was costly and time-consuming. Moreover, on continental scales, it was not possible to map water storage because of the sparseness of the station networks (Rodell and Famiglietti, 1999). Furthermore, measurements of water storage changes by gauging groundwater 430 level and soil water saturation changes are not reliable due to the lack of accurate storage coefficients (Strassberg et al., 2007;Riegger et al., 2012). Hydrological and land surface models have alleviated the problem to some extent. Such models estimate TWS and its components via simplifying real-world systems. The model outputs, however, are subject to high uncertainty and low accuracy due to the lack of global and systematic hydrological data (Jiang et al., 2014). (GFZ), provides quality-controlled data from Level-0 (KBR range data) to Level-3 (grids). Moreover, Level-3 data from the mascons approach in terms of TWSA can be accessed from UT-CSR, JPL, and NASA-GSFC, while the latter does not pro-vide GRACE-FO observations. Other than Level-3 data, several centers help to visualize GRACE TWSA. The JPL and GSFC mascons, for instance, can be visualized using the Mascon Visualization Tool from the University of Colorado Boulder, and 455 the basin-wise variability of TWS can be obtained from the Gravity over basins Information Service (GravIS) website. Furthermore, several data browsers allow the interactive retrieval of GRACE and GRACE-FO data, including the one developed within the International Center for Global Earth Models (ICGEM) project, the GRACE Plotter, and NASA data Analysis Tool. Table 3 lists the above mentioned centers and products, including HydroSat. Table 3. List of centers which provide Level-3 TWSA from GRACE and GRACE-FO.
Each monthly solution contains the full hydrological, cryospheric, and Glacial Isostatic Adjustment (GIA) signal in the form of fully normalized Spherical Harmonic (SH) coefficients, after removing the contributions from other phenomena like tides (ocean, solid earth, and atmospheric), atmospheric and non-tidal oceanic mass changes.
Furthermore, the monthly solutions are contaminated by noise from different sources including the high-frequency noise in the spherical harmonic coefficients due to the orbit geometry and sensor noise. In order to reduce the high-frequency noise and retrieve mass changes, Gaussian filtering with a radius of 400 km is applied (Wahr et al., 1998). Moreover, to correct for leakage due to filtering, existing methodologies are employed e.g data-driven approach proposed by Vishwakarma et al. (2017) and forward-modelling (Chen et al., 2013b). Finally, HydroSat corrects the GIA signal following (A et al., 2012). The

Lake and reservoir water storage anomaly
The dynamics of lake and reservoir storage is a key parameter in studies about the global hydrological cycle. The variation of 505 lakes and reservoirs ::::::: variation ::: of water height and surface area :: of :::: lakes :::: and :::::::: reservoirs : have been successfully monitored using spaceborne measurements, as demonstrated in previous studies. Table 4 lists some of these studies providing either time series of water volume change or lake and reservoir bathymetry. Table 4. Overview of studies proving ::::::: providing : time series of lakes and reservoirs water volume variations Study :::: study Water :::: water level Surface ::::: surface area Remark ::::: remark ::::::::::::: The mentioned studies follow almost the same strategy to generate water volume anomaly time series or a bathymetry map. In these studies, after collecting the simultaneous surface water area and level measurements, the empirical relationship 510 between lake water level and area is developed. Then the water volume variations are estimated using by the so-called end-area formula or pyramid formula.

525
The algorithm starts by acquiring the water area from satellite imagery (explained in Section 3) and the time series of the water level (explained in Section 2). To derive the time series of the water volume anomaly and also the water level-area-volume model, the following steps are performed: -Generating monthly water level time series with the same temporal sampling as the surface area time series -Creating the scatterplot of the simultaneous surface water area and level and removing the blunders in the scatterplot

530
-Defining the water surface area-level model through either parametric and non-parametric approaches -Estimating the water volume variations by assuming that between two successive pair of measurements (water level H and lake area S), the lake morphology is regular and has a pyramidal shape, (Abileah et al., 2011). ::::::::::::::::::::::::::: ::::::::::::: (1) -Obtaining the water area-level-volume model 535 Flowchart of obtaining water volume anomaly using surface water area from satellite imagery and water level time series from satellite altimetry Figure 6 shows time series of water volume anomaly for some small, medium and large lakes with different climate characteristics. A statistically representative joint time period for altimetry and imagery data is crucial to obtain a reliable estimate of water volume. For example Tana, Sobradinho, and Dale Hollow Lakes, simultaneous surface water area and water 540 level measurements are available for more than 10 years, resulting in a reliable area-volume model. Over the Arkabutla Lake and Barren Lake, the joint time period is rather short but representative as it covers the entire statistical distribution of both variables. On the other hand, the time series of lake volume anomaly of McConaughy, Harlan County, Rathbun and Mark Twain lakes carry mismodelling error as their joint time periods are short and non-representative of the entire distribution.
Time series of water volume change of selected lakes and reservoirs. The location of these lakes can be found in figures 2 545 and 4

River discharge from space
Monitoring of river discharge, defined as the volume of water passing a river section in a given time, is a critically important part to understanding a broad range of science questions focused on hydrology, hydraulics, biogeochemistry and water resources management. Especially the river discharge quantification in ungauged basins anywhere and anytime is the holy 550 grail of hydrology. River discharge reflects the drainage basin dynamics and affects environmental conditions like currents and hydrography in coastal waters; it is a function of precipitation and meteorological elements controlling evapotranspiration, geology, relief, and vegetation.
Despite its importance, the publicly available in situ river discharge database has been declining steadily over the past decades due mainly to economic and political reasons. From about 8000 (pre-1970), the number of available gauging stations 555 has decreased to less than 1000 (around the year 2015) (Lorenz et al., , 2015Tourian et al., 2017b).
Given the insufficient monitoring from in situ gauge networks, and without any outlook for improvement, spaceborne approaches come to the rescue. Satellite-based Earth observation with its global coverage has been demonstrated to be a potential alternative to in situ measurements. In future, the SWOT mission with wide-swath altimetry is expected to attain global river discharge given its unprecedented temporal resolution, spatial coverage and the synchronous availability of river height, width 560 and slope (Biancamaria et al., 2016;Durand et al., 2016).
5.1 :::::::: HydroSat :::::::: products ::: for ::::: river :::::::: discharge In HydroSat, discharge estimates are available from both altimetric river water level and imagery-based effective river width also in the two modes of Standard-Rate (SR) and High-Rate (HR). While the SR product relies on standard temporal resolution of the spaceborne data, the HR data comes with a higher temporal resolution through an assimilation process. To the best of 565 our knowledge, there is no repository or website providing similar space-based river discharge estimates.
5.2.1 ::::::::::::: Standard-Rate :::: river ::::::::: discharge 590 Standard-Rate river discharge Q at selected gauges can be estimated from space-based water level H (or river width W ) measurements using an empirical relationship between river height (or width) and discharge (Kouraev et al., 2004;Birkinshaw et al., 2010). The most common form of this relationship is the so-called rating curve Q = F (H) or Q = F (W ). Rating curves are conventionally generated for simultaneous measurements of space-based water level (or river width) and in situ river discharge. Once the model is developed, the discharge can be determined from water level or width measurements. The 595 restriction however remains that for deriving a rating curve, simultaneous measurements are required, meaning the availability of in situ measurements during the satellite era.
Globally a great portion of existing gauges are not active during the satellite era although they provide a wealth of legacy data. For such gauges, Tourian et al. (2013) suggest a statistical approach based on quantile mapping of in situ discharge and altimetry water level measurements. Since the quantile functions of discharge and river water level (width) have a same x-600 axis (cumulative probability), it is possible to connect their y-axis directly and obtain F (.). As this approach does not involve the time coordinate explicitly, the requirement for synchronous datasets is obsolete. This is to say that the pre-satellite river discharge data records can be salvaged and turned into usable data for the satellite altimetry or imagery time frame.
The method is further improved by Elmi et al. (2021b) to infer a nonparametric model for estimating the river discharge and its uncertainty. The algorithm employs a stochastic quantile mapping function scheme by iteratively 1) generating realizations 605 of river discharge and height (width) time series using a Monte Carlo simulation, 2) obtaining a collection of quantile mapping functions by matching all possible permutations of simulated river discharge and height (width) quantile functions, and 3) adjusting the measurement uncertainties according to the point cloud scatter. The flowchart in Figure S8 describes the procedure of SR discharge estimation using spaceborne river height or width.
Adopted from Elmi et al. (2021b): Flowchart of discharge estimation algorithm for generating the quantile mapping function 610 5.2.2 ::::::::: High-Rate ::::: river :::::::: discharge Figure 7 presents the discharge estimated from river width, using the stochastic quantile mapping function algorithm over four different river reaches along the Niger, Congo and Po rivers (Elmi et al., 2021b). High NSE values for the Niger River reaches (Figure 7 (a,b)) shows that the developed method can accurately estimate the discharge given that both discharge and width measurements have a representative statistical distribution in the training period. The performance of the method significantly 615 decreases over the Congo River reach (Figure 7 (c)) mainly because of the complex relationship between river width and discharge in this part of the river. The performance of the algorithm over the Po River reach is only minimally acceptable (NSE=0.13). Here, the discharge is not estimated accurately due to insufficient width measurements at high discharge and low signal-to-noise ratio of the same measurements. The obtained rating curve over the Po River, however, highlights the advantage of using a non-parametric model through the quantile mapping function compared to choosing a parametric model. With the 620 help of the proposed method, spaceborne discharge estimates can be obtained for all non-active gauges in politically-ungauged basins (see Gleason and Durand (2020)).
Adopted from Elmi et al. (2021b): Four river reaches defined as case studies (a)

High-Rate river discharge
HydroSat provides High-Rate (HR) river discharge time series based on the method developed by Tourian et al. (2017a) that goes beyond the conventional one-on-one relationship between virtual station (or reach) and (legacy) in situ station explored in SR discharge products. A multitude of altimetric discharge time series over a river network are used in this approach to 630 estimate time series of daily river discharge. This is fundamentally done via assimilating multiple altimetric discharge -the SR time series -and in situ measurements using a linear dynamic system. The dynamic system consists of a stochastic process model that benefits from the cyclostationarity of discharge. This model is informed by the covariance and cross-covariance generated out of old in situ data. The process model is then combined with observation equations fed by several altimetric and in situ discharge time series to form a linear dynamic system. Ultimately, the system is solved using the Kalman filter, 635 followed by smoothing the solutions using the Rauch-Tung-Striebel (RTS) scheme (Rauch et al., 1965). In fact, the Kalman filter produces an a posteriori discharge estimate with a likelihood function of discharge based on the available observations and the prior information derived from the stochastic process model. In case of an observation gap, the posterior estimates rely on the stochastic process model and its cyclostationary mean discharge. Figure S9 represents the flowchart for estimating the HR river discharge. The implementation details of the method can be found in Tourian et al. (2017a). Flowchart of estimation 640 of High-Rate river discharge using a Kalman filter approach Figure 8 shows an example of a HR river discharge time series over River Niger at Lokoja. The inputs to the dynamic system are the altimeric and in situ river discharges. During the period when the only source of observations is the in situ data (before 1992), the Kalman filter estimates a discharge that matches the in situ data with a relative RMSE of less than 2% (not visible in the figure). After 1992, when altimetric river discharge is available, the Kalman filter is less dependent on the in situ data, leading to a relative RMSE up to 50%. The validation over the 645 entire Niger basin and 22 gauges along the main stem show an average correlation of 0.9, an average relative RMSE, a relative bias of about 15%, and a NSE greater than 0.5 for 15 gauges.
Daily river discharge over the Niger River at Lokoja by assimilating altimetric discharge with available in situ data The proposed method is applicable in all river networks with available legacy in situ data. It allows for obtaining a smoothed daily discharge time series without data outages at any given location along a river.

650
6 Summary, conclusion and outlook The development of repositories and services to provide global water cycle products from spaceborne sensors is getting more attention than ever before, which is motivated by the urgent need for more hydrological evidence, the absence of perspective for improving in situ data, the existence of an abundance of satellite missions, the prospect of cutting-edge missions such as the SWOT mission, and also the promise of operational satellites in space. Such products support studies focused on understanding 655 the water cycle and the Earth system in general. HydroSat as a repository of global water cycle products provides estimates and uncertainty of the following water cycle variables: -surface water extent of lakes and rivers: HydroSat provides surface water extent time series of both lakes and rivers from optical satellite images. For generating dynamic water masks, region-based classification is employed, which benefits from the spatio-temporal behavior of pixel-intensity. This allows us to deal with the complexities in extracting 660 dynamic river masks. Moreover, such an algorithm setup allows obtaining a probabilistic water mask leading to an estimate for surface water extent uncertainty. While datasets of surface water extent variation over lakes are available from various data centers, HydroSat additionally provides time series of river width for major river basin. For a quality assessment, time series of surface water extent over lakes are compared with available in situ and or altimetric water level time series. Over rivers, such a quality assessment is predominantly done through comparing the time series against in 665 situ river discharge.
-water level time series of lakes and river: HydroSat provides water level time series of rivers, lakes and reservoirs in two modes, standard-Rate (SR) and High-Rate (HR), with their uncertainty estimates. For water level time series HydroSat uses an automated, data-driven outlier rejection methodology designed within an iterative, non-parametric adjustment scheme. The outlier-free measurements form the final time series without any further smoothing. While water level time 670 series over inland water bodies are available from similar data centers, HydroSat additionally provides HR water level time series over rivers through densifying individual SR time series along a river. For the HR products over the lakes, inter-satellite biases are removed through a hybrid approach by incorporating lake surface area information. The quality of water level time series is assessed through a validation against in situ water level or proxy data like river discharge, river width or lake surface area.

675
-terrestrial water storage anomaly: HydroSat provides Terrestrial Water Storage Anomaly (TWSA) time series and its uncertainty over global major river basins using GRACE and GRACE-FO observations. To estimate TWSA time series from level-2 data (shperical harmonics up to degree/order 96), the C 2,0 and C 3,0 are replaced and degree-1 is added from the corresponding SLR estimates. Moreover, ellipsoidal and GIA correction are followed together with a smoothing Gaussian filter with a radius of 400 km. The final TWSA in HydroSat are corrected for tidal aliasing error and leaked 680 signals. For the field product, the Gaussian filtering is applied together with a de-striping filter and leakage is corrected using forward modeling approach. The quality of TWSA time series is assessed through a comparison with two mascons products, CSR RL06 version 02 and JPL RL06 version 02.
-water storage anomaly of lakes and reservoirs: HydroSat provides time series of surface water storage anomaly for lakes and reservoirs using the time series of water level and surface area measurements. For developing the surface 685 water level-area-volume model, the scatterplot of simultaneous water level and area measurements is obtained. HydroSat performs an iterative data snooping procedure to obtain a reliable empirical relationship between surface water level and area. In this way, the quality of the obtained time series of water storage anomaly is ensured, since the non-representative measurements are eliminated.
-river discharge estimates for large and small rivers: HydroSat provides SR and HR river discharge estimates together 690 with their uncertainties. To obtain the SR products, HydroSat relies on a non-parametric quantile mapping approach that salvages gauging stations that are no longer updated with in situ measurements. Since no model assumptions are required under the non-parametric approach, the HydroSat discharge time series are less contaminated by a mis-modeling. For the uncertainty estimation HydroSat applies a stochastic quantile mapping function algorithm supported by a Monte Carlo simulation. The availability of enough SR time series over a river network allows approximating the spatio-temporal 695 dynamics of a river system by a linear dynamical system. The HR products are the solutions of such a dynamic system by Kalman filters obtained with up to daily temporal resolution at potentially any location along the river. For the quality assessment, SR and HR discharge time series are compared with in situ and spatial river discharges, water levels, and river widths.
Global hydrological modeling has been improved in terms of modeled processes leading to a better recognition of modeling uncertainties. Such recognition clearly signals that the modeling uncertainties are not reduced. To reduce the uncertainty one solution is indeed, to use the best available input data for modelling. Space-based data together with process knowledge allow a realistic modelling of water flows and storages in the different compartments. Moreover, spaceborne measurements can be 705 used to calculate indicators of the past and future state of the global freshwater system to assess risks under 1.5 • C and 2 • C global warming and support decision making (Döll et al., 2018).
In addition, variables of water level, surface water storage anomaly and surface water extent support downscaling of mass transport monitoring in time and space. The success of GRACE has created a new demand for scientists and decision makers for a sustained observation of the terrestrial water storage change (Pail et al., 2015). Although the utility of GRACE data has been 710 mainly limited to large catchments, understanding of water storage changes in regions with some local weak signatures play an important role within the Earth system and the sustainable development of water resources (Lorenz et al., 2015). Therefore, an urgent priority is to mitigate the spatial resolution limitation of GRACE through incorporating additional hydrological variables such as surface water storage and river and lake water level variation. This will improve in particular our understanding of the water cycle in many small vulnerable catchment areas with large populations.

715
Another added value of the HydroSat data is its complementary role to the SWOT mission. Over rivers, SWOT will estimate discharge from multiple algorithms as well as consensus values computed over multiple individual algorithms (Stuurman and Pottier, 2020). The majority of algorithms are Bayesian, relying on prior data. Hydrologic variables provided by HydroSat can effectively be used as potential prior information for each of the discharge algorithms through available water levels (SR and HR), river width from satellite imagery, and discharge estimates (SR and HR). Over lakes and reservoirs, our estimates 720 of surface water extent and volume anomaly will boost SWOT's estimates both in terms of temporal resolution and coverage.
This supports studies aiming to understand long-term behavior of lakes and reservoirs.

Data availability
The data are publicly available in the HydroSat database via http://hydrosat.gis.uni-stuttgart.de. All time series in this paper are assigned a number, the so-called HydroSat ID, by which the time series can be found in the HydroSat database via the search 725 field. In this database all data can be browsed, visualized and analyzed without registration. However, registration is required to download the data. A snapshot of all the data (taken in April 2021) is available in GFZ Data Services, which is accessible during the peer review process via https://dataservices.gfz-potsdam.de/panmetaworks/review/a250099b6bb14c162399cf78b1a3182b1e4420b2db8 (see Files) and will be published at https://doi.org/10.5880/fidgeo.2021.017 (Tourian et al., 2021).