GloFAS-ERA5 operational global river discharge reanalysis 1979–present

. Estimating how much water is ﬂowing through rivers at the global scale is challenging due to a lack of observations in space and time. A way forward is to optimally combine the global network of earth system observations with advanced numerical weather prediction (NWP) models to generate consistent spatio-temporal maps of land, ocean, and atmospheric variables of interest, which is known as a reanalysis. While the current gen-eration of NWP models output runoff at each grid cell, they currently do not produce river discharge at catchment scales directly and thus have limited utility in hydrological applications such as ﬂood and drought monitoring and forecasting. This is overcome in the Global Flood Awareness System (GloFAS; http://www.globalﬂoods.eu/,


Introduction
A key challenge in hydrology is estimating past, present, and future hydrological conditions in rivers around the world. This is largely due to severe temporal and spatial gaps in the global river discharge observing network. In many parts of the world, there is simply not enough long-term river discharge observations at high enough spatial density, and in the vast majority of countries hydrometric data are not available in real time (Lavers et al., 2019). The lack of observations is therefore a major barrier in our ability to provide monitoring and early warning of hydrological extremes such as floods and droughts, which have, for example, implications for progressing international disaster risk reduction (UNDRR, 2015). A way forward pioneered in the field of meteorology and climate has been to optimally combine in situ and satellite earth system observations, together with advanced numerical weather prediction (NWP) models, to generate a reanalysis of land, ocean, and atmospheric variables of interest, thus providing consistent spatio-temporal "maps without gaps" (Hersbach et al., 2020). Several global hydrological products have been developed that provide estimates of runoff or river discharge with a wide range of forcing and methodological approaches (e.g. Fekete et al., 2002;Döll et al., 2003;Qian et al., 2006;Sperna Weiland et al., 2010;Reichle et al., 2011;Yamazaki et al., 2011;Beck et al., 2017;Ghiggi et al., 2019;Lin et al., 2019). While these datasets can be used to understand past variability and change in the terrestrial hydrological cycle, they are currently not produced in an operational environment in near real time and so cannot be used for monitoring current global river conditions or providing initial conditions to hydrometeorological forecasting systems.
A long-term and near-real-time river discharge reanalysis is produced operationally as part of the Global Flood Awareness System (GloFAS; http://www.globalfloods.eu/) which bridges this gap. GloFAS is the global flood service of the European Commission's Copernicus Emergency Management Service (CEMS), an operational system for monitoring and forecasting floods across the world with over 5000 registered users. GloFAS was developed together by the Joint Research Centre (JRC) of the European Commission, the University of Reading, and the European Centre for Medium-Range Weather Forecasts (ECMWF). The system went preoperational in July 2011 (Alfieri et al., 2013), becoming a fully operational, 24/7 supported service in April 2018 (version 1.0, upgraded to version 2.0 in November 2018). Glo-FAS is provided through a free and open licence and is designed for decision makers and forecasters in national and international water authorities, water resources management, hydropower companies, civil protection authorities, and international humanitarian aid organizations. A recent example of the use of GloFAS was for supporting the humanitarian response to the devastating floods that affected large parts of Mozambique, Malawi, and Zimbabwe in the wake of tropical cyclone Idai in March 2019 (Magnusson et al., 2019). Given the large amount of openly available data that is generated by GloFAS, including a long-term, near-real-time river discharge reanalysis, a large set of reforecasts, and real-time flood and seasonal forecasts, it is also used by researchers and commercial industries for a wide range of projects and for developing value-added products.
In GloFAS, ensemble river discharge forecasts are produced each day at a daily time step and provide probabilities of flood thresholds being exceeding for a given river section with a lead up to 30 d ahead (GloFAS 30 d; Alfieri et al., 2013). There is also a seasonal component, GloFAS-Seasonal (Emerton et al., 2018), that provides forecasts once per month at a weekly time step with a lead time up to 4 months ahead. The river discharge reanalysis is used for two core tasks within GloFAS. First, flood thresholds at 2-, 5-, and 20-year return periods for each river cell are derived from the long-term reanalysis series. This allows for the magnitude of the real-time ensemble river discharge forecasts to be directly compared to the magnitude of the long-term flood thresholds and thus awareness of a flood signal if the threshold is exceeded. Second, it provides the basis to derive initial hydrometeorological conditions for both GloFAS 30 d and GloFAS-Seasonal real-time forecasts. Estimating initial conditions is a key step to determine the current status of soil moisture, groundwater, snow cover, and initial state of water within rivers and other waterbodies, and it has been identified as one of the major challenges in continental-and globalscale flood forecasting given the limited availability of observational data at these scales (Emerton et al., 2016).
The aim of this data paper is to describe the newly produced operational river discharge reanalysis dataset as part of the launch of GloFAS v2.1 on 5 November 2019 (see GloFAS technical documentation for details on upgrades: https://confluence.ecmwf.int/display/ COPSRV/Global+Flood+Awareness+System, last access: 28 August 2020). GloFAS river discharge reanalysis is based on ERA5 (Hersbach et al., 2020), ECMWF's latest global atmospheric reanalysis which extends back to 1979 and was officially released in January 2019. An innovation of ERA5 is that it is produced in near real time in an operational environment, allowing for the production of GloFAS-ERA5 reanalysis with a latency of 2 to 5 d behind real time. This has the major advantage for GloFAS in that the initial hydrometeorological conditions can now be derived from the same product from which the long-term flood thresholds are derived, which will ensure a much better consistency with real-time forecasts compared to previous GloFAS model configurations. Uniquely, the global river discharge product is over 40 years long, produced in near real time, and is freely available to download for the community through the Coper-nicus Climate Change Service (C3S) Copernicus Climate Data Store (CDS): https://cds.climate.copernicus.eu/cdsapp# !/dataset/cems-glofas-historical?tab=overview (C3S, 2019), opening multitudes of hydroclimate applications across the world.
Section 2 outlines the production of the dataset and Sect. 3 describes its main attributes including available variables and file format. An evaluation of the dataset against a global network of observations is conducted in Sect. 4. The dissemination of the data through the CDS is shown in Sect. 5 before key conclusions and future work are offered in Sect. 6. Pappenberger et al. (2010) first demonstrated that it was possible to achieve useful river discharge predictions by coupling a river routing scheme with the land surface model of the ECMWF global numerical weather prediction (NWP) system. The GloFAS-ERA5 river discharge reanalysis uses this concept and is produced by coupling the land surface model runoff component of the ECMWF ERA5 global reanalysis (Hersbach et al., 2020) with the LISFLOOD hydrological and channel routing model (van der Knijff et al., 2010). In ERA5, the runoff (m d −1 ) from one cell is not connected to neighbouring cells; hence, it is not possible to estimate river discharge (m 3 s −1 ) at the catchment scale. Coupling ERA5 runoff with LISFLOOD allows for the lateral connectivity of grid cells with runoff routed through the river channel to produce river discharge. A schematic of the key components in the production of the GloFAS-ERA5 reanalysis is provided in Fig. 1. The open-access scientific publications and model documentation that describe the full methodological detail for each key component is provided in Table 1 and summarized below.

ERA5 runoff
ERA5 runoff is produced from the HTESSEL land surface model (Hydrology Tiled ECMWF Scheme for Surface Exchanges over Land; Balsamo et al., 2009) as used within the ECMWF Integrated Forecasting System (IFS). HTES-SEL computes the surface water and energy fluxes and the temporal evolution of soil temperature, soil moisture, and snowpack. Excess precipitation and snowmelt are partitioned as surface runoff or infiltrated into a four-layer soil column (7 cm depth for top layer and then 21, 72, and 189 cm) at each ERA5 grid cell before draining from the bottom of the soil column as sub-surface runoff (Balsamo et al., 2009). ERA5 uses an advanced land data assimilation system to assimilate conventional in situ and satellite observations for land surface variables such as soil moisture, soil temperature, snow water equivalent, snow density, and snow temperature, as outlined in de Rosnay et al. (2014).
ERA5 benefits from a decade worth of numerical weather prediction developments in model physics, numerics, and data assimilation by using ECMWF IFS model cycle 41r2 (2016) compared to model cycle 31r2 (2006) as used in its predecessor, ERA-Interim (Dee et al., 2011). ERA5 has a horizontal resolution of approximately 31 km at the Equator (native octahedral grid) and since January 2019 has been openly available from 1979 to present. A key novelty of ERA5 is its operational production that makes available an intermediate timely product, ERA5T, in near real time, allowing the production the GloFAS-ERA5 river discharge reanalysis operationally with a latency of between 2 and 5 d behind real time.

LISFLOOD river discharge
River discharge is currently not calculated by HTESSEL. Instead, surface and sub-surface runoff from the HTESSEL land surface model coupled with a simplified global version of LISFLOOD, a spatially distributed grid-based hydrological and channel routing model. The details of the global version of LISFLOOD used within GloFAS v2.1 and its calibration can be found in Hirpa et al. (2018) but are briefly summarized here for context. The sub-surface runoff from HTESSEL is used as input for the LISFLOOD groundwater module, which consists of two parallel linear reservoirs that store and subsequently transport water to the river channel with a time delay. The upper zone represents quick groundwater and sub-surface flow, while the lower zone represents slow groundwater flow that generates base flow. In Hirpa et al. (2018), the upper zone time constant was given a default value of 10 d with a lower (upper) bound of 3 d (40 d) during calibration, and the lower zone time constant given a default value of 200 d with a lower (upper) bound of 40 d (500 d). The surface runoff from HTESSEL is used as input for the LISFLOOD river channel routing module. This is a two-stage process whereby the surface runoff for each cell is first routed to the nearest downstream river channel cell, then the water in the channel is routed through the river network using the kinematic wave approach. Groundwater and river routing parameters in GloFAS were calibrated against daily river discharge observations for 1287 catchments globally by Hirpa et al. (2018). A key feature of LISFLOOD is the ability to represent features that can severely alter the timing and magnitude of river discharge, such as lakes, reservoirs, and human water use . A total of 463 of the largest lakes (surface area > 100 km 2 ) and 667 of the largest reservoirs were incorporated into the GloFAS river network by Zajac et al. (2017).
To generate the GloFAS-ERA5 river discharge reanalysis, the LISFLOOD model is forced with daily HTESSEL surface and sub-surface runoff from ERA5 starting from 1 January 1979 (Fig. 1). In order to be consistent with the operational GloFAS procedure, the runoff fields from ERA5 were downscaled using the simple nearest neighbour method from the native ERA5 to the 0.1 • GloFAS grid. To avoid the need for very long spin-up periods, LISFLOOD calcu-  ERA5 runoff Surface and sub-surface runoff within ERA5 generated using the HTESSEL land surface model Balsamo et al. (2009) LISFLOOD river discharge River discharge generated using LISFLOOD hydrological and channel routing model to route runoff into and through the river network and provide groundwater storage, including lake, reservoir, and human water use routines Burek et al. (2013) Lakes and reservoirs used in GloFAS Incorporated 463 lakes and 667 reservoirs into the GloFAS river network Zajac et al. (2017) Calibration of LISFLOOD used in GloFAS LISFLOOD was calibrated against daily river discharge from 1287 observation stations worldwide Hirpa et al. (2018) lates a steady-state storage amount for the lower groundwater zone during a long-term "pre-run" and thus reduces the lower zone's spin-up time . LISFLOOD was therefore given a 1-year model spin-up using preliminary ERA5 output for 1978. To produce GloFAS-ERA5 reanalysis in near real time operationally, the latest available ERA5T data are used.

Data description
The key attributes of the current operational version (v2.1) of the GloFAS-ERA5 river discharge reanalysis dataset are shown in Table 2. The daily reanalysis is global in coverage, except for Antarctica, with a horizontal grid resolution of 0.1 • (approximately 11 km at the Equator). The dataset is over 40 years long starting on 1 January 1979. An innovative aspect of the dataset is its operational production allowing it to be available 2 to 5 d behind real time, shortly after ERA5T becomes available. The intermediate ERA5T data are not quality assured due to its timely nature. Consequently, there will be two reanalysis streams available: GloFAS (consolidated) is the final product based on the consolidated ERA5 from 1 January 1979 until 2 to 3 months behind real time, updated on the CDS on a monthly basis; and GloFAST (intermediate) is the timely product based on the intermediate ERA5T from 1 August 2019 until 2 to 5 d behind real time, updated on the CDS on a daily basis whenever ERA5T becomes available.
The GloFAS-ERA5 reanalysis dataset includes the variables river discharge and the upstream area for each GloFAS grid cell (Table 3). Data are stored in NetCDF format with one file per day containing the 24 h mean river discharge (00:00 UTC to 00:00 UTC). Each daily filename follows the convention "CEMS_ECMWF_dis24_<YYYYMMDD>_glofas<T> _v2.1.nc" whereby the date stamp represents the end of the 24 h averaging period. So, for example, the file "CEMS_ECMWF_dis24_20190101_glofas_v2.1.nc" contains the daily mean flow for the 24 h period 00:00 UTC 31 December 2018 to 00:00 UTC 1 January 2019. Appendix A shows the header metadata information contained within the example NetCDF file. Each daily NetCDF file for the whole globe has an uncompressed size of ∼ 21.7 MB; therefore, the estimated size of the dataset from January 1979 to October 2019 is ∼ 320 GB. Figure 2 maps the mean daily river discharge from 1979 to 2018 for each GloFAS river with an upstream area greater than 1000 km 2 , revealing the main river arteries of the world. An example hydrograph of the long-term near-real-time reanalysis against available river discharge observations is shown in Fig. 3 for the Teles Pires River in the Amazon basin, Brazil.

Evaluation and limitations
GloFAS-ERA5 v2.1 river discharge reanalysis was evaluated against a global network of daily river discharge observations. As part of GloFAS, a database of global hydrological observations for 2042 stations is held, consisting predominantly (i.e. ∼ 75 %) of data from the Global Runoff Data Centre (GRDC) and supplemented by data collected through collaboration with GloFAS partners worldwide to improve spatial coverage. A number of criteria were used to select stations for the evaluation: at least 4 years of daily data available between 1979 and 2018 (not necessarily contiguous) (78 stations removed); minimum upstream area of 500 km 2 (4 stations removed); error in catchment area supplied by data provider and upstream area for corresponding cell on the GloFAS river network within 20 % (93 stations removed); first order visual quality check on observed river discharge time series to remove stations with erroneous data; for example, time series truncated above a threshold, severe inhomogeneities, or series monitoring an artificial canal instead of a river (39 stations removed); station with the longest record retained when multiple observation stations were matched to the same GloFAS river cell (27 stations removed).
This filtering procedure resulted in the selection of 1801 stations with drainage areas ranging between 575 to 4 664 200 km 2 and a median of 30 046 km 2 . Individual metadata of all 1801 stations are given in Table S1 in the Supplement. Care must be taken in spatial representativeness of the following evaluation results as the observation network is sparse in some regions of the world, particularly in large parts of Africa and Asia. Performance at the daily scale was assessed using the modified Kling-Gupta efficiency metric (KGE ; Gupta et al., 2009;Kling et al., 2012). The KGE is gaining popularity as the standard performance metric in hydrology (e.g. Harrigan et al., 2018;Lin et al., 2019) and can be decomposed into three components important for assessing hydrological dynamics: temporal errors through correlation, bias errors, and variability errors: where r is the Pearson correlation coefficient between reanalysis simulations (s) and observations (o), β is the bias ratio, γ is the variability ratio, µ is the mean discharge, and σ is the discharge standard deviation. The KGE and its three decomposed components (correlation, bias ratio, and variability ratio) are all dimensionless with an optimum value of 1. In order to evaluate the hydrological simulation skill of GloFAS-ERA5 reanalysis, its performance is compared against a simpler benchmark. Here the observed mean flow is used as a benchmark as proposed by Knoben et al. (2019). This is not a difficult benchmark to beat but should arguably be the minimum reference for any hydrological system to be compared against. Here we represent KGE as a skill score, KGESS, to evaluate the performance of GloFAS-ERA5 river discharge reanalysis against the mean flow benchmark simulation, given as     Knoben et al., 2019), and KGE perf is the value of KGE for a perfect simulation which is 1. A KGESS = 0 means the GloFAS-ERA5 reanalysis is no better than the mean flow benchmark and so has no skill, KGESS > 0 means the reanalysis is considered skilful, and KGESS < 0 means the performance is worse than the benchmark and so has negative skill. Performance metrics for all 1801 stations are included in Table S1.

Overall performance
Results for overall performance show that the GloFAS-ERA5 river discharge reanalysis is skilful in 86 % of catchments (Fig. 4a). The global median KGESS (KGE ) is 0.51 (0.31) with an interquartile range (IQR) of 0.30 (0.00) to 0.66 (0.52). Performance is best in Brazil (particularly the Amazon basin), central Europe, and the eastern and western regions of the United States (Fig. 5). GloFAS-ERA5 reanalysis performance is poor (i.e. KGESS < 0) in many catchments in Africa and the North American Great Plains extending into Mexico with notable patches in eastern Brazil, Thailand, and southern Spain. Results will be biased towards regions with a larger number of stations, especially when well performing large basins contain many sub-catchments (e.g. Amazon and Rhine basins).

Decomposition into correlation, bias, and variability
An advantage with the KGE is that it can be decomposed into three constituent components so that greater insights can be gained into which aspects of the GloFAS-ERA5 reanalysis are driving poor and good skills. Almost all (99 %) catchments show a positive correlation (Figs. 4b and 6a) with a global median Pearson correlation coefficient of 0.61 (IQR = 0.44, 0.74). Figure 4c shows that river discharge reanalysis is negatively biased in 64 % of catchments (i.e. bias ratio < 1) with a global median bias ratio of 0.84 (IQR = 0.62, 1.21). In the evaluation of their global river simulation, Lin et al. (2019) consider a percentage bias within ±20 % (equivalent to a bias ratio within 0.8 to 1.2) to be very good. Whilst only 28 % of stations meet this criterion for the GloFAS-ERA5 reanalysis, results are in line with simulations in Lin et al. (2019). The worst performing catchments (dark red KGESS dots in Fig. 5) are predominantly driven by very large positive biases (dark blue dots in Fig. 6b) in dryer rivers of the central United States, Africa, and eastern Brazil, as well as the western coast of South America; in total 12 % of catchments have a bias ratio > 2 (equivalent to a percent bias > 100 %). Figure 4d (shown spatially in Fig. 6c) shows lower variability in GloFAS-ERA5 reanalysis than observations in 61 % of catchments (i.e. variability ratio < 1), but errors in variability are less severe than bias errors with a global median variability ratio of 0.91 (IQR = 0.69, 1.15).
It is important to also look at the average magnitude of errors as a small over/under estimation in dry rivers can produce large percentage biases (and hence bias ratios). This was done by converting the units of both the reanalysis and observation time series from cubic metres per second (m 3 s −1 ) to runoff depth across the catchment area in millimetres per day (mm d −1 ) to allow direct comparison between catchments of different sizes and then computing the mean absolute error (MAE) metric (Fig. 7). The global median MAE is 0.41 mm d −1 (IQR = 0.18 mm d −1 , 0.72 mm d −1 ). Most areas with a bias ratio > 2 (in Fig. 6b), namely much of Africa, the central United States, and eastern Brazil, have in fact a low absolute magnitude of errors given their dry locations. Other notable areas with a low absolute magnitude of errors include large parts of India, South East Asia, and Australia. There are, however, catchments on the western coast of South America, Sudan, and Ethiopia and tributaries of the River Ganges with a large MAE. Figure 8 shows the global performance of GloFAS-ERA5 reanalysis for each month across all 1801 stations. Hydrological simulation skill is relatively consistent across each month with median KGESS ranging between 0.32 to 0.41 (Fig. 8a). The April to October months have the highest skill, with November to March having a higher proportion of catchments with negative skill. When the KGE is decomposed into correlation, bias, and variability components at the monthly scale ( Fig. 8b-d, respectively), it shows that the months with higher incidence of negative KGESS are driven by a higher proportion of catchments with large positive biases in those months. Correlation and variability error metrics do not vary much from one month to the next in comparison to bias errors.

Performance by month
Results are grouped into Northern Hemisphere (n = 1268 stations) and Southern Hemisphere (n = 533 stations) in Fig. 9. The overall GloFAS-ERA5 monthly performance in each hemisphere does not change substantially from the global analysis (Fig. 8). Nevertheless, there are some differences. The KGESS and bias ratio from the Northern Hemisphere ( Fig. 9a and c, respectively) tend to follow the global analysis most strongly (i.e. Fig. 8a and c, respectively), which is not surprising given 70 % of all stations are located in the Northern Hemisphere. However, a higher proportion of Southern Hemisphere stations show large positive biases from April to June compared to November to March in the Northern Hemisphere. The largest proportion of stations with negative KGESS in the Southern Hemisphere is found from August to October (Fig. 9a). These months correspond with a lower Southern Hemisphere correlation (Fig. 9b) and a higher proportion of stations with large positive variability ratios (i.e. GloFAS-ERA5 has higher variability than observed river discharge).

Performance by catchment area
The skill of GloFAS-ERA5 river discharge reanalysis grouped into seven catchment area categories is shown in Fig. 10. In general, skill is lowest for catchments in the three categories < 10 000 km 2 with median KGESS = 0.21 (n = 39), 0.4 (n = 41), and 0.42 (n = 53), respectively. Performance improves as catchment size increases with median KGESS = 0.56 for catchments > 50 000 km 2 . It must be noted that results are affected by uneven samples of catchment sizes available within the GloFAS observation database, with catchments between 10 000 and 50 000 km 2 being dominant (n = 1013) and smaller catchments being under-represented.

Limitations
This first evaluation has found the dataset to be hydrologically skilful in the vast majority of catchments tested, although the strength of skill can vary considerably depending on location. The degradation in skill, as defined using KGESS, is the combination of (lower) correlation, (larger) bias errors, and (larger) variability errors. The evaluation provides users with an overview of the global-scale quality of the dataset, although users are advised to undertake a more     , bias ratio (c), and variability ratio (d). Boxes represent the IQR and horizontal grey line the median. Whiskers extend to the most extreme data point unless the data point is more than 1.5 times the IQR from the box and is instead represented as an outlier (grey diamond).  in-depth evaluation of the dataset for their region of interest. A key limitation of the dataset is the large biases identified in several regions (see above). The attribution of such biases in the GloFAS-ERA5 reanalysis is outside the scope of this data paper, but ongoing investigations such as Zsoter et al. (2019) have shown that biases can be introduced by the real-time land data assimilation within the HTESSEL land surface model. Another expected cause of differences between river discharge reanalysis and observations is due to human modification within catchments and river channels (e.g. Harrigan et al., 2014). It is estimated that just 37 % of rivers remain free-flowing globally with the construction Figure 11. The GloFAS-ERA5 river discharge reanalysis landing page in the C3S Climate Data Store (CDS; https://cds.climate.copernicus. eu/cdsapp#!/dataset/cems-glofas-historical?tab=overview). of reservoirs and dams the main contributor to loss of connectivity (Grill et al., 2019). While GloFAS-ERA5 reanalysis does represent major dams and reservoirs on the modelled river network, simplified reservoir operating parameters were used based on expert opinion (outlined in Zajac et al., 2017) due to lack of availability of global operational release records. Given the fundamental dependence of the dataset on ERA5, it would be pertinent for users to be aware of the known ERA5 issues, which can be found in the ERA5 documentation: https://confluence.ecmwf.int/display/CKB/ERA5 (last access: 28 June 2020). In particular, "rain bombs" are known to occur from time to time in the numerical weather prediction model used by ERA5 whereby extremely large rainfall totals are generated, although these are rare (∼ 10 episodes per year) and happen mostly in isolated grid points over orographic areas in Africa (Hersbach et al., 2020). However, their impact on hydrology has not been assessed. As with any reanalysis product, care must be taken when calculating long-term trends in river discharge as discontinuities may be present in the record due to changes in the global observing system entering ERA5.

Data availability
The GloFAS-ERA5 river discharge reanalysis is provided through the European Commission Copernicus Emergency Management Service (CEMS) and follows the Copernicus open data policy that users shall have free, full, and open access to Copernicus service information. With the drive for open data comes challenges. In the era of big data, it is clear that traditional ways of hosting and disseminating large earth system datasets is no longer fit for purpose. An exciting development in the way large climate datasets are discovered, accessed, and used is the Copernicus Climate Change Service (C3S) Climate Data Store (CDS; https://cds.climate. copernicus.eu/cdsapp#!/home, last access: 28 June 2020). The CDS hosts various global and regional reanalysis products, gridded records for essential climate variables (ECVs), in which river discharge is included as a key terrestrial ECV, and much more. The CDS requires standardization of data and metadata so that datasets are more useable and discoverable through the CDS metadata pages. The CDS website provides easy access to data through user-friendly download forms. There is also a CDS Python application programming interface (API) to allow programmatic access to data. An innovative feature of the CDS is the Toolbox, which makes it easier to handle large volumes of data by allowing users to make custom applications, filter data by geographical region and date range, and finally present the data using maps and charts directly through the CDS cloud infrastructure.
The GloFAS-ERA5 river discharge reanalysis product is available on the CDS: https://cds.climate.copernicus.eu/ cdsapp#!/dataset/cems-glofas-historical?tab=overview with the following DOI: https://doi.org/10.24381/cds.a4fdd6b9 (C3S, 2019). The CDS landing page for the GloFAS-ERA5 reanalysis dataset is shown in Fig. 11. Both the long-term consolidated and the near-real-time intermediate reanalysis data are available in two ways. First, through the "Down-load data" tab whereby users can manually select options in a form for which data they would like to download. Second, data can be retrieved through the dedicated Python CDS API; an example API retrieval script is shown in Appendix B. Note that users must register for a CDS account (for free) before gaining access.

Conclusions
This paper outlines the production, description, evaluation, and access to the new GloFAS-ERA5 operational global river discharge reanalysis dataset available from 1979 and updated in near real time. This dataset is central to two key steps within GloFAS: (i) the calculation of flood thresholds against which real-time ensemble forecasts are compared to determine the probability of a flood signal and (ii) more consistent hydrometeorological initial conditions for the real-time flood and seasonal forecasts. The evaluation against observations showed that the product is skilful in 86 % of catchments according to the modified Kling-Gupta efficiency skill score against a mean flow benchmark. However, skill varies considerably with location, with several regions such as the central United States, Africa, eastern Brazil, and the western coast of South America having large systematic positive biases. The results from the evaluation are comparable with other long-term global river discharge products (e.g. Lin et al., 2019). The attribution of such biases in the GloFAS-ERA5 reanalysis is outside the scope of this data paper, but ongoing investigations such as Zsoter et al. (2019) on the biases introduced by the real-time land data assimilation within the HTESSEL land surface model will help us to better understand existing limitations. GloFAS is an operational system which undergoes constant developments with intensive research on future versions of the model. It is foreseen that a new model version will be made operational in 2021 based on the full LISFLOOD hydrological model and an improved model calibration (Alfieri et al., 2020).
The long-term and operational nature of the GloFAS-ERA5 reanalysis dataset opens avenues for further applications. Forecast evaluation activities within GloFAS now include skill assessment over longer time periods and has allowed a new operational forecast verification suite to be developed whereby the performance of the forecasts can be tracked in near real time for every river in the world. Other applications are envisaged for monitoring the global status of flood and drought conditions, the identification of hydroclimatic variability and change, and as raw input for postprocessing and machine learning methods that can add further value.