River runoff is an
essential climate variable as it is directly linked to the terrestrial water
balance and controls a wide range of climatological and ecological processes.
Despite its scientific and societal importance, there are to date no
pan-European observation-based runoff estimates available. Here we employ a
recently developed methodology to estimate monthly runoff rates on regular
spatial grid in Europe. For this we first assemble an unprecedented
collection of river flow observations, combining information from three
distinct databases. Observed monthly runoff rates are
subsequently tested for homogeneity and
then related to gridded atmospheric variables (E-OBS version 12) using
machine learning. The resulting statistical model is then used to estimate
monthly runoff rates (December 1950–December 2015) on a
0.5

River flow is one of the best monitored components of the terrestrial water
cycle

In this paper we present a new monthly estimate of the amount of water
draining from 0.5

In contrast to GS15, in which we developed and tested the methodology, we focus here on expanding the observational basis. More specifically, we assemble an unprecedented collection of observed river flow data which is subject to automated quality control and statistical homogeneity assessment. In addition we rely on the latest generation of station-based precipitation and temperature grids to estimate gridded runoff time series for Europe. Finally, the accuracy of the derived runoff estimates is assessed in terms of cross validation and its potential limitations are discussed in the context of example applications.

This paper presents a dataset that estimates the monthly amount of water
draining from 0.5

To estimate this quantity we rely on river- and streamflow observations from
relatively small catchments (catchment area

The presented dataset is developed using a collection of streamflow
observations that is assembled from three major databases. Two of these are
international collections which contain observations from many European
countries (Sects.

Prior to further computations daily and monthly river flow time series were converted into daily runoff rates, expressed in millimetres per day, using catchment areas provided by the respective databases.

The Global Runoff Data Centre (GRDC;

Stations should

be located in the WMO region 6 (Europe);

be within the following geographical domain: 25

not be located in Spain (see Sects.

have a minimum of 10 years of observations.

In February 2016 this resulted in a total of 1722 stations with daily values and
2047 stations with monthly values which were ordered from the GRDC. In many
cases monthly data are computed by the GRDC on the basis of available daily
values. There are, however, instances were only monthly data that were not
computed by the GRDC are available (referred to as

The EWA has been assembled by the European Flow
Regimes from International Experimental and Network Data (Euro-FRIEND)
project (

Spanish streamflow data were retrieved from the digital hydrological year
book (Anuario de aforos digital 2010–2011, AFD), which provides observations
until 2010–2011 and is freely accessible online
(

Gridded observations of precipitation and temperature were obtained from the
E-OBS (version 12) dataset

As the considered data stem from heterogeneous data sources, it is likely
that individual daily observations differ in quality. To get first-order
estimates of their credibility, all daily river flow observations were
flagged according to a set of rules. As we are not aware of quality control
(QC) procedures for runoff that are applicable to a large number of time
series and are documented in the scientific literature, we adapt QC
techniques that were developed for climatological records. More specifically,
the set of rules described below is based on criteria mentioned by

Days for which

Days for which

are flagged as

Values with

As the presented data product is derived on the basis of monthly values,
daily time series were aggregated to monthly means. Prior to the computation
of monthly mean runoff rates daily values flagged as

Both GRDB and EWA provide data in daily as well as monthly resolution. In
order to increase the spatial and temporal coverage of the observations
underlying the presented data product, we aim at using originally monthly
data to fill in missing values in monthly time series that were computed on
the basis of quality controlled daily values (Sect.

Include the unmodified originally daily values in the final collection if only these are available.

Include the unmodified originally monthly values in the final collection if only these are available.

If originally monthly data are available at time steps
without originally daily data:

Determine the number of overlapping time steps (

If

Use cumulative distribution function (CDF) matching

Use the transformed originally monthly data to infill missing values of the originally daily series.

If

This procedure resulted in a total of 1892 monthly time series for GRDB and 3320 monthly time series for EWA that combine information from the originally daily and the originally monthly data.

Spanish data are available directly from the Spanish authorities (see
Sect.

The GRDB and the EWA are to some extent populated with data from the same gauging stations. Therefore both databases need to be linked in order to avoid duplicated information. Unfortunately, linking the two databases is not straightforward, as there is no common database identifier. In addition, differences in naming conventions, inconsistent spelling of river and station names, round-off errors in station coordinates and typographical errors hamper the unambiguous linkage of the EWA and the GRDB. Further, both the GRDB and the EWA exhibit duplicated entries, which is likely related to their complex history, including irregular manual updates.

To overcome these issues we employ deduplication and record linkage
techniques

Almost the same procedure is used for deduplication and record linkage. For convenience the following description is formulated for the deduplication task, in which the entries of a single database are compared to each other (for record linkage, the entries of two different databases are compared; differences for the deduplication and record linkage will be highlighted in Step 3):

the first step of
deduplication is based on analysing the similarity of the

The similarity between the river names and the station names is
measured using the Jaro–Winkler distance,

The geographical proximity was quantified using

where

in a second step, the monthly
river runoff series of the candidate duplicates that were identified
in Step 1 are analysed in terms of their temporal overlap and their
coefficient of determination (squared correlation coefficient),

: different merging procedures were
applied for deduplication and record linkage:

if duplicated entries were identified, the entry with more data points in the streamflow time series was kept. The other entry was discarded. No attempts to merge the time series have been made, as this was found to only affect a small number of stations with similar record length.

if two entries of GRDB and EWA were found to
be very likely identical the time series were merged as follows:

If

If

The deduplication procedure identified 18 very likely duplicates in the EWA and 16 very likely duplicates in the GRDB collection. Linking the deduplicated records from GRDB and EWA resulted in the identification of 4384 unique stations.

Percentiles of selected statistics of the monthly runoff database. Shown are the fraction of missing months (Fraction missing), the time series length in months (Length) as well as the catchment area in square kilometres (Area).

Locations of streamflow stations, stemming from the three
considered data collections. Records from the EWA and the GRDB
that were identified as

Spatial and temporal coverage of available streamflow observations. The top row shows the date of the first and the date of the last available observation at each station. The bottom panel shows the total number of stations with observations for each month.

The 4384 linked records from the EWA and the GRDB were combined with the 1184
stations from AFD (Fig.

Overview on the spatial and seasonal distribution of missing months. Shown are the fraction of missing months at each station (left), the month which has on average most missing values at each station (centre) and the regional frequency distribution of the months with the most missing values (right). NM indicates no missing values.

Climate records can exhibit changes which do not reflect real climatic
or environmental change. In the context of river flow, such
breakpoints could, for example, be related to changes in instrumentation,
gauge resaturation, re-calibration of rating curves, flow regulation
or channel engineering. In the climatological literature such effects
are commonly referred to as inhomogeneities. While a substantial body
of literature is devoted to the treatment of inhomogeneities in
atmospheric variables

Identification of inhomogeneities in large data collections is usually
based on tests that aim at identifying breakpoints in the considered
time series. Such breakpoints can, for example, be a sudden shift in the mean, variance or higher-order moments. For the presented data
product the test battery for inhomogeneity detection that is used by
EAC&D13 is employed:

Standard normal homogeneity test

Buishand range test

Pettitt test

Von Neumann ratio test

The considered tests are based on the assumption that the data points of the
time series are independent and identically distributed (iid). To approximate
this assumption, the monthly mean time series (Sect.

As runoff usually has a skewed distribution, the monthly time series were log-transformed. As the logarithm is not defined for zero values, 0.01 was added before transformation.

To remove the seasonal cycle and to reduce the influence of monotonic trends, the log-transformed monthly time series were detrended for each month separately. For this, a linear least-squares trend was fitted to all Januaries, Februaries, etc. and subsequently subtracted from the corresponding months.

The detrended runoff residuals can still exhibit a high degree of
serial correlation, violating the iid assumption. Therefore the
residuals were further pre-whitened. For this we followed previous
studies

The four tests were subsequently applied to the preprocessed time
series. Following EAC&D13, the credibility of time series is
classified based on the number of tests that reject the null
hypothesis of no breakpoint:

The test battery was applied to monthly runoff series that had at least
24 monthly values from 1950 onwards, corresponding to the time window of the
presented data product. Figure

Number of stations for which 0, 1, …, 4 of the
considered tests reject the null hypothesis of no
breakpoint (1 % level) at monthly resolution. Stations with more
than one rejection are marked as

Homogeneity testing: number of tests that reject
the null hypothesis of no breakpoint at each station considered at
the 1% level. Stations marked blue (zero or one rejection) are
considered

The methodology for estimating runoff at ungauged locations proposed by GS15
relies on assigning gauging stations with relatively small catchments to
regular spatial grids. Here the monthly mean runoff rates of the selected
stations were assigned to the 0.5

Assigning stations to the 0.5

Select stations:

Only stations with catchment areas

Only stations with at least 24 non-missing months from 1950 onwards are selected

Only stations that are labelled

Only stations with a long-term mean runoff less than 10 000 mm year

Assign stations to the grid cells which include the station coordinates.

Compute the weighted mean runoff rate of all stations within a grid cell, using the catchment areas of the available stations as weights. The weights are calculated for each time step separately to account for irregular temporal coverage of the stations.

This procedure resulted in a total of 2771 selected stations which were
assigned to 1073 grid cells, implying that there are on average 2.5 stations
assigned to each grid cell. Figure

Frequency distribution of grid cells with 1, 2, …, 24 stations.

Spatial and temporal coverage after assigning the monthly
runoff series to the 0.5

As the above-described procedure can assign data from several stations with
different temporal coverage to one grid cell, it can happen that the resulting
time series exhibits sudden jumps or other inhomogeneities. To reduce the
influence of such artifacts the homogeneity testing that was applied to the
station data (Sect.

Table

Final selection of grid cells with observations. Only grid cells with homogeneous time series were selected. See text for details.

Number of grid cells for which 0, 1, …, 4 of the considered tests reject the null hypothesis of no breakpoint (1 % level) at monthly resolution. Grid cells with more than one rejection are excluded from the analysis.

The technique used to estimate gridded runoff time series is identical to the
approach introduced by GS15. For convenience we provide here a brief
overview of this method. For a full description of the employed
methods we refer to GS15. Following GS15 we aim at modelling the
monthly runoff rate

As in GS15 the model selection and validation is conducted using two independent cross-validation experiments. For the first experiment, the grid cells with observations were randomly split into 10 equally sized subsamples. The model was then trained using 9 of the 10 subsamples and subsequently used to predict the remaining subsample. This procedure was repeated until each subsample has been left out once and is referred to as cross validation in space. This focuses on the accuracy of estimates at locations that were not used for model training. The second experiment focuses on the accuracy at time steps that were not used for model training. For this the available data where split into 10 consecutive time blocks. The model was then trained using 9 of the 10 time blocks and subsequently used to predict the time block that has been left out. This procedure was repeated until each time block has been left out once.

As any other machine learning tools, RFs have a number of parameters that
control the trade-off between the flexibility and the reliability of the
resulting model. While GS15 used the default parameters recommended by

To investigate the effect of different values of

As in GS15 model selection is based on the global root mean square error
(RMSE), computed over all time steps and grid cells. Uncertainty in the
RMSE is quantified in terms of 95 % bootstrap confidence intervals (2000
replications). The optimal values of

For all

Identify RMSE

Choose any larger

If the results between cross validation in space and cross
validation in time differ, choose the smaller

Choose the smallest

Cross-validation error different values of
the nose size parameter (

Spatial distributions and box plots (whiskers: 10th
and 90th percentiles; box: interquartile range; bar: median) of

Figure

We employ here the same performance metrics that have been used by
GS15 to quantify the accuracy of the gridded runoff estimate. For
convenience we reproduce here the definition of the
considered metrics, where

Spatial distributions and box plots (whiskers: 10th and 90th
percentiles; box: interquartile range; bar: median) of

The seasonal cycle skill score

where

The model efficiency

where

The relative model bias

which has an optimal value of zero. Positive and negative values indicate overestimation and underestimation respectively.

The coefficient of determination (squared correlation
coefficient),

The coefficient of determination between the observed and the
modelled mean annual cycle,

The coefficient of determination between the monthly anomalies
(i.e. monthly time series with the long-term mean of each month
removed),

Figures

Long-term mean of the presented gridded runoff field as well as the month of the maximum and minimum of the mean annual cycle.

The final observation-based gridded runoff dataset is created by first
training the model using all available stations and E-OBS precipitation and
temperature. Subsequently the model is used to estimate monthly runoff rates
[mm day

The spatial and temporal extent of the data is determined by the coverage of the forcing data.

A consequence of the time-lag operator in Eq. (

Most station data are located in central and western Europe, suggesting that the data will have the highest degree of accuracy in these regions. In other regions the reliability of the data is expected to decrease gradually. Therefore special care should be taken if analysing the data in regions with low station coverage.

The E-OBS dataset also covers parts of the Caspian Sea and other large inland water bodies. Although it might not be physically meaningful to provide runoff estimates for these locations, we opted not to remove the corresponding grid cells from the dataset. The rationale underlying this decision is that the definition of shorelines in gridded data products depends on several assumptions and we want to allow the users to make such choices corresponding to their needs.

In the following we present two example applications of the newly developed dataset. These applications closely follow the ones presented in GS15.

Standardised runoff anomalies for selected drought events in Europe.

Figure

As runoff reflects the excess water that is available to ecosystems, it is an
interesting candidate for drought monitoring. To assess droughts, we follow
previous studies (

Figure

In conclusion, we presented an observational dataset that provides monthly
pan-European runoff estimates and ranges from December 1950 to December 2015.
The data are a significant update of our previous
assessment (GS15), which only included data ranging to 2001. The dataset is
based on an unique collection of streamflow observations from small
catchments which were upscaled on a 0.5

The data are publicly available in NetCDF format

The streamflow observations collected in the ERDB (Sect.

In the following the different fields of this meta-data table are briefly described. For convenience, we partition the description of the meta-data into three blocks, labelled Part A to Part C:

summarises
information on names, spatial location and temporal coverage:

The database identifier used to organise ERDB. This
identifier is structured as

Country code.

Name of the river or stream.

Name of the station.

Longitude of the station in decimal degrees.

Latitude of the station in decimal degrees.

Altitude of the station in metres above sea level.

Catchment area in square kilometres.

Date of the first entry in the time series.

Date of the last entry in the time series.

Time series length in number of months.

Number of months with non missing data.

The fraction of missing months.

summarises the
results of the record linkage procedure described in Section

Database identifier of EWA, if any EWA record is assigned to the entry.

Database identifier of GRDB, if any GRDB record is assigned to the entry.

Database identifier of AFD, if any AFD record is assigned to the entry.

The value of

The value of

The value of

The value of

The value of

summarises the
results of the homogeneity assessment (Sect.

The results of the standard normal homogeneity
test. Following values are possible:

The results of the Buishand range test. See

The results of the Pettitt test. See

The results of the Von Neumann ratio test. See

The support of the ERC DROUGHT-HEAT (contract no. 617518) and DROUGHT-R&SPI
projects (contract no. 282769) is acknowledged. We acknowledge the E-OBS
dataset from the EU-FP6 project ENSEMBLES
(