The dataset presented here consists of an ensemble of 10 global
hydrological and land surface models for the period 1979–2012 using a
reanalysis-based meteorological forcing dataset (0.5
Water security concerns all global economies, rich and poor
Only a limited number of global reanalysis datasets that can support water
resources analysis is available. Pioneered by GLDAS,
(
We use the WFDEI dataset to force a set of 10 global models, both land
surface models (LSMs) and global hydrological models (GHMs). By using a
sizeable set of models we take steps to mitigate some of the errors and
uncertainties that are introduced in individual models by the simplified
representation of spatially heterogeneous real world processes like water and
energy balances, river routing and seasonal varying vegetation cover
The multi-model ensemble presented here inherits a number of models from the
WATCH project supplemented by additional models, a new forcing dataset
(WFDEI), a WFDEI-derived reference potential evapotranspiration dataset and a
new modelling protocol. Furthermore, we introduce the data repository where
the results are stored in an open format including all data needed for other
groups to perform a similar exercise. In the end this can contribute to a
better understanding of the characteristics of the increasing number of
global models
In this paper we present the first version of the dataset, which is based on the current state of the art of the contributing modelling systems and will provide a benchmark to evaluate improvements made to the models and forcing data in the coming years. The main goal of this paper is to provide a multi-decadal dataset of water balance components from an ensemble of models that is open and of use for further research and applications. Secondly, we investigate whether the ensemble mean in this dataset is superior to the individual models given the diverse set of models, and if so, for which variables.
First, we describe the methods and models we have used. Secondly, we
investigate the characteristics of the resulting dataset using the
multi-model signal-to-noise ratio (SNR) to investigate multi-model agreement and
the tools from The International Land Model Benchmarking Project
Each of the models used produced results for the period 1979–2012 based on
the provided meteorological forcing. In total 10 models were used, both
large-scale hydrological models and land surface models with extended
hydrological schemes (see the list below and Table
Overview of models and summary of processes included.
HBV-SIMREG, SWBM, LISFLOOD and WaterGAP3 all have been calibrated in previous studies based on observed runoff data, although these calibration efforts were done with different forcing datasets (see the respective model papers listed above). The other models rely on a priori parameter estimation alone.
The data used to force the models were from the WFDEI dataset
A list of the most important output variables is presented in
Table
Overview of the meteorological forcing used in the simulations, and
the corrections applied to the original ERA-Interim during the WFDEI
processing
List of most important output variables and conventions. If a standard name is not available the name will be used in the respective netCDF files.
NA
Similar to other global forcing datasets (
The simulations were performed from 1 January 1979 to
31 December 2012 in a continuous run. With respect to static fields (e.g.
soil physical parameters, land cover type) each modelling group used their
own datasets, as this is considered to be part of the modelling system, and
exchanging these fields between models is not straightforward. Two simple
quality control tests were applied to the data: (i) generic metadata and
quality control (including mass balance checks) and (ii) comparison of minimum,
maximum and mean fields. The first test is automatic (script available at:
To ensure uniform input and use of the data the project's servers have been
configured to host the forcing data and also provide an interchange platform
for the project using a THREDDS data server
Flow of input and model data via the eartH2Observe Water Cycle
Integrator (WCI). All data are accessible to users via a number of open
protocols and a tailor-made user interface at
All model outputs passed the metadata consistency checks. The simulations
from WaterGAP3, PCR-GLOBWB and JULES have grid points with residuals above
the defined threshold (5.
An important component of a multi-model dataset is the possibility to
characterize the multi-model agreement or consistency. While such
characteristics do not imply quality or skill, it can provide an overview of
the regions and variables where datasets strongly disagree. This information
can be used either by the modelling community to focus on particular aspects
of their models or by users as a first order uncertainty estimate of the
multi-model ensemble. The agreement metric selected here is the
SNR, which compares the signal to noise levels by
relating the ensemble's variance to that of the individual members, which has
been widely used as a classical measure of predictability in seasonal
forecasting
We performed the calculations with monthly mean anomalies to focus on the
model agreement in terms of intra-seasonal to inter-annual variability. Since
all models were driven by the same atmospherical conditions, low values of
SNR can be directly associated with differences in the representation of
processes such as energy partitioning and runoff generation – i.e. ensemble
uncertainty. However, since one single forcing was used, the ensemble is
missing an important source of uncertainty: the driving data. Precipitation
is likely the main source of uncertainty; it is very important for the
terrestrial water balance while at the same time it is difficult to observe
both locally and remotely. To put our results into perspective, we also
computed the SNR of an ensemble of precipitation datasets including three
atmospheric reanalysis datasets (ERA-Interim:
The SNRs were computed for the period January 1980 to December 2012 by
removing the mean annual cycle in each grid point from each ensemble member
such that
The multi-model consistencies in terms of inter-annual variability evaluated
by the SNR are shown in Fig.
Signal-to-noise ratio of monthly mean anomalies of
evapotranspiration
Distribution of the SNR of monthly anomalies over
different BIOMES (horizontal axis; see Fig.
We use the ILAMB system (The International Land Model Benchmarking Project;
ILAMB provides a scoring system to relate modelled results to reference
datasets. In the ILAMB system multiple performance metrics are calculated,
and additionally these metrics are converted to scores ranging between 0 and
1 to facilitate comparison and averaging. In this study three performance
metrics are calculated for each of the five model variables evaluated (ET,
TWSA, SMA, SWE, SC): total bias, root mean square error (RMSE) and phase
difference (difference in months between peak values); furthermore a total of
five 0–1 scores are calculated, for global bias, RMSE, seasonal cycle,
spatial distribution and inter-annual variability, plus a 0–1 overall score
that summarizes them. The metrics and scoring are explained in detail in the
ILAMB documentation
(
BIOMES used in calculating regional averages. These are: AUST (Australia), EQAS (equatorial Asia), SEAS (Southeast Asia), CEAS (central Asia), BOAS (boreal Asia), SHAF (Southern Hemisphere Africa), NHAF (Northern Hemisphere Africa), MIDE (Middle East), EURO (Europa), SHSA (Southern Hemisphere South America), NHSA (Northern Hemisphere South America), CEAM (Central America), TENA (temperate North America) and BONA (boreal North America).
Components used in total water storage estimation for each model. The definition of the variables can be found in Table
Although there are a number of uncertainties associated with TWSA as
estimated by GRACE measurements
Terrestrial water storage anomaly metrics for the ensemble mean, from top to bottom: SD of bias, phase difference (months) and root mean square error.
For evapotranspiration our results compare better to the GLEAM products (mean
model total score of 0.83 for both products) but less so to the MODIS product
(mean model score of 0.78; see Table
Model mean evapotranspiration compared to the MODIS and GLEAM-V2a/GLEAM-V3b products. The difference in model mean annual ET in the last three rows is due to different periods used for the comparison (GLEAM-V2a 1980–2011, GLEAM-V3b 2003–2012, MODIS 2000–2012).
All models provided SWE, while only six models provided SC. Total performance
against the reference dataset was highest for ORCHIDEE, WaterGAP3 and
LISFLOOD although the bias is fairly large for all models (see
Fig.
Bias in kg m
Difference in peak snow cover (SC) month for the models compared to the IMS dataset.
Although current satellite-derived surface soil moisture products that cover
a long period have a number of limitations
Averaging depth of surface moisture and root zone moisture in the models.
Soil moisture anomaly dynamics and climatology over Australia and Southeast Asia for all models compared to ESA CCI SM.
Table
Table
Global mean yearly precipitation
Mean evaporation and runoff for the whole period compared to the
change in storage of the total moisture component of each model. Mean
precipitation for the whole period using the common land surface mask was 863
(kg m
Comparison of mean annual total of terrestrial precipitation, evapotranspiration and runoff with previous studies.
Average runoff plotted against average total evaporation (both
expressed in kg m
All data are made available via the eartH2Observe server which can be
accessed via the WCI portal (
For most of the variables we have found that the ensemble of land-surface and hydrological models gives satisfactory results and the agreement between the models is good for large parts of the terrestrial earth. The multi-model agreement in terms of monthly anomalies using the SNR provided an insight into the main regions/variables where the dataset shows a reduced multi-model agreement: (i) snow-dominated regions (in all three variables – evapotranspiration, runoff and root zone soil moisture) and (ii) tropical rainforest and monsoon regions (only for evapotranspiration). Furthermore, the SNR of an ensemble of precipitation datasets was calculated, indicating a large uncertainty of precipitation in the tropics, which is not reflected in the ensemble runoff from the models. In cold regions the precipitation uncertainty derived from the available datasets is small compared to the uncertainty of the multi-model simulations. This suggests that the model cold processes are an important factor in this multi-model disagreement. However, in these regions there are no satellite estimates and a limited number of rain gauges, which means that the current global datasets most probably underestimate the precipitation uncertainty in those regions.
The ability of the multi-model ensemble to model total water storage dynamics
at the scale of the GRACE data is generally good although models predict the
peak in total water storage earlier in all regions. The fact that the phase
difference is largest in the cold zones also indicates that there are
difficulties in modelling the snow pack. This is in line with the observation
of
Although the scores indicate a good performance of the ensemble, the evapotranspiration estimates are higher than those by the benchmark datasets. This, combined with the large spread within the ensemble itself, indicates that the ET estimates have a large uncertainly and further work is needed to improve the results. It also shows that in future versions of the dataset potential ET (PET) and net radiation should also be reported by all models as the choice of PET calculation method and net radiation estimate may be large contributors to the recorded spread in ET estimates.
The current study shows a wide spread in runoff into the oceans derived from the set of models used. The large range stems from a combination of different total evaporation values and different storage dynamics in the models due to the different concepts and parameterization of runoff generation. Given the large spread it seems plausible that the ensemble mean provides the most reliable estimate of the global water fluxes although there is no independent way of testing this assumption.
At the global level the multi-model ensemble mean provides the best (or close
to the best) performance for most of the variables we investigated using the
ILAMB system although caution should always be used.
The above shows a couple of areas of importance for further development of
global models and datasets and the current set in particular: precipitation
estimates in the tropics, cold weather processes and evapotranspiration
losses. This does not mean that other processes are already properly
represented in the global models and that the influence of these processes is
not important or not reflected in the current results. In particular for snow
precipitation we rely on reanalysis mostly and the uncertainty in the SWE
estimates could also stem from snow input. Work on improving the
precipitation estimates has been started by creating a merged precipitation
product
Constraining models with soil moisture may reduce the spread in
evapotranspiration rates and discharge estimates (see e.g.
One way of making the forcing data and model results more relevant for
basin-scale studies is by including higher-resolution model runs. Several of
the models will be running at a higher resolution in a future set of runs and
the common resolution will be increased to 0.25
A literature search reveals that the data we produced have already been used
by several other researchers including a study investigating
vegetation–atmosphere coupling
The SNR of the multi-model ensemble was computed as
the ratio between the external variance (
Global variables summary.
Diagnostic summary for soil moisture anomaly: model vs. ESA-CCI.
Diagnostic summary for evapotranspiration: model vs. GLEAM-V3B.
Diagnostic summary for snow water equivalent: model vs. GLOBSNOW.
Diagnostic summary for snow cover: model vs. IMS.
Diagnostic summary for terrestrial water storage anomaly: model vs. GRACE.
JS wrote most of the text, performed the global land surface water budget analysis and interpreted the validation with external datasets. ED performed the SNR analysis, the HTESSEL-CaMa runs and supplied the model information. AMdlT did the JULES runs, supplied the model information and performed the evaluation with the ILAMB system; GB supported with the HTESSEL-CaMa runs; AvD performed the W3RA runs, supplied the model information and supported in editing the paper; FSW supported editing the paper and performed the PET calculations; MM, J-CC and BD performed the SURFEX-TRIP runs and supplied the model information. SE, GF and MF performed the WaterGAP3 runs and supplied the model information. SP and RvB performed the PCR-GLOBWB runs and supplied the model information. JP performed the ORCHIDEE runs, supplied the model information and assisted in the terrestrial water budget calculation. HB performed the LISFLOOD and HBV-SIMREG runs and supplied the model information. RO performed the SWBM runs and supplied the model information. BC developed and maintained the data server and portal and supplied the portal information. SB supplied the WaterWorld information. WD supplied the soil moisture data and assisted in the analysis. GPW supplied the WFDEI forcing data and information and assisted in the analysis.
The authors declare that they have no conflict of interest.
This research received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 603608, “Global Earth Observation for integrated water resource assessment”: eartH2Observe. The National Snow and Ice Data Center is acknowledged for providing the IMS snow cover data. We thank Tim Stockdale for the suggestions on the use of the SNR analysis.Edited by: David Carlson Reviewed by: two anonymous referees