Articles | Volume 15, issue 7
Data description paper
04 Jul 2023
Data description paper |  | 04 Jul 2023

An ensemble of 48 physically perturbed model estimates of the 1∕8° terrestrial water budget over the conterminous United States, 1980–2015

Hui Zheng, Wenli Fei, Zong-Liang Yang, Jiangfeng Wei, Long Zhao, Lingcheng Li, and Shu Wang

Terrestrial water budget (TWB) data over large domains are of high interest for various hydrological applications. Spatiotemporally continuous and physically consistent estimations of TWB rely on land surface models (LSMs). As an augmentation of the operational North American Land Data Assimilation System Phase 2 (NLDAS-2) four-LSM ensemble, this paper describes a dataset simulated from an ensemble of 48 physics configurations of the Noah LSM with multi-physics options (Noah-MP). The 48 Noah-MP physics configurations are selected to give a representative cross-section of commonly used LSMs for parameterizing runoff, atmospheric surface layer turbulence, soil moisture limitation on photosynthesis, and stomatal conductance.

The dataset spans from 1980 to 2015 over the conterminous United States (CONUS) at a monthly temporal resolution and a 1/8 spatial resolution. The dataset variables include total evapotranspiration and its constituents (canopy evaporation, soil evaporation, and transpiration), runoff (the surface and subsurface components), as well as terrestrial water storage (snow water equivalent, four-layer soil water content from the surface down to 2 m, and the groundwater storage anomaly). The dataset is available at (Zheng et al.2022). Evaluations carried out in this study and previous investigations show that the ensemble performs well in reproducing the observed terrestrial water storage, snow water equivalent, soil moisture, and runoff. Noah-MP complements the NLDAS models well, and adding Noah-MP consistently improves the NLDAS estimations of the above variables in most areas of CONUS. Besides, the perturbed-physics ensemble facilitates the identification of model deficiencies. The parameterizations of shallow snow, spatially varying groundwater dynamics, and near-surface atmospheric turbulence should be improved in future model versions.

1 Introduction

Estimates of the terrestrial water budgets (TWBs) – evapotranspiration, runoff, terrestrial water storage, and their constituents – over continental domains are of high interest for a broad range of hydrological applications. Publicly available data have been applied to investigate the state of the terrestrial water cycle (Trenberth and Fasullo2013a; Rodell et al.2015; Scanlon et al.2018; Yin and Roderick2020); to understand the interactions among hydrological processes, vegetation, climate, and human activities (Trenberth and Fasullo2013b; LaFontaine et al.2015; Ward et al.2014; Levia et al.2020); to examine the availability and variability of water resources and use (Wu et al.2021; Hejazi et al.2014; Scanlon et al.2012; Voss et al.2013; Lv et al.2019; Le et al.2011; Rodell et al.2009); and to assess the risk of extreme events such as droughts (Peters-Lidard et al.2021; Prudhomme et al.2014; Dai2013; Su et al.2021) and floods (Emerton et al.2017; Lin et al.2018).

As the applications have expanded, the availability of TWB estimates has increased rapidly (Peters-Lidard et al.2018; Saxe et al.2021; Zhang et al.2018). Commonly used estimation methods include remote sensing, in situ observations, and model simulations (Saxe et al.2021; McCabe et al.2017; Pan et al.2012; Gao et al.2010; Trenberth et al.2007). Among these methods, land surface models (LSMs) are apt for continuously producing physically consistent TWBs over a large domain and long period, and their characteristics are particularly favorable for certain circumstances. For instance, LSMs can estimate various TWB components simultaneously; whereas, for some components, such as runoff (Lin et al.2019; Beck et al.2017), root-zone soil moisture (Xia et al.2015b, a), and transpiration (Lian et al.2018), direct remote sensing is either unavailable or highly uncertain. Additionally, LSMs are valuable in remote or topographically complex regions because of the sparseness of in situ observations (Kim et al.2021). Estimations based on remote sensing and in situ observations are often impeded by scale mismatches and observation gaps, whereas these issues are rarely an impairment for LSM simulations. Besides, LSM simulations can complement remote sensing and in situ observations well. Combinations of estimates from different techniques can improve the estimation accuracy (Zhang et al.2018; Pan et al.2012; Zhao and Yang2018), while comparisons between model-simulated estimates and observations can reveal the impacts of human activities (Zaussinger et al.2019) and underground processes (Zheng et al.2020).

Several operational LSM simulation systems have been set up over different regions of the globe (Xia et al.2019; Shi et al.2011; Carrera et al.2015; Rodell et al.2004). The systems combine an ensemble of LSMs to utilize the competitive strengths of different LSMs and eliminate the weakness associated with individual ones. Among them, the North American Land Data Assimilation System (NLDAS) (Xia et al.2012a, b; Mitchell et al.2004) stands as a pioneering and successful one. The NLDAS Phase 2 (NLDAS-2) operates over the conterminous United States (CONUS) from 1979 to near real time at a spatial resolution of 1/8. The system generates a set of multi-source synthesized data of surface meteorology, vegetation, and soils and uses them to drive an ensemble of four different LSMs. The four LSMs – namely Noah version 2.8 (Ek et al.2003; Chen and Dudhia2001a, b; Chen et al.1997), Variable Infiltration Capacity (VIC) version 4.0.3 (Liang et al.1994), Mosaic (Koster and Suarez1992), and Sacramento Soil Moisture Accounting (SAC) model (Burnash et al.1973), were selected to give a good cross-section of the diverse range of LSMs (Mitchell et al.2004). The models have varying strengths and weaknesses in process parameterizations and modeling skills (Kumar et al.2017). A combination of multiple models can produce an aggregated estimate that outperforms most of the individual constituents (Fei et al.2021; Beck et al.2017; Guo et al.2007; Ajami et al.2007). An ensemble can also quantify the estimation uncertainty resulting from different model formulations (Troin et al.2021; Cloke and Pappenberger2009). Evaluations of the NLDAS-2 four-LSM ensemble estimates have shown satisfactory performance in matching the observed evapotranspiration (ET) (Zhang et al.2020; Xia et al.2012b; Kumar et al.2018), runoff (Xia et al.2012a), and soil moisture (Xia et al.2015b, a).

We have enriched the NLDAS-2 four-model ensemble with 48 perturbed-physics configurations of the Noah LSM with multi-physics options (Noah-MP) (Fei et al.2021; Zheng et al.2020, 2019). Noah-MP has more physically realistic representations of vertical stratification than the NLDAS-2 models have. A column of land in Noah-MP consists of a vegetation canopy layer, three snowpack layers, four soil layers, and a groundwater component (Niu et al.2011; Yang et al.2011). Conceptual (e.g., the five water tanks of SAC) and lumped (e.g., the combined vegetation–soil surface layer of Noah) representations of the stratification of vegetation and soil, as used in the NLDAS-2 models, are minimized. Moreover, Noah-MP has a more comprehensive representation of various land surface processes that are evident at different depths. The modeled processes include snow accumulation and ablation, infiltration, percolation, retention, freeze–thaw of snow or soil water, groundwater recharge/discharge, and energy constraints (Niu et al.2011). These improvements in vertical stratification and process parameterizations are expected to better estimate TWBs. Indeed, previous comparisons between Noah-MP and the four NLDAS-2 LSMs have shown that Noah-MP is comparable or better when it comes to estimating soil moisture (Cai et al.2014b), runoff (Cai et al.2014a; Fei et al.2021), and ET (Zhang et al.2020).

Our enrichment also features a single-model perturbed-physics ensemble, which is different from the widely used multi-model ensemble approach. The Noah-MP ensemble is constructed by shuffling the available parameterization options of several selected processes. The ensemble size grows exponentially as a multiplication of the available parameterization options of different processes (Yang et al.2011; Zhang et al.2016; Gan et al.2019). A large ensemble should give a broad cross-section of feasible model formulations to account for the model uncertainty in TWB estimation (Telteu et al.2021; Mitchell et al.2004) and is critical for a statistically reliable estimation of the probability of hydrological events such as floods and droughts (Troin et al.2021). The single-model perturbed-physics ensemble also facilitates uncertainty attribution and reduction. The ensemble consists of pairs that are different in the parameterization of one process and the same for another. Variance analysis of the ensemble can quantify the contribution of the parameterization of a process and compare the relative importance of two processes (Zheng et al.2019; Clark et al.2011). Such quantification could inform further model development to reduce the model uncertainty. However, there are also pitfalls unique to the single-model perturbed-physics ensemble. Fei et al. (2021) found that the ensemble members generated by naive perturbation of the Noah-MP physics are not independent enough from each other. The low independence hinders the skill gained from the ensemble method. The finding suggests that advanced techniques of physics perturbations should be developed to maximize the ensemble skill and minimize the ensemble size. An open-accessible dataset should facilitate the research that leverages the advantages and addresses the issues of the perturbed-physics ensemble.

We have previously evaluated the runoff and compared it with NLDAS-2 (Fei et al.2021). This paper describes the estimation of all the TWB variables, along with the description of the spread among the ensemble members, the difference with the NLDAS models, and the performance with reference to various observations are presented. The paper is organized as follows. Section 2 presents the information necessary for using the dataset, including the dataset variables, file organization, and the source data and models used for data generation. Section 3 describes the intercomparison methods, evaluation metrics, and reference datasets. Section 4 presents the results and discussion. Finally, after stating the data availability in Sect. 5, Sect. 6 draws conclusions.

2 Data description

The dataset contains gridded water budget variables over CONUS. Section 2.1 describes the dataset variables and their physical relationships. The 48 Noah-MP physics configurations used to create the dataset are detailed in Sect. 2.2. Section 2.3 briefly covers the atmospheric forcing, the static parameters of vegetation and soil, and the simulation settings.

2.1 Dataset variables

Table 1 lists the dataset variables. The variables are available at each 1/8 grid point in NLDAS-2, indicated by a land–water mask (X). The surface water budgets of each grid cell are represented as follows.

Table 1The dataset variables.

Download Print Version | Download XLSX

Neglecting horizontal water exchange between adjacent grids, the water budget closure can be obtained among the precipitation (P; kgm-2s-1), ET (E), runoff (R), and terrestrial water storage change (W=dWdt, kgm-2s-1) (Zheng et al.2020):

(1) P = E + R + W ,

where precipitation (P) is from NLDAS-2 (described in Sect. 1) and used as the model input.

Noah-MP resolves the components of the water budget closure Eq. (1). ET (E) consists of canopy evaporation (Ecan), ground evaporation (Egnd), and transpiration (Etran):

(2) E = E can + E gnd + E tran .

Runoff (R) has a surface (Rsrf) and subsurface (Rsub) component:

(3) R = R srf + R sub .

Terrestrial water storage (TWS; W) is the sum of snow water equivalent (SWE; Wsnow), groundwater storage in unconfined aquifers (Wgw), and soil water content in the four model layers (Wsoil,i):

(4) W = W snow + W gw + i = 1 N soil W soil , i ,

where Nsoil=4 is the number of soil layers. Soil water storage (Wsoil,i) is not included in the dataset but can be calculated from the volumetric water content (wsoil,i) as follows:

(5) W soil , i = ρ wat w soil , i Δ z soil , i for i = 1 , , N soil ,

where ρwat=1000kgm-3 is the water density; and Δzsoil,1=0.1 m, Δzsoil,2=0.3 m, Δzsoil,3=0.6 m, and Δzsoil,4=1 m are the thicknesses of the four soil layers.

2.2 The 48 Noah-MP physics configurations

The Noah-MP LSM version 3.6 is used. We perturbed the parameterization of runoff, stomatal conductance, soil moisture stress factor, and near-surface atmospheric turbulence. The processes are selected, as they directly control runoff generation and evapotranspiration. Their importance has been shown in global simulations (Yang et al.2011). The perturbation creates an ensemble of 48 members (48 = 4 runoff × 2 stomatal conductance × 3 soil moisture stress × 2 turbulence). Limited by computational resources, we did not perturb the parameterization of the cryosphere hydrological processes such as snow albedo (Chen et al.2014; He et al.2019) and rain–snow partitioning (Wang et al.2019), which may limit the usage of the dataset in cryosphere hydrology. Appendix A details the formulation of and parameters in the selected parameterization. The parameters use the Noah-MP default values.

2.3 Domain, temporal span, atmospheric forcings, and static parameters

The simulation domain covers the CONUS (25–53N, 125–67W) at a spatial resolution of 0.125.

The hourly NLDAS-2 atmospheric forcings were used to drive the 48 Noah-MP configurations. This study used seven forcing variables: downward solar radiation, downward longwave radiation, air temperature, surface pressure, specific humidity, wind speed, and precipitation rate. The static datasets, including topography (, last access: 6 February 2016), predominant vegetation class (, last access: 6 February 2016), and soil texture type (, last access: 6 February 2016), are also the same as those in NLDAS-2. We used the default Noah-MP lookup tables to convert the input vegetation and soil types to parameter values.

The simulation spans 36 years from January 1980 to December 2015 at a time step of 15 min. The initial states on 1 January 1980 were obtained by cycling the year 1979 one hundred times.

3 Intercomparison and evaluation methods

The evaluations and intercomparisons in this paper are performed for 12 River Forecast Centers (RFCs): Northeast (NE), Mid-Atlantic (MA), Ohio (OH), Lower Mississippi (LM), Southeast (SE), North Central (NC), Northwest (NW), Arkansas (AB), Missouri (MB), West Gulf (WG), California–Nevada (CN), and Colorado (CB). Figure S1 in the Supplement displays the geographical delineation of the RFCs. More details on the RFCs, such as their multi-year average precipitation, potential evaporation, and topography, can be found in Fei et al. (2021, Fig. 1).

The intercomparison and evaluations were conducted at different timescales – the long-term climatological mean, annual cycle, and interannual anomaly. Section 3.1 details how the temporal variations and ensemble spread are derived for the timescales. We utilized the Taylor diagram and Taylor skill score (TSS) to measure the performance of Noah-MP against various reference datasets. The evaluation methods are shown in Sect. 3.2, and the reference datasets are described in Sect. 3.4. In addition to the intercomparisons and evaluations, Sect. 3.3 introduces the Sobol' sensitivity index for the process sensitivity analysis.

3.1 Temporal variability and ensemble spread

We calculated the total temporal variability (σtotal), the variability of the annual cycle (σancy), and the interannual variability (σanom) for each ensemble member and the ensemble arithmetic mean following Dirmeyer et al. (2006):


where xclim, xancy, and xanom are the climatology, annual cycle, and interannual anomaly of the monthly time series xy,m (month m of the year y; m=1,,12; y=1,,Y).

The ensemble spread for the total time series (σˇtotal), annual cycle (σˇancy), and interannual anomaly (σˇanom) are derived also following Dirmeyer et al. (2006):


where σ(x)=n=1N(x-xˇ)2/N denotes the standard deviation across the ensemble members at each time step. xˇ denotes the arithmetic ensemble mean.

We calculated the ratio (r=σˇ/σ) of ensemble spread σˇ to temporal variability σ for the total monthly time series (with a subscript of “total”), annual cycle (“ancy”), and interannual anomaly (“anom”). The ratio enables the intercomparison of the ensemble spread among the selected timescales. A grade is assigned according to the ratio following Dirmeyer et al. (2006): a grade “A” for r<0.316; “B” for 0.316r<1; “C” for 1r<3.16; “D” for 3.16r<10; and “E” for r>10. A lower (higher) r or a higher (lower) grade denotes a lower (higher) ensemble spread.

3.2 Taylor diagram and skill score

The Taylor diagram (Taylor2001) is a graphical representation of how closely a model simulation matches observations in terms of correlation coefficient (R), normalized unbiased root mean square error (nuRMSE), and normalized standard deviation (σ^f). The TSS is an index that measures the distance between a model simulation and the observations in the Taylor diagram. The TSS is defined as follows:


where σf and σo are the standard deviations of the model simulation and the observation, and R0 is the maximum correlation coefficient attainable (in this study, R0=1). The value range of TSS is from 0 to 1. A higher TSS indicates a higher overall performance of model prediction with reference to the observations.

3.3 Sobol's total sensitivity index

The sensitivity of the Noah-MP ensemble to a physical process can be quantified by the Sobol' total sensitivity index (Sobol'1993; Zheng et al.2019). The Sobol' total sensitivity index measures the proportion of the variance of different processes to the total variance, which is defined as follows:

(14) S Λ = E Λ ( Var Λ ( Y | Λ ) ) Var ( Y ) ,

where SΛ is the Sobol' total sensitivity index for one process Λ; ∼Λ represents the other processes except for Λ; Y represents the 48 Noah-MP ensemble members; Var(Y) is the variance of Y; VarΛ(Y|Λ) denotes the variance among different parameterization schemes of the process Λ, and E(∼Λ) denotes the arithmetic average across all combinations of the other processes except for Λ. Detailed calculations and examples can be found in Zheng et al. (2019, Appendix A).

3.4 Reference data

3.4.1 Terrestrial water storage

We used the 1×1 monthly Gravity Recovery and Climate Experiment (GRACE) land water-equivalent thickness surface-mass anomaly, level-3, Release 6.0, version 04, as the reference of TWS (W in Table 1). The GRACE products from different processing centers are slightly different. To reduce the noise of the estimates (Sakumura et al.2014), we used the arithmetic average of the products from three centers: (1) GeoForschungsZentrum Potsdam (or the German Research Center for Geosciences) (, last access: 10 February 2022), (2) the Center for Space Research at the University of Texas, Austin (, last access: 10 February 2022), and (3) NASA's Jet Propulsion Laboratory (, last access: 10 February 2022). The GRACE satellites began orbiting Earth on 17 March 2002. We selected the period from 2003 to 2015 for the evaluation. There are 14 missing values during the period, which were filled with a simple linear interpolation.

The GRACE products experience signal leakages between land and water (Save et al.2016). Such signal leakage could impact the estimation in the RFCs that are adjacent to the Great Lakes (Ma et al.2017) and oceans. To alleviate the impacts of the Great Lakes, we corrected the GRACE TWS estimates in the NC, OH, and NE RFCs using the water level variations observed by the National Oceanic and Atmospheric Administration (NOAA). Appendix B details the algorithm.

3.4.2 Soil moisture

We used the daily North American Soil Moisture Database (NASMD;, last access: 6 February 2016) (Quiring et al.2016) as the reference for the simulated soil moisture (Wsoil,i in Table 1), similar to previous NLDAS evaluations (Xia et al.2015b, a). NASMD assembles the soil moisture time series at multiple depths of more than 2200 stations of 24 networks with quality control. The observation depth varies with the network. We interpolated the observations to the centers of the Noah-MP soil layers, which are 0.05, 0.25, 0.7, and 1.5 m, respectively. The interpolation is performed only when a valid observation is exactly at, or two valid observations exist above and below, the given depth; otherwise, a missing value is given. We excluded the soil layers with more than 50 % missing values to minimize the impacts of missing values on the evaluation, after which 264 wsoil,1, 214 wsoil,2, 95 wsoil,3, and 23 wsoil,4 valid time series remained. Daily data from 1996 to 2013 were then aggregated into monthly values. For any month, if less than 10 d of valid data are available, a missing value is assigned.

3.4.3 Snow water equivalent

We used the daily Snow Data Assimilation System (SNODAS;, last access: 13 April 2021) as the reference of SWE (Wsnow in Table 1). SNODAS is a data assimilation system developed by the NOAA National Weather Service's National Operational Hydrologic Remote Sensing Center. This system aims to provide a physically consistent framework to combine snow modeling and observations from satellites, airborne platforms, and ground stations (National Operational Hydrologic Remote Sensing Center, 2004). The original spatial resolution is 1 km × 1 km, and we aggregated the data to the 0.125 NLDAS grids. SNODAS began on 28 September 2003, and we selected the period from September 2004 to August 2015 that spans 11 whole snow seasons. Clow et al. (2012) showed that the SNODAS SWE performs well in the forest areas of the Colorado Rocky Mountains but performs poorly in the alpine areas.

3.4.4 Evapotranspiration

We used plot-scale AmeriFlux observations and two gridded products as the reference ET (E in Table 1). The two gridded products are derived from different methods, and a common evaluation period from 1982 to 2015 is selected for this study. The gridded products have different spatial resolutions. We downscaled the data to the NLDAS grids and then aggregated them to the 12 RFCs.

We selected 25 AmeriFlux sites (, last access: 18 September 2021). The 25 selected sites were selected because they have the longest observation periods for seven major land cover types (i.e., evergreen forest, deciduous forest, mixed forest, shrubland, savanna, grassland, and cropland). Figure S1 and Table S1 detail the selected sites. The data have been widely used in LSM evaluations (Cai et al.2014a; Zhang et al.2020). AmeriFlux provides hourly or half-hourly latent heat measurements. We converted the measurement to ET by dividing the latent heat of water vaporization (2.5104×106 J kg−1). The hourly or half-hourly ET is then aggregated into monthly values. In the process of aggregation, if there were less than 8 valid hours in a day, a missing day was marked; if there were fewer than 10 valid days in a month, a missing month was assigned; if there was less than 50 % valid months, the time series was dropped. In this study, the data serve as the ground truth for the gridded ET products and model estimation.

Table 2The climatological mean, ensemble spread, and temporal variability of different dataset variables. xˇ¯ denotes the climatological average of the ensemble mean. σˇancy, σˇanom, and σˇtotal denote the spread of the 48 Noah-MP configurations in simulating the multi-year averaged annual cycle, interannual anomaly, and total 36-year monthly time series, respectively. σancy, σanom, and σtotal denote the temporal variability of the annual cycle, interannual anomaly, and the total time series, respectively. The rating of the ratio σˇ/σ is defined in Sect. 3.1. The units of the variables are presented in Table 1. W̃ (W̃gw) denotes the terrestrial water storage (groundwater) anomaly (kg m−2), whereas W (Wgw) denotes the monthly terrestrial water storage (groundwater) change (kgm-2s-1).

Download Print Version | Download XLSX

The first gridded ET product is FLUXNET multi-tree ensemble (MTE) (Jung et al.2009) (, last access: 3 May 2017). FLUXNET MTE ET is a monthly dataset produced from the FLUXNET eddy covariance measurements, remote sensing, and meteorological data using the multi-tree ensemble statistical method (Jung et al.2009). The product is widely used in LSM evaluations (Cai et al.2014a; Ma et al.2017; Xia et al.2016; Jung et al.2019; Fang et al.2020; Zhang et al.2020; Pan et al.2020). FLUXNET MTE ET has a spatial resolution of 0.5× 0.5. We remap the data to the 0.125× 0.125 NLDAS grids with a first-order conservative method. We use the FLUXNET ET as the reference for the annual cycle, since it replicates the AmeriFlux observations best among the two gridded products (Tables S2–S4 in the Supplement).

The second gridded ET product is the Global Land Evaporation Amsterdam Model (GLEAM), version 3.3a (, last access: 7 May 2019), which is another widely used ET product (Xu et al.2019). GLEAM estimates transpiration, canopy evaporation, soil evaporation, open-water evaporation, and sublimation separately and then sums them as ET. The method aims to maximize the utilization of satellite information. The product estimates monthly ET at a spatial resolution of 0.25× 0.25. We bilinearly interpolated the data to the NLDAS grids. The GLEAM ET is used as the reference for the interannual anomaly, as it better replicates the AmeriFlux-observed anomaly than FLUXNET (Tables S2–S4).

3.5 NLDAS ensemble

We used three NLDAS-2 models – namely Noah-2.8, VIC-4.0.3, and Mosaic – as the benchmark of the Noah-MP ensemble. Their outputs can be publicly downloaded from the NASA Goddard Earth Sciences Data and Information Services Center (, last access: 6 February 2016). More information on the NLDAS-2 models, the forcing and static datasets, and simulation settings can be found in Xia et al. (2012b, a). The NLDAS-2 datasets have been proven to perform soundly for regional hydrological simulations (Xia et al.2012b, a, 2016, 2015b, a) and are widely used as a benchmark for LSM evaluations (Cai et al.2014b; Fei et al.2021).

4 Results and discussions

In this section, we begin by intercomparing the ensemble spread of the dataset variables (Sect. 4.1). Then, Sect. 4.24.5 examines the performance and ensemble spread of the TWS anomaly (TWSA), SWE, soil moisture, and ET, in comparison with the NLDAS ensemble. The performance of runoff can be found in our previous papers Fei et al. (2021) and Zheng et al. (2020).

4.1 Comparison of the ensemble spread

We aggregated the dataset variables across CONUS. Table 2 summarizes the ensemble spread and temporal variability of the ensemble mean for all the dataset variables. Among the variables, runoff (including surface and subsurface components) shows the largest spread, accounting for one-fifth to one-third of the climatological mean. The ensemble spread is comparable to or larger than the temporal variability. The magnitude is similar to that estimated in GSWP-2. The spread of ET and TWS is significantly smaller than those observed in GSWP-2. The high consistency among the Noah-MP configurations could be a sign of the limited sampling of available process parameterizations but could also be a result of continuous model improvements. ET might be the former case, since the ensemble does not perturb several important processes such as roughness sublayer (Abolafia-Rosenzweig et al.2021) and plant hydraulics (Li et al.2021). TWS is likely the latter case, since the parameters of different groundwater schemes are calibrated using GRACE (Niu et al.2007). The spread of snow water equivalent depicts the smallest spread, which is 2.5 % of the climatological mean. The small spread reflects a limited sampling of the cryosphere processes as discussed in Sect. 2.2. For soil moisture, the ensemble spread is marginal at the surface and increases significantly in the deep layers. The difference hints that the controlling processes vary with depth. The surface soil moisture is tightly controlled by the atmospheric forcings, whereas the spread of the subsurface soil moisture hints at the complex interplay among various land surface processes (e.g., root water uptake and subsurface runoff) (Koster2015).

Table 2 also shows the comparison between the annual cycle and interannual anomaly for different variables. For runoff and soil moisture, the ensemble spread in the annual cycle is larger (i.e., σˇancyσanom). Whereas for transpiration (Etran), groundwater storage (W^gw and Wgw), and snow water equivalent (Wsnow), the spread is larger for the interannual anomaly (i.e., σˇancyσanom). The dynamics at different times are modulated by different processes (Dickinson et al.2003). Since the timescale of the interannual anomaly is smaller than that of the annual cycle, the land surface memory and the associated land model parameterizations may contribute a larger part to the annual cycle, whereas the interannual anomaly is more tightly controlled by atmospheric forcing. The smaller ensemble spread in the annual cycle than in the interannual anomaly is a sign of insufficient representation of the physics uncertainty.

4.2 Terrestrial water storage anomaly

Figure 1 shows the annual cycle of the TWSA estimated from GRACE, Noah-MP, and NLDAS. Figure 2 presents the TSS. In the 12 RFCs over CONUS, the TWSA peaks in spring, declines rapidly in summer, reaches a minimum in autumn, and recovers in winter. In terms of the timing of the peak and trough, Noah-MP and the NLDAS models perform similarly. In terms of the amplitude of variation, Noah-MP generally produces higher values in all RFCs. Previous studies have attributed this difference to the inclusion of a bucket groundwater component in Noah-MP (Cai et al.2014b; Ma et al.2017). However, we found the Noah-MP configurations without a groundwater component can still produce a higher amplitude, especially considering the structural similarity between Noah-MP and Noah. Further investigation of the model difference is necessary.

Figure 1Annual cycle of TWSA from the model estimates and GRACE in the 12 RFCs for the period 2003–2015. The TWSA is calculated from the monthly TWS by subtracting the 13-year average (2003 to 2015). Black dots denote the GRACE observations. Error bars show the standard deviation of the year-to-year differences. The shaded areas denote the range between the maxima and minima of the 48 Noah-MP estimates. The solid red line denotes the Noah-MP multi-physics ensemble mean. The three NLDAS models (Noah, Mosaic, VIC) and their ensemble means are denoted by the blue, green, cyan, and dark golden lines, respectively. The 12 RFCs are sorted based on climatic aridity, i.e., the most humid RFC in the top left and the driest RFC in the bottom right.


Figure 2 shows the Taylor diagram for the annual cycle of TWSA. The Noah-MP configurations generally outperform the NLDAS models in most of the RFCs, which results in the superior performance of the ensemble mean (shown in Table S5). Detailed examination of the TSS reveals that Noah-MP and NLDAS have similar correlation coefficients. Their difference is manifested in the modeled standard deviation (i.e., the amplitude of variation). In NE, MA, NW, and CN, Noah-MP underperforms compared with NLDAS, mainly due to overestimating the standard deviation. However, the interpretation of the overestimation is multifaceted. First, Noah-MP could overestimate the variability due to unsuitable parameters. For instance, specific yield is an important parameter for the simulation of groundwater storage and water level (Lv et al.2021). The parameter is calibrated as 0.2 by global simulations (Niu et al.2007). The globally calibrated value may be overestimated in the RFCs, leading to an overestimation in TWSA. Second, the GRACE data could underestimate the temporal variability at these coastal RFCs due to signal leakage from the ocean (Cai et al.2014b). In AB and MB, Noah-MP performs slightly better than the NLDAS models, but both underestimate the standard deviation. The Ogallala Aquifer encompasses the two RFCs. Noah-MP can better present the groundwater changes than the NLDAS models due to an unconfined aquifer module. But Noah-MP does not include the confined aquifers, leading to an underestimation of the groundwater storage variability.

Figure 2Taylor diagram of TWSA's annual cycle for the 48 Noah-MP configurations (red dots) and three NLDAS models (blue dots for Noah, green stars for VIC, and cyan square for Mosaic) at the 12 RFCs. Black stars denote the observations. The ensemble mean of Noah-MP and NLDAS is presented by purple and dark golden diamonds, respectively. The distance between the model and observation presents the nuRMSE. The radial lines show the correlation coefficient, while the distance to the origin along the line denotes the normalized variability.


Figure 3As in Fig. 2, but for the interannual anomaly of TWSA.


Figure 4Monthly surface (0–0.1 m) and root-zone (0–1 m) soil moisture (left column) and the annual cycle (right column) from the Noah-MP ensemble, the NLDAS models, and the NASMD observations for the period 1996–2013. Only the RFCs with more than 10 observational sites are considered. Black dots denote the arithmetic average of the valid NASMD observations. Error bars in the right columns denote the standard deviation of the year-to-year differences. The shaded areas denote the range between the maxima and minima of the 48 Noah-MP estimates. The solid red line denotes the Noah-MP multi-physics ensemble mean. The three NLDAS models (Noah, Mosaic, VIC) and their ensemble mean are denoted by the blue, green, cyan, and dark golden lines, respectively. The five RFCs are sorted based on climatic aridity.


Figure 5The first column shows the TSS of the NLDAS ensemble mean in simulating the annual cycle of soil moisture at different depths (0–0.1, 0.1–0.4, 0.4–1, 1–2 m). The second column presents the difference in TSS between the Noah-MP ensemble mean and the NLDAS ensemble mean. The third column depicts the TSS difference between the arithmetic average of the Noah-MP ensemble mean and the NLDAS models to the NLDAS three-model mean. P is the percentage of sites at which a higher TSS appears relative to the NLDAS ensemble mean. P1 (P2) is the percentage of sites at which a higher TSS appear given that the Noah-MP ensemble mean outperforms (underperforms) the NLDAS ensemble mean. The evaluation period is 1996–2013.

Figure 6As in Fig. 5, but for interannual anomaly.

Figure 3 shows the Taylor diagram for the interannual anomaly of TWSA. Compared with the annual cycle (Fig. 2), both the Noah-MP configurations and the NLDAS models degrade significantly. Noah-MP still performs better than NLDAS in most RFCs. However, the superiority is marginal. In three RFCs – namely NE, MA, and OH – Noah-MP notably underperforms NLDAS. The underperformance is mainly due to higher variability than GRACE. Similar to the annual cycle, possible reasons include: (1) Noah-MP overestimated the variability due to unsuitable parameter values, and (2) GRACE underestimated the variability due to the signal leaked from oceans.

4.3 Soil moisture

Figure 4 presents the time series of the surface (0–0.1 m) and root-zone (0–1.0 m) soil moisture in NC, NW, AB, WG, and CB. These RFCs were selected as they have more than 10 valid sites. Table S6 presents the corresponding TSSs. The Noah-MP configurations are consistent in estimating the surface soil moisture, having a spread remarkably smaller than that among the three NLDAS models. The NLDAS models have more diverse representations of soil moisture. Noah is the same as Noah-MP. Both solve Richards' equation to present the soil moisture dynamics in four layers (Niu et al.2011). Mosaic also solves Richards' equation but at three soil layers. The top layer is further divided into tiles to better represent spatial heterogeneity (Koster and Suarez1992). VIC is different from them, utilizing a conceptual soil water tank (Liang et al.1994). It could be that Noah-MP underestimated the ensemble spread, especially when considering its inability to represent the subgrid heterogeneity. The NLDAS models could also overestimate the spread when considering the conceptual representation of VIC. The spread among the Noah-MP configurations increases significantly from the surface (Fig. 4e) to the root zone (Fig. 4k). The ensemble spread in the root-zone soil moisture reflects the difference in modeling root-water uptake for plant transpiration and soil-bottom drainage as described in Appendix A. Further investigation (Figs. S2–S5) shows that, in the deep layers (the third and fourth layers), Noah-MP has a comparable or greater spread than NLDAS.

Comparison between Noah-MP and Noah shows that they perform similarly in AB (Fig. 4e and f) and WG (Fig. 4g and h) but are different in NC (Fig. 4a and b), NW (Fig. 4c and d), and CB (Fig. 4i and j). The similarity in AB and WG is reasonable, since the two models have similar soil layer structures and parameterizations. The dissimilarity in NC, NW, and CB is most pronounced in winter. It could result from the different snow parameterizations in Noah-MP and Noah, which is investigated in Sect. 4.4.

In the RFCs and soil layers examined in Fig. 4, the Noah-MP ensemble mean performs similarly or better than the NLDAS ensemble mean except in AB and CB. The superiority of the Noah-MP in simulating soil moisture is also reported in previous evaluations (Cai et al.2014b). In AB and CB, individual NLDAS models do not show a consistent superiority over Noah-MP. In AB, the best NLDAS model is VIC in winter and Noah in summer. The performance of VIC in winter corresponds to the best-performing snow estimation (Fig. 8h). In CB, the best NLDAS model is Noah in winter and Mosaic in summer. Mosaic carefully considers the subgrid variability of soil moisture, which could lead to better skill in RFCs with complex topography such as CB and NW. The NLDAS ensemble mean takes the advantage of wintertime soil moisture from VIC in AB and summertime soil moisture from Mosaic in CB. Both the advantage of VIC and Mosaic come from the representation of subgrid heterogeneity.

Figures 5 and 6 compare the TSS between the NLDAS and Noah-MP ensemble mean at each NAMSD site for the annual cycle and interannual anomaly, respectively. The comparison varies significantly with site and soil layer depth, revealing two major patterns. First, similar to Fig. 4, NLDAS tends to outperform Noah-MP in AB and CB. The high skill of the NLDAS ensemble mean is likely a result of a high ensemble skill gain (Fei et al.2021) related to the diversity among the NLDAS models. Noah-MP has both a low ensemble spread and an inadequate representation of subgrid heterogeneity in the two RFCs. Second, Noah-MP outperforms NLDAS in other RFCs. In NC, OH, and MA, the low performance of NLDAS is related to the anomaly in wintertime soil moisture (Fig. S2). The anomaly suggests that the NLDAS models generally have difficulty in modeling snow and snow–soil moisture interactions (refer to Sect. 4.4 for more information). On the other hand, Noah-MP has a better snow module, leading to a higher soil moisture estimation skill.

To maximize the utilization of the NLDAS model diversity and Noah-MP physics improvements, we combine the Noah-MP ensemble mean and the three NLDAS models. The right-hand columns of Figs. 5 and 6 show that the four estimates' arithmetic average outperforms the three-model NLDAS ensemble mean at most NASMD sites. The outperformance suggests an added value of the Noah-MP data. If the Noah-MP ensemble mean already outperforms the NLDAS ensemble mean, the added value appears in almost every site. If the Noah-MP ensemble mean underperforms the NLDAS ensemble mean, the added value can still show up at approximately one-third (one-fourth) of the sites for the annual cycle (interannual anomaly).

4.4 Snow water equivalent

Figure 7a presents the spatial patterns of the multi-year averaged SWE (Wsnow) from SNODAS. Snow is mainly distributed in the northeast (NE, NC, OH, and MA) and in the mountains of the west (the Cascade Mountains, Rocky Mountains, and Sierra Nevada in NW, AB, MB, WG, CN, and CB). Figure 7b (c) shows the geographical difference between the Noah-MP (NLDAS) ensemble mean and SNODAS. Both Noah-MP and NLDAS exhibit a considerable underestimation in most areas of CONUS. However, the underestimation of Noah-MP is generally smaller. Figure 7d confirms that both Noah-MP and the NLDAS models tend to underestimate SWE in most areas but exhibit an overestimation when snow is extremely thick (SWE is greater than 400 mm). Noah-MP performs better than the NLDAS models in most cases. The superiority of Noah-MP is likely attributable to the three-layer snowpack module, which can represent snow dynamics better in a wide range of snow depth than the single-layer Noah and Mosaic snow module and the quasi-two-layer VIC snow module. A careful consideration of the surface energy balance can also contribute to Noah-MP's superiority over VIC and Mosaic. Consequently, Noah-MP captures the spatial patterns better than NLDAS, with a spatial correlation of 0.87 versus 0.43. Further examination reveals that the superiority of Noah-MP appears in all elevation bands and is the most significant between 1000 to 2000 m with a spatial correlation of 0.85 versus 0.38 (0.77 versus 0.76 below 1000 m, and 0.89 versus 0.75 above 2000 m).

Figure 7(a) Spatial distribution of the 11-year averaged (September 2004 to August 2015) SWE from SNODAS. (b) Difference between the NLDAS ensemble mean and SNODAS. (c) Difference between the Noah-MP ensemble mean and SNODAS. The spatial correlation coefficients between the NLDAS–Noah-MP ensemble mean and SNODAS are also presented.

Figure 8As in Fig. 1, but for SWE between September 2004 and August 2015.


Figure 9TSS of the ensemble means in simulating the annual cycle (left column) and interannual anomaly (right column) of SWE. (a, b) TSS of the NLDAS ensemble mean. (c, d) The difference in TSS between the Noah-MP ensemble mean and the NLDAS ensemble mean. (e, f) The difference between the arithmetic average of the Noah-MP ensemble mean and three NLDAS models to the NLDAS ensemble mean. The evaluation period is 2004–2015.

Figure 8 compares the annual cycle estimated from SNODAS, NLDAS, and Noah-MP. The annual cycle in the 12 RFCs exhibits a similar pattern: it accumulates in winter, peaks in spring, and melts from late spring to summer. The snow season in the northeastern RFCs (i.e., NE, MA, OH, and NC) spans from October to May, whereas the snow season is longer in the mountainous RFCs of the west (i.e., NW, AB, MB, WG, CN, and CB), lasting to June. From the comparison between Noah-MP and NLDAS, we make three observations. First, consistent with Fig. 7, the NLDAS models underestimate the SWE in all RFCs. Among the three NLDAS models, Noah performs the best in the northeast (e.g., NE, MA, and OH), whereas VIC shows some advantages in the western mountainous RFCs (e.g., NW, CN, and CB). Possible reasons for the Noah superiority in the northeast include the careful consideration of the surface energy balance, whereas, in the western mountains, the elevation bands of VIC can better capture the spatial heterogeneity. Second, Noah-MP outperforms the NLDAS models in almost all RFCs, except for showing a marginal degradation in AB and MB. In these two RFCs, VIC outperforms not only the NLDAS models but also Noah-MP. The terrain is hilly, and the elevation-banded parameterization enables a better representation of subgrid snow variability. Another reason is related to the shallow snow in these two RFCs. Noah-MP has known been experiencing negative biases with shallow snow (i.e., AB, MB, and WG). The bias is attributable to the shallow snow albedo as discussed in Dang et al. (2019) and Wang et al. (2020). Third, the ensemble spread among the Noah-MP configurations is small. The estimates of SWE and its uncertainty should be improved in the future by considering processes such as rain–snow partitioning Wang et al. (2019), snow albedo (Wang et al.2020; Dang et al.2019), and roughness length (He et al.2019).

Figure 9 shows the TSS of the NLDAS and Noah-MP ensembles in estimating the annual cycle and interannual anomaly of SWE. Table S7 summarizes the skill scores for the 12 RFCs. The annual cycle and interannual anomaly exhibit similar spatial patterns. NLDAS performs well in most parts of the CONUS, with TSSs higher than 0.75. In MB, TSS reaches a minimum of 0.60. In comparison with NLDAS, Noah-MP performs notably better in the east and west but marginally worse in parts of the central CONUS. The location of Noah-MP's underperformance coincides well with shallow snow and relatively flat terrain. The coincidence hints at the weakness of Noah-MP in simulating shallow snow and the advantage of VIC in representing subgrid snow variability as discussed previously.

We further averaged the 48 Noah-MP configurations and added their average to the three-model NLDAS ensemble. Figure 9e and f show that the four-estimate ensemble mean outperforms the three-model NLDAS ensemble mean in nearly all areas of CONUS, again proving the added value of the data provided in this paper.

Figure 10As in Fig. 1, but for ET in the period of 1982–2011.


Figure 11Differences between the Noah-MP ensemble mean and GLEAM in total ET (E), canopy evaporation (Ecan), ground evaporation (Egnd), and transpiration (Etran). The units are millimeters per day (mm d−1).

Figure 12Probability distribution of ET's TSS for the annual cycle (a, b) and interannual anomaly (c, d). The left column compares the Noah-MP (orange) and NLDAS (cyan) ensemble means. The right column reveals the Noah-MP (magenta) and NLDAS (dark blue) ensembles. The upper, middle, and lower quantile lines of the magenta boxes show the 75th, 50th, and 25th percentile values of the Noah-MP ensemble. The upper, middle, and lower lines of the dark blue boxes show the three NLDAS models. The blue, green, and cyan dots denote Noah, VIC, and Mosaic, respectively. The evaluation period can be found in Table S1.


4.5 Evapotranspiration

Figure 10 compares the annual cycle estimated from FLUXNET MTE, NLDAS, and Noah-MP in the 12 RFCs. We choose FLUXNET MTE as the reference here, since its performance is superior when compared to AmeriFlux (Tables S2–S4). In all the 12 RFCs, ET peaks during summer and is lowest during winter. Noah-MP successfully captures the timing of the peak in humid RFCs (i.e., NE, MA, OH, LM, SE, NC, and NW) but shows a 1-month lead in a few semi-arid and arid RFCs (i.e., MB, WG, and CN). The average of the three NLDAS models better reproduces the timing of the peak, but the models differ from each other significantly. Among the Noah-MP and NLDAS ensembles, VIC and Mosaic are notably different. VIC exhibits a systematic underestimation, while Mosaic shows an overall overestimation. The 48 Noah-MP configurations and Noah perform closely during autumn and winter, whereas their differences are pronounced during spring and summer. During spring and summer, Noah is the closest to FLUXNET MTE in most RFCs except NE and MA, whereas Noah-MP constantly overestimates the ET in all RFCs.

The overestimation of Noah-MP was investigated by separately comparing the three components of ET (i.e., transpiration, canopy evaporation, and ground evaporation) with GLEAM (Fig. S6). Figure 11 shows the overestimation of total ET is closely linked to the overestimation of ground evaporation, which could be partially attributable to the overly high roughness length for heat and water, as described in Appendix A8 and A9. Besides, the lack of a litter layer (Decker et al.2017) in Noah-MP could also play a part.

Figure 13Normalized ensemble spread of the multi-year averaged annual (first row) and seasonal (second–fifth rows) ET from NLDAS (first column) and Noah-MP (second column). The ensemble spread is normalized by the temporal variability of the FLUXNET MTE ET calculated using Eq. (6).

Figure 14The Sobol' total sensitivity of the multi-year averaged and seasonal (spring – MAM, summer – JJA, autumn – SON, winter – DJF) ET to the four parameterizations: stomatal conductance, soil moisture limitation to transpiration (β factor), turbulence, and runoff. Higher values indicate higher sensitivities.

Figure 12 evaluates Noah-MP and NLDAS using the 25 AmeriFlux sites. The NLDAS ensemble mean outperforms the Noah-MP ensemble mean for the annual cycle, and this outperformance results from two causes. First, an NLDAS member, Noah, performs the closest to the observations, as shown in Figs. 12b and 10. Second, the three NLDAS models are remarkably different from each other. The diversity of the ensemble gives a higher skill gain by combining them, as shown by the difference between the ensemble mean skill (Fig. 12a) and the median TSS (Fig. 12b). On the other hand, the Noah-MP configurations are too similar to each other, and all have a positive bias (Fig. 10). However, for the interannual anomaly, the Noah-MP ensemble mean slightly outperforms the NLDAS ensemble mean (Fig. 12c). Figure 12d shows that the Noah-MP configurations marginally outperform the NLDAS models. Among the NLDAS models, VIC performs the best, and Noah does not exhibit the same superiority shown in the annual cycle. The difference between the NLDAS ensemble mean performance (Fig. 12c) and median performance (Fig. 12d) is marginal, suggesting that the NLDAS ensemble skill gains are not notable for the interannual anomaly.

Figure 13 examines the ensemble spread of Noah-MP and NLDAS. The ensemble spread is normalized by the temporal variability calculated using the FLUXNET MTE ET. NLDAS has a significant spread in the southeast and west in all seasons, while spring shows the largest value. As seen in Fig. 10, the NLDAS ensemble spread mainly reflects the differences between VIC and Mosaic. The Noah-MP ensemble has a notably smaller spread than NLDAS. The Noah-MP ensemble spread is manifested in spring and summer in the southeastern (SE, LM, and WG) and western (CN, CB, and NW) RFCs.

We can decompose the Noah-MP ensemble spread and pinpoint the dominant process using Sobol' sensitivity analysis (Zheng et al.2019). Figure 14 delineates the Sobol' total sensitivity index of total ET to the four processes described in Appendix A. In spring and summer, for the regions where the Noah-MP configurations show significant spread (SE, LM, WG, CB, CN, and NW) (Fig. 13), ET is most sensitive to the parameterization of stomatal conductance (Fig. 14e and i) and then to the β factor (Fig. 14f and j). However, for regions with positive biases (NC, OH, and LM, as shown in Fig. 11b and c), the Noah-MP estimation is more sensitive to the turbulence parameterizations (Fig. 14c and g). During autumn and winter, the parameterizations of stomatal conductance (Fig. 14m and q) and β factor (Fig. 14n and r) still have significant impacts on the estimation of ET, and these impacts could be a result of the “memory” of TWS (Zheng et al.2019). Besides these two processes, the runoff parameterization is dominant during autumn in the east (Fig. 14p), and the turbulence parameterization is dominant during winter (Fig. 14s).

5 Code and data availability

The dataset is freely available for download from the Zenodo online repository at (Zheng et al., 2022). The dataset (along with datasets on which it is based) is subject to a Creative Commons BY (attribution) license agreement (, last access: 16 August 2021).

6 Conclusions

This paper describes a 1/8 dataset of the TWB over the CONUS from 1980 to 2015 simulated from an ensemble of 48 perturbed-physics configurations of Noah-MP. This Noah-MP multi-physics ensemble features an enrichment of the NLDAS-2 four-model ensemble and brings convenience for multi-model comparison. The dataset has already been used in the monitoring of groundwater storage change (Rateb et al.2020), the analysis of LSM parameterization sensitivity (Zheng et al.2019), the development of model evaluation method (Zheng et al.2020), and hydrological ensemble simulations (Fei et al.2021). This paper details the Noah-MP parameterizations employed and evaluates the estimated TWSA, soil moisture, SWE, and ET in comparison with the NLDAS ensemble.

The spread of the ensemble estimation is the largest for the runoff. The spread in surface runoff accounts for 34 % of its climatological mean. The spread is comparable to the previous estimates for multi-model ensembles (Dirmeyer et al.2006). The ensemble spread in snow water equivalent is the smallest, 2.5 % of its climatological mean. The ensemble has not included different parameterizations of several snow processes such as rain–snow partitioning, snow albedo, and roughness length, which could lead to an underestimation of the ensemble spread. The underestimation of the ensemble spread becomes more apparent when the Noah-MP ensemble mean is biased (Fig. 8). The bias is more pronounced relative to the NLDAS models in parts of AB and MB where snow is shallow and the terrain is hilly (Fig. 9). VIC performs better there, suggesting the importance of considering subgrid variability.

Evaluation against various reference data shows that Noah-MP generally performs better than the NLDAS models. The augmented three-layer snow module of Noah-MP significantly improves the estimation of snow and also wintertime soil moisture. On the other hand, the NLDAS models outperform Noah-MP in AB and CB for surface soil moisture (Figs. 4 to 6), in AB and MB for snow (Figs. 8 and 9), and in OH, LM, and NC for the annual cycle of ET (Fig. 10). The outperformance of NLDAS is likely attributable to the consideration of subgrid variability in soil moisture with Mosaic and in snow with VIC. The Noah-MP ensemble could be improved by increasing the spatial resolution or developing parameterizations of subgrid heterogeneity. Noah-MP also underestimates the temporal variability of TWS in coastal RFCs (Figs. 2 and 3). Correction of the ocean signal leakage in the GRACE data and representation of spatially varying unconfined aquifers' parameters should be beneficial. For the annual cycle of ET, there is a systematic overestimation in spring and summer. Sobol' sensitivity analysis of the Noah-MP ensemble reveals that the bias is mainly related to the parameterization of turbulence. We have examined the code and found that the implementation is inconsistent with the literature. The parameterization of the roughness length of heat and water vapor likely contributes to the ET overestimation.

The Noah-MP ensemble shares the same atmospheric forcing and static parameters with the NLDAS models. The similarity enables the comparison between the multi-model and perturbed-physics ensemble methods as shown in Fei et al. (2021). Besides, Noah-MP complements the NLDAS models well. Adding Noah-MP to the NLDAS ensemble can consistently improve the TWB variables in most areas of CONUS.

The Noah-MP model has been undergoing rapid development. New components such as plant hydraulics (Li et al.2021), roughness sublayer (Abolafia-Rosenzweig et al.2021), crops (Liu et al.2016), and dynamic rooting depth (Liu et al.2020) have been added. New schemes for the processes such as rain–snow partitioning (Wang et al.2019) have been included. Parameterizations of surface roughness length (He et al.2019; Zhang et al.2021), snow albedo (Wang et al.2020), and vertical soil layers (Zhao et al.2022; Shellito et al.2020) have been refined. The dataset can be improved by using the updated model and including more perturbations.

Appendix A: Formulation of the used Noah-MP parameterization schemes

A1 SIMGM runoff parameterization scheme

SIMGM is a TOPMODEL-based runoff model (Niu et al.2007). The scheme parameterizes runoff (Rsrf and Rsub) as an exponential function of groundwater table depth (zwt, m, positive down) as follows.


where Qsoil,srf is the water incident on the soil surface (the sum of precipitation throughfall, snowmelt, and dewfall; kgm-2s-1); ffrz,i is the fractional frozen area of the ith soil layer (m2 m−2), which is parameterized using the frozen water content of the soil layer following Niu and Yang (2006); fsat is the saturation fraction of the grid cell (m2 m−2); and zbot is the depth of the soil column bottom (2 m in this study), and zwt is the groundwater table depth (m), which is converted from the groundwater storage by a specific-yield parameter. The groundwater storage is predicted using a dynamic groundwater model interacting with the soil column bottom (Niu et al.2007).

The scheme has four calibratable parameters: (1) fsat,max, the maximum saturation area fraction (m2 m−2), which is defined as the cumulative distribution function of the topographic index when the grid-cell-mean water table depth is zero; (2) f, a runoff decay factor (unitless); (3) Rsub,max, the maximum subsurface runoff when the grid-cell-mean water table depth is zero (kgm-2s-1); and (4) Λ, the grid-cell-mean topographic index (unitless). In this study, the parameters have the following values: fsat,max=0.38m2 m−2, f=6, Rsub,max=5kgm-2s-1, and Λ=10.5.

A2 SIMTOP runoff parameterization scheme

SIMTOP is also a TOPMODEL-based runoff parameterization scheme, the same as SIMGM (Eqs. A1A3). The major difference between SIMTOP and SIMGM is that SIMTOP parameterizes the groundwater table depth (zwt) using the soil liquid water content by assuming the water head is at equilibrium throughout the soil column down to the water table (Niu et al.2005). Although SIMTOP and SIMGM share the same conceptual model of runoff generation, implementation differences exist. First, in contrast to Eqs. (A2) and (A3), SIMTOP does not use the soil column bottom depth (zbot) in calculating the saturation area fraction (fsat) and subsurface runoff:


Second, parameter values are slightly different for the runoff decay factor and maximum subsurface runoff: f=2, and Rsub,max=4kgm-2s-1.

A3 NOAHR runoff parameterization scheme

The Noah runoff (NOAHR) scheme parameterizes surface runoff (Rsrf) as infiltration excess:

(A6) R srf = Q soil , srf - Q soil , in ,

where Qsoil,in is the infiltration into the soil (kgm-2s-1). The infiltration is derived from the approximate solution to the Richards equation following Philip (1969) with additional considerations of the spatial variability of precipitation and infiltration capacity. By assuming exponential and independent distributions of precipitation and infiltration capacity within a model grid cell, NOAHR formulates the soil infiltration as follows:


where Ic is the soil infiltration capacity of the model grid cell (kg m−2), wd is the water deficit of the soil column (kg m−2), and Δt is the model time step (s). Following Chen and Dudhia (2001a), the parameter is assumed as propositional to the saturated hydraulic conductivity of the first soil layer (Ksat,1; kgm-2s-1):

(A10) K Δ t = K Δ t , ref k ref K sat , 1 ,

where KΔt,ref and kref are two parameters. In Noah-MP (and Noah), KΔt,ref=386400s-1, and kref=2×10-3 kg m−2 s−1. Ksat,1 is assigned using a soil parameter lookup table according to the soil texture type.

NOAHR assumes free drainage at the soil column bottom. The subsurface runoff is calculated as

(A11) R sub = α slope K soil , N soil ,

where αslope is the terrain slope index, which is arbitrarily given as 0.1 in the adopted version of Noah-MP. Ksoil,Nsoil is the hydraulic conductivity of the bottom soil layer, which is parameterized following Clapp and Hornberger (1978).

A4 BATS runoff parameterization scheme

The BATS scheme parameterizes surface runoff (Rsrf) as a function of soil wetness (Yang and Dickinson1996):


where θ is the averaged wetness throughout the soil column (m3 m−3).

Similar to NOAHR, the BATS scheme also assumes a free drainage boundary condition at the soil column bottom. Subsurface runoff (Rsub) is parameterized as follows:

(A15) R sub = 1 - max i = 1 , , N soil ( f frz , i ) K soil , N soil .

A5 Ball–Berry scheme of stomatal resistance

Leaf stomata are the small pores typically found on the underside of leaves. They control the gas exchange of CO2, H2O, and O2 between the internal leaf structure and the external atmosphere. In LSMs, the opening and closing of the stomata are characterized by stomatal conductance.

The Ball–Berry scheme for parameterizing the stomatal conductance (gs) for H2O is as follows:

(A16) g s = m A c s e s e i P atm + b ,

where gs is the leaf stomatal conductance (µmolm-2s-1), m is a vegetation-type dependent parameter (unitless), A is the leaf photosynthesis rate, cs is the CO2 partial pressure at the leaf surface (Pa), es is the water vapor pressure at the leaf surface (Pa), ei is the saturated water vapor at the stomata (Pa), Patm is the ambient air pressure (Pa), and b is the stomatal conductance at zero photosynthesis (µmolm-2s-1). The parameters m and b are assigned from a lookup table using the vegetation type.

A6 Jarvis scheme of stomatal resistance

The Jarvis scheme for parameterizing the canopy resistance (Rc) based on the product of four stress factors (s m−1) is calculated as follows (Chen et al.1996; Sellers et al.1996; Jacquemin and Noilhan1990; Jarvis1976):


where f1, f2 and f3 are the stress factors of solar radiation, vapor pressure deficit, and air temperature, respectively (unitless), which are unitless and range from 0 to 1; β is the soil moisture stress factor, which is detailed in Sect. A7; Rg is the incoming solar radiation (W m−2) for unit leaf area index; Tl is leaf temperature (K). qsat(Tl) (kg kg−1) and qa (kg kg−1) are the saturated humidity at the temperature of Tl and ambient humidity, respectively. In literature (Chen et al.1996; Jacquemin and Noilhan1990), qsat(Tl) and qa are specific humidity, whereas in Noah-MP v3.6, they are implemented as mixing ratio.

The scheme has five parameters: Rc,min, the minimum stomatal resistance (s m−1) per unit leaf area index; Rc,max, the maximum resistance; Rgl, a radiation scaling factor (unitless); hs, a humidity scaling factor (unitless); Tref, the optimum temperature (K). Among these parameters, Rc,min, Rgl, and hs are assigned using a vegetation-parameter lookup table, while Rc,max and Tref are assumedly assigned to 5000 s m−1 and 298 K, respectively.

A7 Three soil moisture stress factor schemes

The Noah β-factor (NOAHB) scheme parameterizes the soil moisture stress factor controlling transpiration (β factor) as a function of soil moisture, which is calculated as follows:

(A22) β = i = 1 N root Δ z soil , i z root min 1 , θ i - θ wilt θ ref - θ wilt ,

where Nroot is the total number of soil layers that contain roots, zroot is the total depth of the root-zone layer (m), and θi is the volumetric soil moisture of the ith soil layer (m3 m−3). NOAHB has two parameters: θref, the field capacity (m3 m−3); and θwilt, the wilting volumetric soil moisture (m3 m−3).

The Community Land Model (CLM) scheme (Oleson et al.2004) parameterizes β as a function of soil matric potential, which is calculated as follows:

(A23) β = i = 1 N root Δ z soil , i z root min 1 , ψ wilt - ψ i ψ wilt - ψ sat ,

where ψi is the water pressure head of the ith soil layer (m), and ψi is converted from θi using the formula of Clapp and Hornberger (1978). CLM has two parameters: ψsat, the saturated water pressure head (m); and ψwilt, the wilting pressure head (m).

The Simplified Simple Biosphere (SSiB) scheme (Xue et al.1991) also parameterizes the β factor as a function of the soil pressure head, similar to CLM. However, the formula is different, as follows:

(A24) β = i = 1 N root Δ z soil , i z root min 1 , 1 - exp - c 2 ln ( ψ wilt ψ i ) .

SSiB has two parameters: ψwilt, the wilting pressure head (m); and c2, a unitless coefficient.

In Noah-MP version 3.6, the parameters θsat, θwilt, and ψsat are assigned using a soil parameter lookup table (Table 2 of Chen and Dudhia2001a); ψwilt is 10 m, independent of vegetation and soil types (Niu et al.2011); c2 is assumed constant at 5.8, whereas in SSiB, this parameter varies with vegetation type (Xue et al.1991, Table 2).

A8 Chen97 near-surface turbulence scheme

The Chen97 scheme (Chen et al.1997) parameterizes the surface exchange coefficient for heat (Ch) as follows:

(A25) C h = κ 2 ln z z 0 m - Ψ m z L + Ψ m z 0 m L - 1 ln z z 0 h - Ψ h z L + Ψ h z 0 h L - 1 ,

where κ=0.4 is the von Kármán constant; L is the Monin–Obukhov (M–O) length (m); z is the reference height (m); Ψm and Ψh are the similarity theory-based stability functions for momentum and heat, respectively; z0m is the roughness length for momentum (m) and depends on the land cover/land-use type; and z0h is the roughness length for heat (m). Niu et al. (2011) parameterized z0h=z0mexp(-κCRe*), where C=0.1 and Re* is the roughness Reynolds number. However, in the code of Noah-MP version 3.6, z0h=z0m.

A9 M–O near-surface turbulence scheme

The M–O scheme is based on the M–O similarity theory (Brutsaert1982), which parameterizes Ch as follows:

(A26) C h = κ 2 ln z - d 0 z 0 m - Ψ m z - d 0 L - 1 ln z - d 0 z 0 h - Ψ h z - d 0 L - 1 ,

where z0h=z0m. d0 is the zero-displacement height (m),

(A27) d 0 = 0.64 z ct ,

where zct is the canopy top height (m).

Appendix B: Estimation of the terrestrial water storage for the RFCs neighboring the Great Lakes

The TWS estimation for the NC, OH, and NE RFCs is performed in two steps: (1) aggregate the GRACE TWS over both the RFC land area and neighboring lakes (lakes Superior, Michigan, and Huron for NC; Erie for OH; and Ontario for NE); and (2) subtract the lake water storage anomaly from the aggregated TWS. The lake water storage in the second step is calculated as the product of the observed water level and the lake area.

The lake water level is an arithmetic average of selected NOAA in situ observations (, last access: 15 February 2022). For Lake Superior, five observation stations were selected: Point Iroquois, Marquette C.G., Ontonagon, Duluth, and Grand Marais. For Lake Michigan, seven stations were selected: Ludington, Holland, Calumet Harbor, Milwaukee, Kewaunee, Sturgeon Bay Canal, and Port Inland. For Lake Huron, five stations were selected: Lakeport, Harbor Beach, Essexville, Mackinaw City, and De Tour Village. For Lake Erie, eight stations were selected: Buffalo, Sturgeon Point, Erie, Fairport, Cleveland, Marblehead, Toledo, and Fermi Power Plant. And for Lake Ontario, four stations were selected: Cape Vincent, Oswego, Rochester, and Olcott.

The lake area is estimated from the lake boundary data provided by the United States Geological Survey (, last access: 15 February 2022). Only the area within the United States is considered, which is within a 150 km radius from the studied RFCs. The lake areas are calculated as follows: 52 441 km2 for Lake Superior within the USA, 57 509 km2 for Lake Michigan, 23 185 km2 for Lake Huron within the USA, 25 494 km2 for Lake Erie, and 18 871 km2 for Lake Ontario. Month-to-month variations in lake area are neglected in this study for simplicity.


The supplement related to this article is available online at:

Author contributions

ZLY initiated and funded the study. HZ conducted the simulation and generated the data. WF analyzed the data and created the figures. JW, LZ, LL, and SW contributed to the validation of the data. All authors contributed to creating the dataset and drafting the paper.

Competing interests

The contact author has declared that none of the authors has any competing interests.


The data are provided as is with no warranties.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Financial support

This research has been supported by the National Natural Science Foundation of China (grant nos. 42075165, 41375088, and 41605062) and the Beijing Natural Science Foundation (grant no. 8204072).

Review statement

This paper was edited by Christof Lorenz and reviewed by two anonymous referees.


Abolafia-Rosenzweig, R., He, C., Burns, S. P., and Chen, F.: Implementation and Evaluation of a Unified Turbulence Parameterization throughout the Canopy and Roughness Sublayer in Noah-MP Snow Simulations, J. Adv. Model. Earth Sy., 13, e2021MS002665,, 2021. a, b

Ajami, N. K., Duan, Q., and Sorooshian, S.: An Integrated Hydrologic Bayesian Multimodel Combination Framework: Confronting Input, Parameter, and Model Structural Uncertainty in Hydrologic Prediction, Water Resour. Res., 43, W01403,, 2007. a

Beck, H. E., van Dijk, A. I. J. M., de Roo, A., Dutra, E., Fink, G., Orth, R., and Schellekens, J.: Global evaluation of runoff from 10 state-of-the-art hydrological models, Hydrol. Earth Syst. Sci., 21, 2881–2903,, 2017. a, b

Brutsaert, W.: Evaporation into the Atmosphere: Theory, History, and Applications, Springer, Dordrecht,, 1982. a

Burnash, R. J. C., Ferral, R. L., and McGuire, R. A.: A Generalized Streamflow Simulation System: Conceptual Modeling for Digital Computers, Technical Report, Joint Federal-State River Forecast Center, U.S. National Weather Service and California Department of Water Resources, Sacramento, California, USA, (last access: 6 February 2016), 1973. a

Cai, X., Yang, Z.-L., David, C. H., Niu, G.-Y., and Rodell, M.: Hydrological Evaluation of the Noah-MP Land Surface Model for the Mississippi River Basin, J. Geophys. Res.-Atmos., 119, 23–38,, 2014a. a, b, c

Cai, X., Yang, Z.-L., Xia, Y., Huang, M., Wei, H., Leung, L. R., and Ek, M. B.: Assessment of Simulated Water Balance from Noah, Noah-MP, CLM, and VIC over CONUS Using the NLDAS Test Bed, J. Geophys. Res.-Atmos., 119, 13751–13770,, 2014b. a, b, c, d, e

Carrera, M. L., Bélair, S., and Bilodeau, B.: The Canadian Land Data Assimilation System (CaLDAS): Description and Synthetic Evaluation Study, J. Hydrometeorol., 16, 1293–1314,, 2015. a

Chen, F. and Dudhia, J.: Coupling an Advanced Land Surface– Hydrology Model with the Penn State–NCAR MM5 Modeling System. Part I: Model Implementation and Sensitivity, Mon. Weather Rev., 129, 569–585,<0569:CAALSH>2.0.CO;2, 2001a. a, b, c

Chen, F. and Dudhia, J.: Coupling an Advanced Land Surface– Hydrology Model with the Penn State–NCAR MM5 Modeling System. Part II: Preliminary Model Validation, Mon. Weather Rev., 129, 587–604,<0587:CAALSH>2.0.CO;2, 2001b. a

Chen, F., Mitchell, K. E., Schaake, J., Xue, Y., Pan, H. L., Koren, V., Duan, Q., Ek, M., and Betts, A. K.: Modeling of Land Surface Evaporation by Four Schemes and Comparison with FIFE Observations, J. Geophys. Res.-Atmos., 101, 7251–7268,, 1996. a, b

Chen, F., Janjić, Z., and Mitchell, K.: Impact of Atmospheric Surface-Layer Parameterizations in the New Land-Surface Scheme of the NCEP Mesoscale Eta Model, Bound.-Lay. Meteorol., 85, 391–421,, 1997. a, b

Chen, F., Barlage, M., Tewari, M., Rasmussen, R., Jin, J., Lettenmaier, D. P., Livneh, B., Lin, C., Miguez-Macho, G., Niu, G.-Y., Wen, L., and Yang, Z.-L.: Modeling Seasonal Snowpack Evolution in the Complex Terrain and Forested Colorado Headwaters Region: A Model Intercomparison Study, J. Geophys. Res.-Atmos., 119, 2014JD022167,, 2014. a

Clapp, R. B. and Hornberger, G. M.: Empirical Equations for Some Soil Hydraulic Properties, Water Resour. Res., 14, 601–604,, 1978. a, b

Clark, M. P., Kavetski, D., and Fenicia, F.: Pursuing the Method of Multiple Working Hypotheses for Hydrological Modeling, Water Resour. Res., 47, 1–16,, 2011. a

Cloke, H. and Pappenberger, F.: Ensemble Flood Forecasting: A Review, J. Hydrol., 375, 613–626,, 2009. a

Clow, D. W., Nanus, L., Verdin, K. L., and Schmidt, J.: Evaluation of SNODAS Snow Depth and Snow Water Equivalent Estimates for the Colorado Rocky Mountains, USA, Hydrol. Process., 26, 2583–2591,, 2012. a

Dai, A.: Increasing Drought under Global Warming in Observations and Models, Nat. Clim. Change, 3, 52–58,, 2013. a

Dang, C., Zender, C. S., and Flanner, M. G.: Intercomparison and improvement of two-stream shortwave radiative transfer schemes in Earth system models for a unified treatment of cryospheric surfaces, The Cryosphere, 13, 2325–2343,, 2019. a, b

Decker, M., Or, D., Pitman, A. J., and Ukkola, A.: New Turbulent Resistance Parameterization for Soil Evaporation Based on a Pore-Scale Model: Impact on Surface Fluxes in CABLE, J. Adv. Model. Earth Sy., 9, 220–238,, 2017. a

Dickinson, R. E., Wang, G., Zeng, X., and Zeng, Q.: How Does the Partitioning of Evapotranspiration and Runoff between Different Processes Affect the Variability and Predictability of Soil Moisture and Precipitation?, Adv. Atmos. Sci., 20, 475–478,, 2003. a

Dirmeyer, P. A., Gao, X., Zhao, M., Guo, Z., Oki, T., and Hanasaki, N.: GSWP-2: Multimodel Analysis and Implications for Our Perception of the Land Surface, B. Am. Meteorol. Soc., 87, 1381–1398,, 2006. a, b, c, d

Ek, M. B., Mitchell, K. E., Lin, Y., Rogers, E., Grunmann, P., Koren, V., Gayno, G., and Tarpley, J. D.: Implementation of Noah Land Surface Model Advances in the National Centers for Environmental Prediction Operational Mesoscale Eta Model, J. Geophys. Res.-Atmos., 108, 8851,, 2003. a

Emerton, R. E., Cloke, H. L., Stephens, E. M., Zsoter, E., Woolnough, S. J., and Pappenberger, F.: Complex Picture for Likelihood of ENSO-driven Flood Hazard, Nat. Commun., 8, 14796,, 2017. a

Fang, B., Lei, H., Zhang, Y., Quan, Q., and Yang, D.: Spatio-Temporal Patterns of Evapotranspiration Based on Upscaling Eddy Covariance Measurements in the Dryland of the North China Plain, Agr. Forest Meteorol., 281, 107844,, 2020. a

Fei, W., Zheng, H., Xu, Z., Wu, W.-Y., Lin, P., Tian, Y., Guo, M., She, D., Li, L., Li, K., and Yang, Z.-L.: Ensemble Skill Gains Obtained from the Multi-Physics versus Multi-Model Approaches for Continental-Scale Hydrological Simulations, Water Resour. Res., 57, e2020WR028846,, 2021. a, b, c, d, e, f, g, h, i, j, k

Gan, Y., Liang, X.-Z., Duan, Q., Chen, F., Li, J., and Zhang, Y.: Assessment and Reduction of the Physical Parameterization Uncertainty for Noah-MP Land Surface Model, Water Resour. Res., 55, 5518–5538,, 2019. a

Gao, H., Tang, Q., Ferguson, C. R., Wood, E. F., and Lettenmaier, D. P.: Estimating the Water Budget of Major US River Basins via Remote Sensing, Int. J. Remote Sens., 31, 3955–3978,, 2010. a

Guo, Z., Dirmeyer, P. A., Gao, X., and Zhao, M.: Improving the Quality of Simulated Soil Moisture with a Multi-Model Ensemble Approach, Q. J. Roy. Meteor. Soc., 133, 731–747,, 2007. a

He, C., Chen, F., Barlage, M., Liu, C., Newman, A., Tang, W., Ikeda, K., and Rasmussen, R.: Can Convection-Permitting Modeling Provide Decent Precipitation for Offline High-Resolution Snowpack Simulations over Mountains?, J. Geophys. Res.-Atmos., 124, 12631–12654,, 2019. a, b, c

Hejazi, M. I., Edmonds, J., Clarke, L., Kyle, P., Davies, E., Chaturvedi, V., Wise, M., Patel, P., Eom, J., and Calvin, K.: Integrated assessment of global water scarcity over the 21st century under multiple climate change mitigation policies, Hydrol. Earth Syst. Sci., 18, 2859–2883,, 2014. a

Jacquemin, B. and Noilhan, J.: Sensitivity Study and Validation of a Land Surface Parameterization Using the HAPEX-MOBILHY Data Set, Bound.-Lay. Meteorol., 52, 93–134,, 1990. a, b

Jarvis, P. G.: The Interpretation of the Variations in Leaf Water Potential and Stomatal Conductance Found in Canopies in the Field, Philos. T. Roy. Soc. Lond. B, 273, 593–610,, 1976. a

Jung, M., Reichstein, M., and Bondeau, A.: Towards global empirical upscaling of FLUXNET eddy covariance observations: validation of a model tree ensemble approach using a biosphere model, Biogeosciences, 6, 2001–2013,, 2009. a, b

Jung, M., Koirala, S., Weber, U., Ichii, K., Gans, F., Camps-Valls, G., Papale, D., Schwalm, C., Tramontana, G., and Reichstein, M.: The FLUXCOM Ensemble of Global Land-Atmosphere Energy Fluxes, Sci. Data, 6, 74,, 2019. a

Kim, R. S., Kumar, S., Vuyovich, C., Houser, P., Lundquist, J., Mudryk, L., Durand, M., Barros, A., Kim, E. J., Forman, B. A., Gutmann, E. D., Wrzesien, M. L., Garnaud, C., Sandells, M., Marshall, H.-P., Cristea, N., Pflug, J. M., Johnston, J., Cao, Y., Mocko, D., and Wang, S.: Snow Ensemble Uncertainty Project (SEUP): quantification of snow water equivalent uncertainty across North America via ensemble land surface modeling, The Cryosphere, 15, 771–791,, 2021. a

Koster, R. D.: “Efficiency Space”: A Framework for Evaluating Joint Evaporation and Runoff Behavior, B. Am. Meteorol. Soc., 96, 393–396,, 2015. a

Koster, R. D. and Suarez, M. J.: Modeling the Land Surface Boundary in Climate Models as a Composite of Independent Vegetation Stands, J. Geophys. Res.-Atmos., 97, 2697–2715,, 1992. a, b

Kumar, S., Holmes, T., Mocko, M. D., Wang, S., and Peters-Lidard, C.: Attribution of Flux Partitioning Variations between Land Surface Models over the Continental U.S., Remote Sensing, 10, 751,, 2018. a

Kumar, S. V., Wang, S., Mocko, D. M., Peters-Lidard, C. D., and Xia, Y.: Similarity Assessment of Land Surface Model Outputs in the North American Land Data Assimilation System, Water Resour. Res., 53, 8941–8965,, 2017. a

LaFontaine, J. H., Hay, L. E., Viger, R. J., Regan, R. S., and Markstrom, S. L.: Effects of Climate and Land Cover on Hydrology in the Southeastern U.S.: Potential Impacts on Watershed Planning, J. Am. Water Resour. Assoc., 51, 1235–1261,, 2015. a

Le, P. V. V., Kumar, P., and Drewry, D. T.: Implications for the Hydrologic Cycle under Climate Change Due to the Expansion of Bioenergy Crops in the Midwestern United States, P. Natl. Acad. Sci. USA, 108, 15085–15090,, 2011. a

Levia, D. F., Creed, I. F., Hannah, D. M., Nanko, K., Boyer, E. W., Carlyle-Moses, D. E., van de Giesen, N., Grasso, D., Guswa, A. J., Hudson, J. E., Hudson, S. A., Iida, S., Jackson, R. B., Katul, G. G., Kumagai, T., Llorens, P., Ribeiro, F. L., Pataki, D. E., Peters, C. A., Carretero, D. S., Selker, J. S., Tetzlaff, D., Zalewski, M., and Bruen, M.: Homogenization of the Terrestrial Water Cycle, Nat. Geosci., 13, 656–658,, 2020. a

Li, L., Yang, Z.-L., Matheny, A. M., Zheng, H., Swenson, S. C., Lawrence, D. M., Barlage, M., Yan, B., McDowell, N. G., and Leung, L. R.: Representation of Plant Hydraulics in the Noah-MP Land Surface Model: Model Development and Multiscale Evaluation, J. Adv. Model. Earth Sy., 13, e2020MS002 214,, 2021. a, b

Lian, X., Piao, S., Huntingford, C., Li, Y., Zeng, Z., Wang, X., Ciais, P., McVicar, T. R., Peng, S., Ottlé, C., Yang, H., Yang, Y., Zhang, Y., and Wang, T.: Partitioning Global Land Evapotranspiration Using CMIP5 Models Constrained by Observations, Nat. Clim. Change, 8, 640–646,, 2018. a

Liang, X., Lettenmaier, D. P., Wood, E. F., and Burges, S. J.: A Simple Hydrologically Based Model of Land Surface Water and Energy Fluxes for General Circulation Models, J. Geophys. Res.-Atmos., 99, 14415–14428,, 1994. a, b

Lin, P., Hopper, L. J., Yang, Z.-L., Lenz, M., and Zeitler, J. W.: Insights into Hydrometeorological Factors Constraining Flood Prediction Skill during the May and October 2015 Texas Hill Country Flood Events, J. Hydrometeorol., 19, 1339–1361,, 2018. a

Lin, P., Pan, M., Beck, H. E., Yang, Y., Yamazaki, D., Frasson, R., David, C. H., Durand, M., Pavelsky, T. M., Allen, G. H., Gleason, C. J., and Wood, E. F.: Global Reconstruction of Naturalized River Flows at 2.94 Million Reaches, Water Resour. Res., 55, 6499–6516,, 2019. a

Liu, X., Chen, F., Barlage, M., Zhou, G., and Niyogi, D. S.: Noah-MP-Crop: Introducing Dynamic Crop Growth in the Noah-MP Land Surface Model, J. Geophys. Res.-Atmos., 121, 13953–13972,, 2016. a

Liu, X., Chen, F., Barlage, M., and Niyogi, D.: Implementing Dynamic Rooting Depth for Improved Simulation of Soil Moisture and Land Surface Feedbacks in Noah-MP-Crop, J. Adv. Model. Earth Sy., 12, e2019MS001786,, 2020. a

Lv, M., Ma, Z., Li, M., and Zheng, Z.: Quantitative Analysis of Terrestrial Water Storage Changes under the Grain for Green Program in the Yellow River Basin, J. Geophys. Res.-Atmos., 124, 1336–1351,, 2019. a

Lv, M., Xu, Z., Yang, Z.-L., Lu, H., and Lv, M.: A Comprehensive Review of Specific Yield in Land Surface and Groundwater Studies, J. Adv. Model. Earth Sy., 13, e2020MS002270,, 2021. a

Ma, N., Niu, G.-Y., Xia, Y., Cai, X., Zhang, Y., Ma, Y., and Fang, Y.: A Systematic Evaluation of Noah-MP in Simulating Land-Atmosphere Energy, Water, and Carbon Exchanges over the Continental United States, J. Geophys. Res.-Atmos., 122, 12245–12268,, 2017. a, b, c

McCabe, M. F., Rodell, M., Alsdorf, D. E., Miralles, D. G., Uijlenhoet, R., Wagner, W., Lucieer, A., Houborg, R., Verhoest, N. E. C., Franz, T. E., Shi, J., Gao, H., and Wood, E. F.: The future of Earth observation in hydrology, Hydrol. Earth Syst. Sci., 21, 3879–3914,, 2017. a

Mitchell, K. E., Lohmann, D., Houser, P. R., Wood, E. F., Schaake, J. C., Robock, A., Cosgrove, B. A., Sheffield, J., Duan, Q., Luo, L., Higgins, R. W., Pinker, R. T., Tarpley, J. D., Lettenmaier, D. P., Marshall, C. H., Entin, J. K., Pan, M., Shi, W., Koren, V., Meng, J., Ramsay, B. H., and Bailey, A. A.: The Multi-Institution North American Land Data Assimilation System (NLDAS): Utilizing Multiple GCIP Products and Partners in a Continental Distributed Hydrological Modeling System, J. Geophys. Res.-Atmos., 109, D07S90,, 2004. a, b, c

Niu, G.-Y. and Yang, Z.-L.: Effects of Frozen Soil on Snowmelt Runoff and Soil Water Storage at a Continental Scale, J. Hydrometeorol., 7, 937–952,, 2006. a

Niu, G.-Y., Yang, Z.-L., Dickinson, R. E., and Gulden, L. E.: A Simple TOPMODEL-based Runoff Parameterization (SIMTOP) for Use in Global Climate Models, J. Geophys. Res.-Atmos., 110, D21106,, 2005. a

Niu, G.-Y., Yang, Z.-L., Dickinson, R. E., Gulden, L. E., and Su, H.: Development of a Simple Groundwater Model for Use in Climate Models and Evaluation with Gravity Recovery and Climate Experiment Data, J. Geophys Res., 112, D07103,, 2007. a, b, c, d

Niu, G.-Y., Yang, Z.-L., Mitchell, K. E., Chen, F., Ek, M. B., Barlage, M., Kumar, A., Manning, K., Niyogi, D., Rosero, E., Tewari, M., and Xia, Y.: The Community Noah Land Surface Model with Multiparameterization Options (Noah-MP): 1. Model Description and Evaluation with Local-Scale Measurements, J. Geophys. Res.-Atmos., 116, D12109,, 2011. a, b, c, d, e

Oleson, K. W., Dai, Y., Bonan, G. B., Bosilovich, M., Dirmeyer, P. A., Hoffman, F. M., Houser, P. R., Levis, S., Niu, G.-Y., Thornton, P. E., Vertenstein, M., Yang, Z.-L., and Zeng, X.: Technical Description of the Community Land Model (CLM), Tech. rep., National Center for Atmospheric Research, Boulder, Colorado,, 2004. a

Pan, M., Sahoo, A. K., Troy, T. J., Vinukollu, R. K., Sheffield, J., and Wood, E. F.: Multisource Estimation of Long-Term Terrestrial Water Budget for Major Global River Basins, J. Climate, 25, 3191–3206,, 2012. a, b

Pan, S., Pan, N., Tian, H., Friedlingstein, P., Sitch, S., Shi, H., Arora, V. K., Haverd, V., Jain, A. K., Kato, E., Lienert, S., Lombardozzi, D., Nabel, J. E. M. S., Ottlé, C., Poulter, B., Zaehle, S., and Running, S. W.: Evaluation of global terrestrial evapotranspiration using state-of-the-art approaches in remote sensing, machine learning and land surface modeling, Hydrol. Earth Syst. Sci., 24, 1485–1509,, 2020. a

Peters-Lidard, C. D., Hossain, F., Leung, L. R., McDowell, N., Rodell, M., Tapiador, F. J., Turk, F. J., and Wood, A.: 100 Years of Progress in Hydrology, Meteorol. Monogr., 59, 25.1–25.51,, 2018. a

Peters-Lidard, C. D., Mocko, D. M., Su, L., Lettenmaier, D. P., Gentine, P., and Barlage, M.: Advances in Land Surface Models and Indicators for Drought Monitoring and Prediction, B. Am. Meteorol. Soc., 102, E1099–E1122,, 2021. a

Philip, J. R.: Theory of Infiltration, in: Advances in Hydroscience, Elsevier, vol. 5, 215–296,, 1969. a

Prudhomme, C., Giuntoli, I., Robinson, E. L., Clark, D. B., Arnell, N. W., Dankers, R., Fekete, B. M., Franssen, W., Gerten, D., Gosling, S. N., Hagemann, S., Hannah, D. M., Kim, H., Masaki, Y., Satoh, Y., Stacke, T., Wada, Y., and Wisser, D.: Hydrological Droughts in the 21st Century, Hotspots and Uncertainties from a Global Multimodel Ensemble Experiment, P. Natl. Acad. Sci. USA, 111, 3262–3267,, 2014. a

Quiring, S. M., Ford, T. W., Wang, J. K., Khong, A., Harris, E., Lindgren, T., Goldberg, D. W., and Li, Z.: The North American Soil Moisture Database: Development and Applications, B. Am. Meteorol. Soc., 97, 1441–1459,, 2016. a

Rateb, A., Scanlon, B. R., Pool, D. R., Sun, A., Zhang, Z., Chen, J., Clark, B., Faunt, C. C., Haugh, C. J., Hill, M., Hobza, C., McGuire, V. L., Reitz, M., Schmied, H. M., Sutanudjaja, E. H., Swenson, S., Wiese, D., Xia, Y., and Zell, W.: Comparison of Groundwater Storage Changes from GRACE Satellites with Monitoring and Modeling of Major U.S. Aquifers, Water Resour. Res., 56, e2020WR027556,, 2020. a

Rodell, M., Houser, P. R., Jambor, U., Gottschalck, J., Mitchell, K., Meng, C.-J., Arsenault, K., Cosgrove, B., Radakovich, J., Bosilovich, M., Entin, J. K., Walker, J. P., Lohmann, D., and Toll, D.: The Global Land Data Assimilation System, B. Am. Meteorol. Soc., 85, 381–394,, 2004. a

Rodell, M., Velicogna, I., and Famiglietti, J. S.: Satellite-Based Estimates of Groundwater Depletion in India, Nature, 460, 999–1002,, 2009. a

Rodell, M., Beaudoing, H. K., L'Ecuyer, T. S., Olson, W. S., Famiglietti, J. S., Houser, P. R., Adler, R., Bosilovich, M. G., Clayson, C. A., Chambers, D., Clark, E., Fetzer, E. J., Gao, X., Gu, G., Hilburn, K., Huffman, G. J., Lettenmaier, D. P., Liu, W. T., Robertson, F. R., Schlosser, C. A., Sheffield, J., and Wood, E. F.: The Observed State of the Water Cycle in the Early Twenty-First Century, J. Climate, 28, 8289–8318,, 2015. a

Sakumura, C., Bettadpur, S., and Bruinsma, S.: Ensemble Prediction and Intercomparison Analysis of GRACE Time-Variable Gravity Field Models, Geophys. Res. Lett., 41, 1389–1397,, 2014. a

Save, H., Bettadpur, S., and Tapley, B. D.: High-Resolution CSR GRACE RL05 Mascons, J. Geophys. Res.-Sol. Ea., 121, 7547–7569,, 2016. a

Saxe, S., Farmer, W., Driscoll, J., and Hogue, T. S.: Implications of model selection: a comparison of publicly available, conterminous US-extent hydrologic component estimates, Hydrol. Earth Syst. Sci., 25, 1529–1568,, 2021. a, b

Scanlon, B. R., Faunt, C. C., Longuevergne, L., Reedy, R. C., Alley, W. M., McGuire, V. L., and McMahon, P. B.: Groundwater Depletion and Sustainability of Irrigation in the US High Plains and Central Valley, P. Natl. Acad. Sci. USA, 109, 9320–9325,, 2012. a

Scanlon, B. R., Zhang, Z., Save, H., Sun, A. Y., Schmied, H. M., van Beek, L. P. H., Wiese, D. N., Wada, Y., Long, D., Reedy, R. C., Longuevergne, L., Döll, P., and Bierkens, M. F. P.: Global Models Underestimate Large Decadal Declining and Rising Water Storage Trends Relative to GRACE Satellite Data, P. Natl. Acad. Sci. USA, 115, E1080–E1089,, 2018. a

Sellers, P. J., Randall, D. A., Collatz, G. J., Berry, J. A., Field, C. B., Dazlich, D. A., Zhang, C., Collelo, G. D., and Bounoua, L.: A Revised Land Surface Parameterization (SiB2) for Atmospheric GCMs. Part I: Model Formulation, J. Climate, 9, 676–705,<0676:ARLSPF>2.0.CO;2, 1996. a

Shellito, P. J., Kumar, S. V., Santanello, J. A., Lawston-Parker, P., Bolten, J. D., Cosh, M. H., Bosch, D. D., Holifield Collins, C. D., Livingston, S., Prueger, J., Seyfried, M., and Starks, P. J.: Assessing the Impact of Soil Layer Depth Specification on the Observability of Modeled Soil Moisture and Brightness Temperature, J. Hydrometeorol., 21, 2041–2060,, 2020. a

Shi, C., Xie, Z., Qian, H., Liang, M., and Yang, X.: China Land Soil Moisture EnKF Data Assimilation Based on Satellite Remote Sensing Data, Sci. China Earth Sci., 54, 1430–1440,, 2011. a

Sobol', L. M.: Sensitivity Estimates for Nonlinear Mathematical Models, Mathematical Modelling and Computational Experiment, 1, 407–414, 1993. a

Su, L., Cao, Q., Xiao, M., Mocko, D. M., Barlage, M., Li, D., Peters-Lidard, C. D., and Lettenmaier, D. P.: Drought Variability over the Conterminous United States for the Past Century, J. Hydrometeorol., 22, 1153–1168,, 2021. a

Taylor, K. E.: Summarizing Multiple Aspects of Model Performance in a Single Diagram, J. Geophys. Res.-Atmos., 106, 7183–7192,, 2001. a

Telteu, C.-E., Müller Schmied, H., Thiery, W., Leng, G., Burek, P., Liu, X., Boulange, J. E. S., Andersen, L. S., Grillakis, M., Gosling, S. N., Satoh, Y., Rakovec, O., Stacke, T., Chang, J., Wanders, N., Shah, H. L., Trautmann, T., Mao, G., Hanasaki, N., Koutroulis, A., Pokhrel, Y., Samaniego, L., Wada, Y., Mishra, V., Liu, J., Döll, P., Zhao, F., Gädeke, A., Rabin, S. S., and Herz, F.: Understanding each other's models: an introduction and a standard representation of 16 global water models to support intercomparison, improvement, and communication, Geosci. Model Dev., 14, 3843–3878,, 2021. a

Trenberth, K. E. and Fasullo, J. T.: North American Water and Energy Cycles, Geophys. Res. Lett., 40, 365–369,, 2013a. a

Trenberth, K. E. and Fasullo, J. T.: Regional Energy and Water Cycles: Transports from Ocean to Land, J. Climate, 26, 7837–7851,, 2013b. a

Trenberth, K. E., Smith, L., Qian, T., Dai, A., and Fasullo, J.: Estimates of the Global Water Budget and Its Annual Cycle Using Observational and Model Data, J. Hydrometeorol., 8, 758–769,, 2007. a

Troin, M., Arsenault, R., Wood, A. W., Brissette, F., and Martel, J.-L.: Generating Ensemble Streamflow Forecasts: A Review of Methods and Approaches over the Past 40 Years, Water Resour. Res., 57, e2020WR028392,, 2021. a, b

Voss, K. A., Famiglietti, J. S., Lo, M., de Linage, C., Rodell, M., and Swenson, S. C.: Groundwater Depletion in the Middle East from GRACE with Implications for Transboundary Water Management in the Tigris-Euphrates-Western Iran Region, Water Resour. Res., 49, 904–914,, 2013. a

Wang, W., Yang, K., Zhao, L., Zheng, Z., Lu, H., Mamtimin, A., Ding, B., Li, X., Zhao, L., Li, H., Che, T., and Moore, J. C.: Characterizing Surface Albedo of Shallow Fresh Snow and Its Importance for Snow Ablation on the Interior of the Tibetan Plateau, J. Hydrometeorol., 21, 815–827,, 2020. a, b, c

Wang, Y.-H., Broxton, P., Fang, Y., Behrangi, A., Barlage, M., Zeng, X., and Niu, G.-Y.: A Wet-Bulb Temperature-Based Rain-Snow Partitioning Scheme Improves Snowpack Prediction over the Drier Western United States, Geophys. Res. Lett., 46, 13825–13835,, 2019. a, b, c

Ward, P. J., Jongman, B., Kummu, M., Dettinger, M. D., Sperna Weiland, F. C., and Winsemius, H. C.: Strong Influence of El Niño Southern Oscillation on Flood Risk around the World, P. Natl. Acad. Sci. USA, 111, 15659–15664,, 2014. a

Wu, W.-Y., Yang, Z.-L., and Barlage, M.: The Impact of Noah-MP Physical Parameterizations on Modeling Water Availability during Droughts in the Texas–Gulf Region, J. Hydrometeorol., 22, 1221–1233,, 2021. a

Xia, Y., Mitchell, K., Ek, M., Cosgrove, B. A., Sheffield, J., Luo, L., Alonge, C., Wei, H., Meng, J., Livneh, B., Duan, Q., and Lohmann, D.: Continental-Scale Water and Energy Flux Analysis and Validation for North American Land Data Assimilation System Project Phase 2 (NLDAS-2): 2. Validation of Model-Simulated Streamflow, J. Geophys. Res.-Atmos., 117, D03110,, 2012a. a, b, c, d

Xia, Y., Mitchell, K., Ek, M., Sheffield, J., Cosgrove, B. A., Wood, E. F., Luo, L., Alonge, C., Wei, H., Meng, J., Livneh, B., Lettenmaier, D. P., Koren, V., Duan, Q., Mo, K. C., Fan, Y., and Mocko, D.: Continental-Scale Water and Energy Flux Analysis and Validation for the North American Land Data Assimilation System Project Phase 2 (NLDAS-2): 1. Intercomparison and Application of Model Products, J. Geophys. Res.-Atmos., 117, D03109,, 2012b. a, b, c, d

Xia, Y., Ek, M. B., Wu, Y., Ford, T., and Quiring, S. M.: Comparison of NLDAS-2 Simulated and NASMD Observed Daily Soil Moisture. Part II: Impact of Soil Texture Classification and Vegetation Type Mismatches, J. Hydrometeorol., 16, 1981–2000,, 2015a. a, b, c, d

Xia, Y., Ek, M. B., Wu, Y., Ford, T., and Quiring, S. M.: Comparison of NLDAS-2 Simulated and NASMD Observed Daily Soil Moisture. Part I: Comparison and Analysis, J. Hydrometeorol., 16, 1962–1980,, 2015b. a, b, c, d

Xia, Y., Cosgrove, B. A., Mitchell, K. E., Peters-Lidard, C. D., Ek, M. B., Brewer, M., Mocko, D., Kumar, S. V., Wei, H., Meng, J., and Luo, L.: Basin-Scale Assessment of the Land Surface Water Budget in the National Centers for Environmental Prediction Operational and Research NLDAS-2 Systems, J. Geophys. Res-Atmos., 121, 2750–2779,, 2016. a, b

Xia, Y., Hao, Z., Shi, C., Li, Y., Meng, J., Xu, T., Wu, X., and Zhang, B.: Regional and Global Land Data Assimilation Systems: Innovations, Challenges, and Prospects, J. Meteorol. Res., 33, 159–189,, 2019. a

Xu, T., Guo, Z., Xia, Y., Ferreira, V. G., Liu, S., Wang, K., Yao, Y., Zhang, X., and Zhao, C.: Evaluation of Twelve Evapotranspiration Products from Machine Learning, Remote Sensing and Land Surface Models over Conterminous United States, J. Hydrol., 578, 124105,, 2019. a

Xue, Y., Sellers, P. J., Kinter, J. L., and Shukla, J.: A Simplified Biosphere Model for Global Climate Studies, J. Climate, 4, 345–364,<0345:ASBMFG>2.0.CO;2, 1991. a, b

Yang, Z.-L. and Dickinson, R. E.: Description of the Biosphere-Atmosphere Transfer Scheme (BATS) for the Soil Moisture Workshop and Evaluation of Its Performance, Global Planet. Change, 13, 117–134,, 1996. a

Yang, Z.-L., Niu, G.-Y., Mitchell, K. E., Chen, F., Ek, M. B., Barlage, M., Longuevergne, L., Manning, K., Niyogi, D., Tewari, M., and Xia, Y.: The Community Noah Land Surface Model with Multiparameterization Options (Noah-MP): 2. Evaluation over Global River Basins, J. Geophys. Res.-Atmos., 116, D12110,, 2011. a, b, c

Yin, D. and Roderick, M. L.: Inter-annual variability of the global terrestrial water cycle, Hydrol. Earth Syst. Sci., 24, 381–396,, 2020. a

Zaussinger, F., Dorigo, W., Gruber, A., Tarpanelli, A., Filippucci, P., and Brocca, L.: Estimating irrigation water use over the contiguous United States by combining satellite and reanalysis soil moisture data, Hydrol. Earth Syst. Sci., 23, 897–923,, 2019. a

Zhang, B., Xia, Y., Long, B., Hobbins, M., Zhao, X., Hain, C., Li, Y., and Anderson, M. C.: Evaluation and Comparison of Multiple Evapotranspiration Data Models over the Contiguous United States: Implications for the next Phase of NLDAS (NLDAS-Testbed) Development, Agr. Forest Meteorol., 280, 107810,, 2020. a, b, c, d

Zhang, G., Chen, F., and Gan, Y.: Assessing Uncertainties in the Noah-MP Ensemble Simulations of a Cropland Site during the Tibet Joint International Cooperation Program Field Campaign, J. Geophys. Res.-Atmos., 121, 9576–9596,, 2016. a

Zhang, X., Chen, L., Ma, Z., and Gao, Y.: Assessment of Surface Exchange Coefficients in the Noah-MP Land Surface Model for Different Land-Cover Types in China, Int. J. Climatol., 41, 2638–2659,, 2021. a

Zhang, Y., Pan, M., Sheffield, J., Siemann, A. L., Fisher, C. K., Liang, M., Beck, H. E., Wanders, N., MacCracken, R. F., Houser, P. R., Zhou, T., Lettenmaier, D. P., Pinker, R. T., Bytheway, J., Kummerow, C. D., and Wood, E. F.: A Climate Data Record (CDR) for the global terrestrial water budget: 1984–2010, Hydrol. Earth Syst. Sci., 22, 241–263,, 2018. a, b

Zhao, L. and Yang, Z.-L.: Multi-Sensor Land Data Assimilation: Toward a Robust Global Soil Moisture and Snow Estimation, Remote Sens. Environ., 216, 13–27,, 2018.  a

Zhao, L., Yang, K., He, J., Zheng, H., and Zheng, D.: Potential of Mapping Global Soil Texture Type from SMAP Soil Moisture Product: A Pilot Study, IEEE T. Geosci. Remote, 60, 1–10,, 2022. a

Zheng, H., Yang, Z.-L., Lin, P., Wei, J., Wu, W.-Y., Li, L., Zhao, L., and Wang, S.: On the Sensitivity of the Precipitation Partitioning into Evapotranspiration and Runoff in Land Surface Parameterizations, Water Resour. Res., 55, 95–111,, 2019. a, b, c, d, e, f, g

Zheng, H., Yang, Z.-L., Lin, P., Wu, W.-Y., Li, L., Xu, Z., Wei, J., Zhao, L., Bian, Q., and Wang, S.: Falsification-Oriented Signature-Based Evaluation for Guiding the Development of Land Surface Models and the Enhancement of Observations, J. Adv. Model.Earth Sy., 12, e2020MS002132,, 2020. a, b, c, d, e

Zheng, H., Fei, W., Yang, Z.-L., Wei, J., Zhao, L., and Li, L.: An Ensemble of 48 Physically Perturbed Model Estimates of the 1/8 Terrestrial Water Budget over the Conterminous United States, 1980–2015, Zenodo [data set],, 2022. a

Short summary
An ensemble of evapotranspiration, runoff, and water storage is estimated here using the Noah-MP land surface model by perturbing model parameterization schemes. The data could be beneficial for monitoring and understanding the variability of water resources. Model developers could also gain insights by intercomparing the ensemble members.
Final-revised paper