eFLaG : enhanced future FLows and Groundwater . A 1 national dataset of hydrological projections based on 2 UKCP 18

Abstract. This paper presents an ‘enhanced future FLows and Groundwater’ (eFLaG) dataset of nationally consistent hydrological projections for the UK, based on the latest UK Climate Projections (UKCP18). The hydrological projections are derived from a range of river flow models (Grid-to-Grid, PDM, GR4J and GR6J), to provide an indication of hydrological model uncertainty, as well as groundwater level (Aquimod) and groundwater recharge (ZOODRM) models. A 12-member ensemble of transient projections of present and future (up to 2080) daily river flows, groundwater levels and groundwater recharge were produced using bias corrected data from the UKCP18 Regional (12 km) climate ensemble. Projections are provided for 200 river catchments, 54 groundwater level boreholes and 558 groundwater bodies, all sampling across the diverse hydrological and geological conditions of the UK. An evaluation was carried out, to appraise the quality of hydrological model simulations against observations and also to appraise the reliability of hydrological models driven by the RCM ensemble, in terms of their capacity to reproduce hydrological regimes in the current period. The dataset was originally conceived as a prototype climate service for drought planning for the UK water sector, so has been developed with drought, low river flow and low groundwater level applications as the primary focus. The evaluation metrics show that river flows and groundwater levels are, for the majority of catchments and boreholes, well simulated across the flow and level regime, meaning that the eFLaG dataset could be applied to a wider range of water resources research and management contexts, pending a full evaluation for the designated purpose.


UKCP09. More recently, , Kay et al. (2021a,b,c) and Lane & Kay (2021) provided 114 future assessments of potential changes in seasonal mean river flows, high flows and low flows 115 using various UKCP18 products with the G2G hydrological model. They found potential 116 increases in winter mean flows and high flows, and decreases in summer and low flows, albeit 117 with wide uncertainty ranges. To date, and to the authors' knowledge, there have been no 118 published assessments of future groundwater levels or groundwater recharge using UKCP18. 119 In summary, there have been substantial scientific advances in hydrological projections for the 120 UK since Watts et al. (2015) and FFGWL, including some research on future indicators relevant 121 for water resource availability and drought. However, relatively few datasets have been made 122 available to the community since FFGWL. While MaRIUS and EdGE provide complementary 123 hydrological datasets, there remains a need for an accessible dataset based on UKCP18. 124 Existing UKCP18 studies have been focused on time-slice projections and used a single 125 hydrological model (e.g. Kay, 2021, a,b.c) so there will be significant benefit arising from the 126 eFLaG dataset of transient projections from a range of hydrological models covering river 127 flows, groundwater levels and groundwater recharge. The whole project workflow is illustrated in Fig 1. eFLaG is driven by the UKCP18 dataset, 140 specifically the 'Regional' 12km projections, to which a bias correction is applied. Section 3 141 describes the processing of the climate projections, including the bias correction method. The 142 UKCP18 projections are used as input to three river flow models (GR, PDM and G2G), one 143 The question of uncertainty in climate impacts modelling is a challenging one that has been 155 explored in a whole range of studies, going back as far as climate projections have been 156 routinely produced from the 1980s. There are inherent uncertainties at every step of the process, 157 from climate emissions scenarios through to climate modelling, and on to environmental 158 modelling (in our case hydrological modelling, which itself has a vast literature when it comes 159 to uncertainty estimation) and then to wider impacts modelling (e.g. in water supply systems). Recently, Smith et al. (2018) presented this issue as a 'cascade of uncertainty' (using widely 161 adopted terminology, e.g. Wilby and Dessai, 2010). Within eFLaG, as with the majority of 162 climate impact applications, it is not possible to sample across all sources of uncertainty. 163 Following Smith et al. (2019) we adopted a pragmatic approach to 'crystalising' the uncertainty 164 within the available time and resource constraints. In Table 1, we consider the sources of 165 uncertainty, and our approach to sampling from them. The focus in eFLaG is on uncertainty 166 arising from initial/boundary conditions. Additionally, for the river flow simulations, the 167 uncertainty arising from model choice is also accounted for, and within this, model structure is 168 accounted for by considering two versions of one of the models.  The regional climate projections were created using perturbed-parameter runs of the Hadley 178 Centre global climate model (GCM) and regional climate models (HadGEM3-GC3.05 and 179 HadREM3-GA705 respectively). These provide a set of 12 high resolution (12km) spatially 180   3. The change factor grids were then smoothed to prevent spatial discontinuities, by 208 updating each grid cell using a weighted combination of the original grid-cell value and 209 neighbouring values, as in Guillod et al. (2018). 4. To produce bias-corrected precipitation estimates, the RCM simulated precipitation 211 time-series were multiplied by the bias-correction factor grid for each month (i.e. all 212 January precipitation was multiplied by the January bias-correction grids, February 213 precipitation by the February correction grid, etc.). 214 The bias-corrected precipitation products were then downscaled from 12km to 1km based on

Accounting for snowmelt processes 220
A simple snow module was applied to account for snow-melt processes (Bell et al., 2016). The 221 snow module converted the 1km bias-corrected precipitation into rainfall plus snowmelt (i.e. 222 available precipitation), based on temperature. This used the minimum and maximum daily 223 temperatures provided by each RCM ensemble member, which were first scaled from a 12km 224 resolution to 1km using a lapse rate based on elevation data. The parameters used in the snow 225 module are given in Supplementary Info (Table S1). 226

Potential evapotranspiration 227
Potential evapotranspiration (PET) was not directly available as an RCM output, and was 228 therefore generated using a range of variables from the RCM-PPE climate time-series (Table  229 S2). The calculation for PET was based on the CHESS method (Robinson et al., 2016), with 230 some details, in particular an interception correction, introduced from the MORECS method 231 The PET data were then copied down from a 12km to 1km resolution. 236

Outputs 237
The 1km gridded time-series of 'available precipitation' and PET were then used to produce 238 the time-series of catchment-averages required for each of the eFLaG river catchments and 239 groundwater boreholes. For the river catchments, the catchment average values were derived 240 using the standard UK National River Flow Archive approach for catchment average rainfalls, 241 as described in NRFA (2021). For the boreholes, following Mackay et al. (2014a), averages 242 were taken over the representative aquifer length which was determined as the groundwater 243 flow path between the borehole and a single discharge point on a river based on the catchment 244 geometry and hydrogeology. For the grid-based models, ZOODRM and G2G, the gridded data 245 were used directly. 246 The bias-corrected climate outputs are part of the eFLaG dataset described further in Section 9. 247 For each river catchment and groundwater borehole, bias-corrected data are available for the 248 observational period, for the purposes of evaluation of the hydrological model outputs, and for 249 the future. In addition, the gridded bias-corrected climatology will be made available as a 250 separate dataset in future.

Catchment selection 260 261
The UK is fortunate to have one of the densest hydrometric networks in the world, with a legacy 262 of strong commitment to data quality and completeness. There are more than 1,500 river flow

River Flows 275
To support selection, a metadatabase was assembled for all NRFA gauging stations in the UK, 'MaRIUS' projects, that used several of the models used by eFLaG (specifically G2G, GR4J).

286
In this regard we ensured that 165 eFLaG catchments overlapped with at least one DWS project. 287 Selection also focused on data quality. Longer record lengths were prioritised and hydrometric 288 quality was evaluated where possible. Given the extent of hydrometric issues (at low flows 289 especially) it is not possible for all sites to have the highest quality data, but where decisions 290 were made on similar sites, quality was considered as a tiebreaker. The selection included 80 291 Benchmark catchments, but did not seek to focus entirely on natural catchments given the 292 limited range of variability they capture (being mostly small and clustered in headwaters), and 293 also included large and disturbed sites known to be important for water industry purposes. 294 Catchment representativeness was also considered, enabling the eFLaG dataset to sample the 295 hydrological variability of the UK. Representativeness was considered by comparing the 296 distribution of eFLaG potential selections relative to various catchment descriptors from the 297 https://doi.org/10.5194/essd-2022-40

Groundwater Levels 308
Boreholes were selected to ensure a number of essential criteria were met. Firstly, only those 309 boreholes with the highest-quality records of groundwater level were considered. This required 310 regular (at least monthly) and continuous (at least 10 years in length) records of data from 311 boreholes that are in zones which are not significantly affected by groundwater abstraction. projects. Accordingly, the selection aimed to ensure good coherence with these studies also. 317 Thirdly, as with river flow catchment selection, an additional activity focused on ensuring water 318 industry relevance, both at the national scale, through consultation with stakeholders at the 319 eFLaG workshop, and through consultation with key demonstrator partners (Dwr 320

Cymru/Welsh Water and Thames Water) who identified strategically important boreholes that 321
would strengthen the outputs for long-term drought risk assessment to support the water 322 resources planning case study. Through this activity, several additional boreholes were 323 identified. 324 These selection criteria identified over 70 'candidate' boreholes for the eFLaG project. A final 325 quality assurance procedure was then undertaken whereby a preliminary analysis of AquiMod's 326 ability to capture low groundwater levels was undertaken at each borehole via visual inspection 327 of the simulated hydrographs. A final set of 54 boreholes was selected (Fig. 3b). They represent 328 a significant advance in aquifer coverage compared to the 24 NGLA boreholes used in FFGWL, 329 15 of which are used in both. 330

Groundwater Recharge 331
The gridded groundwater recharge simulations have been aggregated over 558 'groundwater 332 In the following sub-sections, we describe each of these models in turn, providing information 370 on the model set-up, calibration and past approaches to evaluation. A consistent approach was 371 applied to the model application and evaluation across all these models where possible. 372 However, it is important to emphasise that while some aspects were common, insofar as 373 possible (e.g. model driving data), it was necessary to apply different approaches to suit the 374 model in question. Calibration was done according to past applications and best-practice. 375 Hence, the calibration approach described below is similar for the GR suite and PDM, but 376 different for Aquimod, and by its nature G2G requires no specific calibration here. Identical 377 approaches to evaluation were adopted across all river flow models, but minor differences 378 applied with groundwater, as described below. 379 There are two sets of model output in eFLaG, described belowthis terminology is adopted 380 throughout. 381  simobs: observation-driven simulation (i.e. simulations for the observed period, driven 382 by observational climate datasets, described below). The simobs period varies between 383 models, but covers at least the January 1961 -December 2018 period. 384  simrcm: UKCP18 RCM-driven simulation (12 ensemble members) (i.e. simulations 385 driven by the UKCP18 RCM bias-corrected dataset as described in Section 3). These 386 are available for 1980 to 2080. The simrcm runs from the observed period could then 387 be evaluated against the simobs data. 388 Common driving data was applied across all models for the simobs runs. Accepted national-389 standard observational climate products were used, including: groundwater. For Stage 1, a range of metrics are available and widely used to assess how well 406 rainfall-runoff or groundwater models perform against observations. Within eFLaG, a range of 407 different metrics were used to assess performance (Table 3). For river flows, these metrics have 408 a focus on low flow metrics (e.g. NSE on log-transformed flows), but some do evaluate 409 performance across the flow regime. For groundwater levels, a generalised NSE score was used 410 which provides an overall assessment of process realism and fit to groundwater level data. The 411 simulated and observed Standardized Groundwater level Index (SGI) were also compared using 412 the NSE (NSESGI) which focusses in on groundwater extremes including droughts.
Qi and qi are observed and modelled flow for day i of a n day record. Q ̅ is the mean observed flow.
Hi and hi are observed and modelled groundwater level for day i of a n day record. H ̅ is the mean observed groundwater level.

High
Flows/Generalised groundwater levels

Nash-Sutcliffe
Efficiency log Low Flows SGIi and sgii are observed and modelled SGI for day i of a n day record. SGI ̅̅̅̅̅ is the mean observed SGI.
where is the correlation coefficient, is the bias ratio  Here Qj,i is observed flow for day i of hydrological year j for a record of n years Low Flows Recently, GR6J has increasingly been applied in UK water resources applications (e.g. Anglian Water 470 Drought Plan, 2021). 471 For eFLaG, it was decided, therefore, that using both GR4J and GR6J would be beneficial. Both GR4J 472 and GR6J were calibrated using the inbuilt automatic calibration function, with the modified Kling weight evenly across the flow regime. The airGR snowmelt module "CemaNeige" was not applied, as 478 a simple snow module was applied to the climate data to pre-process the precipitation data into rainfall 479 and snowmelt based upon temperature (See section 3). 480

Grid-to-Grid 481
The Grid-to-Grid (G2G) hydrological model is an established area-wide distributed model that has 482 been used to investigate the spatial coherence and variability of floods and droughts at catchment, Within the model, a soil water store with a distribution of water absorption capacities controls runoff 505 production through a saturation excess process; stored water is also lost to evaporation. In one 506 configuration, all runoff enters a surface store (the fast pathway) while a groundwater store (the slow 507 pathway) is recharged by soil water drainage. In an alternative configuration, the runoff is split between 508 the two stores according to a fixed fraction. Water in the surface-and ground-water stores is routed 509 using a non-linear storage equation (powers of 1, 2 and 3 were trialled under eFLaG), or, for the surface 510 store, a cascade of two linear reservoirs, before being combined to produce the modelled flow at the 511 catchment outlet. Water is conserved within the model, whilst a multiplicative factor (equal to 1 if not 512 required) is applied to the input precipitation. Alternatively, a Groundwater Extension (Moore and 513 Bell, 2002) may be invoked to allow modelling of underflow at the catchment outlet, external springs, 514 pumped abstractions, and the incorporation of well level data. Multiple hydrological response zones 515 within a catchment can also be represented (not trialled under eFLaG). PDM may be thought of as a 516 toolkit of model components representing a range of runoff production and flow routing behaviours, 517 and with a choice of time-step. 518 Under eFLaG, single zone PDM models were invoked with a daily time-step. The model stores were 519 initialised using the mean observed flow over the period of record, and the first two years of model 520 flow discarded to allow for model spin-up. Nineteen different combinations of the above-mentioned 521 toolkit options were systematically trialled for each catchment. Parameter estimation was performed 522 using an automatic calibration procedure that applied a simplex optimisation scheme (Nelder and  For each borehole, the AquiMod parameters and structure were calibrated to achieve the most efficient 543 simulation of available historical groundwater level data using the Nash-Sutcliffe Efficiency (NSE), 544 which provides a reliable assessment of overall process realism and goodness of fit to groundwater parameters that could be related to catchment information (e.g. relating to known land cover and soil 547 type) were fixed. The remaining parameters were then calibrated, using six different saturated zone 548 model structures including a one-layer model (fixed hydraulic conductivity and specific yield); two-549 and three-layer models with variable hydraulic conductivity and fixed specific yield; two-and three-550 layer models with variable hydraulic conductivity and variable specific yield; and a 'cocktail glass 551 representation of hydraulic conductivity variation with depth (Williams et al., 2006). The optimal 552 structure-parameter combination was obtained for each borehole using the Shuffled Complex 553 Evolution global optimisation algorithm. 554 The calibrated models were then evaluated for their ability to capture groundwater level extremes using evaporation and soil moisture deficit from rainfall and calculates potential recharge as a fraction of the 566 excess water using a runoff coefficient value. The model was driven by daily rainfall and potential 567 evaporation data. The model was primarily parameterised using available national scale data including 568 data relating to the soil hydrology (Boorman et al., 1995), vegetation (LCM2000, NERC) and surface 569 topography. The latter of these was used to route surface water runoff. 570 The runoff coefficient, which defines the proportion of excess soil water that drains overland via 571 surface runoff, is an unknown parameter which must be calibrated. This was done in two stages. Firstly, 572 the calibration problem was simplified by defining zones of equal runoff coefficient. In total 35 zones 573 were used in ZOODRM which were based on UK hydrogeological and geological maps (DiGMapGB-574 625, 2008). Then, the runoff coefficient for each zone was manually calibrated by comparing simulated 575 runoff to observed river flows minus baseflow which was calculated using a well-established baseflow 576 separation method (Gustard et al., 1992). This was done using monthly mean flows given that 577 ZOODRM does not have a sophisticated runoff routing scheme, and it is not expected, therefore, to 578 capture daily variability in runoff. The comparison to monthly flows does, however, provide a useful 579 means to evaluate the seasonal water balance of the model which serves as the best available proxy for 580 the accuracy of the recharge simulations. In total, 41 gauging stations were used to assess the model 581 performance. 582 The only hydrological process that needs initialisation in the ZOODRM is the soil moisture deficit. As 583 all simulations start in January, which is a wet month with minimal potential evaporation, it is assumed 584 that the initial soil moisture deficit is equal to zero. Even so, a warm up period of one year is used to 585 initialise the model. 586 587

Hydrological model evaluation (Stage 1 evaluation) 588 589
This section provides a brief summary of the outputs of the Stage 1 evaluation. Note that for river 590 flows, model evaluation was undertaken at the same gauged locations and for the same period of time 591 used for model calibration, except G2G which is not specifically calibrated. 592 For G2G, again, good performance was observed overall (medians for NSE/ logNSE/ sqrtNSE/ KGE2 601 ≥ 0.7). However, the performance was generally lower than for GR or PDM because the G2G is not 602 calibrated to individual catchments, and G2G simulates natural flows, whereas the lumped models are 603 calibrated to the observations used for performance assessment. In catchments with a high degree of 604 anthropogenic disturbance, G2G is less able to simulate observed flows, whereas the calibration of the 605 other hydrological models will implicitly account for such artificial impacts, to a degree. 606

River Flows 593
This distinction highlights an important benefit of eFLaG: PDM and GR4J/GR6J are calibrated to 607 present-day flows and hence simulated flows are not natural, as they implicitly include artificial 608 impacts. These runs do not, therefore, allow users to separate natural flows and artificial influences in 609 the baseline period, nor to project how they may change relative to each other in future. On the other 610 hand, although not used here, G2G has the capability of including artificial influences separately (e.g.

Groundwater recharge 658
ZOODRM demonstrates an ability to efficiently capture monthly mean river flows as is reflected by 659 the medians for NSE and KGE2 which both exceed 0.75 and the median absolute percent bias which 660 is 12.7% (Fig. 6). Fig. S6 shows the distributed recharge model results at the 41 gauging stations across 661 the country. The model uses a simplistic overland routing approach, which is implemented to check 662 the water balance at a monthly basis, noting that large scale spatial recharge values are most commonly 663 used to drive groundwater flow models using monthly stress periods. Frome (53006; Fig. S8), and Lud (29003; Fig. S7). 695 For certain catchments such as the Stringside (33029; Fig. 7) and Lud (29003; Fig. S7), although there 696 appears to be greater RCM uncertainty in river flows than for other catchments, the differences tend 697 to be exaggerated in smaller, drier catchments with lower flows across the flow regime. The 698 logarithmic y-axis is also a contributing factor to this, and also accounts for the seemingly larger RCM 699 uncertainty in low flows than high flows across all catchments. These findings are also consistent 700 across the four hydrological models, with no systematic differences identified for a given hydrological in which the lowest river flows derived from the RCM ensemble are much lower than those in the 703 model observations (e.g. 23004 South Tyne (Fig. S7) and 67018 Welsh Dee (Fig. S8) for GR6J, 33029 704 Stringside (Fig. 7) for G2G).

Groundwater level duration curves 706
Overall, an analysis of the groundwater level duration curves (GLDCs) at all boreholes (Figs.S10-S15) 707 shows close correspondence between the simrcm and simobs runs whereby the simobs GLDC typically 708 lies within the range of the simrcm GLDCs. However, there are some different behaviours across the 709 boreholes which are summarised in Fig. 8. Fig.8a  including the Heathlanes borehole situated in the Permo-Triassic Sandstone (Fig. 8b). These appear to 714 be associated with boreholes which are known to respond relatively slowly to climate due to local 715 hydrogeological conditions. For example, Heathlanes is known to be representative of a relatively low 716 hydraulic diffusivity aquifer. For some boreholes there are areas of the GLDCs where the simobs 717 GLDC does not lie within the range of the simrcm GLDC. In the most extreme cases, systematic biases 718 across almost the entire GLDC can be seen (e.g. Fig. 8c).    The eFLaG dataset is presented as a nationally consistent dataset of future river flow, groundwater and 795 groundwater recharge, using the latest available climate projections, from UKCP18. In this article, we 796 have described the dataset and its evaluation against observational hydrological datasets, to give some 797 confidence in the use of eFLaG as a dataset that can be used to assess the potential impacts on climate 798 change on UK hydrology for a very wide range of applications. The eFLaG dataset was developed specifically as a demonstration climate service for use by the water 800 industry for water resources and drought planning, and hence by design is focused on future projections 801 of drought, low river flows and low groundwater levels. We therefore present eFLaG primarily as a 802 dataset for this purpose. Ongoing work is underway to demonstrate the utility of eFLaG for future 803 drought projections (Parry et al. in prep.) and for future drought/water resources planning in practice 804 (Counsell et al. in prep.). The predecessor product, FFGWL, has been widely used within the water 805 industry to provide insight into the future evolution of river flows and groundwater levels through the 806 21st century to support water resources management plans, and also supported significant academic groundwater behaviours across much of the hydrological range suggests that this product could also 814 find application in a whole range of impact studies, subject to additional evaluation for the purposes 815 in mind. While not validated specifically for floods, the encouraging evaluation outputs for higher flow 816 percentiles suggests users can analyse high flow metrics and variability (e.g. frequency of flows above 817 a threshold), even if not annual maximum peak flows. 818 As with FFGWL, there are a number of advantages of using eFLaG for future projections: it is a 819 spatially coherent dataset, meaning that future changes in hydrological variables can be compared 820 between catchments, boreholes and aquifers at the regional-to-national scale. This is a key benefit for 821 both research as well as practical water resources planning. Spatially coherent projections are needed 822 to address the spatio-temporal dynamics of droughts (e.g. Tanguy et al. 2021) and how these may 823 change in future and what this may mean for water resources planningwhere, in practice, water 824 resources management plans often involve transfers between regions (e.g. Murgatroyd et al. 2021). 825 Another key benefit of eFLaG is that transient time series (daily data from 1980 to 2080) allow users 826 to can explore the future evolution of river flow and groundwater variability on interannual and decadal 827 timescales, rather than just using 'Change Factor' approaches that compare between future time slices 828 and the baseline. 829 The use of an ensemble of outputs enables users to consider uncertainty in driving data (via the 12 830 member RCM ensemble) as well as, for river flows, hydrological model uncertainty. In addition, 831 different models provide different benefits: G2G performs less well against observations than the 832 (calibrated) lumped catchment models, but does enable the characterisation of natural flows, which is 833 vital for some uses (and against which artificial influences can be modelled separately in future). should consult all the provided evaluation metrics when considering which catchments to use (and 838 which models to use) in their analyses. 839 Users must also be aware that while there is some consideration of uncertainty through the adoption 840 of the RCM PPE, and the use of a multiple models for river flows, there are many other sources of 841 uncertainty not sampled in eFLaG. While the PPE gives a range of 12 outcomes, it is only one UKCP18 842 product and one emissions scenario, so does not sample the full range of outcomes in UKCP18. 843 Furthermore, only one bias correction approach is used. Although we use a range of hydrological 844 models, clearly other hydrological models could provide different outcomes than the set used here, 845 and we have also not considered other sources of uncertainty in the hydrological modelling (e.g. Finally, eFLaG only provides projections for a subset of the UK gauging station network (200 850 catchments from some 1200 on the NRFA, for example). This is an inevitable constraint, as with the 851 original FFGWL product (300 locations). While we have tried to sample UK hydrology to give users 852 as much scope as possible, there will still be a need to transpose projections to sites of interest for some 853 users. One of the benefits of eFLaG is that gridded river flow and recharge models are used. While 854 these gridded datasets are not made available here, future initiatives will be looking to exploit them 855 for providing projections at ungauged locations. 856 857 9. Data Availability 858 859 The eFLag dataset is associated with a Digital Object Identifier. This must be referenced fully for every 860 use of the eFLag data as: https://doi.org/10.5285/1bb90673-ad37-4679-90b9-0126109639a9 861 862 All eFLaG files are available through the UKCEH Environmental Informatics Data Centre: 863 https://catalogue.ceh.ac.uk/documents/1bb90673-ad37-4679-90b9-0126109639a9 864 865 The data are stored as .csv files in the folder structure shown in the Guidance note available at 866 Hannaford et al. (2022). In total there are 3304 files: one for each variable, model and 867 catchment/borehole combination. They can be broadly split into two groups of files (Table 4), simobs  868 and simrcm, as follows.

Conditions of Use 885
The eFLaG dataset is available under a licensing condition agreement. For non-commercial use, the 886 products are available free of charge. For commercial use, the data might be made available 887 conditioned to a fee to be agreed with UKCEH and NERC BGS licensing teams, owners of the IPR of 888 the datasets and products.