CAMELS-GB: Hydrometeorological time series and landscape attributes for 671 catchments in Great Britain

This study presents the first large-scale comprehensive hydrometeorological dataset for Great Britain. Authors synthetize the range of different data type (time and space support) from allied science fields into single, ready for use database in the well-known CAMEL format. The sources and structure of the data are well described; data aggregation procedures within the selected watersheds are specified. Comments on the possible limitations (quality and uncertainties) for the some variables are given. The format of the database is simple and self-describing. Manuscript is well structured and easy to read. Abstract We present the first large-sample catchment hydrology dataset for Great Britain, CAMELS-GB 20 (Catchment Attributes and MEteorology for Large-sample Studies). CAMELS-GB collates river flows, catchment attributes and catchment boundaries from the UK National River Flow Archive together with a suite of new meteorological timeseries and catchment attributes. These data are provided for 671 catchments that cover a wide range of climatic, hydrological, landscape and human management characteristics across Great Britain. Daily timeseries covering 1970-2015 (a period 25 including several hydrological extreme episodes) are provided for a range of hydro-meteorological variables including rainfall, potential evapotranspiration, temperature, radiation, humidity, and river flow. A comprehensive set of catchment attributes are quantified including topography, climate, hydrology, land cover, soilssoils, and hydrogeology. Importantly, we also derive human management attributes (including attributes summarising abstractions, returns and reservoir capacity in each 30 catchment), as well as attributes describing the quality of the flow data including the first set of discharge uncertainty estimates (provided at multiple flow quantiles) for Great Britain. CAMELS-GB (Coxon et al, 2020; available at https://doi.org/10.5285/8344e4f3-d2ea-44f5-8afa-86d2987543a9) is intended for the community as a publicly available, easily accessible dataset to use in a wide range of environmental and modelling analyses.

Section 6.3: Provide a reference to the method used to calculate streamflow elasticity. Table 2 "streamflow precipitation elasticity (sensitivity of streamflow to changes in precipitation at the annual timescale, using the mean daily discharge as reference). See equation 7 in , with the last element being ̅ ̅ ⁄ not ̅ ̅ ⁄ " Section 6.3: Good that two base flow indices are provided. I personally liked using the BFIHOST, but the latter is not directly derived from the streamflow record and therefore might not fit in this sub dataset. Might it fit somewhere else? Or was there another reason that it was excluded?

We added a reference to the exact equation in
We also very much like using BFIHOST. However, as you rightly point out, it doesn't fit in the hydrological attributes (as it is derived primarily from soils data rather than streamflow data) and we decided not to include it elsewhere as the source data for BFIHOST are not open access.
Section 6.3: I think zero_q_freq is the fraction and not the percentage of time with zero flow. Please check. This might also explain why you have some NAs in the slope of the FDC. Please check as well.
You are correctthis is the fraction, not the percentage. We have changed the text in Table 2 to reflect this.

"fraction of days with Q = 0"
This is also the reason for some NAs in the slope of the FDC (as you suggested) and we updated Table 2 to make the user aware.
"slope of the flow duration curve (between the log-transformed 33rd and 66th streamflow percentiles). There can be NAs in this metric when over a third of the flow time series are zeros (see zero_q_freq)" Section 6.3: Any reasoning / reference on why you chose 9 times median flow or 0.2 times mean flow as thresholds for high flow / low flow events?
We followed the definitions adopted by the previous CAMELS datasets. These thresholds were originally suggested by Clausen and Biggs (2000) and Westerberg and McMillan (2015). We added these two references to Table 2. UKBN catchments were labelled as suitable for low, medium and high flows. These data are available as open access here (https://nrfa.ceh.ac.uk/benchmark-network) so can be easily included as part of any analyses in conjunction with CAMELS-GB. As discussed above, we will be waiting for a significant update to CAMELS-GB before including additional attributes.
Section 6.8: I completely understand the uncertainties with regard to the human interventions, which are nicely outlined in the limitations. However, what should be commented on is that some of the benchmark catchments seem to be relatively heavily impacted by a human intervention of some sort, e.g., the occurrence of a significant amounts of abstractions or the presence of several reservoirs. As a user of the dataset, does this mean that I should interpret some of the benchmark catchments with care? Or that I should be extra careful when interpreting the abstraction and reservoir information?
Both the UKBN classifications and the abstractions information should be treated with care, as there are limitations in both.
As noted by Harrigan et al., (2018), the UKBN sought to be a 'best available' classification of human disturbances based on available information and expert judgment input from the gauging authorities. However, inevitably it is not perfect, and some compromises had to be made (i.e. some impacts tolerated) especially in the heavily impacted south and east of the UK, to ensure coverage in these regions (especially because good hydrometric data quality was also a key criteria in the UKBN selections so the pool of potential stations was smaller than in CAMELS-GB). In such otherwise sparsely covered areas, abstractions, discharges etc were sometimes tolerated on a case-by-case basis if (i) there was a pressing need for a catchment to fill a gap (either geographically or in terms of representativeness) and available information suggested they (ii) had a modest overall influence on flows or (iii) were known to be stable over time.
Similarly, the Artificial Influences dataset generated for CAMELS-GB is also not without limitations as noted in the paper (6.8.2 and 6.8.3). The dataset provides gross totals that point to the possible influence of abstractions (or reservoirs etc) but does not actually quantify the net influence of these impacts on the actual flow regime (unlike other artificial influence schemes, as discussed). It would be possible to have high potential influences in a catchment without them manifesting themselves as a major detectable influence on the streamflow regime. Moreover, all UK artificial influence datasets are subject to quality limitations, as outlined in these sections.
We have clarified this in Section 6.8.1 and 6.8.2.
Section 6.8: It might be useful to additional add the Factors affecting runoff codes, which are presented in the UK hydrometric register, as indicative information on the type of human influence that might have been present at some point in time in the catchment? Factors affecting runoff codes might also be highly uncertain, but together with the already presented data on abstractions and reservoirs, they might provide some additional clues on possible human influences.
The FAR classification was originally intended as a way of highlighting the presence of impacts that would affect the water balance in a catchment. While it can provide a crude guide to the presence of impacts, it does not give information on the extent of these impactsnor does its absence indicate a lack of impacts. It was also created a long time ago and has not been routinely (nor systematically) updated. While it is another source of information, it is not one we would want to include in CAMELS-GB given the focus on the new impact datasets which are quantitative. However, we should have signposted this dataset in the manuscript and have now done so at the beginning of Section 6.8. "We focused on providing quantitative data of human impacts in CAMELS-GB, however it is important to note that additional datasets are available that qualitatively characterise human impacts in GB including the Factors Affecting Runoff (FAR) codes available from the National River Flow Archive." Line 514-518: State that this is annual average precipitation. For me, it would be enough to just mention mm / year (and delete mm / day).
We have added it is an annual average but want to keep both figures as the numbers provided in the dataset are mm/day.
"The wettest areas of the UK are in mountainous regions with a maximum of 9.6 mm day-1 (annual average of 3500 mm year-1) in the north-west." Line 524: Add space between number and unit (mm)

Changed.
Figure 1a: As you already have a second y-axis on the right, you might also consider adding a second x-axis on the top that indicates the accumulated amount of years with missing data.
Thanks for the suggestion but we think this will make the plot too busy.

Figure 2e
: The distribution of total reservoir volume might be more informative (although prob. not very different from the number of reservoirs).
Thanks, we have altered Figure 2e to display total reservoir capacity in each region and have altered the text in section 6.9.
"Large reservoir capacity is concentrated in the more mountainous northern and western regions of GB, particularly in Western Scotland (Figure 2e)." For interest, the figure below shows the difference between total reservoir capacity per region and number of reservoirs. While the pattern is broadly the same (i.e. the northern and western regions have the largest number of reservoirs and total reservoir capacity), Western Scotland interestingly has fewer but larger reservoirs compared to the North-East.

Supplement (S2):
Nice that all these maps are added. It would be helpful if they had titles, so you do not always need to read the caption. . .
Agreed. We will make this change in the revised manuscript.
Dataset: Clear and easy to process (in my case with R, but I am sure that this will be the same for other software). It would be nice to have one .zip file in the parent directory, which allows you to directly download all the time series data at once.
Agreed. The method of download has been changed so you can simply download a .zip file.

Reviewer 2
This study presents the first large-scale comprehensive hydrometeorological dataset for Great Britain. Authors synthetize the range of different data type (time and space support) from allied science fields into single, ready for use database in the well-known CAMEL format. The sources and structure of the data are well described; data aggregation procedures within the selected watersheds are specified. Comments on the possible limitations (quality and uncertainties) for the some variables are given. The format of the database is simple and self-describing. Manuscript is well structured and easy to read.
We thank Reviewer 2 for their positive assessment of the paper and their helpful comments. Please see our detailed responses below.
However, some critical comments must be noted: 1) In comparison with, CUAHSI ODM for instance, the CAMEL metadata schema is poorly developed. The database schema does not provide interoperation with data sources and feedback from community. There is no version control for observations, derived values and data series.
We have used the principal data centre for UK Freshwater research data (the Environmental Information Data Centre) to host the data. The EIDC provides DOIs but does not currently provide mechanisms for versioning or for community feedback. The data within CAMELS-GB are from primary data sources which are themselves versioned at the dataset level but do not provide information on versions or changes at the observation level.
It is important to note that CAMELS-GB (and the other CAMELS datasets) are the result of grass-root efforts led by individual hydrologists. Other initiatives supported by larger institutions and sustained funding rely on a more developed data management scheme, which we recognise the value of and find inspiring for future development stages. However, we would like to stress that no budget was available to produce and release CAMELS-GB. We anticipate that after this first phase, focussed on the release of national CAMELS datasets such as CAMELS-GB, we will be able to focus our efforts on increasing the consistency among the CAMELS datasets, as well as their interoperability and their data standards (see Addor et al., 2019). We have added a sentence to the conclusions to make this clear.
"We are also striving to increase the consistency among the CAMELS datasets (in terms of time series, catchment attributes, naming conventions and data format, see Addor et al., 2019), and to create a dataset that is globally consistent. We anticipate that this will happen as part of a second phase, which will build upon the current first phase that is focussed on the release of national products, such as CAMELS-GB." 2) Addition of a drainage network layer would facilitate navigation through the data and trace the hierarchy of nesting watersheds.
We agree that this would be a useful dataset to include. Currently, however, there are no open access, high-resolution river networks for Great Britain to share as part of the CAMELS-GB dataset so this is not possible.
3) Please give an explicit comment on data spatial aggregation -are nested watersheds areas been included or not.

Abstract
We present the first large-sample catchment hydrology dataset for Great Britain, CAMELS-GB 20 (Catchment Attributes and MEteorology for Large-sample Studies). CAMELS-GB collates river flows, catchment attributes and catchment boundaries from the UK National River Flow Archive together with a suite of new meteorological timeseries and catchment attributes. These data are provided for 671 catchments that cover a wide range of climatic, hydrological, landscape and human management characteristics across Great Britain. Daily timeseries covering 1970-2015 (a period  25 including several hydrological extreme episodes) are provided for a range of hydro-meteorological variables including rainfall, potential evapotranspiration, temperature, radiation, humidity, and river flow. A comprehensive set of catchment attributes are quantified including topography, climate, hydrology, land cover, soilssoils, and hydrogeology. Importantly, we also derive human management attributes (including attributes summarising abstractions, returns and reservoir capacity in each 30 catchment), as well as attributes describing the quality of the flow data including the first set of discharge uncertainty estimates (provided at multiple flow quantiles) for Great Britain. CAMELS-GB (Coxon et al, 2020; available at https://doi.org/10.5285/8344e4f3-d2ea-44f5-8afa-86d2987543a9) is intended for the community as a publicly available, easily accessible dataset to use in a wide range of environmental and modelling analyses.

Introduction
Data underpin our knowledge of the hydrological system. They advance our understanding of water dynamics over a wide range of spatial and temporal scales and are the foundation for water resource planning and regulation. With the emergence of new digital technologies and increased monitoring of the earth system via satellites and sensors, we now have greater access to data than ever before. This 40 proliferation of data has been reflected in recent projects where there has been a focus on sharing data and collaborative research (SWITCH-ON; Ceola et al., 2015), collecting new datasets through the creation of terrestrial environmental observatories (TERENO; Zacharias et al., 2011) or the Critical Zone Observatories (CZO; Brantley et al., 2017), and cloud based resources for modelling and visualising large datasets such as the Environmental Virtual Observatory (EVO; Emmett et al., 2014) 45 and the CUASHI hydrodesktop (Ames et al., 2012).
To synthesize hydrologically relevant data and learn from differences between catchments, several large-sample hydrological datasets have been produced over the last decades. These datasets rely on complementary data sources to provide the community with hydrometeorological time series and landscape attributes enabling the characterisation of dozens to thousands of catchments (see Addor et 50 al., 2019 for a review). Many studies have demonstrated the importance of large sample catchment datasets for understanding regional variability in model performance (Coxon et al., 2019;Kollat et al., 2012;Lane et al., 2019;Newman et al., 2015;Perrin et al., 2003), testing model behaviour and robustness under changing climate conditions (Coron et al., 2012;Fowler et al., 2016;Werkhoven et al., 2008), understanding variability in catchment behaviour including hydrologic signatures and 55 classification (Sawicz et al., 2011;Yadav et al., 2007), assessing trends in hydro-climatic extremes (Berghuijs et al., 2017;Blöschl et al., 2017;Gudmundsson et al., 2019;Hannaford and Buys, 2012;Stahl et al., 2010), exploring model and data uncertainty (Coxon et al., 2014;Westerberg et al., 2016) and regionalising model structures and parameters (Lee et al., 2005;Merz and Blöschl, 2004;Mizukami et al., 2017;Parajka et al., 2005;Pool et al., 2019;Singh et al., 2014). 60 However, while the number of studies involving data from large samples of catchments is rapidly increasing, publicly available large sample catchment datasets are still rare. Researchers spend considerable time and effort compiling large sample catchment datasets, yet these datasets are rarely made available to the community due to data licensing restrictions, strict access policies or because of the time required to make these datasets readily usable (Addor et al., 2019;Hannah et al., 2011;65 Hutton et al., 2016;Nelson, 2009;Viglione et al., 2010). Notable exceptions of open-source, largesample, catchment datasets include the MOPEX dataset that includes hydro-meteorological timeseries and catchment attributes for 438 US catchments (Duan et al., 2006), the CAMELS dataset that covers 671 US catchments (Catchment Attributes and MEteorology for Large-Sample studies, Addor et al., 2017;Newman et al., 2015), the CAMELS-CL dataset that contains data for 516 catchments across 70 Chile (Alvarez-Garreton et al., 2018) and the Canadian model parameter experiment (CANOPEX) database (Arsenault et al., 2016). Because Ddaily streamflow records often are not allowed to cannot be redistributed, thus researchers have computed streamflow indices (hydrological signatures) and made them publicly available together with catchment attributes. This is the approach selected taken for the Global Streamflow Indices and Metadata Archive (Do et al., 2018;Gudmundsson et al., 2018), 75 which includes >35,000 catchments globally, and the dataset produced by Kuentz et al., (2017) which includes data for >30,000 catchments across Europe. Overall, datasets for large samples of catchments are vital to advance knowledge on hydrological processes (Falkenmark and Chapman, 1989;Gupta et al., 2014;McDonnell et al., 2007;Wagener et al., 2010), to underpin common frameworks for model evaluation across complex domains (Ceola et al., 2015) and ensure 80 hydrological research is reusable and reproducible through the use of common datasets and code (Buytaert et al., 2008;Hutton et al., 2016).
In Great Britain, there is a wide availability of gridded, open source datasets and free access to quality-controlled river flow data via the UK National River Flow Archive (NRFA). While this is a large resource of open data by international standards, these datasets have not yet been combined and 85 processed over a consistent set of catchments and made publicly available in a single location. Further these are dynamic datasets subject to change which cannot support consistent repeatable analysis. Finally, the range of variables and catchment attributes is more limited than other largesample datasets such as CAMELS.
To address this data gap, we produced the CAMELS-GB dataset (Coxon et al., 2020). CAMELS-GB 90 collates river flows, catchment attributes and catchment boundaries from the NRFA together with a suite of new meteorological timeseries and catchment attributes for 671 catchments across Great Britain. In the following sections we describe the key objectives behind CAMELS-GB and how they have shaped the content of the dataset. We also provide a comprehensive description of all data contained within CAMELS-GB including 1) its source data, 2) how the timeseries and attributes were 95 produced and 3) a discussion of the associated limitations.

Objectives
CAMELS (Catchment Attributes and MEteorology for Large-sample Studies) began as an initiative to provide hydro-meteorological timeseries (Newman et al., 2015) and catchment attributes covering climatic indices, hydrologic signatures, land cover, soil and geology (Addor et al., 2017) for the 100 contiguous United States. Since then, the dataset has been used widely in other studies (e.g. Gnann et al., 2019;Pool et al., 2019;Tyralis et al., 2019) and has provided the framework for the production of similar datasets. CAMELS for Chile (CAMELS-CL, Alvarez-Garreton et al., 2018) was released and CAMELS datasets for other countries are in production (Brazil and Australia). While each CAMELS dataset has unique features (for example CAMELS-CL provides snow water 105 equivalent estimates and CAMELS-GB characterises uncertainties in streamflow timeseries), all the CAMELS datasets consistently apply the same core objective; make hydrometeorological time series and landscape attributes for a large-sample of catchments publicly available. They strive to use the same open-source code, variable names and datasets in order to increase the comparability and reproducibility of hydrological studies. In creating the CAMELS-GB dataset, we wanted to build on 110 the successful CAMELS blueprint to provide a large-sample catchment dataset for Great Britain based on four core objectives.
Firstly, we wanted to build on the wealth of data already available for GB catchments but synthesize the diverse range of data into a single, consistent, up-to-date dataset. The UK has a rich history of leading research in catchment hydrology and integrating large samples of data for many catchments. 115 For example, the Flood Studies Report (NERC, 1975) extracted high rainfall events, peak flows and catchment characteristics for 138 catchments to support flood estimation using catchment characteristics. The UK NRFA contains a wealth of data (including flow timeseries, catchment attributes, catchment masks) for the UK gauging station network which contains approximately 1,500 gauging stations as summarised in the UK Hydrometric Register (Marsh and Hannaford, 2008). 120 Where possible, we have made use of the existing data available on the NRFA in CAMELS-GB to ensure consistency and to avoid duplicating efforts. We also build on these existing datasets by providing new catchment attributes and timeseries that are currently not available on the NRFA (e.g. potential-evapotranspiration, temperature, soils, and human impacts).
Secondly, we wanted to provide a large-sample catchment dataset for Great Britain based on 125 information that i) are sufficiently detailed to enable the exploration of hydrological processes at the catchment scale, ii) are well documented (ideally in open-access peer-reviewed journals), iii) rely on state-of-the-art methods and iv) include recent observations. Consequently, some catchment attributes currently available on the NRFA have been re-calculated for CAMELS-GB as better quality or higher spatial resolution datasets are now available (e.g. to derive land cover and hydrogeological attributes). 130 This also means that we have primarily used the best available national datasets for the derivation of the catchment timeseries and attributes. These timeseries and attributes can be compared at a later stage to estimates to be derived from global datasets.
Thirdly, we wanted to provide qualitative and quantitative estimates of the limitations/uncertainties of the data provided in CAMELS-GB. Characterising data uncertainties is crucial as different data 135 collection techniques or quality standards can bias comparisons between catchments. By providing quantitative estimates of uncertainty (including the first set of national discharge uncertainty estimates), we hope to raise awareness and encourage users of the dataset to consider these uncertainties in their analyses.
Finally, where possible, we have ensured that the underlying datasets (such as gridded geophysical 140 and meteorological data) are publicly available to allow reproducibility and reusability.

Catchments
The catchments included in the CAMELS-GB dataset were selected from the UK NRFA Service Level Agreement (SLA) Network. Approximately half of the NRFA gauging stations are designated as SLA stations in collaboration with measuring authorities (as described in Dixon et al., 2013;145 Hannaford, 2004), embracing catchments which are considered to contribute most to the overall strategic utility of the gauging network. Selection criteria include hydrometric performance, representativeness of the catchment, length of record and degree of artificial disturbance to the natural flow regime. The flow records for these SLA stations are subject to an additional level of validation on the NRFA and are also used to calculate performance metrics that quantify completeness and 150 quality (see the methods and metrics outlined in Dixon et al., 2013 andDixon, 2014). This process focuses on the credibility of flows in the extreme ranges and the need to maintain sensibly complete time series, thus providing good quality and long time series for CAMELS-GB. All gauges from the UK SLA network are included in CAMELS-GB except catchments from Northern Ireland (due to a lack of consistent meteorological datasets across the UK) and two gauges where no 155 suitable surface area catchment could be derived (e.g. a groundwater spring for which surface catchment area is not hydrologically relevant). This results in a total of 671 catchments (includes nested catchmentssee Supplement Fig S1) covering a wide range of climatic and hydrologic diversity across GB that is representative of the wider gauging network (see Supplement Fig S21 and S3 for a comparison of key attributes for the CAMELS-GB catchments and all GB gauged 160 catchments).
In keeping with the CAMELS-CL dataset (Alvarez-Garreton et al., 2018), we chose to include both non-impacted and human impacted catchments in the dataset complemented with catchment attributes on the size and type of human impacts these catchments experience. Human impacted catchments are provided to support the current IAHS Panta Rhei decade which is focused on how the water cycle is 165 impacted by human activities (McMillan et al., 2016;Montanari et al., 2013) and also enable national scale hydrological modelling and analyses across catchments that are impacted by reservoirs, abstractions and land use change.

Catchment Masks
Catchment masks are provided in the dataset to allow other users to create their own catchment hydro-170 meteorological timeseries and attributes from gridded datasets not used in this study. The catchment masks were derived from the UK Centre for Ecology & Hydrology (CEH)'s 50m Integrated Hydrological Digital Terrain Model (IHDTM; Morris and Flavin, 1990) and a set of 50m flow direction grids. The flow direction grids are based on a Digital Elevation Model and contours from the UK Ordnance Survey Land-Form Panorama dataset (now withdrawn and superseded by OS 175 Terrain 50) and hydrologically corrected by "burning in" rivers using CEH's 1:50K digital river network (Moore et al., 2000). The catchment boundaries were created using bespoke code for identifying all IHDTM cells upstream of the most appropriate grid cell to represent the gauging station location and generating a meaningful "real-world" boundary around these cells. In a few cases, where the topographical data makes automated definition difficult, catchment masks were manually 180 derived. Catchment masks are provided as shapefiles in the OSGB 1936 co-ordinate system (British National Grid).
ASCII files were generated from the shapefiles by converting the shapefile onto a 50m raster grid and then exporting the rasters to individual ascii files. These files are used to calculate all catchment averaged time series and attributes in CAMELS-GB. To calculate the catchment average 185 timeseries/attribute for each dataset, the 50m grid cells in each catchment mask were assigned a value from the respective dataset grid cell (determined by which dataset grid cell the lower left hand corner of the mask grid cell lay within) and an arithmetic mean of these values were calculated (unless specified otherwise). This ensures a weighted average is calculated that accounts for the differences in grid cell sizes between the catchment mask (on a 50m grid) and any other datasets (often on a 1km 190 grid). This is particularly important for smaller catchments in areas of highly variable data.
It is important for users to note that as the topographical boundaries are used throughout the study to quantify the hydrometeorological timeseries and attributes, this could mean significant errors where the catchment area is poorly defined.

195
Daily meteorological and hydrological time series data are provided for the 671 CAMELS-GB catchments including flow, rainfall, potential evapotranspiration, temperature, short-wave radiation, long-wave radiation, specific humidity and wind speed (summarised in Table 1). These datasets were chosen for inclusion in CAMELS-GB to cover the common forcing and evaluation data needed for catchment hydrological modelling, to allow users to derive different estimates of potential 200 evapotranspiration and to provide the key hydro-meteorological data for catchment characterisation.
Hydro-meteorological timeseries data for the 671 catchments were obtained from a number of datasets for a 45 year time period from the 1 st October 1970 -30 th September 2015. These long time series enable the dataset's use in trend-analysis, provide a valuable dataset for model forcing and evaluation and ensures the robust calculation of hydro-climatic signatures. These long time series 205 also cover a wide range of nationally important climatic events such as the 1976 drought and 2007 floods (see summaries of UK drought and flood episodes for a more extensive review including Folland et al., 2015;Marsh et al., 2007;Stevens et al., 2016). From previous analyses, it is important to note that there are key known non-stationarities over this period in hydro-meteorological data and human activity (see for example Hannaford and Marsh, 2006) for GB. For example, seasonal changes 210 in precipitation have been well documented (Jenkins et al., 2009) and linked to changes in river flow (Hannaford and Buys, 2012;Harrigan et al., 2018).

Meteorological Timeseries
Meteorological timeseries were derived from high-quality national gridded products chosen for their high spatial resolution (1km 2 ), long time series availability and basis on UK observational networks. 215 For each of the meteorological datasets, daily time series of catchment areal averages were calculated using the catchment masks and methods described in Section 3. These timeseries are available for all CAMELS-GB catchments with no missing data.
Daily rainfall timeseries were derived from the CEH Gridded Estimates of Areal Rainfall dataset (CEH-GEAR) (Keller et al., 2015;Tanguy et al., 2016). This dataset consists of 1km 2 gridded 220 estimates of daily rainfall for Great Britain and Northern Ireland from 1 st January 1961 -31 st December 2015. The daily rainfall grids are derived using natural neighbour interpolation of a national database of quality-controlled, observed precipitations from the Met Office UK rain gauge network. It should be noted that the rainfall timeseries available in CAMELS-GB use the same underlying data but are not identical to catchment average rainfall series available from the NRFA 225 which are derived using only 1km grid cells with >50% of their area within the catchment boundary.
Daily meteorological timeseries were derived from the Climate Hydrology and Ecology research Support System meteorology dataset (CHESS-met; Robinson et al., 2017a). The CHESS-met dataset consists of daily 1km 2 gridded estimates for Great Britain from 1 st January 1961 -31 st December 2015 and includes several meteorological variables derived from observational data (see Table 1). 230 CHESS-met was derived from the observation-based MORECS, which is a 40 km resolution gridded dataset, derived by interpolating daily station data (Hough and Jones, 1997;Thompson et al., 1981).
The CHESS-met variables are obtained by downscaling MORECS variables to 1 km resolution and adjusting for local topography using lapse rates, modelled wind speeds and empirical relationships. CHESS-met air temperature and wind speed were directly downscaled from MORECS, specific 235 humidity was calculated from MORECS vapour pressure, downward short-wave radiation was calculated from MORECS sunshine hours while long-wave radiation was calculated from the downscaled temperature, vapour pressure and sunshine hours (see Robinson et al 2017b for details).
Daily potential evapotranspiration timeseries were derived from the Climate Hydrology and Ecology research Support System Potential Evapotranspiration dataset (CHESS-PE; Robinson et al., 2016). 240 The CHESS-PE dataset consists of daily 1km 2 gridded estimates of potential-evapotranspiration for Great Britain from 1 st January 1961 -31 st December 2015. Potential evapotranspiration is calculated using the Penman-Monteith equation and CHESS-met datasets (see Robinson et al., 2017b). In recognition of the uncertainty in PET estimates, we provide two estimates of potential evapotranspiration available from CHESS-PE. The first estimate (PET) is calculated using the 245 Penman-Monteith equation for FAO-defined well-watered grass (Allen et al., 1998) and is used to calculate all subsequent PET catchment attributes provided in CAMELS-GB. This estimate only accounts for transpiration and doesn't allow for canopy interception. The second estimate (PETI) uses the same meteorological data and the Penman-Monteith equation for well-watered grass but a correction is added for interception on days where rainfall has occurred (Robinson et al., 2017b). The 250 seasonal differences between these two data products can be seen in Figure S120b (supplementary information). Generally, the PETI estimate with the interception correction is higher because interception is a more effective flux than transpiration under the same meteorological conditions. CHESS PETI can be between 5%-25% higher than CHESS PET at the grid-box level, whereas at a regional level, CHESS PETI is 7% higher than PET in England and 11% higher than PET in Scotland 255 overall (Robinson et al., 2017b). In comparison to other PET products commonly used in GB, the CHESS PETI estimate is similar to grass-only MORECS (the United Kingdom Meteorological Office rainfall and evaporation calculation system; Hough and Jones, 1997) which has its own interception correction. It is important to note that the effect of seasonal land cover is not accounted for in the CHESS-PE products, this means that for arable agriculture which may have bare soil for part of the 260 year, or deciduous trees which lose leaves and thus reduce both transpiration and interception, the potential evapotranspiration could be lower during winter than is estimated here. This leads to a varying difference between the PET and PETI of grass and other land cover types throughout the year (Beven, 1979).

265
Daily streamflow data for the 671 gauges were obtained from the UK NRFA on the 27 th March 2019 using the NRFA API (https://nrfaapps.ceh.ac.uk/nrfa/nrfa-api.html, last access 11 December 2019). This data is collected by measuring authorities including the Environment Agency (EA), Natural Resources Wales (NRW) and Scottish Environmental Protection Agency (SEPA) and then quality controlled, on an ongoing annual cycle, before being uploaded to the NRFA site. It is important to 270 note that, on occasion, these flow timeseries are reprocessed when a rating curve is revised (for example) and so there may be differences between the flow timeseries on the NRFA website and contained in CAMELS-GB. If users wish to extend the timeseries beyond that available in CAMELS-GB, we suggest downloading and using the extended flow timeseries available from the NRFA website and re-calculating the hydrological signatures using the code we have archived. Data are 275 provided in m 3 s -1 and mm day -1 , and calculated using catchment areas derived from the catchment boundaries described in Section 4. 6 Catchment Attributes

Location, Area and Topographic Data
Catchment attributes describing the location and topography were extracted for each catchment from the NRFA (see Table 2). Catchment areas are calculated from the catchment masks described in 290 Section 4. Catchment elevation is extracted from CEH's 50m Integrated Hydrological Digital Terrain Model and the minimum and, mean, maximum catchment elevation within the catchment mask is provided alongside different percentiles (10 th , 50 th and 90 th ). On occasion minimum elevation may differ slightly from the gauge elevation attribute. The latter are as reported to the NRFA by the measuring authorities and derived in a variety of ways with different levels of accuracy. Furthermore 295 some may refer to the bank top, the gauge minimum, or a local datum. The minimum elevation attribute provides a more consistent metric (though itself limited in accuracy due to the 50m grid representation). Mean elevation and meanMean drainage path slope areis also provided from precomputed grids.. This catchment attribute was developed for the Flood Estimation Handbook (Bayliss, 1999). The mean drainage path slope and provides an index of overall catchment steepness 300 by calculating the mean of all inter-nodal slopes from the IHDTM for the catchment. For two catchments (18011 and 26006) where automatic derivation of the catchment boundary from the IHDTM for the gauge location was not possible and catchment masks were manually derived, no appropriate pre-computed values for the mean elevation or mean drainage path slope was available.

305
Climatic indices were derived using the catchment daily rainfall, potential evapotranspiration and temperature time series described in section 5.1 (see Table 2). The Penman-Monteith formulation without correction for interception is used to calculate all PET catchment attributes provided in CAMELS-GB as it has more consistency with other global and national PET products. To provide consistency with previous CAMELS datasets, we compute the same climatic indices for all 310 catchments in CAMELS-GB. However, it is important to note that in CAMELS-GB climatic indices are calculated for the full meteorological timeseries available in CAMELS-GB (water years from 1 st Oct 1970 to 30 th Sept 2015), whereas CAMELS and CAMELS-CL both use the water years from 1990 to 2009. The meteorological timeseries and code (https://github.com/naddor/camels, last access: 11 December 2019) are provided for users to calculate indices over different time periods if required. 315

Hydrologic Signatures
Hydrologic signatures were derived using the catchment daily discharge and rainfall time series described in section 5.1 and 5.2 (see Table 2). To provide consistency with the previous CAMELS datasets, we compute the same hydrologic signatures for all catchments in CAMELS-GB but add an additional formulation of baseflow index developed by the UK Centre for Ecology & Hydrology and 320 commonly used in Great Britain (Gustard et al., 1992; see Appendix A and Figure S10a). Hydrologic signatures are calculated for the flow timeseries available during water years from 1st Oct 1970 to 30th Sept 2015 (previous CAMELS datasets calculated these metrics during water years from 1990 to 2009) using code available on github (https://github.com/naddor/camels, last access: 11 December 2019). We advise users to take the length of the flow timeseries and percentage of missing data 325 (available in the hydrometry catchment attributessee section 6.7) into account when comparing hydrologic signatures across catchments.

Land Cover Attributes
Land cover attributes for each catchment were derived from the UK Land Cover Map 2015 (LCM2015) produced by CEH (Rowland et al., 2017). While other land cover maps are available 330 from CEH for 1990, 2000 and 2007, attributes are only provided for LCM2015 as different methods have been used to derive each of the land cover maps preventing straightforward analysis of changes in land cover over time. LCM2015 was chosen as it contains the most up-to-date data and methodology used to derive the land cover. LCM2015 uses a random forest classification of Landsat-8 satellite images based on the Joint Nature Conservation Committee (JNCC) Broad Habitats, 335 encompassing the range of UK habitats.
In this study, the 1km percentage target class is used from the LCM2015 products, consisting of a 1km raster with 21 bands relating to the percentage cover value of different target classes that represent Broad Habitats. This is a significant number of land cover classes and so the 21 target classes were mapped to eight land cover classes; deciduous woodland, evergreen woodland, grass and 340 pasture, shrubs, crops, suburban and urban, inland water, bare soil and rocks (see Appendix B). These are the same as the eight land cover classes used when running the JULES model with the CHESS meteorological driving data, and so provide consistency with other national scale efforts across Great Britain (Best et al., 2011;Blyth et al., 2019;Clark et al., 2011). For each catchment, the percentage of the catchment covered by each of the eight land cover types was calculated and is provided in 345 CAMELS-GB, alongside the most dominant land cover type (see Table 2).
Key limitations of this dataset are that the land cover attributes reflect a snapshot of the land cover in time and are subject to uncertainties in the Landsat-8 satellite images and the random forest classification. It is important to note that the land cover attributes provided in CAMELS-GB are different to those provided on the NRFA website which use LCM2000 and different land use 350 groupings.

Soil Attributes
Soil attributes for each catchment were derived from the European Soil Database Derived Data product (Hiederer, 2013a(Hiederer, , 2013b As this dataset only characterises the top soil layers (up to 1.3m), we also used the Pelletier et al., (2016) modelled soil depth dataset to give an indication of the depth to unweathered bedrock 360 extending up to 50m depth. Soil attributes for depth available to roots, percentage sand, silt and clay content, organic carbon content, bulk density and total available water content were calculated from the ESDB. We additionally estimated the saturated hydraulic conductivity and porosity (saturated volumetric water content) using two pedo-transfer functions, with the aim of providing one estimate consistent with CAMELS and a best estimate for European soil types. These were, (1) the widely-365 applied regressions based on sand and clay fractions first proposed by Cosby et al., (1984) based on soil samples across the United States, and (2) the HYPRES continuous pedotransfer functions using silt and clay fractions, bulk density and organic matter content developed using a large database of European soils (Wösten et al., 1999(Wösten et al., , 2001Wösten, 2000) (see Appendix C for equations).
To estimate average values of all soil properties with depth, we calculated a weighted mean of the 370 topsoil and subsoil data for each 1km grid cell. Weights were assigned based on the topsoil/subsoil proportion of the overall soil depth for that cell. Catchment average soil properties were calculated by taking the arithmetic mean (or harmonic mean for saturated hydraulic conductivity as advised in Samaniego et al., 2010) of all 1km grid cells that fell within the catchment boundaries. To give an indication of the distribution of soil properties across the catchment, the 5th, 50th and 95th percentile 375 values of all grid cell values falling within the catchment boundaries was also calculated for all soil attributes apart from percentage sand, silt and clay. There were some grid cells where no soil data was available. Rather than set default values for these grid cells, we chose to exclude them from the calculations of catchment-average properties and provide the percentage of no-data cells within a catchment as an indication of the data availability of the catchment-average properties. 380 There are some key limitations associated with these datasets. Firstly, the soils information given on a 1km grid is only representative of the dominant soil typological class within that area. This means that much of the soil information is not represented in the soil maps, and the variation of soil properties within the 1km grid is lost. The high spatial heterogeneity of soils data means that correlations between soil property values given in the soil product and ground soil measurements are 385 likely to be low (Hiederer, 2013a(Hiederer, , 2013b. Secondly, as can be seen from Figure S120c-d in the supplement, there are large uncertainties relating to the choice of pedotransfer function. Care should be taken when interpreting results for saturated hydraulic conductivity, as the HYPRES equation is relatively inaccurate with a low R2 value of 0.19, and application of a single continuous pedotransfer function may result in poor results for some soil types (Wösten et al., 2001). Finally, it is important to 390 be aware that measured soils data wereas unavailable for some urban areas including London, and these areas had been gap-filled (Hiederer, 2013a(Hiederer, , 2013b).

Hydrogeological Attributes
Hydrogeological attributes for each catchment were derived from the UK bedrock hydrogeological map (BGS, 2019) and a new superficial deposits productivity map, both developed by the British 395 Geological Survey. The UK bedrock hydrogeological map is an open source dataset that provides detailed information (at 1:625,000 scale) on the aquifer potential based on an attribution of lithology with seven classes of primary and secondary permeability and productivity (see Appendix D). The superficial deposits productivity map is a new dataset of similarly attributed superficial deposits aquifer potential across Great Britain (at 1:625,000 scale). These two datasets were chosen as they are 400 the only two spatially continuous, consistently attributed hydrogeological maps of the bedrock and superficial deposits at the national scale for GB.
These two datasets were combined by superimposing the superficial deposits layer on top of the bedrock layer to provide catchment attributes for CAMELS-GB that characterise the uppermost geological layer (i.e. superficial deposits where present and bedrock where superficial deposits are 405 absent). Combining the two datasets gave a total of nine hydrogeological productivity classes (see Appendix D). For each catchment, the percentage of the nine hydrogeological classes was calculated and is provided in CAMELS-GB (see Table 2). These nine classes indicate the influence of hydrogeology on river flow behaviour and describe the proportion of the catchment covered by deposits of high, moderate or low productivity and whether this is predominantly via fracture or 410 intergranular flow (see Table 2). Such classifications have previously been used to enable correlations between catchment hydrogeology and measures of baseflow (Bloomfield et al., 2009).
Users should be aware that the aquifer productivity dataset is heuristic, based on hydrogeological inference that are based on mapped lithologies rather than on statistical analysis of borehole yields. It can be used for comparison between catchments at the regional to national scales. It should not be 415 used at the sub-catchment scale where more refined hydrogeological information would be required to understand groundwater-surface water interactions. The hydrogeological attributes provided in CAMELS-GB will differ to those available on the NRFA website as CAMELS-GB uses the latest geological data.

420
Several attributes are provided in CAMELS-GB describing the gauging station type (i.e the type of weir, structure or measurement device used to measure flows) as listed on the NRFA, period of flow data available, gauging station discharge uncertainty and channel characteristics such as bankfull (see Table 2). The catchment attributes for discharge uncertainty are described in more detail below.

425
Discharge uncertainty estimates for CAMELS-GB were calculated from a large data set of rating curves and stage-discharge measurements using a generalized framework designed to estimate placespecific discharge uncertainties outlined in Coxon et al, (2015). This framework estimates discharge uncertainties using a nonparametric locally weighted regression (LOWESS)., where s Subsets of the stage-discharge data contained within a moving window are used to calculate the mean and variance 430 at every stage point, which then define the LOWESS fitted rating curve and discharge uncertainty, respectively. Stage and discharge gauging uncertainties are incorporated into the framework by randomly sampling from estimated measurement error distributions to fit multiple LOWESS curves and then combining the multiple fitted LOWESS curves and variances in a Gaussian Mixture Model. Time-varying discharge uncertainties are accounted for by an automatic procedure where differences 435 in historical rating curves are used to separate the stage-discharge rating data into subsets for which discharge uncertainty is estimated separately. The framework has been shown to provide robust discharge uncertainty estimates for 500 gauging stations across England and Wales (see Coxon et al., 2015 for more details).
For CAMELS-GB we extended the application of the framework to Scottish gauging stations to 440 provide discharge uncertainty estimates across Great Britain. Discharge uncertainty estimates for CAMELS-GB catchments are provided for several flow percentiles (Q95, Q75, Q50, Q25, Q5 and Q1 derived from the flow timeseries provided in CAMELS-GB described in Section 5.2) for the most recent rating curve to allow users to evaluate discharge uncertainty across the flow range. The upper and lower bound of the discharge uncertainty prediction interval is provided as a percentage of the 445 flow percentile for each catchment and flow percentile where available. In total discharge uncertainty estimates are available for 503 (75%) CAMELS-GB gauges. As the method is data based, the rating curve and its uncertainty interval cannot be computed for gauging stations where there are fewer than 20 stage-discharge measurements, or for flows above (below) the highest (lowest) stage-discharge measurement. This means that for some (or all) flow percentiles (particularly Q95 and Q1) there may 450 be no discharge uncertainty estimate as indicated by 'NaN'. There are 45 stations where stagedischarge data were available, but discharge uncertainty estimates are not provided as the resulting uncertainty bounds were deemed to not accurately reflect the discharge uncertainty at that gauging station or because there was no sensible relationship between stage and discharge.
Users are advised that the CAMELS-GB discharge uncertainty estimates (1) are dependent on the 455 types of error included in and underlying assumptions of the discharge uncertainty estimation method (see Kiang et al., 2018 for a comparison of seven discharge uncertainty estimation methods) and (2) may not be applicable to the whole flow timeseries (as they cover the most recent rating curve) or for stations where flow is measured directly (i.e. at ultrasonic or electromagnetic stations).

460
Providing information on the impact of humans in each catchment is a vital part of CAMELS-GB. To account for the degree of human intervention in each catchment we compiled data on reservoirs, abstraction and discharge returns provided by national agencies. We focused on providing quantitative data of human impacts in CAMELS-GB, however it is important to note that additional datasets are available that qualitatively characterise human impacts in GB including the Factors Affecting Runoff 465 (FAR) codes available from the National River Flow Archive.

Benchmark Catchments
The UK Benchmark Network consists of 146 gauging stations that have been identified by the NRFA as suitable for the identification and interpretation of long-term hydrological variability and change against several criteria including length of record, quality of flow data, known impacts within the 470 catchment and expert consultation (for a full description see Harrigan et al, 2018). Consequently, these gauging stations can be treated as relatively 'near-natural' and indicate that the influence of humans on the flow regimes of these catchments is modest. It is important to note that some impacts were tolerated where they were deemed to have a modest overall influence on flows and known to be stable over time. This was to ensure coverage in regions such as the heavily impacted south and east 475 of GB. Theseis data aires available for all the CAMELS-GB catchments and data is provided for each catchment on whether it is part of the UK Benchmark Network or not.

Abstraction and Discharges
The abstraction data consists of monthly abstraction data from January 1999 -December 2014 that are reported by abstraction licence holders to the Environment Agency. These data are the actual 480 abstraction returns and represent the total volume of water removed by the licence holder for each month over the time period. A mean daily abstraction rate for all English catchments is provided in CAMELS-GB for groundwater and surface water sources. The monthly returns for each abstraction licence in the database were averaged to provide a mean monthly abstraction from 1999 -2014. All abstraction licences that fell within each catchment boundary (using the catchment masks outlined in 485 section 4) were then summed for surface water and groundwater abstractions respectively and converted into mm day -1 using catchment area. The mean daily abstraction rate is provided alongside attributes describing the use of the abstracted water (agriculture, amenities, environmental, industrial, energy or for water supply). The discharge data consists of daily discharges into water courses from water companies and other discharge permit holders reported to the Environment Agency from 1st 490 January 2005 -31st December 2015. To calculate a mean daily discharge rate for each catchment, the daily discharge data for each discharge record was averaged and then all discharge records that fell within the catchment boundary were summed and then converted into mm day -1 using catchment area.
There are several important caveats associated with these data. Firstly, these data are only available 495 for England. Consequently, there are many catchments where no data are available (identified by 'NaN') and only a proportion of the abstractions may have been accounted for catchments which lie on the border of England/Wales or England/Scotland. Furthermore, not all licence types/holders are required to submit records to the Environment Agency, therefore this is not the full picture of human intervention within each catchment. Secondly, the abstractions and discharges data cover different 500 time periods. Thirdly, the topographical catchment mask was used to define which abstraction returns were included in each catchment. Groundwater abstractions that lie within the topographical catchment may not have a direct impact on the catchment streamflow and instead may impact a neighbouring catchment that shares the same aquifer. Conversely, groundwater abstractions that lie outside the catchment could have an impact on the catchment streamflow. Fourthly, there is a large 505 inter-annual and intra annual variation in the abstraction and discharges data and their impacts will be different across the flow regime. Consequently, it is important that the mean abstraction totals are used as a guide to the degree of human intervention in each catchment rather than absolute totals of the abstraction for any given month. Finally, although 'abstractions' represent removed from surface water or groundwater sources, some of this water will be returned to catchment storages. The 510 discharge data provided accounts just for treated water from sewage treatment works and does not provide information on other water returns that may be fed back into catchment storages. As such, tThe mean totals for abstractions and discharges used here are a very broad guide that point to the possible influence of abstractions but do not quantify the net influence of these impacts on the actual flow regime. Other (less widely available) metrics have been applied in the UK which use modelling 515 approaches to assess the net impact of abstractions/discharges across the whole flow regime (for example the Low Flows Enterprise methodology; see also Hannaford et al. 2013).

Reservoirs
Reservoir attributes are derived from an open source UK reservoir inventory (Durant and Counsell, 2018) supplemented with information from SEPA's publicly available controlled reservoirs register. 520 The UK reservoir inventory includes reservoirs above 1,600 megalitre (ML) capacity, covering approximately 90% of the total reservoir storage in the UK. This dataset was collected from the Environment Agency through a Freedom of Information request, the UK Lakes Portal (CEH) and subsequent internet searches. It includes information on the location of the reservoir, its capacity, use and year the reservoir was built. To check the accuracy of this dataset, we cross-referenced the 525 reservoirs in the UK reservoir inventory with reservoirs in the Global Reservoir and Dam (GRanD v1.3) database (Lehner et al., 2011). While many of the reservoirs and their capacity data was consistent for reservoirs for England and Wales, many Scottish reservoirs contained in the GRanD database were not present in the UK reservoir inventory or reported very different storage capacities. This is likely due to the estimation of storage capacities of Scottish reservoirs in the UK reservoir 530 inventories (see Hughes et al., 2004) rather than actual storage capacities. Consequently, for reservoirs in Scotland, we used information from SEPA's publicly available controlled reservoirs register (http://map.sepa.org.uk/reservoirsfloodmap/Map.htm, last access: 11 December, 2019) including the reservoir name, location and storage capacity, and then supplemented this information with the year the reservoir was built and reservoir use by cross-referencing data from the UK reservoir 535 inventory (users should be aware that reservoir use and the year the reservoir was built were not available for every reservoir).
For CAMELS-GB several reservoir attributes are derived for each catchment by determining the reservoirs that lie within the catchment mask from the reservoir locations and then calculating (1) the number of reservoirs in each catchment, (2) their combined capacity, (3) the fraction of that capacity 540 that is used for hydroelectricity, navigation, drainage, water supply, flood storage and environmental purposes, and (4) the year when the first and last reservoir in the catchment was built.

Regional Variability in Catchment Characteristics
Figure 2 highlights some of the key catchment variables and in this section we discuss their regional variability (according to the regions in Figure 2a). Spatial maps of all catchment attributes can be 545 found in the supplementary information ( Figures S4-S11).
There are distinct regional differences in climate across GB (Figure 2b). Precipitation is typically higher in the west and north of GB corresponding with the areas of high elevation and prevailing winds from the west that bring significant rainfall. The wettest areas of the UK are in mountainous regions with a maximum of 9.6 mm day -1 (annual average of 3500 mm year -1 ) in the north-west. 550 Snow fractions are generally very low across Great Britain (median snow fraction of 0.01) except for catchments in the Cairngorm mountains in north-east Scotland where the fraction of precipitation falling as snow can reach 0.17 (see supplementary information, Figure S54e). Precipitation is lowest in the south and east of GB with a minimum of 1.5 mm day -1 in the east. In contrast, potential evapotranspiration (PET) is much less variable across GB with mean daily totals ranging from 1 to 555 1.5mm day -1 . PET is highest in the south (where temperatures are highest) and lowest in the north. Mean flow varies from 10 to 0.09 mm day -1 and is typically higher in the north and west, reflecting the regional variability in precipitation and PET. This is also reflected in Figure 2c, where catchments in the north and west of GB tend to be wetter with higher runoff coefficients and catchments in the south and east are much drier with lower runoff coefficients. Figure 2c also shows that annual 560 precipitation totals exceed annual PET totals; the aridity index is below 1 for all catchments reflecting the temperate and humid climate of GB. It is important to note that these estimates are dependent on the underlying data. For example, there can be significant variability in the calculation of PET, depending on the methods and assumptions used (e.g. Tanguy et al., 2018) and here we have used a PET estimate where canopy interception is not accounted for. Interception is an important component 565 of the water cycle in GB, which experiences a large amount of low to moderate rainfall intensities (Blyth et al., 2019), thus using the CHESS PETI estimate instead would increase the aridity index above one in some locations.
There is also regional variability in baseflow index (the ratio of mean daily baseflow to daily discharge), which is typically higher in the south and east of GB and lower in the north-west. Some 570 of these differences can be attributed to regional aquifers that have high/moderate productivity which are more prevalent in the south-east, east and north-east (see Figure 2b).
From Figure 2c, it is notable that runoff deficits significantly exceed total potential evapotranspiration for many of the CAMELS-GB catchments in the south-eastthis could be due to water loss to regional aquifers, the issue of catchment areas not mapping onto the contributing area and/or due to 575 the choice of PET used (see above). There are also seven catchments where the runoff exceeds total rainfallthis could be due to water gains from regional aquifers, catchment areas not mapping onto the contributing area, inter-basin transfers, uncertainties in the rainfall and/or under-estimation of rainfall. Many of the widely-used hydrological models and analysis techniques will not be able to reproduce catchment water balances which are outside the water and energy limitations shown in Fig  580 2c, unless the models or analysis techniques are explicitly adapted to consider the sources of uncertainty, potential unmeasured groundwater flow pathways and/or human influences that we have noted. We encourage users of the data to consider whether the assumptions of their methods are consistent with the uncertainties we have documented.
Land cover and human modifications can also impact river flows. Crops and grassland tend to be the 585 dominant land cover for GB catchments, with crops typically the dominant land cover for catchments in the east and grassland for catchments in the west (Figure 2d). There is also a higher percentage of catchments in the east which are dominated by urban land cover. The highest proportion ofLarge reservoir capacity reservoirs is concentrated in the more mountainous northern and western regions of GB, particularly in the North-EastWestern Scotland (Figure 2e). 590

Data Availability
The CAMELS-GB dataset (Coxon et al., 2020) detailed in this paper is freely available via the UK Centre for Ecology & Hydrology Environmental Information Data Centre (https://doi.org/10.5285/8344e4f3-d2ea-44f5-8afa-86d2987543a9). The data contain catchment masks, catchment time series and catchment attributes as described above. A full description of the 595 data format is provided in the supporting documentation available on the Environmental Information Data Centre.

Conclusions
This study introduces the first large sample, open-source catchment dataset for Great Britain, CAMELS-GB (Catchment Attributes and MEteorology for Large-sample Studies), consisting of 600 hydro-meteorological catchment timeseries, catchment attributes and catchment boundaries for 671 catchments. A comprehensive set of catchment attributes are quantified describing a range of catchment characteristics including topography, climate, hydrology, land cover, soils and hydrogeology. Importantly, we also derive attributes describing the level of human influence in each catchment and the first set of national discharge uncertainty estimates that quantify discharge 605 uncertainty across the flow range.
The dataset provides new opportunities to explore how different catchment characteristics control river flow behaviour, develop common frameworks for model evaluation and benchmarking at regional-national scales and analyse hydrologic variability across the UK. To ensure the reproducibility of the dataset, many of the codes and datasets are made available to users. 610 While a wealth of data is provided in CAMELS-GB, there are many opportunities to expand the dataset that were outside the scope of this study. Currently there are no plans to regularly update CAMELS-GB, however, In particular, future work will concentrate on 1) expanding the dataset to include higher resolution data (such as hourly rainfall e.g. Lewis et al., 2018, and flow timeseries) and datasets for the analysis of trends (such as changes in land cover over time), 2) improving the 615 comparability of CAMELS-GB with other CAMELS datasets by using common, global hydrometeorological and geophysical datasets to derive catchment timeseries and attributes, and 23) refining the characterisation of uncertainties in catchment attributes and forcing (particularly for rainfall data). We are also striving to increase the consistency among the CAMELS datasets (in terms of time series, catchment attributes, naming conventions and data format, see Addor et al., 2019), and 620 to create a dataset that is globally consistent. We anticipate that this will happen as part of a second phase, which will build upon the current first phase that is focussed on the release of national products, such as CAMELS-GB.
We estimated the saturated hydraulic conductivity and porosity (also referred to as maximum water 640 content, saturated water content, satiated water content) using two pedo-transfer functions.
The first was the widely-applied regressions based on sand and clay fractions first proposed by Cosby et al., (1984 Where is saturated hydraulic conductivity in cm hour -1 and is porosity in percent (m 3 m -3 ). Predictor variables are Sand ( ) and Clay ( ).
The second, was the HYPRES continuous pedotransfer functions using silt and clay fractions, bulk density and organic matter content (Wösten et al., 1999;Wösten, 2000):

Appendix D Hydrogeological classes
For CAMELS-GB, we combined the BGS Hydrogeology map and superficial deposits layer. The table below provides a summary of the different classes in each dataset and how these were amalgamated to form the nine classes used in CAMELS-GB. 660  Tables   Table 1 Summary table of  and Table  1Table 1 Thresholds for high/low flow frequency and duration were obtained from (Clausen and Biggs, 2000;Westerberg and McMillan, 2015) runoff_ratio runoff ratio, calculated as the ratio of mean daily discharge to mean daily precipitation -stream_elas streamflow precipitation elasticity (sensitivity of streamflow to changes in precipitation at the annual timescale, using the mean daily discharge as reference). See equation 7 in , with the last element being ̅ ̅ ⁄ not ̅ ̅ ⁄ -slope_fdc slope of the flow duration curve (between the log-transformed 33 rd and 66 th streamflow percentiles) (Yadav et al., 2007). There can be NAs in this metric when over a third of the flow time series are zeros (see zero_q_freq)