CAMELS-AUS: hydrometeorological time series and landscape attributes for 222 catchments in Australia

. This paper presents the Australian edition of the Catchment Attributes and Meteorology for Large-sample Studies (CAMELS) series of datasets. CAMELS-AUS (Australia) comprises data for 222 unregulated catchments, combining hydrometeorological time series (streamﬂow and 18 climatic variables) with 134 attributes related to geology, soil, topography, land cover, anthropogenic inﬂuence and hydroclimatology. The CAMELS-AUS catchments have been monitored for decades (more than 85 % have streamﬂow records longer than 40 years) and are relatively free of large-scale changes, such as signiﬁcant changes in land use. Rating curve uncertainty estimates are provided for most (75 %) of the catchments, and multiple atmospheric datasets are included, offering insights into forcing uncertainty. This dataset allows users globally to freely access catchment data drawn from Australia’s unique hydroclimatology, particularly notable for its large interannual variability. Combined with arid catchment data from the CAMELS datasets for the USA and Chile, CAMELS-AUS constitutes an unprecedented resource for the study of arid-zone hydrology. CAMELS-AUS is freely downloadable


Introduction
For some time, the ideals of "comparative hydrology" and "large-sample hydrology" have been advanced as complementary and necessary components of hydrology (e.g.Falkenmark and Chapman, 1989;Andréassian et al., 2006;Gupta et al., 2014).Alongside traditional hydrological studies, which may focus on a single catchment or possibly compare results among several catchments within a region, largesample studies aim to establish the generality of results and to test paradigms applicable on regional to global scales (e.g.McMahon et al., 1992;Peel et al., 2004;Kuentz et al., 2017;Ghiggi et al., 2019;Mathevet et al., 2020).Large samples of catchments are also insightful for certain tasks, such as prediction in ungauged basins (e.g.Pool et al., 2019;Kratzert et al., 2019b) or training and evaluation of machine learning algorithms (e.g.Kratzert et al., 2018;Shen, 2018;Kratzert et al., 2019a).Thus, large-sample studies are a growing compo-nent of recent hydrological research (see review by Addor et al., 2019).
However, issues of data availability and commensurability, which are endemic to environmental sciences, are exacerbated for large-sample hydrology.Large samples may cross jurisdictions or data providers or require harmonisation across different data formats or nomenclatures (e.g.hydrometric-data quality codes and flags) and are more likely to suffer from spatial gaps due to different data sharing policies of water agencies (Viglione et al., 2010;Addor et al., 2019).Thus, the importance of FAIR data (findable, accessible, interoperable and reusable; see Wilkinson et al., 2016 andthe Open Data Charter, 2015) in hydrology is amplified in large-sample hydrology, and there is a clear need for open publication of datasets wherever possible to allow equal access.Such policies also encourage hydrologists to work across boundaries -an important ideal since the spatial dis-tribution of hydrologists globally reflects neither the spread of interesting hydrological environs nor the pressing need for hydrological insights to inform policy.
Responding to these needs, multiple recent projects have publicly released large-sample hydrological datasets (e.g.Arsenault et al., 2016;Do et al., 2018;Lin et al., 2019;Linke et al., 2019;Olarinoye et al., 2020).Here we contribute to one such ongoing project -the Catchment Attributes and Meteorology for Large-sample Studies, or CAMELS, project.Originally launched for the United States (Newman et al., 2015;Addor et al., 2017), CAMELS datasets now exist for Chile (Alvarez Garreton et al., 2018), Great Britain (Coxon et al., 2020) and Brazil (Chagas et al., 2020).The defining features of a CAMELS dataset are that they complement data on streamflow (which are often publicly available) with other data types: (i) pre-processed climatic data for each catchment, such as would be required to run a hydrological model, and (ii) catchment attributes which characterise various aspects of the catchment without the need for field visitation (impractical for large samples).They also support download of the entire dataset in contrast to agency websites, which may only support one-at-a-time download (if at all).Lastly, whereas government agencies reserve the right to retrospectively re-process their streamflow data (e.g.due to rating curve changes), CAMELS datasets enable repeatability because a given CAMELS release effectively "freezes" the data, creating a consistent version that is available indefinitely via a persistent digital object identifier (DOI).
The present dataset focusses on the continent of Australia, including the southern state of Tasmania but excluding other Australian territories.Australia is the world's sixth-largest country (approximately 7.7 × 10 6 km 2 ) and is comparable in size to the conterminous USA or Europe, but the hydrologically active parts of the country tend to be limited to coastal regions, with the interior being semi-arid or arid (Fig. 1; see also Knoben et al., 2018).Thus, dense gauging of streamflow covers only a small proportion of the total area, with the remaining areas providing few gauged locations.While sparsely gauged, the dry parts of Australia provide interesting arid-zone catchment examples, many of which are included in the CAMELS-AUS, the Australian edition, dataset.In addition to arid regions, Australia includes northern areas with tropical climate and southern areas with temperate climate.
This paper is structured as follows.In Sect. 2 we describe the rationale for the dataset, including considerations of why Australian hydroclimate is interesting and relevant to hydrologists globally; and factors shaping the dataset, including local data availability.Section 3 provides a technical description of the dataset and forms the bulk of the paper.Sections 4 and 5 explain CAMELS-AUS data availability and conclude the paper, respectively.

Rationale
This section lays out the motivations underpinning the release of this dataset for Australia.It also outlines why CAMELS-AUS takes its present form, including two chief aspects: catchment selection and inclusion of local versus global datasets.
2.1 Motivation: Australian hydroclimate and its place in the study of arid-zone hydrology and hydrology under climatic change Every region on earth is unique and has characteristics of interest for hydrological study.Within Australia and for CAMELS-AUS, three characteristics are noted here.Firstly, Australia contains many arid landscapes, and considerable advances in arid-zone hydrology have been made there (e.g.Western et al., 2020).CAMELS-AUS contains more than 20 arid-zone rivers (depending on definition, but see Fig. 1), so the publication of the dataset opens the study of these rivers to a global pool of scientists.Added together with included arid-zone rivers in the USA and Chile (Addor et al., 2017;Alvarez-Garreton et al., 2018), the CAMELS datasets together provide a significant sample for the study of arid-zone hydrology.
Secondly, Australian catchments tend to have lower rainfall-to-runoff ratios, linked to higher evaporative demand.As shown in Fig. 2d, the median rainfall-runoff ratio among Australian catchments is approximately 0.25, compared to approximately 0.4 for the rest of the world.Australian catchments are often water-limited (at least on a seasonal basis), providing different modelling challenges to energy-limited catchments from higher latitudes.
Finally, a notable characteristic of Australian hydroclimatology is its tendency for multi-year spells of climatic anomalies of larger magnitude compared to most other regions of the world (Peel et al., 2005), due partly to the strong influence of climate teleconnections such as the El Niño-Southern Oscillation (ENSO; e.g.Peel et al., 2002;Verdon-Kidd and Kiem, 2009).Recent severe droughts have affected south-eastern Australia, including the 13-year Millennium Drought (Van Dijk et al., 2014), which provided the opportunity for knowledge sharing with other droughtprone regions (Aghakouchak et al., 2014) and supplied many case studies of hydrological model failure (i.e. the high bias and low model performance in differential split sample testing reported by e.g.Saft et al., 2016), which are under ongoing investigation (e.g.Fowler et al., 2020b).In the context of providing credible runoff projections, case studies of long droughts are the only means by which hydrologists can test hypotheses regarding how catchments respond physically to the onset of drier conditions, including aspects of long "memory" (e.g.Fowler et al., 2020b) and potential to shift behaviour, possibly in a quasi-permanent fashion (e.g.Peterson et al., 2021).Thus, it is hoped that the public re- lease of datasets such as CAMELS-AUS may hasten scientific progress towards more defensible and robust hydrological models.

Context: hydrometeorological monitoring in Australia
Systematic climatic measurement in Australia extends back to the late 1800s (e.g.Ashcroft et al., 2014), with widespread streamflow gauging of headwater catchments commencing from the 1950s and '60s.Meteorological monitoring is the responsibility of a federal Bureau of Meteorology (BOM), but streamflow monitoring falls to the states and territories of Australia rather than the federal government (Skinner and Langford, 2013).Thus, Australian streamflow data have historically been dispersed between its six states and two territories (Fig. 1), and while quality control is relatively well established, methods and formats (e.g.quality codes and flags) are not consistent between states and territories.Since the 2000s this situation has partially been rectified after federal legislation required the BOM to collate data from the states under new "water information" powers (Vertessy, 2013).

Catchment choice: the Hydrologic Reference Stations dataset
Under its new responsibilities the BOM initiated several national hydrological projects, one of which is called the Hydrologic Reference Stations project (Turner et al., 2012).This project selected a large set of gauging stations, each on unregulated streams, to serve as a "platform to investigate longterm trends in water resource availability" (Turner et al., 2012(Turner et al., , p. 1555)).The project has a website for provision of streamflow data to the public (http://www.bom.gov.au/water/hrs/, last access: 29 July 2021).We adopted the Hydrologic Reference Stations as the basis for CAMELS-AUS for three reasons: -The selection criteria used by the BOM, including record length, lack of regulation and stationarity of anthropogenic influence (see Sect. 3.2), are consistent with the aim of the CAMELS project to provide highquality scientific data.
-Considerable effort has already been expended by the BOM to standardise and quality-check the streamflow data, which was only possible via contacts with state agencies that are not necessarily available to academic authors (for an example, see BOM, 2020).It is logical to take advantage of this prior effort.
It is noted that this choice is not intended to limit future inclusion of a wider range of stations and catchments.We envisage that the Hydrologic Reference Stations may provide the nucleus for future versions of the CAMELS-AUS dataset, while the current selection provides a sensible and pragmatic starting point.The Hydrologic Reference Station dataset itself may be subject to future expansion, which would inform future CAMELS-AUS versions.Furthermore, whereas the Hydrologic Reference Stations project, by definition, sought catchments which are minimally disturbed (or at least having stationarity of anthropogenic influence), future versions could be more inclusive so as to cater for studies examining diverse anthropogenic influences including changes over time -an approach already taken by CAMELS-GB (Great Britain; Coxon et al., 2020) and CAMELS-BR (Brazil;Chagas et al., 2020).In summary, the current form of CAMELS-AUS should not be interpreted as setting a norm for future versions (or other datasets).

Local versus global datasets
A key choice in developing CAMELS-AUS was whether to use local or global datasets (or both) when extracting hydrometeorology time series and catchment attributes.On the one hand, global datasets are important to facilitate intercontinental comparisons.On the other hand, when local datasets are available, they are generally the highest-quality information that exists for a given region (e.g.Acharya et al., 2019).
With the advent of large-sample hydrology, it is now possible to conduct near-global studies using very large samples of catchments (e.g. over 2000 in Mathevet et al., 2020), and future studies might compose such large samples by combining continental-scale datasets like the various CAMELS.However, the lack of standardised approaches and sources between national large-sample datasets remains a key limitation of large-sample studies (Addor et al., 2019).
The approach followed by the CAMELS datasets so far is to use the best possible data available for each country, so national datasets have been prioritised over global datasets.In some cases, global datasets have been employed, for instance the Global Lithological Map (Hartmann and Moosdorf, 2012) in CAMELS datasets for the USA and Chile or the Multi-Source Weighted-Ensemble Precipitation (Beck et al., 2017) in CAMELS-CL.But overall, the best national data products were selected for each country, leveraging the knowledge of CAMELS creators.This enables global users, who may not be familiar with these national products, to benefit from this local knowledge.It also gives direct access to the best available data to users whose study focusses on catchments from a single country (see e.g.intercomparisons in Acharya et al., 2019).In keeping with this approach, the priority was given to national data products to produce CAMELS-AUS.
In parallel, efforts are ongoing to increase the consistency among the CAMELS datasets (in terms of data products used to derive the time series and catchment attributes and also naming conventions and data format; see Addor et al., 2019) in order to create a dataset that is globally consistent.This is part of a second phase, which will build upon the current phase, which is focussed on the release of national products, such as CAMELS-AUS.To contribute to this effort, we have supplied the CAMELS-AUS catchment boundaries and gauge locations.Because of these ongoing efforts, our expectation is that the data introduced here, derived from Australian sources, will in time be complemented by data derived from global datasets.

CAMELS-AUS dataset technical description
The previous section outlined key decisions made for CAMELS-AUS; i.e. it is based on the Hydrologic Reference Stations, and its data are derived from Australian rather than global sources.This section provides more detail and presents each aspect of the dataset in turn.Work not undertaken by the present authors (e.g. earlier efforts by the BOM for the Hydrologic Reference Stations project) is clearly marked.In many cases, sub-sections end with an "Included in dataset" section to clearly outline items in the online repository related to the sub-section text.
Before presenting the details, we note that the online repository of the dataset (Fowler et al., 2020a) includes the following: a file containing the overall attribute table containing all non-time-series data (see Tables 1, 3 and 4); -27 time series files, each containing data for all catchments for a given hydroclimatic variable (see Table 2); and extra files such as shapefiles and readme files as noted below.

Catchment selection rules
Given the decision (Sect.2.2 above) to base the CAMELS-AUS dataset on the BOM's Hydrologic Reference Stations, this sub-section summarises the process of catchment selection undertaken earlier by the BOM, as described in Turner et al. (2012).
-Initial selection: 246 potential stations were initially selected based on the three criteria of (i) record length (minimum of 1975 onwards), (ii) availability of data including historic rating curve information and (iii) lack of regulation by large dams.
-Invitation for stakeholders to suggest additional stations: BOM consulted with 70 stakeholders from federal, state and territory agencies and water authorities, who were given the opportunity to add new stations to the list.This enlarged the list to 362 stations.
-Targeted fact-finding: to elicit information about each candidate station and catchment, the relevant agencies were asked a series of questions about the catchments in their jurisdiction relating to both past and present practices.Topics included diversions, irrigation structures, upstream point source discharge, land clearing, forestry, urbanisation, fire and farm dams.
-Final selection: the final selection process considered all the above information.A good coverage of Australia's various hydroclimatic regions was desired, although this is inherently limited by the coverage of the gauging network.Where possible, only stations with < 5 % missing data and <10 % change in forest cover were selected.
The above process provided the first version of the Hydrologic Reference Stations, with a total of 221 catchments.
A subsequent update in 2015, which included a detailed review and update of streamflow data up to 2014 (BOM, 2020), resolved to retain all existing stations and add one more (ID 215207).Thus, the final number of stations is 222 (Fig. 1).

Included in dataset.
The following variables are provided in the CAMELS-AUS attribute table (see Table 1): station ID, station name (including river name and station name), drainage division and river region (out of 13 drainage divisions and 218 river regions across Australia).Unfortunately, information is not available about which catchments were included or excluded under the above rules.

Catchment boundaries
For all but 10 of the catchments, catchment boundaries were derived via flow path analysis (using Esri's Arc Hydro) of topographic data undertaken by the authors.The input data were (i) the post-processed and hydrologically enforced DEM of Gallant et al. (2012), which is derived from the 1 s (approximately 30 m) grid Shuttle Radar Topography Mission (SRTM) dataset, and (ii) the location of the streamflow gauges as provided by the BOM.The Arc Hydro analysis determines the apparent position of streams from the DEM data, and it was found that the published locations rarely fall precisely on these digital streamlines.The mismatch is unsurprising given that location data may be decades old, and significant figures may have been truncated with the passage of data between databases (or never reported in the first place).Also, the position of the digital streamline may or may not match reality, particularly in flat landscapes.To derive catchment areas, the BOM-published gauge locations were shifted to the nearest streamline with expected catchment area.This movement was generally less than 200 m.
As noted, this method was used for most catchments, with the following exceptions: -For the six largest catchments (A0030501, A0020101, G8140040, G9030250, 424002 and 424201A), this process was not undertaken due to excessive computational requirements.For context, the largest catchment is approximately the size of the United Kingdom (see Fig. 1).
-For a further four catchments (A2390519, A2390523, 307473 and 606185), the Arc Hydro process resulted in a catchment boundary that was inconsistent with the boundaries displayed on the Hydrologic Reference Station website.Although degraded for fast mapping, the website boundaries show the approximate position of the boundary as agreed with stakeholders and agencies who have local knowledge.Therefore, in cases of obvious mismatch, the Arc Hydro-derived boundaries were assumed to be in error.Despite the "blockiness" of the website boundaries, they were considered to be a better option for these four catchments.
For these 10 catchments a protocol was developed to read the website's .jsonfile to extract the boundary vertices.The website boundaries were then adopted.Note that more detail on the above considerations, including a selection of figures, is given in the dataset within the readme file README_CAMELS_AUS_Boundaries.pdf.
https://doi.org/10.5194/essd-13-3847-2021Earth Syst.Sci.Data, 13, 3847-3867, 2021 Included in dataset.The main inclusions are a point shapefile of adopted gauge locations and a polygon shapefile of adopted catchment areas.Further information included are a point shapefile of BOM-published gauge locations, polygon shapefile of website-mapped boundaries, and readme file explaining the above logic but in more detail and with fig-ures.As listed in Table 1, the CAMELS-AUS attribute table lists the coordinates of the catchment outlet and centroid, along with notes which expand on issues listed above, on a catchment-by-catchment basis.For p_seasonality see Eq. ( 14) in Woods (2009).    in the dataset.The upstream (i.e.entirely contained) catchments are clearly marked in the CAMELS-AUS attribute table (see Table 1).Catchments containing nested catchments are also marked.
Before moving on from considerations of spatial data, it is noted that (i) CAMELS-AUS does not come with a spatial layer for the river network, (ii) users may find the 15 s Hydrosheds River Network (http://www.hydrosheds.org/downloads, last access: 30 July 2021) or the BOM Geofabric v2 SH network (http://www.bom.gov.au/water/geofabric/download.shtml,last access: 30 July 2021) useful, and (iii) the reason these are not included in CAMELS-AUS is because of licensing concerns (for Hydrosheds) and file size concerns (for the Geofabric).
Included in dataset.The following variables are provided in the CAMELS-AUS attribute table (see Table 1): catchment area, map zone and three indicators related to nestedness (NestedStatus, NextStationDS, NumNestedWithin).

Streamflow data and uncertainty
Streamflow time series data are provided by the BOM in two variants: non-gap-filled, and gap-filled.The gap-filled variant is filled using the daily rainfall-runoff model GR4J (Perrin et al., 2003), but the BOM has not published further methodological details about calibration method, validation procedures or the specifics of the interpolation method.In addition to the streamflow data, the BOM also provides quality codes and flags.As mentioned in Sect.2.1, the quality codes and flags of each state of Australia are different, but the BOM has harmonised these to a common set (http://www.bom.gov.au/water/hrs/qc_doc.shtml, last access: 30 July 2021).For CAMELS-AUS, these data are supplied as follows.Firstly, summary statistics about period of record (start date, end date and proportion of missing data) are provided in the attribute table, as listed in Table 1.Regarding time series data (Table 2), each of the above three data types (gap-filled, nongap-filled, and quality codes and flags) are provided within CAMELS-AUS exactly as supplied by the BOM, except that they are presented as a single file across all catchments.In addition, since the units of the streamflow files are ML d −1 , whereas modelling studies typically use mm d −1 , CAMELS-AUS provides an additional streamflow time series file in mm d −1 .
Figure 3 shows that CAMELS-AUS stations are typically long-term gauges, with the shortest record being 29 years.All but 17 gauges commence by 1975 (in line with the selection rules in Sect.3.1), and all but 22 of the records contain data up until the cut-off date for this dataset, which is 31 December 2014.Thus, records longer than 40 years are typical (Fig. 3b). Figure 3a considers both the record extent and missing data to determine the overall data availability for different overlapping periods.The data availability for the periods starting in 1965 and 1970 are lower than the others, as expected given the remarks about record length.An increase in missing data post-1990 means that the data availability curves decrease slightly for the most recent period (dark blue).
Information about streamflow uncertainty is provided with CAMELS-AUS (Table 1) from an earlier study by McMahon and Peel (2019).McMahon and Peel (2019) examined available rating curve data for 166 of the 222 stations, developed rating curves based on Chebyshev polynomials, and estimated uncertainties using an approach which considered regression error and uncertainty in water level.The original authors post-processed their data to provide the following statistics (Table 3) for CAMELS-AUS: (i) number of separate rating curves considered for a given station (median value across all stations was 3); (ii) number of days considered across all curves (median value was ∼ 14 700 or ∼ 40 years); (iii) low, medium and high flow rates in mm d −1 (flow rates exceeded 90 %, 50 % and 10 % of the time over days considered by the curves); and (iv) 95 % confidence intervals around low, medium and high flow rates, expressed in percentage terms.However, for some stations considered by McMahon and Peel (2019) the above data are not supplied in full for the following reasons: (a) the percentile flow is zero (cease to flow), leading to undefined relative uncertainty estimates due to the need to divide by zero, or (b) the percentile flow is outside the rated range, in which case neither upper nor lower bounds are reported for that flow.In a small number of cases the uncertainty bound numbers are very high, and these cases are generally associated with near-cease-to-flow conditions.For example, the highest value of Q_uncert_Q10_upper (refer to Table 3 for naming conventions) occurs for catchment 919309A, for which Q10 is 0.000023 mm d −1 , but the upper bound is 0.05 mm d −1 , which is >2000 times higher.Thus, Q_uncert_Q10_upper for this catchment is 201 400 %.
Included in dataset.The dataset includes three streamflow time series files, as explained above and listed in Table 2; one time series file for streamflow quality codes and flags; and the following attributes in the CAMELS-AUS attribute table: three attributes related to record extent and availability (start date, end date, prop_missing_data; see incorporating topography as a co-variate.They both output grids at a resolution of 0.05 • × 0.05 • (approximately 5 km).However, the datasets differ in the variables they provide: AWAP provides precipitation, temperature, vapour pressure and radiation, all of which SILO also provides in addition to vapour pressure deficit and, importantly for modelling studies, various formulations of potential evapotranspiration (PET).They also differ in spatial interpolation method: the SILO method forces an exact match to measured values, whereas AWAP does not (Tozer et al., 2012).Both AWAP and SILO are commonly used in Australia.Rather than select one dataset over another, CAMELS-AUS includes both datasets and leaves the choice to users.When possible, users are encouraged to compare the datasets to obtain insights into interpolation uncertainty for the forcing data.For all AWAP and SILO variables, time series for each catchment were compiled by the CAMELS-AUS project by calculating the catchment spatial average separately for each day.The full available period was extracted, which for most variables is 1900-2018 (SILO) and 1911-2017 (AWAP).Exceptions to these record extents are noted in the text below.

Limitations arising from conventions for definition of daily time steps
Variables such as precipitation and streamflow are continuous, and formatting into a daily time step requires arbitrary conventions to split continuous time into 24 h periods.For example, the BOM convention is that precipitation is split at 09:00 (all times given in local time) each day, and a daily value refers to the precipitation that occurred over the preceding 24 h.Thus, if the BOM reports 18 mm precipitation for 14 March, this means that 18 mm was recorded between 09:00 13 March and 09:00 14 March.For streamflow, the conventions may vary depending on state or territory, but in collating the HRS data the BOM claims that conventions have been standardised to 09:00 to 09:00 (i.e. the same as precipitation).However, an audit of HRS data conducted by Jian et al. (2017) investigated this standardisation.They report that data from the states of Victoria, New South Wales, Queensland and the Australian Capital Territory (which together account for 168 of 222 stations) were consistent with the 09:00-to-09:00 claim.In contrast, they report that Western Australia (16 stations) data appear to be subject to a 01:00 split (i.e. 8 h earlier than expected), and South Australia and Northern Territory data (25 stations) appear to be subject to a 23:30 split (i.e.9.5 h earlier than expected).Modellers should be mindful of these points when designing studies and interpreting results since modelling results may be sensitive (Reynolds et al., 2018;Jian et al., 2017) to the day definitions for both precipitation and discharge (and, if relevant, the degree to which they are offset from one another).Regarding PET, the key variables (e.g.temperature) are aligned directly with the day they are reported.This creates a time offset between PET and precipitation.In the experience of the CAMELS-AUS authors, this offset will typically make little difference to the results of, for example, a rainfall-runoff modelling study since PET typically influences streamflow via seasonal, not daily, dynamics in most CAMELS-AUS catchments.In the interest of providing CAMELS-AUS data subject to minimal manipulation, we do not apply a time shift to PET (or any other data), but users may wish to manually shift PET earlier by 1 d to minimise the time offset between precipitation and streamflow.
A further consideration is that, due to Australia's large size, the CAMELS-AUS catchments occupy three different time zones.The majority are in a single zone (UTC + 10:00) covering Queensland, New South Wales, the Australian Capital Territory, Victoria and Tasmania.However, South Australia and the Northern Territory are in a separate zone (UTC + 09:30), while Western Australia uses UTC + 08:00.In addition, daylight savings time is used in South Australia, New South Wales, the Australian Capital Territory, Victoria and Tasmania.During the daylight savings period (typically October to April) 1 h needs to be added to the UTC times stated above.Given this multiplicity of combinations, measurements taken on either side of a state border that are marked with the same timestamp (e.g.09:00) may, in reality, have been taken at different times.
Unfortunately, these limitations (related to time zones and day definitions) are inherent to the observations, and this then carries across into derivative products such as gridded climate data.In principle, if data were measured continuously it would be possible to redefine the day definitions and thus harmonise across time zones and data products, but unfortunately most observations are only taken once per day rather than continuously.Thus, there is little choice but to accept the use of these data despite these limitations.

Precipitation
AWAP and SILO precipitation are provided in the files pre-cipitation_awap.csv and precipitation_ silo.csv,respectively (Table 2).Users interested in a comparison of AWAP and SILO precipitation are referred to Tozer et al. (2012), who note that the two products vary due to differences in interpolation methods, as noted above.They also assess the impact of adopting these gridded products on rainfall-runoff modelling outcomes, which may be of interest to CAMELS-AUS users. https://doi.org/10.5194/essd-13-3847-2021 Earth Syst.Sci.Data, 13, 3847-3867, 2021 One further rainfall-related time series file is precipita-tion_var_awap.csv, which provides, for each day, the spatial variance due to differences between grid cell values within a given catchment.This analysis was conducted using the tool AWAPer (Peterson et al., 2020), and the outputs can be used to understand how representative areal averages are across a given catchment and how this varies with time.

Evaporative demand
As noted, evaporation and evapotranspiration variables are provided by SILO only (Table 2).SILO provides PET estimates for the FAO56 short-crop (Food and Agriculture Organization of the United Nations, 1998) and ASCE tall-crop (ASCE, 2000) methodologies, in addition to three evapotranspiration formulations from Morton (1983), namely point potential, areal wet environment potential and areal actual.Three additional evaporation products are also provided, namely Morton (1983) shallow-lake evaporation, interpolated Class A pan evaporation (which only covers the measured period, 1970 onwards) and synthetic Class A pan evaporation extended to the full SILO period using the method of Rayner (2005).See Table 2 for adopted file names.

Other time series
AWAP time series are provided for a further four variables: daily maximum temperature, daily minimum temperature, vapour pressure (1950 onwards) and solar radiation (1990 onwards).Solar radiation AWAP data have numerous gaps, which have been filled by the average Julian day value: for example, if 5 March is missing, we adopt the average value over all non-missing instances of 5 March.SILO time series are provided for the following variables: daily maximum temperature, daily minimum temperature, vapour pressure, vapour pressure deficit, solar radiation, mean sea level pressure (1957 onwards), relative humidity at time of maximum temperature and relative humidity at time of minimum temperature.See Table 2 for adopted file names.

Catchment attributes
The following sub-sections, along with Tables 3 and 4, summarise the set of CAMELS-AUS catchment attributes.Spatial distributions of selected attributes are mapped in Fig. 4.
We note that the CAMELS-AUS dataset owes much to the earlier work of Stein et al. (2011), whose National Environmental Stream Attributes project calculated a broad variety of catchment attributes spatially across Australia, 74 of which are included in the CAMELS-AUS dataset.Stein et al. (2011) calculated these for the upstream area of each stream segment in Australia based on a 250k scale stream and catchment dataset (the BOM Geospatial Fabric v2.1; http://www.bom.gov.au/water/geofabric/, last access: 30 July 2021), and the contribution of the CAMELS-AUS project for the 74 in-dices is limited to (i) spatially matching each outlet to the appropriate segment (of which there are 1.4 million to choose from) and (ii) sorting through the attributes to identify those relevant to CAMELS-AUS (e.g.not all Stein et al., 2011, at-tributes relate to the upstream catchment area; others relate to the local area immediately around the stream segment and are thus irrelevant as CAMELS-AUS attributes in nearly all cases).

Climatic indices and streamflow signatures
A total of 11 climatic indices are provided, as listed in Table 3, calculated using the same code used in the original CAMELS (Addor et al., 2017).The code requires input time series of precipitation, temperature and PET, and for this purpose AWAP was used where available (precipitation, temperature), and for PET, SILO Morton Areal Wet Environment PET was used (this combination of inputs is consistent with past modelling studies such as Fowler, 2016Fowler, , 2018Fowler, , 2020b)).Likewise, 13 streamflow signature indices are provided, as listed in Table 3, also calculated using code from Addor et al. (2017).Together, the climatic and streamflow indices cover a wide range of statistics commonly used to characterise hydroclimate in modelling and regionalisation studies, and their common formulation with Addor et al. (2017) aids intercontinental comparison.

Geology and soils
Geology data are taken from Stein et al. (2011), who in turn derived these data from the 1 : 1 000 000 scale Surface Geology of Australia.In Table 4 this dataset is cited for brevity as Geoscience Australia (2008), but here we acknowledge the detailed state-by-state work of Liu et al. (2006), Raymond et al. (2007a, b, c), Stewart et al. (2008) and Whitaker et al. (2007Whitaker et al. ( , 2008)).For each catchment the proportion taken up by each of the seven geological types is provided as separate attributes.Additionally, we follow Alvarez-Garreton et al. (2018) in defining separate categorical attributes for the primary and secondary geological units (see Fig. 4j for a map of the primary types) with their respective areas defined as separate numerical attributes.
Soil data are taken from a variety of sources.The soil depth attribute (SolumThickness) is based on the Atlas of Australian Soils (Isbell, 2002), which divides Australia into soil "map units", each with associated "principle profile forms" (ppfs) in order of dominance.In turn, the dataset provides estimates (McKenzie et al., 2000) of the distribution of solum thicknesses (as 5th, 50th and 95th percentiles) associated with each ppf.The CAMELS-AUS SolumThickness is defined as a spatial average across the map units that occur in the catchment, where the depth assumed for a given map unit is the median value for its dominant ppf.Soil saturated hydraulic conductivity (ksat) and water holding capacity (solpawhc) are taken from Stein et al. (2011), who in turn  3 and 4. https://doi.org/10.5194/essd-13-3847-2021 Earth Syst.Sci.Data, 13, 3847-3867, 2021 derived these data from Soil Hydrologic Properties of Australia (Western and McKenzie, 2004).

Topography and geometry
Maximum elevation and average elevation are each taken from Stein et al. (2011), but because the gauging stations themselves are not features in the Stein et al. dataset, we calculate the elevation at the outlet separately.Catchment slope is calculated as the spatial average of the slope product of Gallant et al. (2012), which is itself based on the 1 s SRTM DEM.Stein et al. (2011) provide a variety of attributes related to the geometry of the catchment and/or stream network.Each of these are based on the geometry of the streams and catchments defined in the BOM's Geospatial Fabric v2.1 (http: //www.bom.gov.au/water/geofabric/download.shtml, last access: 30 July 2021), which itself is based on the 9 s (approximately 270 m) DEM of Hutchinson et al. (2008).The attributes are (i) maximum flow path length upsdist upstream from the outlet; (ii) stream density; (iii) Strahler (1957) stream order at outlet; (iv) elongation ratio; (v) relief, here defined as ratio of the mean and maximum elevations above the outlet; and (vi) relief ratio, here defined as elevation range divided by flow path distance.
Further attributes are defined based on the multi-resolution valley bottom flatness (MRVBF) index of Gallant et al. (2012).
As the name indicates, the index relates to the shape of the landscape and the degree of deposited sediment.As explained in Table 4, the index values contrast erosional (MRVBF = 0) locations with depositional (MRVBF > 0) locations ranging from "small hillside deposits" (MRVBF = 1) through to "extensive depositional basins" (MRVBF = 9).A total of 10 separate attributes are defined based on each integer value (0, 1. ..9) that MRVBF can take, indicating the proportion of the catchment in the given class.Lastly, using an earlier MRVBF version, Stein et al. (2011) analysed how common it is for a stream to pass through erosional landscapes (MRVBF = 0) and defined this as an additional attribute, "confinement".

Land cover and vegetation
Land cover and vegetation attributes are primarily based on the Dynamic Land Cover Dataset (DLCD), v2 of Lymburner et al. (2015).Across Australia, the DLCD maps 22 land cover classes using MODIS satellite data over rolling 2-year windows, providing 13 separate time slices (January 2002-December 2003, January 2003-December 2004. . . January 2014-December 2015).The CAMELS-AUS dataset incorporates these data in three ways: 1.A separate attribute is defined for each land cover class, where the attribute value indicates the temporal average proportion of the catchment taken up by the class over the 13 time slices.
2. Since "proportion forested" is an oft-used catchment attribute, a separate attribute is defined as the sum of the four DLCD classes which mention trees ("treesclosed", "trees -open", "trees -scattered" and "treessparse").
3. The time series data themselves are provided in full for each catchment, in a separate spreadsheet Land-cover_timeseries.xlsx.
The DLCD dataset is complemented by data from Stein et al. (2011), in turn sourced from the National Vegetation Information System (NVIS; Department of Environment and Water Resources, 2008).Stein et al. (2011) report the proportion of the catchment occupied by NVIS "major vegetation sub-groups" (categories are grasses, forests, shrubs, woodlands and bare).This has considerable overlap with the DLCD, and the reason it is included is because the NVIS also estimates the proportion of these vegetation types that existed in the catchment's "natural" state (pre-1750; note this is pre-European but not pre-Indigenous settlement).For each of the five categories, the NVIS provides natural pre-1750 ("_n') and "extant' (meaning current, "_e') statistics.

Anthropogenic influences
Anthropogenic influences are relevant to CAMELS-AUS because some catchments are minimally disturbed (e.g.pre-European vegetation cover, few roads), while others, although unregulated, are nonetheless significantly changed from their natural state (e.g.due to agricultural land use, small private (farm) dams, small towns and/or paved roads).Data on anthropogenic influences are taken from Stein et al. (2011) and based on earlier work with the same lead author (Stein et al., 2002).The earlier study aimed to identify the "wild" rivers of Australia by quantifying human impacts on two broad categories: the flow regime (sub-categories: impoundments, flow diversions and levee banks) and the catchment (sub-categories: infrastructure, settlements, extractive industries and land use).Following the same method, Stein et al. (2011) provide a unitless index varying between zero and one to quantify human effects in each of these categories and sub-categories, all of which are in CAMELS-AUS.
In addition to the Stein et al. (2002) indices, one further attribute from the Stein et al. (2011) dataset is included in CAMELS-AUS: the length of river upstream before encountering a dam.Although most of the current catchments lack large dams (and thus this will be the same as upsdist; see Sect.3.7.2), it is possible that future releases may include catchments that are marginally regulated, and the index might be relevant in these cases.

Other catchment attributes
This final category contains indices that do not easily fit in one category or that fit into more than one.The attributes quantifying human population are included here as they are relevant to both the land cover category and the anthropogenic influences but fit neatly into neither.These population attributes, taken from Stein et al. (2011), are based on aggregation of census population to 9 s grid squares and quantify the spatial average, the maximum grid value present in the catchment, and the proportion of grid squares exceeding 1 and 10 people km −2 .A further inclusion is the erosivity, which is primarily a climatic attribute but is often used by studies associated with the soil category.The erosivity is taken from Stein et al. (2011) and in turn from the National Land and Water Resources Audit (National Land and Water Resources Audit, 2001).
Finally, there are two further sub-categories of attributes: growth indices of plants and net primary productivity statistics.The growth indices of plants, compiled by Stein et al. (2011) and calculated using the Australian National University's ANUCLIM programme (Xu and Hutchinson, 2011), quantify the suitability of growing conditions (and the seasonality thereof) for three types of plants: megatherm (plants living in relatively high temperatures year-round), mesotherm (plants living in seasonally high temperatures) and microtherm (plants living in low temperatures).Net primary productivity (NPP) statistics are provided from Stein et al. (2011) and based on Raupach et al. (2002).NPP is defined by Raupach et al. (2002) as "plant photosynthesis less plant respiration . . . the carbon or biomass yield of the landscape" and "the most important driver of the coupled balances of water, C, N and P".Although Raupach et al. (2002) quantified both baseline (pre-agricultural) and current NPP, only the baseline figures were processed by Stein et al. (2011).The attributes include the annual average NPP in addition to averages for each calendar month separately.

Data availability
The CAMELS-AUS dataset is freely available for download from the Pangaea online repository at https://doi.org/10.1594/PANGAEA.921850(Fowler et al., 2020a).The dataset can only be downloaded via Pangaea's "view dataset as html" option, not "download dataset as tab-delimited text".The dataset (along with datasets on which it is based) is subject to a Creative Commons BY (attribution) licence agreement (https://creativecommons.org/licenses/, last access: 31 July 2021).

Conclusions
This paper introduces a new freely available dataset for Australia, CAMELS-AUS.It is the fifth CAMELS dataset worldwide and the first large-sample hydrology dataset for Aus-tralia to include data on climatic forcing, catchment attributes and gauging uncertainty.CAMELS-AUS provides time series data (streamflow and 18 climatic variables) and a broad set of 134 attributes for 222 unregulated catchments from across Australia.Given the unique hydroclimate of Australia, with high hydroclimatic variability and many case studies of multi-year drought, it is hoped that the release of this dataset will accelerate progress in such fields as arid-zone hydrology and the study of hydrology under a changing climate.
Author contributions.KJAF and NA conceived the dataset with the support of MCP.KJAF, NA and MCP designed the dataset.KJAF, CC and SCA analysed and compiled the hydrometeorological time series and catchment attribute data.MCP analysed earlier work (McMahon and Peel, 2019) to provide the uncertainty estimates included in the dataset.KJAF wrote the initial draft of the manuscript, and all co-authors edited and amended it to provide the final paper.
Competing interests.The authors declare that they have no conflict of interest.
Disclaimer.Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Acknowledgements.The authors acknowledge the Bureau of Meteorology, Australia, for their support and permission to undertake this project on the Hydrologic Reference Stations.The authors acknowledge the excellent work of Janet Stein and co-authors (Stein et al., 2011), to whom this dataset owes more than half the listed catchment attributes.Keirnan Fowler acknowledges funding from the Australian Research Council (grant no.LP170100598) and the Bureau of Meteorology (grant no.TP705654) during the period of preparation of this paper.Keirnan Fowler appreciates the support of Gemma Coxon (University of Bristol), who encouraged his foray into large-sample hydrology.Nans Addor acknowledges support from the Swiss National Science Foundation (fellowship no.P400P2_180791).The authors acknowledge Jan Seibert and the two anonymous reviewers, whose feedback significantly improved the manuscript.
Financial support.This research has been supported by the Australian Research Council (grant no.LP170100598); the Bureau of Meteorology, Australia (grant no.TP705654); and the Swiss National Science Foundation (fellowship no.P400P2_180791).

Figure 1 .
Figure 1.Location of the 222 CAMELS-AUS flow gauging stations and catchments, along with mean annual precipitation (from Jones et al., 2009) and Australian states and territories.

Figure 2 .
Figure 2. Mean annual values of hydrological variables for the global set of 699 catchments presented by Peel et al. (2010; n Australia = 123, n rest of world = 576).Boxplots show the 5th, 25th, 50th, 75th and 95th percentiles.Potential evapotranspiration in this dataset is a reference crop estimate using a method similar to Hargreaves' method, as outlined inAdam et al. (2006).

Figure 3 .
Figure 3. Plot after Coxon et al. (2020) showing (a) number of stations with percentage of available streamflow data for different periods and (b) length of the flow time series for each gauge.

Table 1 .
Basic catchment information provided in the attribute table of CAMELS-AUS.Latitude and longitude at outlet.Note that in most cases this will be slightly different to the BOM-published value because most outlets needed to be moved onto a digital streamline in order to facilitate flow path analysis.
long_outlet lat_centroid Latitude and longitude at centroid of the catchment long_centroid map_zone Map zone used to calculate catchment area (function of longitude) catchment_area Area of upstream catchment in square kilometres state_outlet Indicates which state or territory of Australia the outlet is within state-alt If the catchment crosses a state or territory boundary, the alternative state or territory is listed here; otherwise "n/a", meaning not applicable.daystartTime (UTC) for midnight local standard time (for state_ outlet).This is the day start time for T max and T min (see Sect. 3.5.2). daystart_P Time (UTC) for 09:00 local standard time (for state_ outlet); 09:00 is when once-per-day precipitation measurements are reported (see Sect. 3.5.2). daystart_Q Time (UTC) for streamflow day start time, assuming local standard time for state_outlet.This varies by state or territory (Sect.3.5.2).nested_status "Not nested" indicates the catchment is not contained within any other."Level1" means it is contained within another, except in cases where it is contained in another "Level1" catchment, in which case it is marked "Level2".There are no "Level3" catchments in the present dataset.

Table 2 .
Hydrometeorological time series data supplied with CAMELS-AUS.All time steps are daily.All non-streamflow data were processed as part of the CAMELS-AUS project to extract catchment averages from Australia-wide Australian Water Availability Project (AWAP) and Scientific Information for Land Owners (SILO) grids.
(Jeffrey et al., 2001vScientificInformation for Land Owners (SILO) project, Government of Queensland(Jeffrey et al., 2001; http://www.longpaddock.qld.gov.au, last access: 30 July 2021); SILO provides 0.05 • grids.To calculate catchment areas, the catchment boundaries were first projected into the appropriate local coordinate system under the Map Grid of Australia (MGA).Due to Australia's size, the MGA defines different coordinate systems based on longitude.Using the catchment centroid, each catchment was placed within a zone, and this zone was used to calculate area using the standard tool within Esri's ArcMap.Inspection of catchment boundaries revealed that some of the catchments are "nested" (i.e.entirely contained) within others, for example, when two gauges lie on the same stream (one downstream of the other) and both have been included https://doi.org/10.5194/essd-13-3847-2021EarthSyst.Sci.Data, 13, 3847-3867, 2021

Table 3 .
Flow uncertainty information, climatic indices and streamflow signatures provided in the attribute table of CAMELS-AUS.

Table 4 .
Catchment attributes included in the attributes table of CAMELS-AUS (apart from climatic and hydrologic indices).

Table 1
Jones et al., 2009)ce in large-sample hydrology studies to derive climate time series inputs by processing gridded data rather than directly using gauged point information (as is still common in industry).The first Australia-wide gridded climate product was the Scientific Information For Land Owners (SILO) project of the government of the State ofQueensland (Jeffrey et al., 2001).Later, the BOM developed a separate set of climate grids under the Australian Water Availability Project (AWAP;Jones et al., 2009).SILO and AWAP are similar: they are both interpolated products based purely on the BOM's climate monitoring sites and (where relevant)