Introducing GloRiSe – a global database on river sediment composition

. Rivers transport dissolved and solid loads from terrestrial realms to the oceans and between inland reservoirs, representing major mass ﬂuxes on Earth’s surface. The composition of river water and sediment provides clues to a plethora of Earth and environmental processes, including weathering, erosion, nutrient and carbon cycling, environmental pollution, reservoir exchange, and tectonic cycles. While there are documented, publicly available databases for riverine dissolved and suspended nutrients, there is no openly accessible, geo-referenced database for riverine suspended sediment composition. Here, we present a globally representative set of 2828 suspended and bed sediment compositional measurements from 1683 locations around the globe. This database, named Global River Sediments (GloRiSe) version 1.1, includes major, minor and trace elements, along with mineralogical data, and provides time series for some sites. Each observation is complemented by metadata describing geographic location, sampling date and time, sample treatment, and measurement details, which allows for grouping and selection of observations, as well as for interoperability with external data sources, and improves interpretability. Information on references, unit conversion and references makes the database compre-hensible. Notably, the close to globe-spanning extent of this compilation allows the derivation of data-driven, spatially resolved global-scale conclusions about the role of rivers and processes related to them within the Earth system.

As transporters of solutes and solids, rivers play a major role in terrestrial weathering and erosion and thereby in the global 35 carbon cycle, mediating atmospheric greenhouse gas concentrations and thus climatic stability of the Earth-system on geologic time-scales (Berner, 2003;Caves Rugenstein et al., 2019;Isson et al., 2020). Riverine suspended sediment fluxes of organic carbon (Berner, 1982;Hilton and West, 2020), phosphorous (Berner, 1999;Froelich et al., 1982) and biogenic silica (Conley, 2002) are considered dominant terms in their global budgets and a similar importance was proposed for the riverine particulate fluxes of calcium (Gislason et al., 2006), strontium (Jones et al., 2012) and inorganic carbon (Middelburg et al., 2020). 40 Moreover, significant amounts of divalent cations weakly bound to the negatively charged (clay) mineral surfaces are transported downstream with the suspended particles and this complicates estimates of weathering rates (Cerling et al., 1989;Tipper et al., 2021).
Superimposed on the lithological composition of the catchment, the interplay of terrestrial weathering and erosion define the composition of water and sediment input to the rivers and the magnitude of the corresponding fluxes. Therefore, dependencies 45 of weathering and erosion can be studied through the rivers composition in comparison to environmental variables (Gaillardet et al., 1999;Hartmann et al., 2014b;Romero-Mujalli et al., 2019). Extensive datasets describing riverine hydro-chemistry (GLORICH, Hartmann et al., 2014a;Nitrogen & phosphorous, McDowell et al., 2020a) and environmental variables (e.g. HydroBasins, Linke et al., 2019) have recently been established. Databases on water and sediment discharge, have been developed (GEMS-GLORI, Meybeck and Ragu, 1997;Milliman and Farnsworth, 2011;Land2Sea Peucker-Ehrenbrink, 2009). 50 In contrast, established major and trace element budgets of river suspended sediment (Martin and Meybeck, 1979;Savenko, 2007;Viers et al., 2009) are not easily (if at all) accessible, comprehensible or fully traceable, leaving much of their potential unexplored. To our knowledge, no current database summarizes the mineralogical and petrographic composition of global river sediments.
To fill this gap we compiled published (108 articles) -mostly peer-reviewed (104) -major, minor and trace element, 55 mineralogical and petrographic data of 2828 suspended and bed sediment samples taken at 1683 different world-wide locations between 1874 and 2016 Complementary metadata provides a spatio-temporal context and facilitates traceability of each datapoint, grouping/selection operations within and interoperability of this database, named Global River Sediment (GloRiSe).
Practically, this data may be used for e.g. (spatial) statistical modelling and model testing, to complement local or regional datasets, to explore and compare time-series at different locations, to screen potential for field studies, to assess anthropogenic 60 pollution or to characterize the material continuously transported to the global ocean and fresh-water reservoirs. Notably, the globe-spanning extent of the database allows to derive data-driven global-scale conclusions about the role of rivers and processes related to them within the Earth-system. In this article, we explain the derivation, harmonization and structure of the database, comment on its extent and limitation, and complement this by an example application, the calculation of the major element composition of the annual riverine suspended sediment export to the ocean. 65

Data collection
Data was collected during a literature survey between March and September 2020 of studies with local to global extent. The survey aimed at suspended sediment samples, being best representative for two potential fields of application: (a) the integrated chemical weathering history of the catchment (von Eynatten, 2004;Guo et al., 2018;He et al., 2020;Nesbitt and Young, 1982) and (b) the characterization of terrestrial riverine material transport and export (Ludwig et al., 1996;Milliman and Farnsworth, 70 2011). Riverbed sediments were primarily considered where no information on suspended sediment samples was found (especially in Africa) and marked specifically. If available, grain size information was included as well (sieve and/or filter pore size, average grain size, sand/silt/clay percentage). The fine fraction, i.e. clay and silt sized particles, of present and past riverine deposits is often considered a compositional tracer for riverine suspended load (Fedo et al., 1995;Garzanti et al., 2015;Guo et al., 2018;Nesbitt and Young, 1982). However, there is an effect of hydrodynamic sorting during deposition, affecting 75 sediment composition (Bouchez et al., 2011;von Eynatten et al., 2012;Galy et al., 2007;Garzanti et al., 2010Garzanti et al., , 2011Horowitz and Elrick, 1987;Lupker et al., 2011). Dissolution, precipitation, particle break-down and resuspension along the river further alter their chemical composition (Cole et al., 2007;Ensign and Doyle, 2006;Lupker et al., 2012;Nakato, 1990;Négrel and Grosbois, 1999;Papanicolaou et al., 2008). These findings need to be considered in the selection of GloRiSe data for a specific application. Because included samples should at least resemble the complete inorganic composition of suspended matter, no 80 studies were included, in which samples were decarbonated before analysis (e.g., Bayon et al., 2015;Liu et al., 2007Liu et al., , 2012.
Other sample treatments, such as oxidation, fusion or digestion were noted.
Previous compilations served as a starting point of our survey (Viers et al., 2009), or were incorporated (Martin andMeybeck, 1979 &Meybeck andRagu, 1997), depending on their content. Studies were selected, that report either inorganic major and minor element composition (Si, Al, Ca, Mg, K, Na, Fe, Mn, Ti, P, S, C, Loss on ignition -together 2412 observations) and/or 85 a (semi-)quantitative mineralogical phase analysis (876 observations). Organic C, P, N and trace-elements were only added when they were reported along with the major element or mineralogical data, implying that there is ample room to further expand GloRiSe using published data. Nevertheless, 1906 observations of selected trace-elements and 700 observations of organic C were collected. Organic P and N are highly under-represented. When instantaneous water discharge and/or suspended sediment concentration and/or solution properties (T, pH, alkalinity, Si(OH)4, DIC, DOC, HCO3 -, SO4 2-, Cl -, Ca 2+ , 90 Mg 2+ Na + , K + , Calcite-saturation) were reported in the same study, these were also added. Doublings with entries of other databases were not checked. Units were properly conversed (Appendix A). Studies were furthermore only included, if geographic coordinates were given or could be assigned using © Google Earth 2020 and given maps and/or site-names. For spatial averages (21 samples), coordinates in the center of the location distribution were chosen (range < 1 to ~ 5 ° latitude/longitude). Country and region of the measurement were also noted following details given in the specific study. The 95 closest information on sampling date is given for each observation, which can range in resolution from some years to minutes, but is the day or month in most cases. Seasonal (26) and annual (143) averages were also included, if the original measurements were not accessible, which is especially the case for older publications (before ~ 2000).

Database structure
The structure of the database employs that of the complementary GLORICH database (Hartmann et al., 2014a) listing variables 100 in columns and identifiers, metadata and observations in rows. It consists of 6 separate tables, in which samples are linked via a unique Sample_ID, which in turn, can be related to a Location_ID that is similar for all observations from exactly the same site. Locations that are situated within the same main basin were assigned the same Basin_ID, that allows to group observations in terms of catchment without further processing using geographic information systems (GIS). A main basin delineates everything that drains into the same river that is tributary only to the ocean (or a lake for endorheic drainage). The Rep_ID 105 was introduced, allowing to distinguish samples that are representative for the export to the ocean in terms of sampling position in the lower course of the main stem river but before significant marine influence, from upstream and tributary observations or endorheic drainages. Marine influence was assessed using tidal maps (Matthews, 2014), coastal landforms (© Google Earth 2020) and, where available, information on salinity gradients or assignment of freshwater endmembers. The Rep_ID also allows to distinguish observations during storm-or flood events, although the assignment of this relied on information provided 110 in the source article.
All information necessary to identify and retrace a sample, such as the IDs assigned by us, its original ID used in the original study, sampling date and references are stored within a separate table. Information on each variable is given in each file and a technical documentation and explanatory script can be downloaded together with the database.

Extent and limitations 115
The database covers a representative set of downstream observations from all over the globe except for Antarctica, but leaves larger gaps in upstream and endorheic areas ( Figure 1). Further gaps are identified in parts of North and Central Canada, Western USA, Central America, Western South America, Brazil, Northwestern Africa and Oceania (Figure 1). Although a few observations from South West Greenland and Central East Greenland are included, they are unlikely representative for Greenland sediment discharge as a whole. Upstream measurements are also not available for many parts of the world. 120 In contrast to the globally representative spatial coverage, temporal coverage is very low, i.e. there are few time series included. This is mostly because there are very few time series and concentration-discharge relationships available in literature.  The number of available variables differs between the samples in respect to the original purpose of data acquisition.
Consequently, depending on the users aim, the number of suitable observations may decrease drastically, i.e., the more variables needed, the less datapoints are available. This problem aggravates if the database is to be combined to other (point) 130 data sources, because the range of intersecting locations might be small or require a decrease in spatial resolution by e.g. rounding coordinates or integrating over geographic shape layers. The database does obviously not (yet) cover the whole overwhelming amount data on river sediment composition in the literature but is intended to be further expanded. We therefore not only ask users to provide feedback but also to contribute to extension of the database by sending us (also unpublished) suitable data with proper metadata to be included. 135

Possibilities and perspectives
To demonstrate some important possibilities and perspectives of the database, we provide an example MATLAB script for the calculation of the average major element composition of the global riverine suspended sediment export, including the combination with external data sources (Milliman and Farnsworth, 2011) and detailed comments. In this example, we exclude bedload samples coarser than very fine sand (grain sizes > 125 µm or unknown) and split up samples according to their 140 observation type into single measurements and spatial averages (I), seasonal averages (II) and annual averages (III). Rep_ID and Basin_ID are used to select samples from river main stems that represent the basins sediment discharge to the ocean. For each basin in the subsets (I) and (II), mean and median major element concentrations are calculated, representing annual averages for each set. All subsets are then recombined and an annual mean and median are calculated for each basin. This data is then merged with annual average sediment discharge values based on the river names (Milliman and Farnsworth, 2011) to 145 calculate basin-wise fluxes of each major element oxide. We then calculated the global mean and median concentrations of each oxide from the basin-wise mean and median, respectively, and the sediment-flux-weighted (sfw) mean. These estimates are based on samples, that can be considered spatially representative and probe ~ 35 % of the total global sediment flux considering 19.1 Pg/a (Beusen et al., 2005;Milliman and Farnsworth, 2011).
Temporal variability of element concentrations in individual rivers is accompanied by an even larger temporal variability of 150 sediment fluxes (Clark et al., 2017;Cohen et al., 2014;Eberl, 2004;Eiriksdottir et al., 2008;van der Perk and Vilches, 2020;Rousseau et al., 2019) and, thus, imposes a major uncertainty on the basin-wise averages, for which often only very few measurements are available (down to 1). To quantify this uncertainty, five to seven sites (depending on the element) within the GloRiSe database, were selected for which time-series spanning at least 10 months are available (along Amazon, Orinoco, Rhine, Loire, Rhône and Kuji). We then calculated the sfw mean major element concentrations and determined the maximum 155 difference and the sample standard deviation relative to this sfw mean concentration (SDsfw,t) from each time-series. The mean for extreme values is 0.3 -29.3 wt%, while the mean SDsfw,t of all series range from 0.1 to 8.6 wt% (Table 1). We propagated the mean SDsfw,t through the standard errors of the global averages under the worst case assumption, that only one measurement is available instead of a time-series for each location. The error estimate is therefore regarded as an upper limit and because many of the rivers available as time-series are comparably small and carbonate-rich (Rhine, Loire, Rhône), potentially 160 implying shorter response time to excursions of the flow regime. Furthermore, we inherently assume in our error estimate, that event-scale variability is within the month-scale variability, which should be subject to future research. We ignore measurement errors, suggesting them to be much smaller than seasonal variation. Uncertainties of sediment fluxes in the weighing procedure are also neglected, because we suggest them to be of minor importance compared to inter-basin differences, spanning several orders of magnitude (Cohen et al., 2014;Milliman and Farnsworth, 2011). 165 The median differs from -1.4 to 4.9 wt% from the mean and should be preferred for all oxides except for SiO2 and MnO because of their log-normal distribution. We applied this to our global sfw mean, which may be more representative for the global sediment export. Results for major and minor elements sum to 75 -85 %, the rest is accounted for by organic matter, degassing during sample preparation (e.g. CO2 and H2O degassing upon fusion) and uncertainty. To estimate the fraction of major cations sorbed to negatively charged mineral surfaces instead of being truly incorporated into the solid phases, we 170 reproduced and utilized a published linear relationship between clay-mineral controlled molar Al/Si ratios and cation exchange capacity (CEC), along with estimates of the average major element composition of this sorbed pool (Tipper et al., 2021). With a global average molar Al/Si ratios of 0.373, we arrive at an average CEC of 31.78 meq/100 g, implying 0.72 wt% Ca, 0.11 wt% Mg, 0.02 wt% Na and 0.03 wt% K of our sfw mean estimates to be derived from the sorbed pool.  Our global mean estimates of SiO2, MgO and Na2O appear to be higher than previous estimates from literature, while Al2O3 and Fe2O3 mean values are lower (Table 1) (2007) is not retraceable so we cannot evaluate possible common data sources. High sample standard deviations exceeding the actual concentration for some alkali and alkaline earth elements (Table 1)  Compared to the continental crust (Rudnick and Gao, 2013), Na2O, K2O and CaO are depleted in riverine suspended sediments, while Al2O3, Fe2O3 and TiO2 are enriched. This can be explained by terrestrial low-temperature weathering, which transforms pristine mineral phases (e.g. feldspar, mafic minerals and calcite) into dissolved load and secondary phases (Putnis et al., 2014;190 Ruiz-Agudo et al., 2016). Well-soluble elements (Na, K, Mg and Ca) will preferably be transported as dissolved load, leaving rather insoluble elements (Al, Fe, Ti) enriched in the weathering product (Gaillardet et al., 1999;Garzanti et al., 2014a;Middelburg et al., 1988;Nesbitt and Young, 1982;Stroncik and Schmincke, 2001). Consequently, the unweathered source rock is relatively rich in mobile elements and also more reactive (Brantley et al., 2008;Lasaga, 1984). To quantify the relative contribution of weathered and unweathered material to the composition of fine-grained sediment samples, reflecting 195 weathering intensity within the sediments source area, different chemical weathering indices have been developed (Fedo et al., 1995;Gaillardet et al., 1999;Garzanti et al., 2013Garzanti et al., , 2014bHarnois, 1988;Nesbitt and Young, 1982;Parker, 1970). Most of them are based on the relative concentrations of mobile to quasi-insoluble elements (Fedo et al., 1995;Garzanti et al., 2014b;Harnois, 1988;Nesbitt and Young, 1982;Parker, 1970), each involving different elements and related pitfalls. Other indices rely on the distribution of elements between solution and particles (Gaillardet et al., 1999;Garzanti et al., 2013). For the current 200 application, where we lack sufficient information on dissolved concentrations and do not expect a strong diagenetic imprint, but need to account for carbonate-and phosphate-related CaO, we use the chemical index of alteration CIX (Garzanti et al., 2014a): We neglect the contributions of sorbed Na2O and K2O, because their magnitude is small compared to solid concentrations 205 (Table 1). High CIX values imply a large contribution of weathered material, while a low CIX, similar to that of parent rocks, implies a substantial contribution of unweathered material. As riverine suspended sediment composition represents a mixture of eroded pristine material (parent rocks) and weathering products with river-internal processing, our lower global average CIX values imply the exported material to be less weathered and hence more reactive than anticipated. Note, that the higher values from literature are not sediment-flux weighted. This observation is statistically significant with respect to the propagated 210 error of the CIX (± 0.1 wt%) and is explainable by the rivers included in each dataset: Cold-climate rivers exhibit lower chemical weathering rates in their catchment (Hartmann et al., 2014b) and mountainous rivers are characterized by steeper terrains, higher erosion rates and less soil formation, hence chemical weathering (Milliman and Syvitski, 1992;Patton et al., 2018). The CIX calculated from the compilation of Savenko (2007) is rather in the range of our estimates, which may be explained by the inclusion of more arctic Russian rivers compared to the other literature estimates, as deduced from the 215 literature cited along. However, the CIX does not include SiO2, which is up to 9 wt% lower in Savenko (2007), than in our study. These explanations are consistent with the marked decrease in our sfw mean CIX compared to the mean and median estimates, because the former is largely influenced by a few large rivers draining areas of high chemical weathering intensities, i.e. Amazon, Ganga-Brahmaputra, Changjiang, Congo, Irrawady, Orinoco, Magdalena, Mekong and Godavari together already deliver ~ 20 % of the global sediment flux and South East Asian drainages contribute as much as 60 % of the global sediment 220 budget (Milliman and Farnsworth, 2011). Table 1 Global average major element composition of riverine suspended matter discharged to the ocean in weight % relative to total suspended matter and normalized to 100 % excluding organic matter and oxides often lost during preparation (e.g. CO2 and H2O) The mean, median and sediment flux weighted average of this study are compared to previous estimates from literature. T denotes the total concentration (ferric + ferrous Fe or organic + inorganic P). CIX is a chemical alteration index calculated on a molar basis (

SiO2
Al2O3 Fe2O3T Müller et al., 2021) and https://github.com/GerritMuller/GloRiSe.git along with the technical documentation. MATLAB scripts of the presented calculations with all required datasets are available as a supplement to this article. We suggest to preferably download GloRiSe 1.0 from the Zenodo-server, because it provides a DOI and a stable version. The github data storage serves for development purposes and will regularly be updated. Large updates,235 involving changes in the database structure will be noted by an integer number, while smaller updates (addition of samples) will be noted by the first digit. For reproducibility, we strongly encourage mention of GloRiSe version used. The MATLAB code used to produce the example presented in Section 5 and through which Figure 1 can be reproduced is available with detailed explanatory comments on the same page.

Conclusions 240
We introduce GloRiSe, a quasi-global database on river sediment composition including major, minor and trace elements along with nutrients and mineralogical data, placed in a spatio-temporal context by their metadata. This metadata also allows the user to trace back data sources, methods of preparation, measurement details and unit conversion, making the database comprehensible. The dataset is thought to enable global-scale investigation of geochemical fluxes related to erosion, weathering and sediment transport, to serve as a basis for statistical modelling and model validation, to screen promising basins 245 for investigations or complement other datasets. With the database, we provide a MATLAB example script for the calculation of the sediment-flux weighted mean major element composition of riverine suspended sediment exported each year to the coastal oceans, We complement these estimates by an error analysis that accounts for the variability between basins and the uncertainty induced by limited knowledge about the relationship of sediment flux and its chemical composition.

Appendix 250
Units of solid concentrations were converted to weight % of sediment expressed as oxide, using molar masses and ratios in the following equation (2): C denotes concentrations in mol %, while M terms the molar mass (g/mol) and m is the mass percentage relative to the bulk sediment. If concentrations were given relative to the solution volume (g/L or mol/L), they were only included if they were 255 convertible to weight % of sediment, i.e., when total suspended sediment concentration (TSS) was given: Solution concentration are given in µmol/L and were converted to this using molar masses similar to equation (2). Original units were noted for each entry and methods of measurement and sample treatment before measurement are specified as stated in the reference to make the data and the conversion comprehensible and reproducible.

Competing interests
The authors declare that they have no conflict of interest. 270

Disclaimer
Despite quality control we cannot guarantee that no errors occurred during the initial data acquisition and publication or during transcription into the database.