Global distribution of wastewater treatment plants and their released effluents into rivers and streams

The main objective of wastewater treatment plants (WWTPs) is to remove contaminants such as pathogens, nutrients, and organic and other pollutants from wastewaters using physical, biological and/or chemical processes prior to 10 discharge into receiving waterbodies. However, since WWTPs cannot remove all contaminants, they inevitably represent concentrated point sources of residual contaminant loads into surface waters. To understand the severity and extent of the impact of wastewater discharges from such facilities into rivers and lakes, as well as to identify opportunities of improved management, detailed information about WWTPs is required, including (1) their explicit geospatial locations to identify the waterbodies affected; and (2) individual plant characteristics such as population served, flow rate of effluents, and level of 15 treatment of processed wastewaters. These characteristics are especially important for contaminant fate models that are designed to assess the distribution of substances that are not typically included in environmental monitoring programs, such as contaminants of emerging concern. Although there are several regional datasets that provide information on WWTP locations and characteristics, data are still lacking at a global scale, especially in developing countries. Here we introduce HydroWASTE, a location-explicit global database of 58,502 WWTPs and their characteristics. This database was developed 20 by combining national and regional datasets with auxiliary information to derive or complete missing WWTP characteristics, including the amount of people served. A high-resolution river network with streamflow estimates was used to georeference WWTP outfall locations and calculate each plant’s dilution factor (i.e., the ratio of the natural discharge of the receiving waterbody to the WWTP effluent discharge). The utility of this information was demonstrated in an assessment of the distribution of wastewaters at a global scale. Results show that 1.2 million kilometers of the global river network receive 25 wastewater input from upstream WWTPs, of which more than 90,000 km are downstream of WWTPs that offer only primary treatment. Wastewater ratios originating from WWTPs exceed 10% in over 72,000 km of rivers, mostly in areas of high population densities in Europe, USA, China, India, and South Africa. In addition, 2,533 plants show a dilution factor of less than 10, which represents a common threshold for environmental concern. https://doi.org/10.5194/essd-2021-214 O pe n A cc es s Earth System Science Data D icu ssio n s Preprint. Discussion started: 4 August 2021 c © Author(s) 2021. CC BY 4.0 License.


Introduction 30
In all inhabited regions of the world, the water quality of rivers, lakes and ultimately the ocean depends on how wastewaters produced from human activities in upstream areas, especially those that are densely populated, are processed and disposed.
Globally produced domestic and municipal wastewater is estimated to amount to 360 km 3 year -1 , of which 41 km 3 year -1 (11.4%) is treated in wastewater treatment plants (WWTPs) and then re-used, 149 km 3 year -1 (41.4%) is treated in WWTPs and then discharged, and 170 km 3 year -1 (47.2%) is not treated in WWTPs but released directly to the environment (Jones et 35 al., 2021). According to recent assessments, approximately 3.1 billion people worldwide had access to sewage systems connected to WWTPs in 2017 (WHO & UNICEF, 2017).
Although the overall goal of WWTPs is to reduce the load of pollutants that reach downstream waterbodies, most WWTPs are only designed to remove organic matter and macro pollutants. Thus one of the biggest issues related to global wastewater treatment is the efficiency of removal of specific contaminants, particularly those related to new products or chemicals that are 40 released without appropriate regulatory oversight and with uncertain or unknown effects on the environment and human health (WHO & UN Habitat, 2018). These "emerging contaminants" (e.g., pharmaceutically active compounds, microplastics, ingredients in household and personal care products) are not commonly monitored and most WWTPs are not designed to remove them either fully or partially before releasing effluents to nearby waterbodies. As a result, wastewaters are being collected from municipal sources, transported to a location where they may or may not be treated, and then released into the 45 environment, thereby causing the WWTP to serve as a concentrated point source of contamination of receiving waterbodies (Daughton and Ternes, 1999;Musolff et al., 2008). Once the contamination enters the river network it continues to flow downstream, potentially accumulating with other contaminants from multiple sources along the way, to sometimes deleterious effects.
Studies have demonstrated that the fraction of wastewater is directly proportional to effects on biodiversity and ecosystems in 50 rivers downstream of effluent discharge Neale et al., 2017;Bunzel et al., 2013). Therefore, the dilution factor (i.e., the ratio between the natural discharge of the receiving waterbody and the WWTP effluent discharge) is one of the major determinants of ecological risks originating from WWTPs (Link et al., 2017). Dilution factors have been used to predict potential exposure to down-the-drain chemicals from population density (Keller et al., 2014), which at a regional level can help prevent negative effects and determine hotspots of contamination. However, to identify which particular WWTPs should 55 be targeted for the implementation of more stringent treatment standards and/or be upgraded through the deployment of advanced treatment technologies, it is necessary to first determine where treated effluents are being discharged in order to pinpoint which individual waterbodies downstream are potentially affected by their wastewaters. For example, Rice and can also account for large-scale drivers that might not be captured by small-scale models . One of the main challenges for global water quality modelling is the lack of spatial consistency in datasets for model inputs, especially in regions where data are insufficient for a detailed assessment Tang et al., 2019;Kroeze et al., 2016). Due to the limited information on global wastewater, all published global water quality models until now (e.g. 100 Global NEWS, WorldQual, GlowPa, IMAGE-GNM) quantify the load of wastewater into the river system using population density and national sanitation statistics as proxies (e.g., Font et al., 2019;Strokal et al., 2019;Mayorga et al., 2010;Van Drecht et al., 2009;Williams et al., 2012;Beusen et al., 2015;Hofstra et al., 2013). More specifically, calculations are based on the fractions of population connected to sewage systems per country.
To address this important shortcoming, the objective of the present study is to develop a novel global database of WWTPs as 105 a means for estimating the distribution of wastewaters in the global river network at high spatial resolution. The database, termed HydroWASTE, includes the explicit geospatial locations of WWTPs, their linkages with the global river and lake network, as well as their main characteristics.

Development of HydroWASTE 110
To create HydroWASTE, three main steps were undertaken, as shown in Fig. 1: (1) the combination of national and regional datasets, including the correction of errors using the WWTP point locations and attributes available; (2) the georeferencing of WWTPs to a global river network, in order to connect the facilities to their receiving waterbodies; and (3) the estimation of missing attributes for each WWTP, including population served, wastewater discharge and level of treatment, using geospatial methods and auxiliary datasets such as modeled river discharge estimates, gridded global population numbers, gross national 115 income per capita, and country-level statistics on sanitation.
The design of HydroWASTE was tailored for its potential application in water quality modelling. The main attributes that are typically required to simulate the wastewater component in water quality models include Grill et al., 2018): (1) the WWTP's location (point coordinates); (2) the estimated effluent outfall location (linkage between WWTP and river network); (3) the number of people served by the WWTP; (4) the amount of wastewater discharged; and (5) the level of 120 treatment offered by the WWTP (primary, secondary, or advanced). The WWTP location is a necessary requirement for any spatially explicit assessment that is based on point sources of effluents discharged through WWTPs. Beyond knowing the actual location of the plant, it is also important to provide the approximate effluent outfall location into the local river network, which can differ substantially from the WWTP location. The number of people served by WWTPs is required to estimate contaminant loads that reach the facility, while the wastewater discharge and the corresponding level of treatment provide the 125 basis for calculating the contaminant loads that are discharged by the facility into receiving waterbodies. If no data concerning the population served are available, wastewater discharge can be used in lieu of this, provided that a reasonable conversion factor between the two can be estimated (see Section 2.1.4 below). Some of these attributes can be directly compiled from After intensive literature and online searches, several national (or multi-national/regional in the case of Europe) WWTP datasets were identified that provide the geographic location of WWTPs, as well as a varying list of additional attribute information such as population served, amounts of effluents discharged, and level of treatment (Table 1). In cases of multiple datasets being available for the same country, such as in the case of the USA or for individual European countries, the most 135 comprehensive or most consistent dataset was chosen rather than merging all available data in order to avoid issues of duplicate records. In most cases, datasets were retrieved from pertinent government agencies through publicly accessible website platforms or personal communication. The quality, completeness, and consistency of the datasets strongly vary among the different sources and nations. For all countries where no national data repositories were available, WWTP point locations (without further attribute information) were added from the open-source web platform of Open Street Map (OSM; 140 https://www.openstreetmap.org/). Table 1 use different attribute nomenclatures and reporting units. For example, in the European dataset, the population size is reported in 'population equivalents'; that is, it assumes one person produces 54 grams of dissolved organic pollutants, expressed as biological oxygen demand (BOD) per 24 hours. Therefore, it accounts not only for permanent residents of the surrounding area, but also for ambient populations, i.e. for differences between daytime and night-145 time populations, including tourists (Nakada et al., 2017). The term 'population served', as used in most national datasets, generally refers to the population physically connected to the particular WWTP, thus paying fees for the service (Daughton, 2012).

The selected datasets listed in
Filtering was necessary for some datasets that include additional records not regarding WWTPs, especially for the most comprehensive datasets for the USA and Europe. These datasets include records of decentralized wastewater treatment 150 systems, stormwater facilities, and other wastewater collection systems that are not connected to a WWTP. Some datasets include records with geographic coordinates outside the expected national or regional boundaries, which were assumed to be errors and removed from HydroWASTE. More details about each dataset can be found in the Supplementary Information, section S1.

Auxiliary datasets 155 a) River network attributes
To assign the estimated effluent outfall location of each WWTP, various raster and vector layers representing the river network and catchment boundaries were obtained from the global HydroSHEDS database (Lehner et al., 2008), which was derived from digital elevation data provided by NASA's Shuttle Radar Topography Mission (SRTM) at 90 m (3 arc-second) resolution. For our study, we used a standardized derivative of this database, termed HydroATLAS (Linke et al., 2019), that offers sub-160 basin delineations at 12 hierarchical levels of increasingly finer subdivisions. We applied the smallest sub-basin breakdown of level 12 which provides 1,034,083 sub-basins globally with an average area of 130.6 km 2 (std. dev. 146.9 km 2 ). HydroATLAS also offers a preprocessed river network, including discharge information, that was extracted at 500 m (15 arc-second) grid cell resolution and represents all rivers and streams where the long-term (i.e., 1971-2000) average discharge exceeds 100 L s -1 or the upstream catchment area exceeds 10 km 2 , or both. Natural river discharge estimates were provided by the global 165 hydrological model WaterGAP (Müller Schmied et al., 2014) version 2.2 as of 2014, which were downscaled from their original resolution of 0.5⁰ grid cells to the HydroSHEDS resolution of 500 m using geostatistical techniques (Lehner and Grill, 2013).

b) Country-level wastewater statistics
To infer missing attributes in the WWTP records, global datasets with information on wastewater at a country-level were used. 170 Treated wastewater discharge at the country-level was provided by Jones et al. (2021)  to allow for spatially consistent calculations. WorldPop was produced using a combination of census, geospatial, and remotelysensed data in a spatial modelling framework (Tatem, 2017).

d) Gross national income (GNI) per capita 185
The World Bank divides economies into four income groups (i.e., low, lower-middle, upper-middle, and high) based on Gross National Income (GNI) per capita (in U.S. dollars), calculated using the World Bank Atlas method (World Bank, 2019). This indicator refers not only to the economy, but also correlates with other non-monetary measures of quality of life. Here, the GNI of 2019 was used to classify countries based on their capacity to deploy different levels of wastewater treatment.

Georeferencing WWTP outfall locations to the global river network 190
A requirement for any spatially explicit water quality assessment that includes WWTPs is to know the approximate location at which each plant's effluents are discharged into a waterbody; i.e. typically a river, a lake, or the ocean. In reality, the location of the effluent discharge into the environment may be distinct from the WWTPs actual location, influenced by several local factors not easily obtainable and applicable at a global scale, such as environmental policies, political and social conventions, ecosystem characteristics, land use, and local conditions such as the presence of interfering pipelines and canals. As such, the 195 reported WWTP locations used in this study are not warranted to represent their actual outfall locations, nor to intersect with the natural river network. In addition, due to inherent quality limitations of the global HydroATLAS river network, which was derived from a digital elevation model, and the applied spatial resolution of 500 m, the river location does not always correspond to reality, especially for small streams.
Given these uncertainties, we developed a rule-based procedure within a Geographic Information System (GIS) to estimate a 200 representative point of connection between each WWTP and the river network (referred to herein as the estimated outfall location) using the following ruleset: (1) the outfall location should be within a predefined radius from the given WWTP point location; (2) only locations with average natural stream flows exceeding 100 L s -1 or with an upstream catchment area exceeding 10 km 2 are considered as possible outfall locations to avoid allocation to very small streams; (3) if multiple options are available, priority should be given to larger rivers under the assumption that effluents are generally directed towards larger 205 rivers to increase dilution); and (4) the location should be within the same sub-basin as the WWTP itself to avoid mis-allocation to close rivers across a watershed divide. By design, this ruleset assigns the outfall location to be downstream of the WWTP location (towards larger rivers), yet within a maximum radius, and this downstream allocation will generally reduce cases where effluents are (possibly erroneously) assigned to very small streams which could cause excessive estimates of wastewater concentrations in follow-up water quality assessments. We thus consider the described procedure to deliver a best-guess 210 association within the given river network with an intended bias to deliver conservative results in terms of environmental risk studies. It is also important to note that the estimated outfall locations should not be interpreted as true and precise geographic locations.
The predefined radius wherein the estimated outfall location can be assigned to a river was set at 10 km. This choice was based on a statistical determination process using a subset of WWTPs and remote sensing imagery for manual verification (see 215 Supplementary Information, section S2.3). If the closest location of connection to a river is further than 10 km, then the estimated outfall of the WWTP was georeferenced to that location, independent of distance, provided that all other rules still apply. In cases where the WWTP location is close to the sub-basin outlet, limiting the estimated outfall location to less than 10 km away from the WWTP location, the outfall location was additionally moved one grid cell (~500 m) further downstream; that is, into the next sub-basin and thus to a larger river, while keeping it close to the original WWTP location and in the same 220 overarching basin (Fig. 2).

Estimation of missing attributes
As a prerequisite for many applications, such as the development of a global contaminant fate model, the characteristics of WWTPs should be consistent throughout the database. Based on previous studies of contaminant fate in rivers Grill et al., 2018;Strokal et al., 2019), the three most important attributes required to produce realistic contaminant load 225 estimates are: (1) the number of people served; (2) total wastewater discharged by the plant; and (3) the level of treatment (i.e., primary, secondary, or advanced).
The availability of these three attributes in the original source data is highly variable between countries (Table 1). For instance, while data for the USA, New Zealand, Brazil, and China provide information on all three attributes, all other regions lack at least one of them, including Europe, India, Canada and Mexico with two attributes and large parts of Africa, South America, 230 Asia, and Australia only offering the WWTP location. For all incomplete data records, we thus inferred the missing attributes based on auxiliary information related to wastewater, such as reported country-level statistics on water use, sanitation, and economy, as well as population distributions. Table 2 provides an overview of the extent of missing data and the auxiliary data that were used to fill the gaps. Processing steps are explained in more detail below. Note that the order in which the missing data were estimated is predetermined: we 235 first completed the records of population served as the results then informed the estimation of wastewater discharge and level of treatment.

a) Population served
For WWTP records that did not include information on the population served by the plant, we estimated this attribute using up to three different approaches (A1, A2 and A3; see Supplementary Information, section S3 for more information), depending 240 on data availability, based on the following assumptions: (A1) the population served is directly related to the wastewater discharge of the WWTP; (A2) the population served should reside within relatively close proximity to the WWTP; and (A3) the treatment capacity of the WWTP cannot overload the receiving river's capacity for dilution. The latter assumption is based on the fact that governments typically regulate WWTP effluents to remain within specified dilution limits to mitigate adverse effects of pollution on aquatic ecosystems downstream (Link et al., 2017;Munz et al., 2017;Neale et al., 2017). Once the 245 different population values were estimated, the minimum value was selected to represent the limit of the WWTP's capacity in terms of population served. We chose the minimum to avoid excessive estimates of population served in subsequent water quality assessments.
For the first approach (A1) we estimated the number of people served, Pest, using the ratio between the plant's wastewater discharge, Wrep (as reported in the WWTP national dataset) and country-level statistics of treated wastewater per-capita, U (as 250 reported by Jones et al., 2021): We tested the validity of the relationship described by Eq.
(1) using countries with complete data availability (see Supplementary Information, section S3.1 for details) which confirmed a strong overall correlation (R 2 = 0.80; n = 28,497). If the total treated wastewater for a certain country was recorded as 0 in the reference dataset, U was substituted by the average 255 treated wastewater per capita for the countries in the same economic group based on their GNI (World Bank, 2019).
For the second approach (A2) the method to estimate the maximum population served depended again on whether the WWTP record contained information on wastewater discharge or not. If no wastewater discharge attribute was included, the maximum population served was estimated as the total population surrounding the WWTP within a radius of 11 km, using WorldPop population counts. This radius size was determined based on the outcome of a sensitivity analysis (see Supplementary 260 Information, section S3.2). In the geospatial analysis, we ensured that each person in a region was served by only one plant, thereby avoiding double counting. In contrast, if a wastewater discharge attribute was available, the total population surrounding each WWTP was computed within a radius of variable size, based on the initial value of population served as calculated using approach A1. All WWTP records were grouped into four size categories of population served: <50,000 people; 50,000-100,000 people; 100,000-500,000 people; and ≥500,000 people. The radius assigned for each group was 5, 10, 20, 265 and 30 km, respectively. This radius assignment was based on tests using the national dataset of India (see Supplementary   Information, section S3.3).
For the third approach (A3), we used the dilution factor, DF, as defined by Eq. (2) to determine the limit of the WWTP's wastewater discharge, W, into the receiving river's natural discharge, Q, at the estimated outfall location (see section 2.1.3 above). Q is provided by the HydroATLAS dataset (see section 2.1.2 above). 270 The minimum DF recommended by the European Medicines Agency for environmental risk assessments of medicinal products for human use is 10 (EMA, 2006). However, this can sometimes differ in reality. Rice and Westerhoff (2017) found a wastewater ratio higher than 50% for over 900 streams receiving wastewater in the USA; i.e., representing a DF equal or lower than 3. For the development of HydroWASTE, we therefore applied a minimum DF of 5, i.e. WWTPs can be assigned 275 maximum populations that would lead to effluent loads exceeding the EMA recommendation, yet within the range of ratios that are observed in reality. For WWTPs that have estimated outfall locations within 50 km of the ocean or a large lake (defined as those with a surface area larger than 500 km² in the global HydroLAKES dataset, Messager et al., 2016), we assume that environmental regulations are less restrictive since there is a large waterbody nearby that could greatly dilute the effluent. For this reason, A3 is not applied for these WWTPs. The maximum population served, Pmax, that the river could support was then 280 calculated by solving Eq. (2) for W (using DFmin = 5) and inserting it into Eq. (1), resulting in: In cases where the wastewater discharge is not reported (Table 2), only approaches A2 and A3 were used, which causes a higher level of uncertainty in these cases.
Finally, the minimum value among approaches A1, A2, or A3 was selected as the WWTPs estimate of population served. A 285 correction was applied if the sum of the estimated population served by WWTPs in a country, Ptot, exceeded the total national population connected to sewers, Pstat, as reported by the JMP-WASH database. In this case, the estimated population served by each WWTP was multiplied by a reduction factor (F) to ensure that the total population served per country would not surpass national statistics: This correction was not applied for any country that reported population served in its national WWTP dataset.

b) Wastewater discharge
We estimated wastewater discharge for all WWTP records that did not report on this attribute. Since a WWTP's wastewater discharge is directly related to the population served, Eq. (1) was modified to estimate the wastewater discharge (West) from the reported or estimated population served (P) of the WWTP record: 295

c) Level of treatment
The level of treatment of each WWTP was estimated based on the GNI per capita per annum categorization as defined by the World Bank for all countries, generally reflecting the observation that high-income countries have a higher probability of advanced wastewater treatment than low-income countries. The applied relationships between income, population served, and 300 level of treatment were determined based on national datasets that reported the level of treatment (see Supplementary   Information, section S3.4 for details). As a result, for countries in the high-income group (USD 12,536 or more per annum), if the population served by the WWTP exceeds 3,000 (i.e., in predominantly urban settings), the level of treatment was set as advanced; otherwise, secondary treatment was assumed. For middle-income countries (USD 1,036 to USD 12,535 per annum), the level of treatment was set as secondary. We did not find any WWTP regional datasets for countries from the low-income 305 group (USD 1,035 or less per annum). We assumed that the level of treatment is the most basic, i.e. primary, in these countries, which may lead to some underestimations of their actual treatment potential.

Application of HydroWASTE to estimate dilution factors and wastewater ratios in global rivers
The dilution factor was calculated for all WWTP records in HydroWASTE using Eq. (2) and the natural river discharge (Q; as reported in the HydroATLAS database) at the estimated outfall location. For WWTPs where the outfall location coincides 310 with a lake from the HydroLAKES dataset (Messager et al., 2016), DF was calculated based on the natural discharge at the outflow of the lake to the river network. Since there is no meaningful value for direct discharge into the ocean or a large lake (i.e., lakes with a surface area larger than 500 km 2 ), the DF for WWTPs where the estimated outfall location is within 10 km of the ocean or a large lake is assumed to be infinite.
Finally, the distribution of wastewaters in the global river network was assessed by calculating the ratio of wastewater to 315 natural discharge in every river reach. For this, the wastewater quantities discharged from all WWTPs were routed and accumulated downstream, from the estimated effluent outfall locations to the ocean, and divided by the long-term natural discharge as provided for all river reaches in the HydroATLAS database (see 2.1.2 above). The WWTPs reported as "Closed", "Decommissioned" or Non-Operational" were included in this analysis for their potential as source of residues in river sediments from former discharge (Thiebault et al., 2021). This process was performed using the river routing model 320 HydroROUT (Lehner and Grill, 2013).

HydroWASTE: a global WWTP database
HydroWASTE contains a total of 58,502 WWTPs, each including a reported or estimated attribute of population served, wastewater discharge, and level of treatment. From these, 58,278 records were successfully georeferenced to the global river 325 network of HydroATLAS. The remaining 224 WWTPs were not linked to the river network as they were located on small islands or in small coastal basins and are thus assumed to discharge directly to the ocean. The average distance between the WWTP location in the source data and its estimated effluent outfall location is 6.5 ± 3.1 km with a maximum distance of 21.8 km. Figure 3 presents the spatial distribution of WWTPs in HydroWASTE. Europe and the USA show the highest densities of 330 WWTPs, whereas China and India have somewhat lower densities but much larger facilities (i.e., more population served, see Table 3). Figure 3 also shows the comprehensiveness of the reported attributes of each regional dataset and an evaluation of HydroWASTE's population served against the JMP-WASH database (WHO & UNICEF, 2017). Since we limited our estimated values of population served so that they did not surpass the country-level records, most countries that show a large error correspond to underestimations. Exceptions occur in many European countries; here, population served was calculated 335 from reported values of 'population equivalents', which includes not only permanent residents but also ambient population and, thus, can exceed the reported national population values in the JMP-WASH database. Table 3 provides an overview of the 20 countries with the largest numbers of population served by WWTPs in HydroWASTE.
These countries contribute around 83% of the total global treated wastewater (Jones et al., 2021). Table 3  In terms of missing attribute information that was not reported but was instead complemented using statistical methods, we assigned 39% of the total population served and 33% of the total wastewater discharge in HydroWASTE through statistical estimates (Table 4).
In order to evaluate the robustness of the methods applied to estimate population served and wastewater discharge for records 350 with missing information, we used a subset of 28,497 WWTPs in HydroWASTE that have reported values of both attributes (see Supplementary Information, section S3.1 and Table S1 for details on these data). We applied the same methods as for the completion of missing attributes to additionally create an estimated value of both reported attributes in this WWTP subset.  (Table 5). The 'primary' treatment level could not be validated as this treatment level was predicted only for low-income countries, yet no reported data were available for this income category to compare against. 360

Global dilution factors
The dilution factors (DFs) were calculated for every WWTP record using Eq. (2), except for WWTPs that were assumed to discharge into large lakes or the ocean (n = 10,445) for which we assigned an infinite DF (see section 2.2 for more details).
The median calculated DF among the WWTPs in HydroWASTE is 570, and 2,533 (5.4%) of all plants showed a DF value below 10, i.e. lower than the recommended threshold for environmental regulations (EMA, 2006). Figure 5 shows the 365 cumulative frequency distribution of DFs calculated from HydroWASTE.
As part of the methods to estimate missing attributes, Eq. (3) required the setting of a minimum DF (see section 2.1.4 above) to estimate the upper limit of population served. We set this DF value to be 5 and applied it to a total of 479 WWTPs, which represent 19% of all plants with DFs below 10.

Wastewater distribution in global rivers 370
To demonstrate the global utility of the HydroWASTE database, we here present a first application in which we used both the location of WWTPs outfalls and their associated attributes to route the discharged effluents along the global river network and calculate the ratio of wastewater in any river reach downstream of a WWTP in the database. The global assessment shows that more than 1.2 million kilometers of rivers are located downstream of WWTPs and thus contain some amount of WWTP effluents (Table 6 and  From all rivers containing wastewater, about one third (398,000 km) exceed a wastewater ratio of 1%. Over 72,000 km of impacted rivers surpass the wastewater ratio of 10% (i.e. corresponding to a dilution factor of 11), thus reaching or exceeding the recommended limit used in environmental regulations (EMA, 2006). Although 26% (19,000 km) of these highly impacted rivers are located within close vicinity of WWTPs (i.e., within the average distance of 8.5 km measured between the estimated WWTP outfall location and the first river confluence thereafter) and may thus represent very local conditions and/or be affected 380 by uncertainties in the WWTP locations, the remaining 74% (53,000 km) are further downstream from WWTPs, indicating persistent risks of high potential wastewater contamination. From the 15 countries with the highest total length of rivers containing any amount of wastewater, more than 10% of impacted rivers in China, Mexico, India, and South Africa exceed the 10% wastewater ratio in their discharge (Table 6). Finally, our study highlights several large river basins, including the Hai (China), Mississippi (USA), and the Orange (South Africa) with particularly long sections of impacted rivers with 385 wastewater ratios exceeding 10% (Table 7).
A total of 149,000 km of river stretches with a wastewater ratio exceeding 1%, and 31,000 km with a ratio exceeding 10%, are located along rivers that are currently considered to be free-flowing , i.e., rivers not substantially impacted by human activities that could alter their connectivity and ecosystem services. Furthermore, we estimate that 17% of rivers that contain more than 10% of wastewater discharge are flowing through protected areas, defined as IUCN categories I-VI 390 (UNEP-WCMC & IUCN, 2021). These results show that wastewater ratios could be used as an additional and complementary metric of water quality to be integrated in refined assessments of anthropogenic impacts on river health and ecological status.
Finally, we assessed the number of potentially affected people along highly impacted rivers (i.e., rivers that carry at least 10% of wastewater). Following Richter et al. (2010), we assume that people living within 10 km of a river potentially dependent on river services, such as water provision or groundwater recharge, or are exposed to risks related to river flows, such as flooding. 395 With this definition, and using population information provided in the HydroATLAS database (Linke et al., 2019) we estimate that 874 million people live within 10 km of rivers with wastewater ratios exceeding 10%. As these people potentially use river waters for various purposes (e.g., drinking, cleaning, fishing, recreation), they are at elevated risk to be affected by water quality issues, including during floods.

Discussion and conclusion
Detailed water quality assessments require spatially explicit information on how, where, and how much wastewater is entering 405 the river system. Here, we developed a global geospatial wastewater treatment plant database, HydroWASTE, involving the compilation of national and regional datasets, the georeferencing of all records to a river network, and the estimation of attributes not originally reported by the source datasets. HydroWASTE can be used for numerous applications ranging from environmental to human health risk assessments. It is the first database at the global scale that includes this level of detail and comprehensiveness regarding geospatial WWTP locations, estimated effluent outfall locations, and associated attributes, such 410 as population served, wastewater discharge, and level of treatment. In a first application, these characteristics allowed for the assessment of the distribution of wastewaters in the global river network.
Since WWTPs are important sources of contaminants into receiving waters, spatial information on wastewater discharge along with the key attributes are critical inputs to water quality modelling. The most recent global assessments did not have access to this level of detail, relying on country-level statistics to account for these sources. The correct location of effluent discharge 415 as a point source is rarely available, and if it is, it often does not connect with the river network integrated in the model. In this study we followed a conservative approach to topographically connect the point sources (WWTPs) with the river network.
That is, instead of just connecting the WWTP to the nearest river reach, we introduced a tolerance of, on average, 6.5 km to allocate the outfall location further downstream, therefore connecting the WWTP to a river with larger expected discharge.
This intentional bias reduces the likelihood of incorrectly predicting low dilution factors and high contamination risks on small 420 streams; however, this approach can also cause an underestimation of the true extent of affected rivers. Nonetheless, we consider this conservative approach to be particularly important given the uncertainties in the river network quality and the reported locations of WWTPs.
Even though our assessment does not consider any removal of contaminants caused by treatment, decay, abstractions or river regulation, we argue that our results of wastewater ratios can serve as a first-order proxy to highlight areas of potential risk as 425 persistent contaminants might not decay, could possibly (bio-)accumulate or be transported downstream all the way to the ocean.
As for dilution factors, we used long-term average natural river discharge for the calculations, thus we would expect higher concentrations of wastewater under low-flow conditions. We acknowledge that a high wastewater ratio in rivers will have different implications in different regions, since treatment levels vary between countries and between individual WWTPs. In 430 addition, this preliminary analysis does not account for any removal or decay processes along the river network. Nonetheless, our approach facilitates the identification of hotspots along rivers where wastewater ratios in river flows would be greatest and represent a risk to local ecosystem or human health. This information could be used to guide regional or even field studies to monitor or assess in more detail the actual local water quality.

Uncertainties 435
The uncertainties involved in this study mostly derive from the source datasets, which makes it difficult to trace their origins and calculate their effects on the final assessment. Some of the detectable inconsistencies relate to the reported attributes. For example, the coordinates do not always depict the precise location of the plant, but instead can refer to the location of the effluent outfall or an approximate location (note that each dataset is described in more detail in the Supplementary Information, section S1). To quantify this type of uncertainty, we verified the given locations for a reference subset of WWTPs which 440 demonstrated the overall robustness of the applied approaches (see Supplementary Information, section S2.2).
HydroWASTE has extensive coverage of most European countries, the USA, India, and Canada, which represent the vast majority of WWTPs in the world (Table 4), and their records are based on information (location and most attributes) reported by their respective national datasets. For many of the remaining countries, especially those where the WWTP locations are sourced from the Open Street Map (OSM) web platform, their total population served tends to be underestimated in 445 HydroWASTE as compared to country-level statistics, reflecting the incompleteness of WWTP records. An analysis between OSM and the available national datasets (see Supplementary Information, section S4.1) showed OSM to cover only 37% of the total number of reported facilities. In terms of estimating missing WWTP attributes, OSM-estimated wastewater discharge was compared to reported values from the South African national dataset, showing acceptable general agreement with 86% of the estimates ranging within one order of magnitude of reported values (see Supplementary Information, section S4.2). Overall, 450 the lower-quality OSM-derived records constitute only 9% of the HydroWASTE database (representing 27% of population served and 19% of wastewater discharged).
As another source of uncertainty, the European WWTP dataset reports the population number as "population equivalent", which does not only refer to residents but also workers, tourists and service providers; that is, not only the country's permanent population with access to wastewater treatment, but the total ambient population using the sanitation services provided by the 455 WWTPs. It can be argued that reporting in terms of "population equivalent" is more adequate when accounting for the amount and content of wastewater discharge (Daughton, 2012;Nakada et al., 2017); however, since some WWTPs also include industrial sources of wastewater, the number of people served can be overestimated (O'brien et al., 2014).
To indicate different levels of reliability for each attribute, including the WWTP location, several quality indicators were assigned to each record in HydroWASTE to help inform users about uncertainties inherent in the data. The quality indicators 460 for population served, wastewater discharge, and level of treatment depend on whether the attribute is reported or estimated, and on the method used if estimated. The quality indicator for the WWTP location is based on a manual accuracy assessment performed using a global subset of the HydroWASTE database (see Supplementary Information, sections S2.1 and S2.2 for more details).
Despite these shortcomings, we believe that the 58,502 WWTPs in HydroWASTE and their effluent discharge into the 465 environment provide a robust first-order representation of the majority of global domestic wastewaters.

Towards better representation of municipal wastewater discharge in the global river system
The robust and consistent global database presented here is designed to be used by water resource managers, policy makers, researchers, and public institutions to develop strategies to control, regulate or mitigate the impacts of anthropogenic chemicals. It can be used to link population to individual WWTPs and trace the pathways of specific substances from 470 households through certain treatment levels into the river network. In addition, HydroWASTE can be used to identify WWTPs for which an upgrade in technology would deliver the biggest improvement of downstream water quality. Alternatively, where necessary, the resulting predictions could identify where local regulations should be established to limit the release of problematic pollutants. And, finally, it is conceivable that this approach could be used to predict the potential impacts that might occur with the development and anticipated widespread use of pharmaceuticals and household products, amongst other 475 potential sources of contamination. Many applications of our novel database relate specifically to the Sustainable Development Goal (SDG) 6 ("Ensure access to water and sanitation for all") as it helps to provide reliable estimates of the distribution of wastewater to inform decision making that ultimately aims at achieving universal access of clean water globally.
In our efforts to obtain national datasets on WWTPs and their characteristics, we found that many countries (especially lowerincome ones) do not provide openly accessible information on these facilities in a consistent and comprehensive format. Given 480 the many implications that WWTPs have on human and environmental health, either in their role to improve water quality through removing contaminants or as a potential point source of untreated substances, we strongly recommend that governments and international organizations produce and make publicly available the data that are required to support water quality assessments from local to global scales. In the interim, HydroWASTE can serve as a starting point for large-scale water quality analyses, or as an initial framework to be expanded. 485

Supplement
The supplement related to this article is available online at:

Author Contribution
HEM compiled all datasets, estimated missing attributes, and performed the analyses. HEM, BL and JN developed the study and drafted the paper. GG, JL, AL and RS contributed to the inclusion of national and regional datasets and their validation. 490 All authors contributed to and approved the paper.

Competing interests
The authors declare that they have no conflict of interest.