Diversity II water quality parameters from ENVISAT (2002–2012): a new global information source for lakes

The use of ground sampled water quality information for global studies is limited due to practical and financial constraints. Remote sensing is a valuable means to overcome such limitations and to provide synoptic views of ambient water quality at appropriate spatio-temporal scales. In past years several large data processing efforts were initiated to provide corresponding data sources. The Diversity II water quality dataset consists of several monthly, yearly and 9-year averaged water quality parameters for 340 lakes worldwide and is based on data from the full ENVISAT MERIS operation period (2002–2012). Existing retrieval methods and datasets were selected after an extensive algorithm intercomparison exercise. Chlorophyll-a, total suspended matter, turbidity, coloured dissolved organic matter, lake surface water temperature, cyanobacteria and floating vegetation maps, as well as several auxiliary data layers, provide a generically specified database that can be used for assessing a variety of locally relevant ecosystem properties and environmental problems. For validation and accuracy assessment, we provide matchup comparisons for 24 lakes and a group of reservoirs representing a wide range of bio-optical conditions. Matchup comparisons for chlorophyll-a concentrations indicate mean absolute errors and bias in the order of median concentrations for individual lakes, while total suspended matter and turbidity retrieval achieve significantly better performance metrics across several lake-specific datasets. We demonstrate the use of the products by illustrating and discussing remotely sensed evidence of lake-specific processes and prominent regime shifts documented in the literature. The Diversity II data are available from https://doi.pangaea.de/10.1594/PANGAEA.871462, and Python scripts for their analysis and visualization are provided at https://github.com/odermatt/diversity/.


Introduction
Freshwater ecosystems have undergone more dramatic changes than any other type of ecosystem (Sectretariat of the Convention on Biological Diversity, 2010).Lakes contain about 87 % of all surface freshwater (Gleick, 1996).The major threats that affect lakes and reservoirs are water-level changes, toxic pollution, salinization, eutrophication, acidification, sediment pollution and invasion of exotic species.Several upstream anthropogenic activities are related to these threats, such as agriculture, forestry, grazing, mining, irrigation, urbanization and dams, hydraulic engineering and industrial development.All these pressures are interconnected and act concurrently to reduce water quality and contribute to the deterioration of the ecosystem, including habitat loss and reduced biodiversity.
The conditions of inland waters vary over a wide range of spatial and temporal scales, leading to important logistic and economic difficulties in monitoring them on a regular basis.Some countries have national or regional lake monitoring programmes, which are primarily based on ground surveys.However, ground surveys often fail to sample on appropriate spatial and temporal scales.Other countries do not have monitoring programmes due to a lack of funds.The use of satellite remote sensing is a potentially cost-effective and efficient way to supplement the conventional in situ point sampling surveys.Remotely sensed products for water availability and quality are complementary to in situ data in terms of spatial and temporal coverage.They provide synoptic views of spatial distribution unachievable by other means and are ideally suited to covering the broad range of space scales and timescales associated with inland water applications.However, remote sensing is limited in terms of parameter coverage and depth resolution.
The Medium Resolution Imaging Spectrometer (MERIS) was operated by the European Space Agency (ESA) in 2002-2012 and demonstrated unparalleled capabilities for water quality remote sensing.Extensive reviews of popular retrieval methods revealed a wide range of different algorithms, but the usage of MERIS data prevailed (Matthews, 2011;Odermatt et al., 2012).The first globally representative lake water quality dataset from remote sensing provided a snapshot of chlorophyll-a (chl-a) concentrations in 80 000 lakes worldwide based on MERIS full-resolution (FR) data acquired in 2011 (Sayers et al., 2015).However, this dataset was compiled with an algorithm optimized for ocean colour remote sensing, whose suitability for inland waters was disavowed on several occasions (see e.g.Mobley et al., 2004;Morel and Prieur, 1977).Therefore we carried out an intercomparison of well-known and publicly available algorithms for the retrieval of chl-a and other water quality parameters in optically complex waters with a heterogeneous reference dataset for more than 40 lakes (Odermatt et al., 2015a).The Diversity II water quality dataset is produced with the most suitable retrieval methods identified through these investigations.
In addition to the optimized methodology, the Diversity II water quality dataset excels previous work by covering the full MERIS operation period with monthly, yearly and 9-year product aggregates, and several additional water quality parameters.Hence it provides a generically specified database that can be used for assessing a variety of locally relevant ecosystem properties and environmental problems.Several case studies are available that demonstrate such assessments with lake-specific foci (Odermatt et al., 2015b), but the larger part of the dataset is yet to be exploited.

Geographical scope
We selected 340 lakes for processing (Fig. 1) based on their biodiversity relevance, size, auxiliary and reference data availability, geographic distribution and particular user requests.Sixty-six of those lakes are at least 50 km 2 large and located within Ramsar Wetlands or listed as LakeNet Biodiversity Priority sites (www.worldlakes.org).The data table (https://doi.pangaea.de/10.1594/PANGAEA.871462?format=html#download) allows for their identification.The dataset includes 250 of the world's 350 largest lakes by extent, whereas size implies regional relevance and favours the feasibility of remote sensing retrievals in general and of lake surface water temperature (LSWT) in particular (Politi et al., 2016).Various contributors provided in situ water quality measurements for 42 lakes, which are used as reference sites for quality assessment (Odermatt et al., 2015a).The largest reservoirs in South America and individual sites in Asia and Australia are included in order to improve the global representativeness.Fifty additional lakes are included due to specific stakeholder requests.
In principle, the Diversity II water quality dataset could be extended to a much larger number of lakes.Size is the most important restriction in this regard, with a contiguous open water surface of roughly 1 km by 1 km being the theoretical minimum, but with certain complications occurring even for larger water bodies.The total number of suitable lakes worldwide is expected to be between the 80 000 demonstrated by Sayers et al. (2015), and, neglecting shape properties, the 350 000 lakes larger than 1 km 2 identified by Verpoorter et al. (2014).

ENVISAT MERIS L1B FSG imagery
MERIS was operated in 2002-2012 on-board the near-polar orbiting ENVISAT satellite by the European Space Agency (ESA; Rast et al., 1999).It measured reflected solar radiance in 15 narrow spectral bands across visible and nearinfrared (NIR) wavelengths.In FR mode, its push-broom charge-coupled device (CCD) arrays sampled the 1150 km wide swath at approximately 260 by 290 m ground resolution in across-track and along-track directions, respectively.MERIS had a nominal revisit time of 2-3 days at the Equator and less at higher latitudes, but FR data were not systematically acquired in the early years until 2005, and in later years they varied slightly due to mission operations, and therefore the availability of usable data varies regionally and temporally (Fig. 1).
We refer to three widely, but not consistently, used satellite image processing levels, in which Level 1 (L1) consists of top-of-atmosphere (TOA) signals, Level 2 (L2) includes derived geophysical quantities and Level 3 (L3) represents spatio-temporally aggregated data.Approximately 300 000 MERIS L1 images were used as input for the production of the Diversity II water quality dataset.The data represent calibrated TOA radiance, also referred to as at-sensor radiance.It emerged from the 2014 bulk reprocessing using MERIS Instrument Processing Facility version 6.Its geoorthorectification was improved using the Accurate MERIS Ortho-Rectified Geolocation Operational Software (AMOR-GOS; Bourg and Etanchaud, 2007); thus, the data are referred to as MERIS L1B Full-Swath Geo-corrected (FSG), and have an absolute and relative pixel location accuracy of 77 and 52 m, respectively.

AATSR ARC-Lake LSWT products
The Diversity II water quality dataset includes LSWT products that were readily provided by the ESA ARC-Lake project as version 3 production in rasters of 0.05 • cell size.The LSWT retrieval was performed with an optimal estimation approach (MacCallum andMerchant, 2013, 2012).Lake-specific prior surface temperatures were generated using an iterative scheme that is initiated with the monthly MODIS land and sea surface temperature climatologies.The ARC-Lake processor then uses valid satellite observations, simulations with the FLake model (Mironov, 2008) and data interpolating empirical orthogonal function (DINEOF; Alvera-Azcárate et al., 2005) techniques to iteratively create from this spatially and inter-annually invariant initial guess a field of spatially resolved temperature fields.
Several product types using different processing techniques and spatial and temporal aggregations are available (MacCallum and Merchant, 2014).We selected the DINEOF reconstructed, day-and night-time acquired monthly products in 0.05 • spatial resolution, whose filenames are ALIDXXXX_PLREC9D_TS012SR.nc and ALIDXXXX_PLREC9N_TS012SR.nc, respectively, with XXXX being a four-digit lake ID.They are available for 298 out of the 340 lakes considered, and empty LSWT product layers are contained in the remaining 42 lakes.Users requiring actual measurements of LSWT are recommended to download them from ARC-Lake or a comparable database directly.

Auxiliary data
Each lake's perimeter was defined in a shapefile that resulted from vectorized outlines of the Synthetic Aperture Radar Water Bodies (SAR-WB) map created by Santoro and Wegmüller (2014).These perimeters represent the maximum extent of water available from ENVISAT-ASAR acquisitions between 2002 and 2012, and each polygon's area and circumference were added to an attribute table.The polygons are intersected with the Global Lakes and Wetlands Database (GLWD; Lehner and Döll, 2004) Level 1 dataset, and ambiguities were manually resolved.The merged tabulated attributes are available in a metadata list (https://doi.pangaea.de/10.1594/PANGAEA.871462?format=html#download).Alternative lake names were added to the list at every opportunity, but are neither exhaustive nor tracked.
Lake water surface level data (Crétaux et al., 2011) provided by the Laboratory of Studies on Spatial Geophysics and Oceanography (LEGOS) through their Hydroweb portal (http://www.legos.obs-mip.fr/soa/hydrologie/hydroweb/)were originally distributed with the Diversity II database.This data come as 1-D discrete time samples, as opposed to the 2-D temporal aggregated water quality maps.Furthermore, they are based on independent developments and upwww.earth-syst-sci-data.net/10/1527/2018/ Earth Syst.Sci.Data, 10, 1527Data, 10, -1549Data, 10, , 2018 dates that would require continuous mirroring.Due to these differences, we refrained from adding them to the Pangaea Diversity II repository.

Data processing methods
The bulk production of temporally aggregated water quality parameters from L1B and auxiliary data requires a combination of several methods in an unsupervised processing chain.
For this purpose we implemented the CaLimnos v1 processing chain (Fig. 2) for deployment on ESA's Earth observation data processing cluster Calvalus (Fomferra et al., 2012).It is composed of several processors for the ESA BEAM Toolbox (Fomferra and Brockmann, 2005), which has recently evolved into the Sentinel Application Platform (SNAP).The same input and auxiliary data and pre-and post-processing modules were also used to create 10-day aggregates for the investigation of phenological cycles in Lake Balaton (Palmer et al., 2015), and corresponding CaLimnos v1 L2 intermediate outputs were used for assessing the spatio-temporal variability of chl-a in Lake Geneva (Kiefer et al., 2015).

Pre-processing
The identification of pure water pixels is an essential preprocessing step, because even sub-pixel signal contributions from land surfaces can strongly affect the retrieval procedures, especially when using band arithmetic algorithms that do not check for input signal compliance at runtime.The Idepix algorithm is an open-source SNAP processor and performs such identification for clouds, cloud shadows, cloud buffers, land, snow/ice, sun glint and mixed pixels (Danne, 2016) based on bottom-of-Rayleigh reflectance (BRR; Santer et al., 1999).BRR is subject to a partial correction of atmospheric effects, representing reflectance at the hypothetical boundary between an infinitesimally small aerosol layer and gaseous air layers above.It is the preferred signal when background reflectance for the estimation of aerosol optical thickness is highly uncertain, and therefore BRR intermediate products are also used for the identification of shallow water areas and as input for the Maximum Peak Height processor (Matthews et al., 2012) according to Fig. 2. Idepix uses the Shuttle Radar Topography Mission (SRTM) Water Body Dataset (SWBD; Slater et al., 2006) as a static a priori land-water mask, which is a snapshot of global water surface extent between 56 • S and 60 • N in February 2000.It applies several arithmetic expressions, a spectral unmixing algorithm for mixed pixel identification, and two back-propagation neural networks (NNs) for cloud identification to MERIS FSG L1B and BRR input data (Kirches et al., 2013).Output is a pixel identification flag layer which is much better suited for water constituent retrieval than the original L1B product flags (Ruescas et al., 2014).However, usage with inland waters is subject to two particular challenges.First, Idepix' sea ice identification uses cli-matological auxiliary data that are not available for lakes, and therefore lake ice identification is less accurate.Second, ephemeral water surfaces that may extend far beyond the SRTM observed extent are always clipped to the latter.
Bottom visibility is a critical and unmastered error source for water quality retrieval, because most algorithms that provide concentrations of water constituents do not account for benthic reflectance contributions in these so-called optically shallow waters.In fact, a pre-condition for them is optically deep water (i.e.no bottom reflection).Sandy or vegetated substrates cause surface-leaving signals that can closely resemble increased suspended sediment and phytoplankton concentrations in the water column, respectively, and thus distort retrievals.Only very few algorithms actually deal with the detection of optically deep water, and none of them applies to inland waters.Based on recommendations for clear coastal waters (Cannizzaro and Carder, 2006) and our own investigations, we defined a band ratio that evaluates the relative elevation of oligotrophic lakes' 555 nm water-leaving reflectance peak, but using BRR in three MERIS bands as input due to the lack of robust automated atmospheric correction algorithms for such conditions (Eq.1; Odermatt et al., 2015a).
Due to the ambiguity of certain substrates' shallow water reflectance and deep-water reflectances, this optical signature alone is prone to false positive identifications.It becomes much more robust when applied to temporally aggregated ratio_490 due to the relative persistence of benthic features as opposed to the dynamically changing water composition.Summer half-year mean averages were selected after evaluation of several statistical aggregation methods.Corresponding aggregates are composed of all cloud-free MERIS observations in May to October 2008 for the Northern Hemisphere, and November 2008 to April 2009 for the Southern Hemisphere.Lakes with constantly high turbidity, such as Lake Balaton, still trigger false positives.Therefore, each ratio_490 aggregate map was verified with high-resolution satellite imagery and bathymetry information.Considerable shallow water areas in about 30 oligotrophic to mesotrophic lakes were masked using a threshold of 0.65, which, in the case of the Beaver Island Archipelago in Lake Michigan, masks areas that are between 5 and 10 m deep (Fig. 3).Pixels removed in such a manner are indicated in a separate product layer (shallow, Fig. 3c).In the Lake Michigan example, there are some patterns in the chl_fub product that still resemble bathymetry features, but their concentration levels are within variations for deep-water areas apart from a few individual pixels.Especially in more turbid lakes, lower thresholds are applied to prevent false positives according to the corresponding column in the lakes list (https://doi.pangaea.de/10.1594/PANGAEA.871462?format=html#download).

Water quality retrieval
Chl-a retrieval in optically complex waters is straightforward when using the secondary reflectance peak at red and nearinfrared (NIR) wavelengths (e.g.Gitelson, 1992;Gons, 1999;Gower et al., 1999).However, using MERIS observations this peak is only accessible in moderately productive or turbid waters, while clear and humic waters call for different approaches (Odermatt et al., 2012).Moore et al. (2014) de-veloped an optical water type classification (OWT) framework, which supports the distinction of these different water types, and which is available as a SNAP plugin (Peters, 2016).It assigns water-leaving reflectance spectra to seven end members, which were identified through cluster analysis of in situ measurements.Classes 1-3 represent clear or absorbing waters, classes 4-5 represent high phytoplankton and classes 6-7 represent high suspended mineral con-tents (Fig. 4).The OWT algorithm depends on the accurate correction of atmospheric effects (Eleveld et al., 2017), which was assessed by classifying 42 matchup pairs of in situ reflectance measurements in 10 diverse lakes and MERIS water-leaving reflectance from various atmospheric corrections.Water-leaving reflectance obtained with the Coast-Colour NN algorithm (description below) achieved the best agreement, in which half the matchup pairs were assigned to the same OWT, and adjacent classes were assigned in 14 cases (Odermatt et al., 2015a).In five out of the remaining seven cases, the optically quite similar classes 1 and 3 are confused.This mismatch due to differences between in situ measured and satellite observed reflectance is quite significant.However, when considering only the separation between classes 1-3 and 4-7, and thus the feasibility of chla retrieval based on the secondary reflectance peak, the approach becomes very robust, with only 2 out of the 42 pairs being confused.The OWT maps therefore provide a rough but robust indicator for chl-a algorithm selection.Most lakes are relatively clearly dominated by either OWT 1-3 or 4-7, which makes the chl-a product selection straightforward.In rare cases like Lake Turkana (Fig. 5), such a selection can only be made if either the lower or upper end of the dynamic range is considered more relevant.Otherwise, it is recommended to either split the lake perimeter or merge the chl-a products, e.g. by weighting them with turbidity levels.
For the Diversity II production the Maximum Peak Height algorithm (MPH; Matthews et al., 2012) was developed further and implemented in a SNAP operator (Block, 2016) because it outperformed other red-NIR reflectance peak algorithms in the algorithm intercomparison study (Matthews and Odermatt, 2015;Odermatt et al., 2015a).It uses BRR in MERIS bands 6-10 and 14 for the retrieval of the red-NIR reflectance peak height and position, which allow for the identification of cyanobacteria-and eukaryote-dominated pixels, water surface covering by cyanobacteria scum or floating vegetation, and chl-a quantification.Dedicated empirically calibrated equations are used for the retrieval of chl-a concentrations in eukaryote-and cyanobacteria-dominated waters.Technically, the algorithm is designed to cover the range of 0-1000 mg m −3 chl-a.However, retrieval accuracy is significantly better for lakes that are predominantly OWT 4-7, namely eutrophic to hypertrophic waters.
The FUB algorithm (Schroeder et al., 2007), named after the Free University of Berlin, is a bundle of dedicated NN algorithms for chl-a, total suspended matter (TSM) and coloured dissolved organic matter (CDOM) retrieval from MERIS L1B data, and a fourth NN that computes AOT at four wavelengths (440, 550, 670, 870 nm) and water-leaving reflectance in all bands up to 708 nm, except at 680 nm.The algorithms are trained with radiative transfer simulations using the Matrix Operator Model (MOMO; Fell and Fischer, 2001) covering chl-a, TSM and CDOM concentration ranges of 0.05-50 mg m −3 , 0.05-50 g m −3 and 0.005-1 m −1 , respectively, and using MERIS bands 1-7, 9, 10 and 12-14  Nine-year aggregated OWT in Lake Turkana (a; owt_cc_dominant_class_mode), which features a very prominent gradient in turbidity (b; turbidity_cc_mean).Maximum turbidity and predominantly OWT 7 are observed in the north, where its main tributary, the Omo River, provides about 90 % of the lake's inflow (Beadle, 1981).In contrast, the terminal basin in the south corresponds to OWT 1 and 3, which are second and third lowest in turbidity according to the end members in Fig. 4. as input.The training ranges are a severe limitation for global usage, but specific retrieval quality flags indicate for each of the four NN algorithms whether the input or output exceeds the training range.However, for oligotrophic to eutrophic and in particular humic lakes, which are commonly identified as OWT 1-3, the FUB algorithm's chl-a output outperformed all other candidates in the intercomparison study (Odermatt et al., 2015a).Note that FUB uses shorter wavelengths that reach deeper into the water column than MPH, which means that the two chl-a products represent different depths and may not converge at intermediate concentrations (ca.10-30 mg m −3 ), where both algorithms produce valid results.
For the retrieval of TSM via particulate backscattering at 443 nm (bb_spm_443 in Fig. 2) and turbidity, as well as water-leaving reflectance input for the OWT classification, we used the CoastColour NN algorithm.Its architecture is based on the approach described in Doerffer and Schiller (2007), with two dedicated NN systems performing atmospheric correction and inherent optical property retrieval (Doerffer, 2011;Ruescas et al., 2014).In contrast to earlier NN algorithms, the CoastColour NN was trained with significantly larger concentration ranges, namely 0.03-1000 g m −3 TSM and 0.03-500 mg m −3 chl.It was extensively validated with the CoastColour Round Robin dataset (Nechad et al., 2015; available in Pangaea) and lake in situ measurements (Odermatt et al., 2015a).

Post-processing and auxiliary data
The aggregation of L2 to L3 products (Fig. 2) facilitates temporal binning and collocation in a common coordinate grid with the WGS 84 (EPSG 7030) coordinate system.Monthly aggregates are created using the input, output and aggregation methods listed in Table 1, and the same aggregation methods are used to create yearly and 9-year aggregates from monthly and yearly aggregates, respectively, which ensures that all months inputting aggregate periods are weighted equally even if the numbers of L2 available in these periods may differ strongly.Aggregation of biophysical parameters is done using the mean of all valid nearest-neighbour input pixels for each output pixel, while the OWT L3 output layer consists of the most frequently observed class value across all available input layers, using the lower class in the rare case of a draw.Monthly, yearly and 9-year aggregates for each lake are saved in individual GeoTIFF files, and compressed in 13 ZIP files representing 11 annual archives for the monthly aggregates, one archive for the yearly aggregate, and the 9-year aggregate.These 13 ZIP archives are zipped again to make each lake available for download in a single file of up to 18.9 GB in size (Caspian Sea).
To extract product statistics and for visualization of the products, a Python package is available at https://github.com/odermatt/diversity.The scripts included in the package allow for creation of spatial and temporal plots such as shown in Sect. 5 (Figs. 5,(12)(13)(14)(15)(16)(17)(18).They also feature the use of blacklists, e.g. to exclude all products with scarce lake extent coverage from further analyses.

Performance metrics and reference data
The estimation of remotely sensed water quality product accuracy is complicated by the relevance, consistency and availability of reference measurements, especially across different lakes and countries.Authoritative in situ reference measurements are obtained from lab-analysed samples or probe records; they apply well-known standards and are checked for consistency.But such measurements are generally not optimized for comparability with remote sensing data.They represent volumes in the order of litres, either from discrete samples or along depth profiles, in contrast to the pixel-sized satellite sampling (Fig. 6), whose vertical sensitivity depends on the derivative of the round-trip attenuation in each pixel (Zaneveld et al., 2005).Furthermore, our 10-year satellite mission with at least 3 days revisit time yields only an expectancy value of 20 matchup data pairs with monthly in situ measurements in a location with 50 % cloud probability.Against this background, Seegers et al. (2018) recently suggested a set of performance metrics that consists of error and decision metrics, as well as spatial and temporal maps and decision graphics.We provide these metrics, namely the mean average error (MAE) and bias of log-transformed parameters as defined in Seegers et al. (2018), and corresponding graphics for chl, TSM and turbidity matchup data pairs with datasets from 24 lakes and a group of reservoirs in 10 countries on 4 continents.Correlation coefficients are obtained from ordinary least square linear regression, neglecting inevitable errors of in situ measured reference datasets (McArdle, 1988).The remote sensing estimates used in these comparisons are means and standard deviations for 3 × 3 pixel kernels at sampling coordinates from L2 products that were acquired on the same day.Various reservoirs (Spain) It should be noted that the accuracy estimates given hereafter refer to L2 products, but for the L3 products distributed with our database, randomly distributed errors are, depending on acquisition frequency (Fig. 1), reduced through averaging.All reference datasets are from continuous monitoring programmes according to the enclosed acknowledgements, the only exception being the 2001-2005 dataset collected in Spanish reservoirs (Ruiz-Verdú et al., 2008;Simis et al., 2007), which was sampled in reservoirs Almendra, Cuerda del Pozo, Iznajar, Rosarito and Talarn.Due to the small number of samples per reservoir, and for their methodological consistency, the data from Spain are used as if they represented a single water body.The data of the Albufera lagoon are excluded because the reference locations were too far inshore and did not produce meaningful data pairs.We generally use records averaged across the top 5 m of the water column where data represent vertical gradients, e.g. from fluorometers or turbidimeters.In the case of Lake Champlain, chl measurements are taken with a vertically integrating hose sampler across twice the measured Secchi depth.The comparability of this approach with remotely sensed concentrations is expected to decrease with increasing Secchi depth, and therefore we removed all matchups sampled across more than 10 m depth (i.e. 5 m Secchi depth).In contrast to the algorithm selection study in Odermatt et al. (2015a), we did not consider any datasets that yielded less than 15 matchups with any of the algorithms, because the statistics resulting from these data are increasingly erratic.Note also that we did not obtain an interpretable number of CDOM measurements, and therefore this parameter remains unvalidated and should be used with caution.

Chl matchup comparisons
Figure 7 shows the largest chl matchup datasets of each country, whereas FUB or MPH are selected according to the lakes' dominant water type in the 9-year averaged products.Eight out of nine lakes feature correlation coefficients between R = 0.40 and 0.83.Larger concentration ranges result usually in higher R; accordingly, eutrophic lakes have generally higher R than oligotrophic and mesotrophic lakes.For most of the latter, i.e. lakes dominated by OWT 1-3 (Champlain, Geneva, Ontario, Paijanne), FUB matchups result in MAE that are close to median concentrations, and lower absolute bias (Table 3), while the comparison for Lake Vanern shows significantly higher concentrations from remote sensing than from in situ measurements.Concerning eutrophic and hypertrophic lakes, MAE and absolute bias are likewise clearly lower than median concentrations for Lake Balaton and the Spanish reservoirs, while a larger bias occurs with the data of Lake Kasumigaura.These successful examples in Fig. 7 are contrasted by Lake Peipsi, whose high CDOM levels and cyanobacteria-rich plankton community are known to create challenging conditions (Kutser et al., 2016).A large number of the matchup samples (black dots) even come with reflectance shapes that do not sufficiently match any of the OWT, which is not observed for any of the other 23 datasets.
Concerning the performance metrics for all 24 datasets, the MPH algorithm (Table 2) achieves R > 0.4 for 17 of them, including all datasets with OWT 4-7 apart from Lake Peipus (see above) and Lake Kallavesi (R = 0.39).But it does not meet R > 0.4 for 5 out of the 16 datasets that are predominantly OWT 1-3, which confirms earlier findings that the algorithm's sensitivity decreases below about 10 mg m −3 (Odermatt et al., 2012).In contrast, the FUB algorithm removes all pixels that exceed its training range (see Sect. 4.2); i.e. it does not allow any matchup for max.chl > 50 mg m −3 but for Lake Peipus (chl = 52.4mg m −3 ).
Earth Syst.Sci.Data, 10, 1527Data, 10, -1549Data, 10, , 2018 www.earth-syst-sci-data.net/10/1527/2018/This is reflected by the number of matchups for FUB, n in Table 3, which is strongly reduced for the most productive waters (Hartbeespoort, Kasumigaura, Klipvoor, Malaren, Peipus, Spanish reservoirs).When considering only datasets with OWT 1-3, FUB achieves R > 0.4 in 10 out of 15 cases, the remaining 5 being predominantly lakes in Finland (Haukivesi, Pielinen, Pyhajarvi, Vanajavesi), where a combination of high CDOM levels and a somewhat smaller dy-namic range than in Lake Peipus complicates retrievals (Attila et al., 2018).As shown already for the examples in Fig. 7, the MAE for lake-specific chl concentrations is within 30 % of the lake median concentration for about three-quarters of the datasets, and the absolute bias is similar to or significantly smaller than the MAE.Finally, we segment the validation matchups by OWT (Fig. 8), whereas FUB matchups are given for OWT 1-3, and MPH matchups for OWT 4-7.The overall comparisons per algorithm (Fig. 8, top row) show again that the concentration range available in the FUB products is limited to < 40 mg m −3 , but they reflect a large part of in situ measurement variations, especially for the 1-10 mg m −3 range.Contrariwise, the few MPH estimates for OWT 4-7 that fall into this range hardly align with in situ measurements.Altogether, MPH still achieves a relatively high correlation due to a better match in the 10-1000 mg m −3 range.The interpretation of individual classes is somewhat more complicated, because OWT 1, 4 and 7 consist of far fewer samples than OWT 2, 3 and 6, while OWT 5 is missing entirely.This reflects on the one hand the relative scarcity of the extremely clear, productive and turbid OWT 1, 5, and 7, respectively.For example, Lake Balaton and Lake Huron make up about 80 % of the samples in OWT 1, and Lake Balaton alone makes up about 90 % of the OWT 7 data.On the other hand, it is a consequence of the CoastColour atmospheric correction, which facilitates a very robust discrimination of OWT 1-3 versus OWT 4-7 but tends to underestimate the occurrence of OWT 4 and 5 in favour of OWT 6.The correlations are around R = 0.5 to 0.7 for large matchup counts, but lower where n < 100.But on the whole, we can again conclude that all MAE and biases are in the order of the subsample's median concentration or lower, with a tendency of MPH to overestimate the reference samples.

TSM and turbidity matchup comparisons
Most monitoring programs provide either for TSM, turbidity or only Secchi depth measurements.TSM and turbidity are, with caveats, directly correlated, while their relationship with Secchi depth is more complicated (Neukermans et al., 2012;Pfannkuche and Schmidt, 2003).Therefore we consider here only lakes where TSM or turbidity is available, which is the case for 3 and 10 lakes, respectively, whereas TSM prevails for the most turbid lakes, and turbidity is of units FNU apart from Lake Geneva (FTU), Lake Michigan and Lake Zug (NTU).In terms of remotely sensed estimates, it should be noticed that TSM (i.e.tsm_cc_mean) is estimated from particulate scattering across all visible wavelengths, while turbidity (i.e.turbidity_cc_mean) is estimated only from scattering at 865 nm according to ISO 7027-1 (2016), which means that the latter represents a significantly light smaller penetration depth.The scatter plots in Fig. 9 depict all available datasets but the one for Lake Pyhajarvi (Finland, R = 0.3).The correlations depend even more on the value range than for chl.A surprisingly high correlation but a significant bias and Earth Syst.Sci.Data, 10, 1527Data, 10, -1549Data, 10, , 2018 www.earth-syst-sci-data.net/10/1527/2018/ MAE are observed for turbidity in Lake Michigan, which might be related to the measurement standard NTU, which does not exceed a maximum of 1.0.The performance for the next least turbid lakes (Paijanne, Pielinen, Pyhajarvi, Geneva, Zug; R = 0.1 to 0.5, max.turbidity 3 to 6 FNU/NTU) is significantly worse than for the more turbid cases with R = 0.6 to 0.7.We therefore conclude that the lake-specific turbidity estimates tend to be of somewhat better accuracy than the chl estimates, apart from variations below 3 FNU, where the scattering signal at 865 nm decreases towards the noise level, in particular in high CDOM boreal lakes.At least for clear water lakes (e.g.Geneva, Zug), the TSM estimates (i.e.tsm_cc_mean) are a promising alternative to the turbidity products due to higher signal levels at visible wavelengths.Unfortunately, TSM reference data are only available for Lake Champlain and two much more turbid lakes, and the results for these three individual datasets are too inconsistent for generalized interpretation.When merging all three TSM and seven consistent turbidity datasets (Fig. 10), the correlations are just below R = 0.8, which is again due to the larger dynamic range, and MAE are only about half the median concentrations.This corresponds to earlier findings in a validation study review (Odermatt et al., 2012), where TSM retrieval is in general subject to smaller uncertainties than chl retrieval.Apart from these performance metrics, it should be noted that there is a cluster of remotely sensed turbidity estimates in the order of 0.3 FNU for Finnish lakes, which are up to 2 FNU according to reference measurements.This cluster is a different view of the sensitivity loss described above.

Limitations
Most remotely sensed water quality parameters are provided, by definition, in per volume concentration units (chl, TSM, CDOM), which are obtained through generalized relationships between these concentration units and absorption and scattering intensities.This approach has limitations that are similar to those for phytoplankton pigment measurements using fluorometer probes (Huot and Babin, 2010).Both are subject to the superimposition of congeneric signals, such as fluorescence by CDOM or other algal pigments in the case of fluorometer probes, and CDOM and detritus absorption in the case of remotely sensed pigment absorption.In addition, uncertainties in the relationship between concentration units and optical quantities affect the quantification, namely fluorescence quantum yield or weight-specific absorption and scattering.Therefore Huot and Babin (2010) call for repeated calibrations of fluorometers to quantify pigment concentrations.When comparing fluorometer measurements across a large number of different lakes, calibration aspects become a major source of uncertainty.Therefore, it was suggested to remove inconsistencies in absolute magnitude across fluorometer profiles by normalizing each profile to a 0-100 index range (Leach et al., 2017).Likewise, the optically derived constituent concentrations provided in the Diversity II water quality data come without lake-specific calibration.The comparison of absolute chl, TSM and CDOM levels across different lakes should thus be avoided.Instead, we recommend either applying spatial or temporal normalization during data analyses or calibrating the data with lake-specific gains and offsets (e.g.Kiefer et al., 2015).Note that parameters that express solely optical (turbidity, cyanobacteria/floating matter) or thermal (LSWT) parameters are not subject to this limitation.
The second major limitation is related to erroneous pixel identification for melting ice coverage and ephemeral lakes.Areas of melting lake ice are optically very similar to water, and the false retention of a lake ice covered pixel by the Idepix algorithm can lead to highly irregular constituent estimates.Indicators for such cases are the seasonal timing, sharp linear features in water constituent products, indicating lake ice borders or cracks, and very low values in mean LSWT.Auxiliary data from the NOAA/NSDIC Global Lake Earth Syst.Sci.Data, 10, 1527Data, 10, -1549Data, 10, , 2018 www.earth-syst-sci-data.net/10/1527/2018/ and River Ice Phenology Database could also help with the identification.A procedure to fix monthly products that are affected by melting lake ice artefacts using the BEAM L3 binner is described in Odermatt et al. (2015b).Ephemeral (also intermittent or seasonal) lakes as well as other lakes temporally consistent spectroradiometric properties inapplicable.Therefore ratio_490 thresholds for such lakes were set to 0.0 to disable shallow water identification.Furthermore, many of these lakes are subject to very high salinity levels and other typical habitat properties that favour extraordinary types of water constituents and accordingly biooptical properties, which can significantly affect the validity of the water quality products.In both cases, melting lake ice and ephemeral lakes, and both the spatial extent of the products and their spatial gradients can significantly deviate from usual conditions.Particular care is thus needed when a lake's product layer extent is significantly smaller than expected.The minimum percentage of water pixels identified in a product relative to the lake's maximum number of water pixels can thus be defined in the blacklisting procedure available in the Diversity II Python library.We recommend setting a minimum of 50 % and checking product integrity visually where appropriate.Finally, note that the relative abundances in the float-ing_vegetation and floating_cyanobacteria layers include only pixels that passed the foregoing Idepix masking.This means that especially very densely covered water pixels are previously identified as land pixels and not counted, resulting in an overall underestimation of abundances.

Example data usage
The Diversity II water quality datasets were used for a preliminary global assessment, from which we describe a few example results hereafter.In addition, we performed several lake-specific assessments, most prominently for indicating fish assemblages and status assessments in Lake Vänern (Sweden; Sandström et al., 2016).Other use cases are described as biodiversity stories and made available from www.diversity2.info.A summary of three selected examples verifies how the remotely sensed parameters respond to spatio-temporally evident or relatively well-documented biophysical events.

Global assessments
In order to obtain a robust database for global analyses, we first created blacklists that list all 2003-2011 monthly tur-bidity_cc_mean products that come with a spatial coverage of less than 50 % or the maximum depicted area in the period.In this manner, we eliminate a lot of products that were acquired under suboptimal conditions, namely cloud or ice coverage.Gaps in MERIS FR data acquisition (Fig. 1) are also accounted for in such a manner.By counting the remaining products, we obtain an overview of the percentage of available products per lake (Fig. 11a).Note that high-altitude lakes are often only ice-free between June and September; values as low as 30 % can thus mean almost complete coverage.For a rough trend estimation, we still exclude all lakes with less than 50 % of the 108 months in 2003-2011 covered, and we normalize the annual trend with the 9-year median concentration in order to account for turbidity magnitude differences (Fig. 11c).Note that such normalization can also account for lake-specific water quality parameter underestimation or overestimation, e.g.where phytoplankton pigment to absorption ratios differ.The overview allows the quick identification of regime shifts and extreme events.Concerning the former, we observe a dramatic decrease in turbidity in Lake Enriquillo (Dominican Republic), which apparently accompanied a debated doubling of its surface area during the observation period (Méndez-Tejeda et al., 2016).As a case for the latter, we see a high positive trend of 0.2 FNU / year in Lake Sinakharin (Thailand), which is usually below 1 FNU, but went through several individual months with up to 6 FNU since summer 2009.Finally, using the same product availability constraint, but no normalization, we mapped cyanobacteria abundance (Fig. 11c).

Lake Biwa
Lake Biwa is a monomictic lake north-east of Kyoto (Japan), up to 104 m deep and with a surface area of 658 km 2 among the smaller lakes in the dataset.The primary productivity of the lake is relatively low, but subject to strong spatial gradients that are related to the distribution of residential and industrial areas, which are concentrated on the south-eastern shore and responsible for increased riverine nitrogen input (Ohte et al., 2010) that increases near-shore phytoplankton growth (Fig. 12).In December 2007 investigations with an AUV (autonomous underwater vehicle) revealed more than 2000 dead organisms on the lake's bottom, mostly endemic Isaza gobi fish and lake prawns.Low dissolved oxygen concentrations of less than 1.0 mg L −1 near the lake bottom in November were identified as the main cause of an increased exposure of aquatic organisms to heavy metals and the dieoff (Itai et al., 2012;Kawanabe et al., 2012).Oxygen supply depends on wintertime vertical mixing, which, aside from wind stress, depends on the vertical density gradients and thus thermal stratification.In situ temperature profiles from Earth Syst.Sci.Data, 10, 1527Data, 10, -1549Data, 10, , 2018 www.earth-syst-sci-data.net/10/1527/2018/  the Lake Biwa Environmental Research Institute's regular limnological survey programme suggest that vertical mixing remained very weak in the winter of 2006/07 (Kawanabe et al., 2012).Even though relating surface to bottom temperatures is not without caveats, significantly higher LSWT with spatially averaged 8.2 • C is observed in March 2007 than in the other years (Fig. 13), suggesting that minimal annual LSWT could be a valuable proxy for vertical mixing in temperate lakes.

Lake Nicaragua
Lake Nicaragua/Cocibolca is the largest lake in central America, with a surface area of 7851 km 2 .It is polymictic, with a maximum and average depth of only 26 and 15 m, respectively.It is subject to prevalent ecological issues such as untreated urban wastewater discharge and immissions from agriculture (soil erosion, fertilizer and pesticide immissions) and aquacultures that introduce non-resident Tilapia species and possibly novel diseases.Moreover, the planned construction of the Nicaragua Canal connecting the Caribbean Sea to the Pacific Ocean would bring about a significant shift in the lake's ecological status, most directly through the excavation of a 27.6 m deep, 520 m wide and 286 km long waterway across the centre of the shallow lake, which will strongly affect light availability within the water (Meyer and Huete-Pérez, 2014).In spite of the limited availability of MERIS FR data over Latin America (Fig. 1), it can contribute to estimating baseline conditions prior to the intervention.Generally maximum and minimum turbidity occur around August and February, respectively (Fig. 14a), within a range between 2 and 20 FNU.Outliers such as in October 2005 and May 2007 can occur when only a small area of the lake is sampled.However, the 2011 turbidity peak in October is related to an extraordinary shift from cyanobacteria to eucaryotic algae (Fig. 15), which comes with significantly lower chl concentrations throughout the year from both the MPH (Fig. 14a) and the FUB algorithm (not shown), but also a second productivity peak.Even though data continuity and in situ measurements are required for further interpretation, the available data suggest that the lake was in a relatively unstable state at the end of the observation period.

Lake Victoria
Lake Victoria is the second largest freshwater lake in the world and is situated in a shallow depression between the Great Rift Valley and the western Albertine Rift, with a shoreline shared by Kenya, Uganda and Tanzania.It is up to 83 m deep, eutrophic and light-limited (Hecky et al., 2010), and its thermocline is usually at around 30-40 m, with complete mixing occurring once a year (MacIntyre et al., 2014;Payne, 1986).About 80 % of the water input to Lake Victoria is from direct rainfall (Swenson and Wahr, 2009), and atmospheric deposition is the most important phosphorous source in pelagic areas of the main basin (Tamatamah et al., 2005).In contrast, the Nyanza Gulf (also known as Winam or Kavirondo Gulf), the lake's most distinctive morphological feature in the north-east, receives about 10 % of the lake's terrestrial inflow, and it was concluded from ground measurements between March 2005 and March 2006 that the Nyanza Gulf even received phosphorous input from the main basin, in contrast to the paradigm that the gulf is a major contributor to the lake's increasing nutrient enrichment (Gikuma-Njuru et al., 2013).As a matter of fact, MERIS observations confirm that the most productive areas are located in the very east of the gulf throughout 2005 (Fig. 17) and for the first half of 2006 (not shown).During this period, the chl levels in the lake's centre in July 2005 appear extraordinarily high.This observation can be verified through a comparison with the number of available observations (2-7) and the abundance of immersed cyanobacteria (0.4-1) in this area and month.This means that most parts of this cyanobacteria bloom were identified in several observations.The Nyanza Gulf was also subject to intensive growth of water hyacinths (Eichhorna crassipes) in response to El Niño precipitation anomalies in 1998 (Albright et al., 2004) and 2007 (Fusilli et al., 2013).Figure 18 displays the 2007 proliferation event according to Diversity II data.The hyacinths appear in floating_cyanobacteria_mean rather than the expected floating_vegetation_mean, presumably due to persistent cyanobacteria dominance in the gulf.Given that Idepix is likely to mask completely overgrown pixels as land (see Sect. 4.1), the remaining water pixels counted for the abundance consist partly of immersed cyanobacteria, and partly of floating eucaryotes.Despite these limitations and the fact that monthly aggregates lack the spatial details of individual observations, the extent and course of the proliferation match well with the MODIS observations presented by Fusilli et al. (2013).
6 Data availability MERIS FR data are the primary input data source for the creation of the Diversity II dataset, and are available from the European Space Agency.LSWT is provided by the ARC Lake project.Auxillary data were collected in the LakeNet and GLWD databases.We use maximum (SAR-WB) and instantaneous (SWBD) lake outlines from ASAR and SRTM data, respectively.In situ data for validation are provided by various public authorities (see Acknowledgements).All input data are available free of charge.

Conclusions
The Diversity II dataset is the first globally representative, temporally resolved and methodologically consistent information source for inland water quality dynamics from satellite Earth observations.It includes monthly, yearly and 9year temporally aggregated geophysical maps of various water quality parameters, which provide unprecedented possibilities for exploitation at global and local scale.Global analyses are yet to be carried out, with caution as to the limitations mentioned hereafter.At local scale, several case studies demonstrated how the data could effectively contribute to traditional investigations of lake-specific processes and events.The Diversity II product user handbook (Odermatt et al., 2015b) helps to improve such interpretation of remotely sensed data by providing background knowledge of acquisition and retrieval methods, and Python scripts are available that facilitate standard information extraction and visualization from the individual GeoTIFF files.
The methods used for producing the Diversity II dataset are widely used and validated by several independent users beyond the end of the ENVISAT era.We complemented these studies with a relatively extensive performance assess- ment, which also refers to the complexity of comparisons between in situ measured and remotely sensed water quality parameters.For the performance indicators given here, R, MAE, bias and graphics, we repeatedly used the median concentration as a reference.We assume that the median represents typically a background concentration, which is several times lower than e.g. the magnitude of seasonal variations.It must also be remembered that the performance assessment refers to L2 observations, while our monthly product's temporal consistency is significantly enhanced where several observations are available in a month.
It is a major asset of Earth observation that productions from L1 observations can be repeated as improved methods become available, so there is no doubt that the methods in use for Diversity II will be improved and overhauled in the future.Several lessons were learned for such repeated productions, some of them involving challenges for future research.For example, the binary identification of optically shallow waters and floating lake ice has received relatively little attention in recent years, but under certain circumstances they have much larger effects on the final product accuracy than retrieval algorithms.Our method for the identification of clouds, land and mixed pixels is more advanced, but it remains critical for all partly cloudy situations.Therefore, the development of such methods should receive much more attention, relative to the number of retrieval algorithms that were developed in recent years.The latest generation of retrieval algorithms will be based on further advanced water types (Spyrakos et al., 2018) and ensemble approaches that account for the selection of multiple algorithms' estimates (as e.g.left to the user with chl_mph_mean/chl_fub_mean).Finally, new approaches are needed for the consolidation of such increasingly large geospatial datasets, and for the extraction of relevant information, which is still mostly based on lake-specific knowledge.

Figure 1 .
Figure 1.Global density maps for the bulk reprocessed MERIS FR dataset in the years 2002-2012, and distribution of the 340 lakes available in the Diversity II water quality dataset (bottom right).

Figure 2 .
Figure 2. The CaLimnos v1 processing chain for inland waters.Colouration indicates algorithms and downstream processes (white), input and auxiliary data (dark grey), intermediate products (light grey) and output products (blue).

Figure 3 .
Figure 3. Shallow water flagging for the Beaver Island Archipelago in the north of Lake Michigan.(a) Sentinel-2A true colour image, 8 May 2017.(b) Ratio_490 from MERIS data, acquired in May-October 2008.(c) Shallow water mask for ratio_490 with a threshold of 0.65 on top of the chl layer for October 2011 as contained in product layer shallow.Bathymetry data provided by NOAA-NCEI.

Figure 4 .
Figure 4. OWT end-member water-leaving reflectance spectra and a OWT 2 retrieval example for an in situ and MERIS CCL2 reflectance pair from Lake Zurich, 15 August 2007.Respective class membership scores are indicated in the legend.

Figure 5 .
Figure5.Nine-year aggregated OWT in Lake Turkana (a; owt_cc_dominant_class_mode), which features a very prominent gradient in turbidity (b; turbidity_cc_mean).Maximum turbidity and predominantly OWT 7 are observed in the north, where its main tributary, the Omo River, provides about 90 % of the lake's inflow(Beadle, 1981).In contrast, the terminal basin in the south corresponds to OWT 1 and 3, which are second and third lowest in turbidity according to the end members in Fig.4.

Figure 6 .
Figure 6.Comparison of how typical chl-a distribution patterns (a) are resolved with moderate-resolution satellite sensors (b) and in situ measurements (c).

Figure 7 .
Figure 7. Matchup comparisons between remotely sensed chl-a and monitoring measurements.Colours represent the dominant OWT in the nine matchup pixels according to Fig. 2 (black: OWT classification not possible), and error bars indicate their standard deviation.

]Figure 8 .
Figure 8. Global chl-a matchup scatter plots and for the FUB (left) and MPH (right) algorithms, for all matching OWT (top row) and specifically per OWT (rows 2-4).Note that none of the matchups was identified as OWT 5 (extremely productive water).Marker shapes represent MPH cyanobacteria flagging.

Figure 10 .
Figure 10.Turbidity (a) and TSM (b) matchup scatter plots and statistics across different lakes.Note that only turbidity data in units FNU were merged.

Figure 11 .
Figure 11.(a) Percentage of the 108 monthly turbidity_cc_mean products with at least 50 % coverage of the maximum lake area.(b) Turbidity trend, relative to median concentrations, for lakes with at least 50 % product availability.(c) Mean cyanobacteria dominance as the average of all spatially averaged monthly immersed_cyanobacteria_mean values for the same lakes.

Figure 12 .
Figure 12.Chl in Lake Biwa, May 2007; L3 aggregate of four cloud-free images and one partly cloudy image (chl_fub_mean).

Figure 13 .
Figure13.Lake surface water temperature (LSWT) in March in Lake Biwa (lswt_n_mean).The March LSWT marks the annual minimum for every year contained in the dataset.

Figure 14 .
Figure 14.Time series of spatially averaged turbidity (a) and chl (b) in Lake Nicaragua, with data gaps especially during the rainy season (June-October).

Figure 15 .
Figure 15.Annual relative cyanobacteria abundance in Lake Nicaragua, 2003-2011 (immersed_cyanobacteria_mean).Note that the number and distribution of valid observations across the years are quite unequal (Fig.16).

Figure 16 .
Figure 16.Number of observations for the annual cyanobacteria abundance maps in Fig. 15 (num_obs).

Figure 18 .
Figure 18.Peak of the 2007 water hyacinth growth in the Nyanza Gulf, Lake Victoria (floating_cyanobacteria_mean).

Table 1 .
Input , output and aggregation specifications for the monthly products.

Table 2 .
Statistical metrics and ancillary information for chl-a matchup data and the MPH algorithm.n/Min./Med./Max.describe the in situ measurement counts and ranges represented by the matchups.OWT are indicated as the mode of the 9-year averaged products where available, as well as for the available matchup data for MPH (in brackets).

Table 3 .
Statistical metrics and ancillary information for chl-a matchup data and the FUB algorithm.n/Min./Med./Max.represent the in situ measurement counts and ranges represented by the matchups for all n > 15.OWT are indicated as the mode of the 9-year averaged products where available, as well as for the available matchup data for FUB (in brackets).