Rates and timing of chlorophyll-a increases and related environmental variables in global temperate and cold-temperate lakes

Lakes are key ecosystems within the global biogeosphere. However, the environmental controls on the biological productivity of lakes, including surface temperature, ice phenology, nutrient loads and mixing regime, are increasingly altered by climate warming and land-use changes. To 20 better characterize global trends in lake productivity, we assembled a dataset on chlorophyll-a concentrations, as well as associated water quality parameters and surface solar radiation, for temperate and cold-temperate lakes experiencing seasonal ice cover. We developed a method to identify periods of rapid net increase of in situ chlorophyll-a concentrations from time series data and applied it to data collected between 1964 and 2019 across 343 lakes located north of 40°. 25 The data show that the spring chlorophyll-a increase periods have been occurring earlier in the year, potentially extending the growing season and increasing the annual productivity of northern lakes. The dataset on chlorophyll-a increase rates and timing can be used to analyze trends and patterns in lake productivity across the northern hemisphere or at smaller, regional scales. We illustrate some trends extracted from the dataset and encourage other researchers to use the open 30 dataset for their own research questions.


Introduction
Lakes play an important role in the biogeochemical cycling of many elements (Battin et al., 2008;Cole et al., 2007;O'Connell et al., 2020;Rousseaux and Gregg, 2013;Schindler, 1971). With over 100 million documented lakes on earth (Verpoorter et al., 2014), evidence indicates that the majority of global lakes are shallow, with enough light and nutrients available to make them highly productive ecosystems (Downing et al., 2006;Wetzel, 2001). Lakes therefore represent active sites for the storage, transport, and transformation of carbon, nutrients (e.g., nitrogen, phosphorus, silicon, iron), and contaminants (e.g., mercury) along the freshwater continuum (Lauerwald et al., 2019;Tranvik et al., 2009). They are also sensitive to the effects of climate change (Williamson et al., 2009;Rouse et al., 1997).
There are multiple environmental controls on lake primary productivity, including water temperature, ice phenology, nutrient concentrations, circulation, mixing regime, and solar radiation (Lewis, 2011;Zohary et al., 2009). Stressors such as climate change and nutrient pollution can significantly impact these controls, altering the ecosystem structure and biogeochemical functioning of lakes (Jeppesen et al., 2020;Markelov et al., 2019). Changes affecting northern lakes include warmer water temperatures, enhanced stratification and hypoxia, nutrient enrichment, light attenuation H. Adams et al.: Chlorophyll-a concentrations in northern lakes by chromophoric organic matter, and increases in the relative abundance of toxic cyanobacteria in the phytoplankton community (Deng et al., 2018;Huisman and Hulot, 2005;Jeppesen et al., 2003;Creed et al., 2018). For example, Lake Superior has seen an increase in primary production -together with increasing surface water temperatures and longer seasonal stratification and ice-free periods -during the last century (O'Beirne et al., 2017). Other lakes are similarly experiencing increases in productivity. According to Lewis (2011), the current mean primary production of lakes is 260 g C m −2 yr −1 , which is 162 % higher than earlier estimations under historical baseline conditions. Globally, phytoplankton (i.e., algae) are the main primary producers in lakes and generally make up the foundation of lentic food webs (Carpenter et al., 2016). Periods of high lake productivity coincide with a rapid increase in phytoplankton biomass. In extreme cases, algal blooms can reach hundreds to thousands of cells per millilitre (Henderson-Seller and Markland, 1987). These bloom events produce large quantities of decomposing organic matter that cause the expansion of hypoxic conditions within the lake (Watson et al., 2016). In harmful algal blooms, certain algal species also release hepatotoxic and neurotoxic compounds (Codd et al., 2005). Thus, identifying trends in the timing and intensity of seasonal algal growth and linking them to changes in environmental stressors can help to predict the future of lake productivity and to assess the risk of undesirable algal blooms.
Because it is challenging to measure algal abundance and growth directly, chlorophyll-a is often used as a proxy for algae biomass and as an indicator of the associated primary production in lakes (Huot et al., 2007). Although other proxies have been developed (Lyngsgaard et al., 2017), chlorophyll-a is the most common metric to characterize trends in algal biomass within and across lakes, especially in historical water quality records. Tett (1987) proposes a chlorophyll-a threshold of 100 µg L −1 to define "exceptional blooms"; Jonsson et al. (2009) use a threshold of 5 µg L −1 to identify a bloom; Binding et al. (2021) flag an algal bloom when the chlorophyll-a concentrations extracted from satellite observations exceed 10 µg L −1 . Such threshold values, however, do not take into account the baseline (i.e., nobloom) chlorophyll-a concentration specific to a given lake or the lake's trophic status (Germán et al., 2017). Furthermore, focusing on harmful and nuisance algal blooms alone may mask the impact that a changing climate or other stressors may have on a lake's overall biological productivity.
Intra-annual fluctuations in lake chlorophyll-a concentrations result from the interactions of multiple variables and processes, including grazing by zooplankton, competition between algal species with different growth strategies and chlorophyll-a contents, and changes in temperature, light, and nutrient availability (Lyngsgaard et al., 2017;Sommer et al., 1986). In dimictic lakes, for example, there are usually two peaks in algal biomass and hence also in chlorophyll-a concentrations in the spring and fall, with a smaller biomass stock of slower growing species during the summer and an even smaller stock of algae (in terms of both biovolume and chlorophyll-a) under the ice cover in the winter (Hampton et al., 2017).
The spring increase in algal biomass generally consists of fast-growing algal species that take advantage of the increases in temperature and light following ice-off as well as the available inorganic nutrients that were generated by mineralization under the ice over the winter. The shift from spring to summer algal communities often coincides with high zooplankton grazing rates exceeding the spring algal growth rates, hence bringing down the total algal biomass. The high zooplankton grazing rates favour the growth during the summer of algal species that are less edible by grazers but which tend to grow at slower rates. Lake overturn in the fall initiates the transition from the predominance of the slow-growing species in the summer to the fast-growing phytoplankton species in the fall, causing a second peak in algal biomass (Sommer et al., 1986).
A common approach for comparing chlorophyll-a trends across multiple lakes is to consider the maximum or mean annual chlorophyll-a concentrations. For example, Ho et al. (2019) applied the Mann-Kendall trend test to analyze time series of annual maximum chlorophyll-a concentrations, while Shuvo et al. (2021) used a random forest regression approach to assess the relative importance of climatic versus non-climatic controls on mean chlorophyll-a concentrations. Both these studies analyzed chlorophyll concentrations derived from satellite observations rather than measured in situ. In addition, these approaches did not specifically identify the periods of the year when chlorophyll-a concentrations experienced rapid changes.
Alternatively, the rate of increase in chlorophyll-a concentration can be used to constrain the timing of rapid increases in algal biomass usually associated with periods of high primary productivity. In this study, we refer to these as "periods of chlorophyll-a increase" (PCIs). The weeks leading up to a PCI are crucial to create the necessary conditions that enable algal growth (Lewis et al., 2018). Thus, to analyze trends in lake net primary productivity, one should consider environmental variables, such as surface water temperature, solar radiation, and nutrient concentrations, both during and preceding the annual PCIs.
Although the rate of chlorophyll-a concentration increase has been used to detect algal blooms within individual water bodies, e.g., in the San Roque reservoir (Germán et al., 2017), it has rarely been used across large temporal (i.e., more than a few years) and spatial (i.e., regional and up) scales. Here, we present a method for calculating net rates of chlorophyll-a increase (RCI). The timing of PCIs and values of the corresponding RCIs were derived from in situ chlorophyll-a concentrations obtained for 343 lakes located at latitudes above 40 • N. The entire dataset covers the period of 1964-2019 and further contains data on coincident environmental control variables, including surface solar radiation. To illustrate the potential applications of the resulting dataset, we present some temporal trends of the chlorophyll-a rates and their relationships with environmental variables. The dataset is made available as an open resource that other researchers are encouraged to use in their own work.
2.1 Data acquisition, compilation, and quality control 2.1.1 Lake data selection In situ chlorophyll-a concentrations and other lake physicochemical data were extracted from open source international, national, and regional databases (see Table A1 for a summary of all databases used). The data include surface water temperature, Secchi depth, and pH as well as the concentrations of particulate organic carbon (POC), total phosphorus (TP), soluble reactive phosphorus (SRP), total Kjeldahl nitrogen (TKN), and dissolved organic carbon (DOC).
To enable readers to compare the methods used by different lake monitoring agencies and researchers to collect and process in situ samples, we provide the links to the raw data sources and metadata files in the appendix (Tables A1-A3). When selecting data, we remained as consistent as possible by implementing the following steps (more details can be found in the "initial formatting" folder found in the associated GitHub repository, https://github.com/hfadams/pci/tree/ main/code/initial_formatting, last access: 7 August 2022).
We only included measurements taken at ≤ 3 m water depth. When the sampling depth was not provided, we assumed the sample was taken from within the top 0.5-3 m of the lake, given that this is the usual standard sampling protocol (Dorset Environmental Science Centre, 2010; United States Environmental Protection Agency, 2012).
We selected lakes from mid to high latitudes (≥ 40 • N). Lakes at these latitudes typically experience seasonal ice cover and thermal stratification during the summer in contrast to low-latitude lakes that are typically meromictic or polymictic (Woolway and Merchant, 2019).
We omitted all variable values below the corresponding analytical detection limit. Data from different sources were individually reformatted to yield consistent (standard) units and headings. Where needed, reported values were averaged to yield daily mean values before being combined into a single CSV file. When multiple chlorophyll-a data types were available (as, for example, in the Laurentian Great Lakes data series), we selected the uncorrected data, because most re-ported lake chlorophyll-a concentrations have not been corrected for phaeophytin pigments. If no coordinates were provided, we assigned those of the lake centroid in QGIS. Fifteen lakes had unknown locations and were removed from the final dataset. We further restricted ourselves to lakes that, in most years, were sampled at least six times per year, which was considered the minimum sampling frequency to reliably detect the yearly PCIs. Lake names were standardized by expanding on abbreviations and removing unnecessary capitalization and special characters.
With the above selection criteria, the final dataset contained 52116 potential PCIs for 343 lakes at ≥ 40 • N and covering the period 1964-2019. The location of the lake sampling locations in the PCI dataset are shown in Fig. 1.

Surface solar radiation data
Open source in situ surface solar radiation (SSR) data for the period 1950-2020 were collected from stations paired with the selected lakes (see Table A2 for data sources). Each lake was paired with the closest SSR station using the nearest neighbour function in QGIS, allowing for a maximum radius of 3 degrees (Schwarz et al., 2018;Fig. 1). In the dataset provided here, the geodesic distance between each lake and its paired SSR station as well as the differences in elevation are given.
The SSR data temporal resolutions varied from minutes to months. Hence, where needed, the SSR data were resampled to yield monthly mean values. For the Experimental Lakes Area (ELA) in Ontario, Canada, the data were converted from photosynthetically active radiation (PAR) to SSR, where the PAR wavelength range (400-700 nm) was averaged to 550 nm.

Lake characteristics
For each lake, we calculated the trophic status index (TSI) based on the mean chlorophyll-a concentration over the sampling period. This TSI value was used to assign the lake to the corresponding trophic state category according to Carlson and Simpson (1996). The HydroLAKES shapefile yielded the lake's surface area, mean depth, and volume (ver. 1.0; Messager et al., 2016). Lake elevation was extracted from a digital elevation model (DEM) (Danielson and Gesch, 2010), and each lake was assigned its corresponding climate zone using HydroATLAS data (ver. 1.0; Linke et al., 2019). The metadata for these variables are published as part of the data publication (Adams et al., 2021), and a summary table of associated lake data is provided in the appendix (Table A4).

Detecting seasonal periods of chlorophyll-a increase
Periods of chlorophyll-a increase (PCIs) were identified based on the normalized net rate of change in chlorophyll-  a concentration (NRCC) at each lake sampling point throughout the year. To locate the start and end of a PCI, we smoothed the annual chlorophyll-a time series using a Savitzky-Golay filter (SciPy.signal savgol_filter) and flagged optima in the smoothed data (SciPy.signal find_peaks) using functions from the open source SciPy ecosystem . The procedure is illustrated in Fig. 2. The NRCC at any given time during the year was calculated by computing the first derivative of the smoothed chlorophyll-a concentration versus time and then dividing the derivative value by the corresponding chlorophyll-a concentration. For each lake and each year, the start of the first PCI was defined as the day the NRCC surpassed 0.4 d −1 . This threshold rate was selected following a series of sensitivity tests (details provided in the supplementary information). A threshold NRCC value was considered more preferable than a threshold RCI value, because it accounts for variations among lakes and among years in the baseline chlorophyll-a concentrations during the non-growing season.
The PCI ended on the day the peak in chlorophyll-a concentration was reached -that is, just before the NRCC turned negative. If a threshold NRCC of 0.4 d −1 was not reached during a given year, the PCI began when the NRCC first became positive. The second (fall) PCI was identified in the same way, following the end of the first (spring) PCI. If the annual chlorophyll-a concentration only yielded one peak value in the smoothed data series, only one PCI was identified for that year, which was then labelled as a "single PCI" year. Years with more than two chlorophyll-a peaks or with no peaks were not included in the PCI dataset.
Depending on data availability, the pre-PCI period was defined as the one-or two-week period immediately preceding the PCI start day. For each pre-PCI, the mean surface water temperature, SSR, and TP concentration were compiled. These served as simple indicators of how favourable in-lake conditions were to initiate algal growth (Lyngsgaard et al., 2017). An example of a year with a spring and fall PCI is shown in Fig. 3. Note that we use the label "fall" to indicate the second yearly PCI, although in some cases, the fall PCI was initiated before the fall equinox.
Once the PCI and pre-PCI durations were determined, the mean values of the variables listed in Table 1 were calculated. This was done for each lake and for each year data were available. In the dataset, each row represents a single PCI and includes the timing and duration, RCI value, and the mean values for all other relevant lake variables, including SSR, averaged for the PCI and pre-PCI. Note that, along with the variables in Table 1, we included the total number of samples collected each year and the mean time between samples. Thus, if desired, the user can filter the dataset for a higher sampling frequency than done here. The supplementary information of the dataset also identifies the organization responsible for monitoring each lake. where the first derivative is divided by the smoothed chlorophylla concentration and is plotted using the right axis. The PCI begins when the NRCC surpasses a threshold of 0.4 d −1 , as shown in the first (spring) PCI, and ends when the NRCC turns negative, which is when the peak chlorophyll-a concentration is reached. When a peak is detected but the NRCC does not surpass a threshold of 0.4 d −1 , the PCI begins when the NRCC surpasses 0 d −1 , as shown in the second (fall) PCI. The PCI and pre-PCI (two weeks leading up to the PCI) are shown in dark and light grey shading, respectively.

Dataset characteristics
Most lakes in the dataset are located between 50 and 60 • N. The majority of available open data are from organizations within the United Kingdom, Sweden, Canada, and the United States. The years with available data in the dataset are unevenly distributed. The majority of PCIs fall in the period 2005-2019 (Fig. 4a), likely due to a combination of increased lake monitoring efforts and a push in recent years towards greater accessibility of publicly funded data (Hallegraeff et al., 2021;Roche et al., 2020). Most sampling frequencies are in the range of 25 to 30 d, with additional peaks at 7 and 14 d (Fig. 4b). Thus, with a few exceptions, the PCIs included in the dataset occurred in lakes sampled at a monthly frequency or better.
The distribution of trophic states of the PCIs recorded in the dataset are 1.6 % oligotrophic, 18.6 % mesotrophic, 75.2 % eutrophic, and 4.6 % hypereutrophic. Single PCIs dominate oligotrophic lakes, where they make up 96.1 % of all PCIs (Fig. 4c). This may reflect the severe nutrient limitation in oligotrophic lakes, which prevents the occurrence of a second annual algal PCI (Rigosi et al., 2014). Oligotrophic lakes also tend to dominate at latitudes ≥ 55 • N (Fig. 4d), where lower water temperatures and lower cumulative solar radiation may further limit algal growth (Lewis, 2011). The PCI durations range from 3 to 275 d, with a median of 68 d Table 1. Summary of variables in the PCI dataset. Associated lake data (e.g., lake depth, surface area, volume, climate zone) are available in the Appendix (Table A4) (Fig. 5a). Fall PCIs tend to be shorter than spring and single PCIs, with the latter exhibiting the most variable start and end days (Fig. 5b).

Environmental conditions during PCIs
Rates of chlorophyll-a increase during the PCIs exhibit lognormal distributions (Fig. 6a). The mean chlorophyll-a rate is lowest in the single PCI category and highest in the fall Figure 4. Distributions of (a) year of occurrence, (b) mean time between samples, (c) lake trophic status index, and (d) lake latitude for each PCI in the dataset. Data are grouped by "double PCI" or "single PCI" year. The data is skewed toward more recent years and higher latitudes. Lakes in the oligotrophic category (TSI <40) have a higher proportion of single PCIs. These "rain-cloud plots" show the same data visualized in three different ways for each group: frequency distribution, boxplot with quartiles (outliers as represented as points), and a jitter plot of data points as different ways to visualize the data (Allen et al., 2021). Note that the amplitude of the frequency distribution is not proportional between categories. PCIs. Mean surface water temperature has a distinct bimodal spring-fall distribution (Fig. 6b). For the single PCIs, the corresponding mean temperatures are evenly distributed across the annual range, which reflects the large spread in the timing of the single PCIs (Fig. 5b). Total P concentrations are lowest during the spring PCIs (Fig. 6c), consistent with a greater control of P limitation on algal growth during spring com-pared to summer and fall (Kirillin et al., 2012). Secchi depth during the PCIs ranges from 0.01 to 15.4 m, with fall PCIs experiencing the lowest mean Secchi depth (Fig. 6d), as turbidity generally increases after the spring bloom. Figure 6. Distributions of selected water quality variables during PCIs: (a) log rate of chlorophyll-a increase, (b) mean surface water temperature, (c) log mean total phosphorus (TP), and (d) mean Secchi depth. The mean rate of chlorophyll-a increase is lowest in the single PCI category and highest in the fall PCIs. For the single PCIs, temperature is evenly distributed across the annual range, as they occur throughout the ice-free season. Total phosphorus concentrations are lowest during the spring PCIs, which likely reflects a greater control of P limitation on algal growth during spring compared to summer and fall. Each PCI category has a similar range in Secchi depth, between 0 and 5 m. Rain-cloud plots show the frequency distribution, boxplot with quartiles (outliers as represented as points), and a jitter plot of data points for each group.

Dataset: examples of trends
The PCI delineation and the estimation of RCI can, in principle, be applied to any lake for which time series chlorophylla concentration data are available. By creating a dataset comprising many lakes and covering multi-year time periods, it becomes possible to extract global trends in lake chlorophylla. Here, we provide a few illustrative examples of how the dataset can be interrogated, setting the stage for its use and extension by other researchers.

Chlorophyll-a rates: trophic status, latitude and climate zone
When grouped by trophic status, mean and median chlorophyll-a growth rates (RCIs) show the expected increase from oligotrophic to hypereutrophic lakes (Fig. 7a).
The rates in the different trophic categories, however, cover large and overlapping ranges. When grouped according to latitude, lakes between 40 and 50 • N exhibit the widest range in RCIs (Fig. 7b), in part due to the high proportion of lakes in this latitude range. The highest latitude lakes (60-70 • N) tend to have the lowest RCIs, which may reflect the cooler temperatures experienced (Lewis, 2011). The lakes are spread across three climate zones: cold and mesic; cool, temperate, and dry; and warm, temperate, and mesic (Fig. 7c). There is considerable overlap in RCI across the climate zones, with no systematic differences in the mean and median RCI values between the zones.
While variations in chlorophyll-a rates of increase (RCIs) are often assumed to reflect comparable differences in algal biomass growth rates, it is important to note that the chlorophyll-a to biomass ratio varies within and among lakes. In particular, chlorophyll-a to biomass ratios are known to be sensitive to variations in solar radiation, temperature, algal species, and cell size (Baumert and Petzodt, 2008;Inomura et al., 2019;Geider, 1987;Álvarez et al., 2017). The summer ratio of chlorophyll-a to biomass (the latter typically expressed as particulate organic carbon concentration) generally increases with increasing latitude, because algae are adapted to harvest the more variable daylight conditions, including longer summer photoperiods, at higher latitudes (Behrenfeld et al., 2016;Taylor et al., 1997). By contrast, cooler temperatures at higher latitudes may re- Lakes of a higher trophic status have a higher mean RCI, while lakes at higher latitudes have lower RCI (with considerable overlap between all categories). Grouping by climate zone shows minimal effect on RCI. The number of lakes represented by each violin is shown in grey text on the panels. Climate zones are as follows: 7 = cold and mesic; 8 = cool, temperate, and dry; 10 = warm, temperate, and mesic. White circles indicate the mean value for each violin.
sult in higher chlorophyll-a to biomass ratios because of lower growth rates, at least when the algae are nutrient replete (Behrenfeld et al., 2016). Thus, the use of a relative rate (NRCC) as the threshold value for defining a PCI and as a metric reported in the dataset facilitates comparisons between lakes of different trophic status or standing stock of chlorophyll-a.

Chlorophyll-a rates: temperature and climate warming
The start and end days of the spring and single PCIs show temporal trends towards occurrence earlier in the year (Fig. 8a). Earlier springtime algal activity could be linked to global warming. The latter is expected to result in earlier ice break up and earlier surface water temperature conditions favourable for algal growth (Markelov et al., 2019). The start and end days of the spring PCIs show a positive correlation with increasing temperature (Fig. 8b). By contrast, little or even negative correlations are seen for the fall PCIs. Thus, all other conditions unchanged, a warmer climate would see earlier spring blooms but few temporal shifts for the fall PCIs and, possibly, even a slight delay. For the spring and single PCIs, the duration shows a maximum around 10 • C. Therefore, moderate temperatures near or slightly above 10 • C Figure 9. Mean PCI surface solar radiation (SSR) grouped by PCI type (single, spring, or fall). White circles show the mean value for each violin. The mean SSR during spring PCIs is lower than that of single and fall PCIs, which have similar distributions.
should, on average, produce the longest lasting algal growth events. The same trend is not seen for the fall PCIs, possibly because they occur when water temperatures are already above 10 • C.

Surface solar radiation during PCIs: seasonal distributions and distances to lakes
The mean SSR during spring PCIs in the dataset is approximately 100 W m −2 (Fig. 9), which is lower than the mean SSR values of single and fall PCIs, which are both close to 175 W m −2 . This difference in mean SSR between spring and fall PCIs is expected, given the longer daylight hours and more intense sunlight experienced in summer and fall compared to early spring. The similarity in mean SSR between single and fall PCIs may be related to the observation that, at higher latitudes (>55 • N), single PCIs occur more commonly than double PCIs (Fig. 4d). Higher latitude lakes tend to bloom only once during the summer months, taking advantage of the period of the year with the highest SSR (Behrenfeld et al., 2016;Lewis, 2011). In support of this, Fig. 5b and c show that single PCIs tend to occur between late spring and early fall. On the other hand, at lower latitudes (40-45 • N), double PCIs are more common than single PCIs, likely due to the higher temperatures and longer periods of sufficient daylight experienced during the spring and fall "shoulder seasons" at these latitudes. Despite the defining importance of sunlight for photosynthesis, in situ SSR time series data are rarely measured systematically as part of lake monitoring programs (Sterner et al., 1997). Although gridded reanalysis datasets that include solar radiation parameters exist, their comparability with in situ SSR measurements remains questionable (Wohland et  al., 2020). In gathering open source data, we compiled in situ SSR measurements from locations as close as possible to the lakes with chlorophyll-a data. Nonetheless, many of the SSR values in our dataset were collected at considerable distances from the corresponding lakes (up to ∼ 300 km, Fig. 10). For our dataset, only ∼ 10 % of the locations where SSR was measured are less than 20 km away from the corresponding lakes, while ∼ 40 % are 20-50 km away, ∼ 43 % are 50-100 km away, and ∼ 7 % are more than 100 km away. Hence, in a significant number of cases, the actual mean SSR during a PCI may differ from the in situ mean SSR reported here due to differences in cloud cover and levels of atmospheric aerosols (among other factors) (Alpert and Kishcha, 2008). Users are therefore advised to consider this limitation when making use of the SSR values in our dataset. Overall, we recognize a need for SSR data to be more systematically measured and reported as part of lake-monitoring programs, in particular for oligotrophic lakes.

Conclusions
We present a novel way to delineate annual periods of chlorophyll-a increase (PCIs) in lakes that, presumably, over-lap with periods of algal growth. We apply this approach to derive the chlorophyll-a rates of increase (RCIs) during the PCIs of 343 lakes from cold and cold-temperate regions in the Northern Hemisphere and covering the period 1964-2019. The derived RCIs are assembled in an open source dataset, together with additional information on the lakes, including water quality, trophic state, and surface solar radiation. Note that the dataset can be paired with other databases, such as HydroLAKES (https://www.hydrosheds. org/products/hydrolakes, last access: July 2022, Messager et al., 2016), HydroATLAS (https://www.hydrosheds.org/ hydroatlas, last access: July 2022, Linke et al., 2019), and GLCP (Meyer et al., 2020), to access additional lake and/or watershed attributes. Our dataset is designed to support comparative analyses of the controls on lake chlorophyll-a dynamics and also, by extension, algal dynamics within and between lakes. We present several examples of such analyses. We hope these will encourage others to use the dataset in their own research and to further expand the dataset's geographical reach and information content.  Table A1.
Summary of sources and licensing for the chlorophyll-a data. Direct links to the datasets are provided where possible, and lake names can be searched within the database. Note that not all lakes in these databases met the requirements to be retained in the PCI dataset. *    (Messager et al., 2016) lake_long Decimal degrees Lake longitude, collected from original data files and HydroLAKES data (Messager et al., 2016) tsi Range from 0-100 Calculated from mean chlorophyll-a concentration across all years the lake was sampled, based on guidelines from the North American Lake Management Society (https://www.nalms.org/secchidipin/ monitoring-methods/trophic-state-equations/ * ) trophic_status Oligotrophic, mesotrophic, eutrophic, hypereutrophic Assigned using lake trophic status index climate_zone Integer Climate zone of each lake, assigned using the HydroATLAS database (Linke et al., 2019) lake_elev m above sea level Elevation of the lake, extracted from the Global Multi-resolution Terrain Elevation Data (GMTED2010) model (Danielson and Gesch, 2010) (https://www.usgs.gov/core-science-systems/eros/ coastal-changes-and-impacts/gmted2010?qt-science_support_page_ related_con=0#qt-science_support_page_related_con * ) lake_area km 2 Total lake surface area, extracted from the HydroLAKES database (Messager et al., 2016) lake_volume km 3 Total lake volume, extracted from the HydroLAKES database (Messager et al., 2016) mean_lake_depth m Mean lake depth, extracted from the HydroLAKES database (Messager et al., 2016) start_sampling Year Year when lake sampling started  Author contributions. All authors took part in development of the study. SS, BDP, and PVC conceptualized the study, while HA and JY developed the methods and carried out the data collection and data post-processing. HA wrote the original manuscript with contributions from JY, BDP, SS, HKP, and PVC. All authors reviewed and edited the final paper.
Competing interests. The contact author has declared that none of the authors has any competing interests.
Disclaimer. Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.