A synthetic satellite dataset of the spatio-temporal distributions of Emiliania huxleyi blooms and their impacts on Arctic and sub-Arctic marine environments ( 1998 – 2016 )

A 19-year (1998–2016) continuous dataset is presented of coccolithophore Emiliania huxleyi distributions and activity, i.e. the release of CaCO3 in water and the decrease of uptake of dissolved CO2 by Emiliania huxleyi cells (e.g. Kondrik et al., 2018a), in Arctic and sub-Arctic seas. The dataset is based on optical remotesensing data (mostly OC CCI data) with assimilation of different relevant in situ observations, preprocessed with authorial algorithms. Alongside bloom locations, we provide both detailed information on E. huxleyi impacts on carbon balance and the sub-datasets of quantified coccolith concentrations, particulate inorganic carbon content and CO2 partial pressure in water driven by coccolithophores. All data are presented on a regular 4×4 km grid at a temporal resolution of 8 days. The paper describes the theoretical and methodological basis for all processing and modelling steps. The data are available on Zenodo: https://doi.org/10.5281/zenodo.1402033.


Introduction
Among the topics related to ongoing climate change, there are alterations in both biodiversity in marine environments and the carbon balance in the atmosphere-ocean system (Rost et al., 2008).In some specific cases both processes are interrelated, being spurred up by one and the same agent(s).Along with other marine inhabitants, coccolithophores are such entities.Specifically, we look at the algal species named Emiliania huxleyi -a unicellular planktonic organism that is the most widespread coccolithophore in the world's oceans.Being simultaneously a calcifying and photosynthetic primary producer of, respectively, inorganic and organic carbon, Emiliania huxleyi, in the course of its life cycle, enhances both the concentration of calcite and carbon dioxide partial pressure in ocean surface water.At least within Emiliania huxleyi bloom areas, both processes are capable of changing the carbon balance and hence affect both CO 2 fluxes between the atmosphere and surface ocean and the aquatic biogeochemistry.Being a spatially huge phenomenon invariably occurring in both hemispheres, and steadily propagating in the poleward direction (Winter et al., 2014) due to CO 2 accumulation in the atmosphere and ensuing climate warming (Johannessen, 2008), Emiliania huxleyi blooms are believed to be highly relevant to understanding the comprehensive nature of the changes unfolding on our planet.
Historically, the initial build-up of knowledge on coccolithophores in general, and Emiliania huxleyi specifically, was broadly based on in situ approaches effected in the course of both shipborne and laboratory activities.Extensive data were obtained on Emiliania huxleyi cell morphometry, internal structure, intracellular dark reactions and photoreactions, factors controlling and affecting the cell growth, as well as intrinsic optical properties, such as total sunlight and spectral absorption, scattering and backscattering (Balch et al., 1996a).In addition, regression relationships were established between Emiliania huxleyi-driven changes in both inherent hydro-optical parameters and CO 2 partial pressure in surface water within the bloom area (Holligan et al., 1993).
Until recently, only a few satellite studies were performed and published on the typical locations of Emiliania huxleyi blooms and associated concentrations of particulate inorganic carbon in surface ocean within the bloom area (e.g. Gordon et al., 2001;Balch et al., 2016).
Prior to the publication by Kondrik et al. (2018a), to the best of our knowledge, only a couple of studies (Shutler et al., 2010(Shutler et al., , 2013) ) were undertaken to retrieve from space-borne data either both the total content of inorganic carbon produced by a Emiliania huxleyi bloom (PIC) and an increase in CO 2 partial pressure ( pCO 2 ) in surface water within the bloom area or else reveal intraannual and interannual variations over long time periods in the location and intensity of Emiliania huxleyi blooms.No concatenated time series data of a nearly 20-year duration are available to date on the associated quantifications of bloom surface, bloom intensity or pCO 2 for all Emiliania huxleyi blooms occurring within extensive latitudinal belts and encompassing waters of different oceans, i.e. marine tracts significantly distanced longitudinally.
Meanwhile, the above-specified information is an indispensable step towards a further pan-global inventory of the effects produced by E. huxleyi blooms on both marine chemistry and ecology, and CO 2 exchange fluxes between the atmosphere and ocean, as such fluxes condition the status of the world's oceans as a sink of CO 2 .
In addition to the studies cited above, it is also worth mentioning a few sources of multi-year satellite data on coccolithophore blooms that may be useful for potential users who wish to broaden their multifaceted databases in their studies.
The NASA OCEANCOLOR portal https://oceancolor.gsfc.nasa.gov/atbd/pic/(last access: 10 January 2019) offers extensive data on particulate inorganic carbon retrieved from MODIS with the Balch et al. (2005) methodology.Downloadable from https://oceandata.sci.gsfc.nasa.gov/(last access: 10 January 2019), these data have a 4 km and 1 day spatial and temporal resolution and cover the time period starting from 2000.
Also, reported by Loveday and Smyth (2018) a 40year time series  from AVHRR observation data on coccolithophore blooms is available at https://doi.org/10.1594/PANGAEA.892175.By employing specially developed coccolithophore bloom area masks that were developed from remote-sensing reflectance spectra, these data are monthly worldwide and available at a spatial resolution of 0.1 • (∼ 10 km).Although these data do not encompass any additional parameters such as particulate inorganic carbon or CO 2 partial pressure in surface water within the bloom, they can be valuable due to an exceptionally long observation period.
Unlike the publications mentioned above, the present paper reports on extensive concatenated original datasets generated for subpolar and polar seas of the Northern Hemisphere, viz. the North, Labrador (with adjacent North Atlantic open waters), Norwegian, Barents, Greenland and Bering seas.Based on the employed space-borne ocean colour information, the obtained datasets are processed into a nearly two decadal (1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016) time series for each of the target seas and marine areas.They encapsulate information about PIC and pCO 2 values in surface water within the bloom area together with intraannual and interannual variations in the location and intensity of Emiliania huxleyi blooms over a variety of seas and across a nearly 20-year time period.
Conjoined with a wealth of presently available supplementary data from satellite and shipborne missions on the environmental conditions under which target Emiliania huxleyi blooms emerged and developed, the synthetic dataset we are reporting on opens the way to detailed analyses of forward and feedback mechanisms governing the temporal and spatial dynamics of this phenomenon.Further utilisation of the results of such analyses in regional and global climatic models promises to predict future directions of development of the phenomenon in question (Rost et al., 2008).

Methodology and dataset content
Based on the facility of available satellite OC CCI (Ocean Colour Climate Change Initiative) and SeaWiFS data in the visible part of the spectrum, the following products have been generated to achieve the goals specified in the previous section, viz.(1) Emiliania huxleyi bloom extent, (2) concentration of coccoliths within the bloom, (3) total content of particulate inorganic carbon (PIC) produced by the bloom, and (4) an increase in CO 2 partial pressure in marine surface waters due to the blooming phenomenon.

Bloom area quantification
Quantification of Emiliania huxleyi bloom areas was performed in two stages.Firstly, RGB (red, green, blue) images were generated based on the weighted remote-sensing reflectance, R rs , which is the upwelling spectral radiance just above the water-air interface normalised to the downwelling spectral irradiance at the same level (Bukata et al., 1995).R rs values in the channels centred at 670, 555 and 443 nm were employed.Analysis of the space-borne radiometric data collected by Kondrik et al. (2017a, b) from the five target seas, yielded statistically robust specific ranges of R rs (λ) highlighting Emiliania huxleyi blooms as turquoise areas; the areas of blooms of other (non-calcifying) algae were reflected in the images as green.Areas with scarce non-calcifying algae abundance showed up as blue or dark blue.The land mask was overlaid so that land areas were coloured light yellow.
In the second stage of quantification of the Emiliania huxleyi bloom extent, an additional criterion was imposed on the revealed turquoise areas: R rs values should be maximal at 490 and/or 510 nm, while at other wavelengths they need to be in excess of 0.001 (412 nm), 0.008 (443 nm), 0.01 (490 nm), 0.008 (510 nm), 0.008 (555 nm) and ∼ 0 (670 nm).Such a selection provided the highest accuracy of bloom delineation.With the known pixel size, the bloom area can be confidently quantified.An example of Emiliania huxleyi bloom extent masking is shown in Fig. 1.

Determination of the coccolith concentration
Determination of the coccolith concentration within the bloom was performed with the BOREALI algorithm (Bio-Optical REtrieval ALgorIthm, Korosov et al., 2009), based on the Levenberg-Marquardt (L-M) finite difference technique (Press et al., 1992).The L-M technique solves the inverse problem, i.e. in our case it allows us to retrieve the concentrations of water constituents from spectral subsurface remote-sensing reflectance, R rsw (λ), which is the upwelling spectral radiance just beneath the water-air interface normalised to the downwelling spectral irradiance at the same level (Jerome et al., 1996).A hydro-optical model accommodating spectral specific absorption and backscattering coefficients of Emiliania huxleyi cells and coccoliths.As well as pure water, non-calcifying alga and dissolved organic matter was developed and employed to run the BOREALI (Kondrik et al., 2017a).
The results of validation of coccolith concentration retrievals with BOREALI were assessed through the following statistical measures: coefficient of correlation, r; linear regression equation, f (x); coefficient of determination, R 2 ; root mean square error, RMSE; and systematic error, BIAS and MAE.BIAS and MAE were then also normalised to the absolute values of coccoliths concentrations determined by using each model: r = 0.88, f (x) = 0.6159x +6.9197, R 2 = 0.77; RMSE = 3.55 × 10 9 coccoliths m −3 , BIAS = 25.30% and MAE = 32.30%.
In addition, ascertained by both RGB and R rs approaches, Emiliania huxleyi bloom areas were further checked using the results of coccolith concentration retrievals.This was done through the application of a threshold.A threshold of 90 × 10 9 coccoliths m −3 was chosen because, firstly, it ensures the best correspondence between the bloom surfaces, determined by our radiometric and BOREALI algorithms.Secondly, this threshold is very close to the average value of coccolith concentrations in developed Emiliania huxleyi blooms reported from the world's oceans (for references, see Balch et al., 1996bBalch et al., , 2005)).The numerical assessments of bloom surfaces delineated and quantified by the above independent methods converged precisely.

Coccolith content, particulate inorganic carbon and
CO 2 partial pressure increment determination Determination of the coccolith content (CC) was performed through establishing mixed-layer depth (MLD) within the bloom area.The climatology of Montegut et al. (2004) was applied.The identified areas of Emiliania huxleyi blooms with retrieved concentrations of coccoliths were overlapped by the respective climatological MLD fields, and for each pixel, the value of MLD was further used to calculate CC.
Further, CC values were used to quantify the total content of particulate inorganic carbon (PIC).It was done for each 8-day time period (corresponding to the temporal resolution of the space-borne radiometric data employed) through multiplying the carbon mass per coccolith, m and CC followed by summarising the results of multiplication within all pixels of respective bloom extent.The value of m was equal to 0.2 pg (Balch et al., 2005).The moment at which the PIC assessment could be ideally performed in each bloom corresponded to the situation when two conditions were fulfilled: (a) the bloom attained its largest surface and (b) the spectral curvature of remote-sensing reflectance, R rs (λ), exhibited a maximum at about 490 nm as the location of the R rs maximum at about 490 nm is an indication that the bloom is prevalently composed of coccoliths (Kondrik et al., 2017a).
Remote determinations of Emiliania huxleyi-driven pCO 2 increment ( pCO 2 ) consisted of establishing a relationship between Emiliania huxleyi-driven changes in pCO 2 , that is, pCO 2 , in bloom pixels, and the respective values of R rs (490).Such a relationship (Kondrik et al., 2018a) with the following statistical characteristics, coefficient of determination, r 2 = 0.54, p 0.001, and RMSE = 23.4µatm, was used to quantify the spatial variations of pCO 2 in the target seas followed by recalculating pCO 2 for the water tem- peratures (retrieved from space-borne data) that actually occurred during the respective Emiliania huxleyi bloom events (Copin -Montegut, 1988).

Additional technical workflow
During satellite data processing, several procedures were performed.
1. Satellite images were reprojected.Given the high latitudinal location of the target seas, it was relevant to use an equal-area polar projection.Therefore, the NASA Ease-Grid was employed.The system of coordinates of the WGS-84 (World Geodetic System 1984) is based on Ease-Grid.3. Missing pixels masked as ragged clouds were filled.In the case of ragged clouds, some pixels of RGB images are not informative.A special algorithm for filling such gaps included averaging of R rs (λ) values from neighbouring pixels and from temporarily previous and following images of the same pixel.The use of this algorithm in each of the cloud-masked images of the areas studied over 19 years and included in the OC CCI product helped to increase the analysed area, sometimes to a significant extent.Calculated from 1998 to 2016 as arithmetic means for the Barents, Bering, North, Norwegian and Greenland seas, the quantitative estimates of such an increase attained for each 8-day-averaged image reached, respectively, ∼ 107, 370, 31, 15 and 13 times.Thus, images were obtained with significantly larger cloud-free areas, ensuring a more accurate estimation of the borders of bloom areas, and their displacement, as well as of bloom areas per se.Examples of product visualisation (for the North Sea) are shown in Fig. 2.

Data sources
Data on R rs in six channels (centred at 412, 443, 490, 510 and 670 nm) are from the OC CCI product (Ocean Colour Climate Change Initiative dataset, version 3.0, European Space Agency, available online at http://www.esa-oceancolour-cci. org/, last access: 10 January 2019).
For the bio-optical retrieval algorithm validation, we employed the PANGAEA database (https://www.pangaea.de/,last access: 10 January 2019) of the concentration of coccoliths within the target coccolithophore blooms in the North Atlantic, including the North and Norwegian seas (Charalampopoulou et al., 2008(Charalampopoulou et al., , 2011 The bio-optical in situ database, spanning between 1997 and 2012 (16 years) was employed for ocean-colour satellite applications because it has global coverage (Valente et al., 2016).The data were acquired from several sources: MOBY (Marine Optical BuoY), BOUSSOLE (BOUée pour l'acquiSition d'une Série Optique à Long termE), AERONET-OC (AErosol RObotic NETwork-Ocean Color), SeaBASS (SeaWiFS Bio-Optical Archive and Storage System), NOMAD (NASA bio-Optical Marine Algorithm Dataset), MERMAID (MERIS Match-up In situ Database), AMT (Atlantic Meridional Transect), ICES (International Council for the Exploration of the Sea), HOT (Hawaii Ocean Time-series), and GeP&CO (Geochemistry, Phytoplankton, and Color of the Ocean).This database comprises a large number of variables, including the spectral remote-sensing reflectance, R rs and chlorophyll a concentration.
The GLobal Ocean Data Analysis Project (GLODAP) database (Key et al., 2015;Olsen et al., 2016), http://cdiac.ornl.gov/oceans/GLODAPv2/(last access: 10 January 2019) was employed for pairing in situ NO 3 values at those points for which in situ pCO 2 values were available.In the cases when the desired NO 3 matching values were unavailable in the GLODAP database, the respective data were employed from the World Ocean Atlas 2013 (WOA13, NOAA, Garcia et al., 2013; https://www.nodc.noaa.gov/OC5/woa13/,last access: 10 January 2019).
The SOCAT v4 database (The Surface Ocean CO 2 Atlas; Bakker et al., 2016; http://www.socat.info/access.html,last access: 10 January 2019) comprises more than 6 million pCO 2 measurements performed at latitudes north of 40 • N. The data employed by us from SOCAT V4 database met the following requirements: (1) measurements are conducted during 1998-2016 and within a 10 m top layer (if there were data from several depths, the measurements from the shallowest depth were used); (2) pCO 2 data should necessarily have both corresponding seawater salinity data and valid R rs spectra; (3) a daily mean pCO 2 value was employed provided there were several in situ measurements; (4) pCO 2 measurements are conducted at a distance not less than 8 km offshore (to avoid the impact of adjacency effect on R rs satellite data); (5) pCO 2 measurements were within the location and timing of Emiliania huxleyi blooming; and (6) data used from the SOCAT v4 database overlap the data from either the GLODAP database or the WOA13 climatology database (depending on which one was used for comparison).

Data spatio-temporal domain
The published dataset covers a time period of 19 years, from 1998 to 2016, with a time resolution of 8 days (a total of 874 time periods), and a spatial domain with the total area of 11 056 800 km 2 at a resolution of 4 × 4 km, divided into four regions described in Table 1 and shown in Fig. 3.
All data are represented in the Lambert azimuthal equal area projection with the parameters corresponding to the widespread NSIDC EASE-Grid North (EPSG: 3973) coordinate system.
The selection of four regions in this work was made for several reasons.They include all seas where coccolithophore blooms usually occur in subpolar and polar regions of the Northern Hemisphere (North, Norwegian, Greenland, Barents, Bering and Labrador seas).The exclusion from our dataset of blooms occurring in the northern parts of the Atlantic Ocean (see, e.g.Holligan et al., 1993) was dictated by some technical restrictions: the hydro-optical model employed for obtaining coccolith concentration values was based prevalently on the data from high-latitude areas and thus should first be validated for geographically different marine environments such as open parts of the Atlantic Ocean.

Dataset overview
The 19-year period data covers four blooming regions differing in nature.This allows us to evaluate the bloom-related processes on different scales and time intervals in order to  reveal both interannual dynamics and seasonal variations of parameters relevant to the bloom phenomenon.Emiliania huxleyi blooms in the Arctic and sub-Arctic seas are characterised by significant instability: the difference in intensity of blooming in different years can be 10-fold.Figure 4 and Table 2 collectively illustrate for the above four marine regions the temporal dynamics in bloom intensity (i.e.bloom-ing area).For example, in the Bering Sea (region 4), the most extensive blooms were observed exclusively from 1998 to 2001, but later on, their intensity decreased drastically.In region 1, mainly in the Barents, Norwegian and North seas, the blooming activity over the years we are reporting on was very irregular, with a peak in 2016.
Earth Syst.Sci.With the collected data, it is possible to highlight the patterns of development of the regularly occurring blooms.They can be characterised with the beginnings and ends of blooming periods, and the overall dynamics of coccolith concentration during the blooms.Such patterns can be established based on the published dataset.Figure 5 shows an example of bloom development in the Greenland Sea (region 2) in the period 26 June-13 August 2014.However, these periods are generally unstable, which is clearly seen in Fig. 6, which dis- Technically, each dataset contains four sub-datasets: bloom status, coccolith concentration, particulate organic carbon content and CO 2 partial pressure in water driven by coccolithophores.The last three categories contain the directly calculated parameter values.The first sub-dataset contains information on the quality and content of the data.This information is organised as a set of flags attributed to data on reliable observations of the presence or absence of blooming, or inaccurate data (usually due to clouds), as well as data on the coast.Figure 7 provides both an example of a status matrix and the matrix containing coccolith concentration values.

Data availability
The dataset is available on Zenodo (Kondrik et al., 2018b; https://doi.org/10.5281/zenodo.1402033).Data granules are divided into directories by region and year, and each child directory contains files with 8-day data on the bloom status, coccolith concentration, PIC and pCO 2 .Data are stored in NetCDF4 format with GDAL-support, which allows immediate use of the data with any NetCDF-based or GIS software.Tips on how to read the data and on QGIS styles for fast visualisation are also provided.

Conclusions
We have composed a detailed 19-year dataset of Emiliania huxleyi blooms in the Arctic and sub-Arctic seas, including information about their influence on the carbon cycle in the ocean.These data are based mostly on satellite remote-sensing observations, but also on available shipborne measurements and results of processing with authorial algorithms.We hope that the publication of these data, on the one hand, will promote further studies aimed at elucidating Emiliania huxleyi bloom driving mechanisms and their forcing factors and, on the other hand, will facilitate an understanding of the patterns of this phenomenon distribution and its impact on the ocean and the atmosphere.
Author contributions.DP is responsible for the theoretical background and methodology development.DK also contributed to theoretical background research and is responsible for the development and programming of data processing algorithms.EK conceived the dataset structure and contributed to programming of data processing algorithms, data analysis and visualisations.All authors equally contributed to the writing of the manuscript and data quality control.

Figure 1 .
Figure 1.Example of the bloom masking algorithm performance.(a) Source of the OC CCI RGB imagery for the North Sea (9 June 2016, with land mask); (b) calculated bloom mask (white pixels stand for bloom detected, black pixels are areas void of bloom).

Figure 2 .
Figure 2. Example of dataset products (the North Sea, 9 June 2016).(a) Source OC CCI RGB imagery with the bloom mask contoured in red, (b) coccolith concentration (10 9 m −3 ), (c) content of particulate inorganic carbon (tonnes) and (d) an increase in CO 2 partial pressure in water (µatm).

Figure 3 .
Figure 3. Dataset of target spatial regions.Regions are shown as coloured boxes, and the colour bar indicates the number of bloom observations in each pixel over the time period 1998-2016.

Figure 4 .
Figure 4. Total number of identified pixels with Emiliania huxleyi for each blooming season in the period 1998-2016 within the four regions specified in Fig. 3.

Figure 5 .
Figure 5. Bloom development in the Greenland Sea (region 2) in June-August 2014.The peak falls on 20 July.

Figure 6 .
Figure 6.Bloom intensity in the Greenland Sea (region 2) on 20 July in different years.Its instability is obvious.