Interactive comment on “ A synthetic satellite dataset of E . huxleyi spatio-temporal distributions and their impacts on Arctic and Subarctic marine environments ( 1998 – 2016 ) ” by Dmitry Kondrik

A 19-year (1998-2016) continuous dataset of coccolithophore E.miliania huxleyi distributions and activity, i.e. the release of CaCO3 in water and the decrease of uptake of dissolved CO2 by Emiliania huxleyi cells (e.g. Kondrik et al., 2018a), in Arctic and Subarctic seas is presented. The dataset is based on optical remote sensing data (mostly OC CCI data) with assimilation of different relevant in-situ observations, preprocessed with authorial algorithms. Alongside with bloom locations, 10 we also provide both detailed information on E. huxleyi impacts within the bloom area on marine environmentson carbon balance and the subdatasets of quantified coccolith concentrations, particulate inorganic carbon content and CO2 partial pressure in water driven by coccolithophores. All data are presented on a regular 4x4 km grid at a temporal resolution of 8 days. The paper describes the theoretical and methodological basis for all processing and modeling steps. The data are available on Zenodo: https://doi.org/10.5281/zenodo.1402033. 15

Below are our answers.

C1
b) For all target seas we collected published reports from in situ/shipborne/laboratory studies explicitly indicating that the coccolithophore blooms were produced by E. huxleyi (see the attached specific list of references) with two exceptions for the Norwegian and Iceland seas, where along with E. huxleyi, Coccolithus pelagicus composes the coccolithophore community. However, as in situ determinations showed in the overwhelming cases the concentrations of cells of Coccolithus pelagicus were marginal (see e.g. Dylmer et al., 2015). This is the reason why we prefer leaving E. huxleyi instead of coccolithophores. A large number of papers on calcifying alga blooms in our targeted seas define the bloom-producing species as E. huxleyi.
Pg. 1, Ln. 7: By "activity" we meant the release of CaCO3 in water and a decrease of uptake of dissolved CO2 by E. huxleyi cells (e.g. Kondrik et al., 2018). In the revised version of the paper we will specify the actual meaning of the employed word "activity".
Pg. 1., Ln.16: It appeared to us that the issue of consequences of ongoing climate change-driven consequences is presently a commonplace, not requiring any further specialization. Indeed, the consequences are multifaceted, with numerous forward and feedback interactions and relate to many spheres of knowledge. So we choose to extend this phrase a little bit and provide this sentence with a reference that reasonably overarches the main dimensions of this phenomenon.
Pg. 1. Ln. 25; You are right, and we will add the reference " Winter et al., 2014". Pg. 2., Ln 6:We agree that this phrase is kind of awkward and we will reword it as follows: "solely satellite remote sensing approach is. . ." Pg. 2. Ln. 21: the following change will be made: the North, Labrador (with adjacent North Atlantic open waters), Norwegian, Barents, Greenland and Bering seas.
Pg. 4, Lns 30-32+ Figure 2c: The total content of PIC, Mpic, was determined for each 8-day time-period through multiplication of the carbon mass per coccolith, m, the coc-C2 colith concentration, Ccc, MLD and the bloom area, S. The value of m was equalled to 0.2pg. While most historical reports support this estimation, it is likely that the employment of this conversion might lead to either (i) some underestimation of PIC since it nevertheless neglects rare, relatively large, suspended calcite particles (PIC concentration per coccolith is ∼0.26 pg by Balch et al.(1991) and 0.5-0.6 pg by Holliganet al.(1983)) or (ii) some underestimation as there are in situ data indicating that many coccoliths in E. huxleyi blooms are either fragmented due to wave action (Holliganet al. 1993b) or just of a smaller size (PIC concentration is 0.13 pg) (Fernandez et al. 1993, Fritz 1999. Thus on balance, the selected value of m, in all probability, is a reasonably good estimate which is supported by the historical literature (Balch et al. 2005). The respective details are provided in section 2. Accordingly, the numbers in Figure 2c are indeed in tons as they reflect the content of PIC in a pixel-size column with the vertical extent equal to the respective MLD that was ascribed to each pixel within the bloom area. The respective methodology is described in detail in Kondrik et al., 2017 and will be given in the text.
Again, we express our gratitude to the referee for his very valuable comments.
Publications explicitly indicating the kind of coccolithophore species forming bloom in the target seas: Barents Sea (Olson & Strom, 2002) Bering Sea (Sukhanova and Flint, 1998) North Sea (Holligan et al., 1993b;Buitenhuis et al., 1996) Norwegian Sea (Baumann et al., 2000) Labrador Sea (Okada & McIntyre, 1979) North Atlantic (Holligan et al., 1993a)  Kondrik and collaborators present a 19-year satellite time series of Emiliania huxleyi bloom area, calcite content, and associated increase in in-water pCO2 in four selected areas of the high-latitude northern hemisphere. The dataset is only partly unique, in the sense that a 19-year global remote sensing dataset of E. huxleyi bloom extent, coccolith concentration, and PIC content can also be easily obtained elsewhere. Therefore uniqueness only applies to pCO2. This dataset could be useful, but I request a few substantial modifications that I believe are necessary to improve understanding and C1 quality of the dataset: (1) some flaws in the dataset (pointed out below, 1a and 1b) will need to be fixed, (2) error estimates for remotely sensed quantities must be provided, and (3) in its present form, the study/data is not correctly positioned within the state-of-the-art literature and other available datasets.
(1a) It appears from Fig. 4 that the E. huxleyi bloom dataset includes false positives, a problem that is particularly evident in the Bering Sea (1998)(1999)(2000)(2001) where the authors have detected blooms initiating in winter and lasting about 10 months as previously reported from ocean colour remote sensing data (Iida et al., 2002). However, ship-borne measurements have identified resuspended diatom frustules as the cause of these bright waters in winter-spring instead of E. huxleyi blooms (Broerse et al., 2003). This invalidates the authorial E. huxleyi bloom detection algorithm and all derived products in the Bering Sea from late fall to spring. I further fail to see how the algorithms used by the authors (Kondrik et al. 2017;Kondrik et al. 2018) to detect E. huxleyi blooms present an advance to NASA's standard method of E. huxleyi bloom classification (Brown and Yoder, 1994), and many other subsequent bloom detection methods (Iglesias-Rodriguez et al., 2002;Iida et al., 2002;Iida et al., 2012;Moore et al., 2012). (1b) The remote sensing algorithm for pCO2 estimation is a simple linear regression between observations of Delta_pCO2 and remote sensing reflectance Rrs in a blue waveband. This relationship is strictly empirical and does not appear to have theoretical grounds; I believe the user should be aware of this. Not surprisingly, there is an enormous spread along this regression line such that for a given reflectance value the estimated Delta_pCO2 has a confidence interval with a width of 50 ppm and even wider for denser blooms. Furthermore, the residuals of the regression are clearly unevenly distributed, with a strong tendency to underestimate Delta_pCO2 at higher reflectances. This relationship should be explicitly stated, which is not presently the case, including all relevant regression statistics, and especially a figure showing the observations and the fitted line so that the user can better grasp the errors of the algorithm.
(2) Whereas the statistics of the validation of the retrieved coccolith concentration are given in section 2.2, the accompanying figure is missing. C2 No uncertainty assessment is given for pCO2 (see previous comment).
(3) A 19-year global remote sensing dataset of PIC concentration merging all ocean colour satellite missions can be obtained here: http://www.globcolour.info/ in temporal resolutions ranging from daily to monthly, spatial resolution ranging from 4km to 100km, and various geographical projections. From PIC concentration, coccolith concentration can be derived using a fixed mass per coccolith (as you do too), and PIC content can also be easily derived by combining with a climatology for Mixed layer depth available here http://www.ifremer.fr/cerweb/deboyer/mld/Surface_Mixed_Layer_Depth.php. I therefore suggest you remove all statements of uniqueness of your PIC dataset (e.g., page 2, lines 24-26). The statements on page 2 lines 11-16, "Prior to the publication of Kondrik et al. (2018), no attempts have been undertaken to retrieve from space. . . No concatenated time series data are available to date on the associated bloom intensity. . ." are thus simply incorrect. I also suggest you appropriately reference the work of (Shutler et al., 2013) entitled "Coccolithophore surface distributions in the North Atlantic and their modulation of the air-sea flux of CO2 from 10 years of satellite Earth observation data Âż, which is very similar to your work on remote sensing of pCO2 in Ehux blooms, but is mentioned nowhere. Page2 Line 8-10: "Until recently, only few satellite studies were published on the typical locations of E. huxleyi blooms and associated concentrations of PIC in surface waters within the bloom area". It appears to me you missed a vast body of literature: (Balch et al., 1991;Balch et al., 1996;Gordon et al., 2001;Smyth et al., 2004;Signorini and McClain, 2009;Moore et al., 2012;Hopkins et al., 2015;Balch et al., 2016;Neukermans et al., 2018) etc.
Further comments : Title : add "blooms" after "E. huxleyi" Abstract : delete "detailed information on E. huxleyi impacts within the bloom area on marine environments", as this suggests that you are detailing ecological impacts P1, L16 : "Ongoing climate change is a background of numerous emerging hot topics." is a rather meaningless opening sentence. P1 L25 : Rivero-Calle is not the right ref-  With all respect for the reviewer, we can't agree with the reviewer's opinion that if any dataset(s) including the parameter(s) listed in our paper already exist(s) then our dataset can not be qualified as unique. The uniqueness of our dataset resides in that that (A) it combines a spatially and temporarily collocated setof parameters (not solely e.g. (D) designed specifically for the user convenience. Thus importantly, the user does not need to compose such a comprehensive database but use the already collected and user-friendly organized data source. Incidentally, this is explicitly corroborated by the reviewer himself/herself: even a spaceborne database on coccolith concentration per se is not available and needs to be retrieved from satellite datasets of PIC.
Summing up: Given that our E-huxleyi-focused a ready-made database is yet unparalleled in terms of its combined areal+temporal coverage (6 seas in 3 oceans, 19 years, respectively), and the number of concatenated variables/parameters, we insist that, to date, it is veritably unique.
Other critical remarks relating to the issue of our database are commented on below.
2. Regarding the presence or absence of E. huxleyi blooms in the Bering Sea.
We considered this issue in detail in our work (Kondrik et al., 2017a), and it would obviously be improper to give here the respective entire excerpt from the above paper.
In capsule: A. Broerse et al.(2003) recognized that the area in which they took water samples, was on the very edge of the "bright patch". They write: "From the 7 February 2001 satellite image ( Fig. 1(5)), it is not clear whether the sampling transect actually reached the edge of bright water patch". It is also worth pointing out that along with the diatom frustules,Broerse et al. also found coccoliths in their samples.

C2
B. The ability of this alga to vegetate under conditions of very low levels of downwelling PAR irradiance is documented by Okada and McIntyre (1979): they have shown through their around-the-year shipborne measurements in the Labrador Sea at a latitudinal location (e.g. Station 'Bravo,' 56.5 • N) similar to the location of the turquoise area in the Bering Sea that E. huxleyi cells indeed vegetated over a very long time period including not only summer time but also the winter period.
C. The appearance of turquoise areas in pelagic marine waters is a very strong argument in favor of attributing them to E. huxleyi blooms as no other hydrocoles possess such optical properties, which would render the truly turquoise color of their communities when observed from above. As Shutler et al. (2010) point out, this is a unique characteristic within phytoplankton species. Optically, diatom frustules are not identical to coccoliths. So that they would not produce the same remote sensing reflectance spectrum as coccoliths do.
An additional, albeit unnecessary argument: the phenomenon of huge blooms of E. huxleyi with extraordinarily high concentrations of coccoliths lasted only a few years and since 2001 have never re-occurred while diatoms blooms and associated release of frustules arethe annual event in the Bering Sea.
D. Finally, (although this argument is certainly optional, it only makes us additionally confident of our interpretation and robustness of our E. huxleyi bloom identification algorithm) we revealed the driving mechanism of the phenomenon of E. huxleyi blooms of exceptional intensity during 1998-2001, but this is the subject of a new paper, and we can't disclose it before its publication (expected in 2019).
In light of the above, the reviewer's assertion that our algorithm is invalidated because of the "false positives" in the Bering Sea could not be accepted.
3. Regarding the contested adequacy of our retrieval algorithms.
3a. On the advantage of our coccolith concentration retrieval algorithm.

C3
We are not going to discuss here the advantages and disadvantages of E. huxleyi bloom detection algorithms suggested by other workers: it deserves a separate paper. Iida et al. (2002) have done it in detail with respect to e.g. the Brown and Yoder (1994) algorithm and pointed to some problems with it. Incidentally, Brown and Yoder themselves acknowledged certain limitationsof a world-wide application of their algorithm. Moore et al. (2012) commented on the feasibility of the algorithms in question developed by other teams that the reviewer specified in the his/her list of references.
The advantages of our algorithm were discussed in Kondrik et al. (2017a), and we hope that the reviewer does not expect us to dwell upon them. They can be epitomized as follows: our algorithm (i) was developed on the basisof a nearly 20 year merged and skillfully harmonized OC CCI data provided by SeaWiFS, MODIS, MERIS, and VIIRS sensors;a comparative analysis of the OC CCI,GlobColour products, as well as the products from the MEaSUREs was conducted to prove the preference of the OC CCI data.
(ii) is based on extensive statistical analysis of satellite spectrometric [Rrs(lambda)] data fromsix marine environments specifically at high northern latitudes inthe North Atlantic, Arctic and North Pacific Oceans; (iii) employsseveral criteria in conjunction, viz.: (a) location of maxima at the wavelengths typical of E. huxleyi bloom in Rrs spectra; (b) Rrs absolute value ranges at six wavelengths obtained through a dedicated/ large-size statistical sets of spaceborne data from the six seas; (c) consistency with the results of independent application of the BOREALI hydro-optical algorithm (Korosov et al., 2009;Kondrik et al., 2017a), which through retrieving inter alia the concentration of both coccoliths and chlorophyllapermits to obtain the spatial distribution of the E. huxleyi bloom. This triple checking assured a higher reliability of the algorithm.
3b. Delta pCO2 retrieval algorithm C4 Again we believe that it would be improper to give here the respective entire excerpt from the paper on pCO2published in a refereed journal (Kondrik et al., 2018a). In a nutshell: (i) the algorithm has the accuracy of delta pCO2 retrieval that is characterized by the following statistical parameters r2 = 0.54, pâL'ł0.001, and RMSE = 23.4µatm; (ii) the ensemble of blue data points in fig. 1 (Kondrik e al., 2018a) that looks like an "enormous spread" were obtained using climatological data and added solely to increase the statistical significance of the regression dependence established through using only in situ data that we could find for our study regions (red dots, their number is 187). Most of these points are within the declared error of 23.4 uatm; the indicated red points do not have the problem of Delta_pCO2 values overestimation indicated by the reviewer. It is also necessary to emphasize that a) "confidence interval" the reviewer refer tois in fact the "prediction limit" while the "confidence limit" has a much smaller variation (about 10 uatm). Also, it is important to be aware that the variation is given in uatm(units of partial pressure), but not in ppmas the reviewer writes.
(iii) all corrections for water temperature were duly conducted using the concurrently collected radiometric and IR satellite data.
(iv) the developed delta pCO2 regression dependence has a truly physical basis. Indeed, the increment of pCO2 in surface water within the E. huxleyi bloom is intimately related to the intracellular production of CO2 through the reaction of calcification and associated generation of coccoliths. The latter are very efficient reflectors of sun light coming into water (just because they don't absorb light but only reflect it). Therefore, the greater the amount of CO2 released through calcification, the more intense the optical signal coming out from the bloom area, especially at the wavelength of Rrs maximum -the parameter in our algorithm that is related to delta pCO2. Incidentally, returning to point 2C in our argumentations above, this is an important difference between coccoliths and diatomic frustules as the latter are not solely reflectors but also C5 absorbers.
4. The graphical illustration of validation of the retrievals of coccolith concentrations is available in our easily accessible papers published elsewhere, we doubt that the inclusion of those illustrations would be justified.

5.
We acknowledge the reviewer's critical remarks in C3 -C4. All necessary changes are entered, the respective references [e.g. Shutler et al. (2010Shutler et al. ( , 2013Winter et al., 2014] are added to the reference list. We certainly appreciate the list of references provided by the reviewer although, actually, we were aware of nearly all listed publications. The reason why they were not used is explained in point 1of our answers. As to the worksby Shutler et al. (2010Shutler et al. ( ,2013, it is indeed our flaw. We are earnestly grateful to the reviewer for this valuable critical remark. in Arctic and Subarctic seas is presented. The dataset is based on optical remote sensing data (mostly OC CCI data) with assimilation of different relevant in-situ observations, preprocessed with authorial algorithms. Alongside with bloom locations, 10 we also provide both detailed information on E. huxleyi impacts within the bloom area on marine environmentson carbon balance and the subdatasets of quantified coccolith concentrations, particulate inorganic carbon content and CO2 partial pressure in water driven by coccolithophores. All data are presented on a regular 4x4 km grid at a temporal resolution of 8 days. The paper describes the theoretical and methodological basis for all processing and modeling steps. The data are available on Zenodo: https://doi.org/10.5281/zenodo.1402033. 15

Introduction
Ongoing climate change is a background of numerous emerging hot topics. Among them,Among the topics related to the ongoing climate change, there are alterations of both biodiversity in marine environments and the carbon balance in the atmosphere-ocean system (Rost et al., 2008). In some specific cases both processes are interrelated being spurred up by one and the same agent(s). Along with other marine inhabitants, coccolithophores are such entities, and more specifically, the algal 20 species named Emiliania huxleyia unicellular planktonic organism that is the most widespread coccolithophore in the world's oceans. Being simultaneously a calcifying and photosynthetic primary producer of, respectively, inorganic and organic carbon, E.miliania huxleyi, in the course of its life cycle, enhances both the concentration of calcite and carbon dioxide partial pressure in ocean surface water. At least within E.miliania huxleyi bloom areas, both processes are capable of changing the carbon balance, and hence affect both CO2 fluxes between the atmosphere and surface ocean and the aquatic biogeochemistry. Being 25 a spatially huge phenomenon invariably occurring in both hemispheres, and gradually steadily propagating in the poleward direction (Winter et al., 2014) due to CO2 accumulation in the atmosphere (Rivero-Calle et al., 2015) and ensuing climate warming (Johannessen, 2008), E.miliania huxleyi blooms are believed to be highly relevant to understanding the comprehensive nature of the changes unfolding on our planet.
2 Historically, the initial building up of knowledge on coccolithophores in general and E.miliania huxleyi, specifically, was broadly based on in situ approaches effected in the course of both shipborne and laboratory activities. Extensive data were obtained on E.miliania huxleyi cell morphometry, internal structure, intracellular darkand photoreactions, factors controlling/affecting the cell growth, as well as intrinsic optical properties, such as sun light total and spectral absorption, scattering/backscattering (Balch et al., 1996a). In addition, regression relationships were established between E.miliania 5 huxleyi-driven changes in both inherent hydro-optical parameters and CO2 partial pressure in surface water within the bloom area (Holligan et al. 1993).
However, as this phenomenon extends over marine areas in excess of hundreds of thousand square kilometres Kondrik et al., 2018a), and is spatially and temporally highly dynamic, solely satellite remote sensing approach means areis able to comply with the challenge of studying it. 10 Until recently, only few satellite studies were performed and published on the typical locations of E.miliania huxleyi blooms and associated concentrations of particulate inorganic carbon in surface ocean within the bloom area (e.g. Gordon et al., 2001;Balch et al., 2016).
Prior to the publication by Kondrik et al. (2018a), no attempts to the best of our knowledge, only a couple of studies (Shutler et al., 2010; have been undertaken to either retrieve from spaceborne data both the total content of inorganic carbon 15 produced by a E.miliania huxleyi bloom (PIC) and increase in CO2 partial pressure (ΔpCO2)  Conjoined with a wealth of presently available supplementary data from satellite and shipborne missions on the environmental conditions under which target E. huxleyiEmiliania huxleyi blooms emerged and developed, the synthetic dataset we are reporting herein opens the way to detailed analysis of forward and feedback mechanisms governing the temporal and spatial 3 dynamics of this phenomenon. Further utilization of the results of such analysis in regional and global climatic models promises to predict future directions of development of the phenomenon in question (Rost et al., 2008).

Methodology and dataset content
Based on the facility of available satellite OC CCI (Ocean Colour Climate Change Initiative) and SeaWiFS data in the visible part of the spectrum, the following products have been generated to achieve the goals specified in the previous section, viz.: 5 1. E. huxleyiEmiliania huxleyi bloom extent; 2. Concentration of coccoliths within the bloom; 3. Total content of particulate inorganic carbon (PIC) produced by the bloom; 4. Increase in CO2 partial pressure in marine surface waters due to the blooming phenomenon.

Bloom area quantification
Quantification of E. huxleyiEmiliania huxleyi bloom areas was performed in two stages. Firstly, RGB (red-green-blue) images 10 were generated based on the weighted remote sensing reflectance, Rrs, which is the upwelling spectral radiance just above the water-air interface normalized to the downwelling spectral irradiance at the same level (Bukata et al., 1995). Rrs values in the channels centered at 670, 555, and 443 nm were employed. Analysis of the spaceborne radiometric data collected by Kondrik et al. (2017a, b) from the 5 target seas, yielded statistically robust specific ranges of Rrs(λ) highlighting E. huxleyiEmiliania huxleyi blooms as turquoise areas; the areas of blooms of other (noncalcifying) algae were reflected in the images as green. 15 Areas with scarce noncalcifying algae abundance showed up as blue or dark blue. The land mask was overlaid so that land areas were coloured light yellow.
In the second stage of quantification of E. huxleyiEmiliania huxleyi bloom extent, an additional criterion was imposed on the revealed turquoise areas: Rrs values should be maximal at 490 nm and/or 510 nm, while at other wavelengths they need to be in excess of 0.001 (412 nm), 0.008 (443 nm), 0.01 (490 nm), 0.008 (510 nm), 0.008 (555 nm), and ~0 (670 nm). Such a 20 selection provided the highest accuracy of bloom delineation. With the known pixel size, the bloom area can be confidently quantified. An example of E. huxleyiEmiliania huxleyi bloom extent masking is shown in Figure1.

Determination of the coccolith concentration 5
Determination of the coccolith concentration within the bloom was performed with the BOREALI algorithm (Bio-Optical REtrieval ALgorIthm, Korosov et al. 2009), based on the Levenberg-Marquardt (L-M) finite difference technique (Press et al. 1992). The L-M technique solves the inverse problem, i.e. in our case allows to retrieve the concentrations of water constituents from spectral subsurface remote-sensing reflectance, Rrsw(λ), which is the upwelling spectral radiance just beneath the waterair interface normalized to the downwelling spectral irradiance at the same level (Jerome et al., 1996). A hydro-optical model 10 accommodating spectral specific absorption and backscattering coefficients of E. huxleyiEmiliania huxleyi cells and coccoliths as well as pure water per se, non-calcifying alga and dissolved organic matter was developed and employed to run the BOREALI (Kondrik et al., 2017a).
In addition, ascertained by both RGB and Rrs approaches, E. huxleyiEmiliania huxleyi bloom areas were further checked up using the results of coccolith concentration retrievals. This was done through the application of a threshold. A threshold of 90 20 × 10 9 coccoliths m −3 was chosen because, firstly, it assures the best correspondence between the bloom surfaces, determined by our radiometric and BOREALI algorithms. Secondly, this threshold is very close to the average value of coccolith concentrations in developed E. huxleyiEmiliania huxleyi blooms reported from the world's oceans (for references, see Balch 5 et al. 1996b;Balch et al. 2005). The numerical assessments of bloom surfaces delineated/quantified by above independent ways converged precisely.

Coccolith content, particulate inorganic carbon and CO2 partial pressure increment determination
Determination of the coccolith content (CC) was performed through establishing mixed layer depth (MLD) within the bloom area. The climatology of Montegut et al. (2004) was applied. The identified areas of E. huxleyiEmiliania huxleyi blooms with 5 retrieved concentrations of coccoliths were overlapped by the respective climatological MLD fields, and for each pixel, the value of MLD was further used for calculating CC. Further, CC values were used to quantify the total content of particulate inorganic carbon (PIC). It was done for each 8-day time period (corresponding to the temporal resolution of the spaceborne radiometric data employed) through multiplying the carbon mass per coccolith, m, and CC followed by summarizing the results of multiplication within all pixels of respective bloom extent. The value of m was equalled to 0.2 pg (Balch et al., 2005). The 10 moment, at which the PIC assessment could be ideally performed in each bloom, corresponded to the situation when two conditions were fulfilled: (a) the bloom attained its largest surface, and (b) the spectral curvature of remote sensing reflectance, Rrs(λ), exhibited a maximum at about 490 nm as the location of Rrs maximum at about 490 nm is an indication that the bloom is prevalently composed of coccoliths (Kondrik et al., 2017a).
Remote determinations of E. huxleyiEmiliania huxleyi-driven pCO2 increment (ΔpCO2) consisted in establishing a relationship 15 between E. huxleyiEmiliania huxleyi-driven changes in pCO2, that is, ΔpCO2, in bloom pixels, and the respective values of Rrs (490). Such a relationship (Kondrik et al., 2018a) with the following statistical characteristics: coefficient of determination, r 2 = 0.54, p ≪0.001, and RMSE = 23.4 μatm was used to quantify the spatial variations of ΔpCO2 in the target seas followed by recalculating ΔpCO2 for the water temperatures (retrieved from spaceborne data) that actually occurred during respective E. huxleyiEmiliania huxleyi bloom events (Copin- Montegut, 1988). 20

Additional technical workflow 5
In the causeprocess of satellite data processing, several preceding procedures were performed.
1. Reprojection of satellite images. Given the high latitudinal location of the target seas, it was relevant to use an equal-area polar projection. Therefore, the NASA 'Ease-Grid' was employed. The system of coordinates of the WGS-84 (World Geodetic System 1984) is at the basis of 'Ease-Grid'. (comparable with cloud-produced signals), which may have led to possible mistakes in the masking algorithm. The problem was overcome via manual processing of the data of a lower level, i.e. directly from the SeaWiFS level 2 product (http://oceancolor.gsfc.nasa.gov/cgi/browse.pl?sen=am) for the period of 1998-2001 in all studied areas. As a result, in the RGB-images the areas masked as clouds in OC CCI images proved to exhibit large bloom areas with the brightness of signals typical of E. huxleyiEmiliania huxleyi. This approach was legitimate as OC CCI data obtained by different sensors have been 5 brought to the SeaWiFS standard channels, and the entire data time series (1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016) was radiometrically uniform.

Correction of Automatic
3. Filling Missing Pixels Masked as Ragged Clouds. In the case of ragged clouds, some pixels of RGB images are not informative. A special algorithm for filling such gaps included averaging of Rrs(λ) values from neighboring pixels and from temporarily previous and following images of the same pixel. The use of this algorithm in each of the cloud-masked images of the areas studied over 19 years and included in the OC CCI product helped increase the analysed area, sometimes to a 10 significant extent. Calculated from 1998 to 2016 as arithmetic means for the Barents, Bering, North, Norwegian and Greenland seas, the quantitative estimates of such an increase attained for each 8-day-averaged image reached, respectively, ~107, 370, 31, 15, and 13 times. Thus, obtained were images with significantly larger cloud-free areas assuring a more accurate estimation of the borders of bloom areas, and their displacement, as well as of bloom areas per se.
Examples of products visualizations (for the North Sea) are shown in Figure 2. 15
For the bio-optical retrieval algorithm validation, we employed the PANGAEA database (www.pangaea.de) of the concentration of coccoliths within the target coccolithophore blooms in the North Atlantic including the North and Norwegian 20 Seas (Charalampopoulou et al. 2008(Charalampopoulou et al. , 2011. The bio-optical in situ database spanning between 1997 and 2012 (16 years) was employed for ocean-colour satellite applications as having a global coverage (Valente et al., 2016). The data were acquired from several sources: Data on mixed layer depth (MLD) were derived from the Montegut climatology (Montegut et al. 2004 (Jakobsson et al. 2012).

8
The GLobal Ocean Data Analysis Project (GLODAP) database (Key et al., 2015;Olsen et al., 2016), http://cdiac. ornl.gov/oceans/GLODAPv2/ was employed for pairing in situ NO3 values at those points for which in situ pCO2 values were available. In the cases when the desired NO3 matching values were unavailable in the GLODAP database, the respective data were employed from the World Ocean Atlas 2013 (WOA13, NOAA, Garcia et al., 2014; https://www.nodc.noaa.gov/OC5/woa13/). 5 The SOCAT v4 database (The Surface Ocean CO2 Atlas, Bakker et al., 2016; http://www.socat.info/access.html) comprises more than 6 million pCO2 measurements performed at latitudes north of 40°N. The data employed by us from SOCAT V4 database met the following requirements: (1) measurements conducted during 1998-2016 and within a 10 m top layer (if there were data from several depths, the measurements from the shallowest depth were used); (2) pCO2 data should necessarily have both corresponding seawater salinity data and valid Rrs spectra; (3) a daily mean pCO2 value was employed provided there 10 were several in situ measurements; (4) pCO2 measurements conducted at a distance not less than 8 km offshore (to avoid the impact of adjacency effect on Rrs satellite data); (5) pCO2 measurements were within the location and timing of E. huxleyiEmiliania huxleyi blooming; and (6) data used from SOCAT v4 database overlap the data from either the GLODAP database or the WOA13 climatology database (depending upon which one was used for comparison).
The GLobal Ocean Data Analysis Project (GLODAP) database (Key et al., 2015;Olsen et al., 2016), http://cdiac. 15 ornl.gov/oceans/GLODAPv2/ was employed for pairing in situ NO3 values at those points for which in situ pCO2 values were available. In the cases when the desired NO3 matching values were unavailable in the GLODAP database we resorted to the respective data from the World Ocean Atlas 2013 (WOA13, NOAA, Garcia et al., 2014; https://www.nodc.noaa.gov/OC5/woa13/).

Data spatio-temporal domain 20
The published dataset covers a time period of 19 years, from 1998 to 2016, with a time resolution of 8 days (a total of 874 time periods), and a spatial domain with the total area of 1,1,05,6,800 km 2 at a resolution of 4x4 km, divided into 4 regions described in Table 1 and shown in Figure 3.
All data a represented in the Lambert Azimuthal Equal area projection with the parameters corresponding to the widespread NSIDC EASE-Grid North (EPSG: 3973) coordinate system. 25 The selection of 4 regions in this work resides in several reasons. They include all seas where coccolithophore blooms usually occur in subpolar and polar regions of the Northern Hemisphere (North, Norwegian, Greenland, Barents, Bering and Labrador seas). The exclusion from our dataset of blooms occurring in the northern parts of Atlantic Ocean (see, e.g. Holligan et al. 1993) was dictated by some technical restrictions: the hydro-optical model employed for obtaining coccolith concentration values was based prevalently on the data from high-latitude areas, and thus should be at first validated for geographically 30 different marine environments such as open parts of the Atlantic Ocean.

Dataset overview
The 19-year period data covers 4 blooming regions differing in nature. This allows to evaluate the bloom-related processes at 5 different scales and time intervals in order to reveal both interannual dynamics and seasonal variations of parameters relevant to the bloom phenomenon. E. huxleyiEmiliania huxleyi blooms in the Arctic and Subarctic seas are characterized by significant instability: the difference in intensity of blooming in different years can reach tens of times. Figure 4 and Table 2 collectively illustrate for the above four marine regions the temporal dynamics in bloom intensity (i.e. blooming area). For example, in the Bering Sea (region 4), the most extensive blooms were observed exclusively from 1998 to 2001, but later on, their intensity 10 decreased drastically. In region 1, mainly in the Barents, Norwegian and Northseas, the blooming activity over the years we are reporting on was very irregular, with a peak in 2016. With the data collected, it's possible to highlight the patterns of development of the regularly occurring blooms. They can be characterized with the beginning/end of blooming periods, and the overall dynamics of coccolith concentration during the 5 blooms. Such patterns can be established based on the published dataset. Figure 5 shows an example of bloom development in the Greenland Sea (region 2) in the period June 26 -August 13, 2014. However, these periods are generally unstable, which is clearly seen in Figure 6, which displays the blooming area configuration in July, 20 for different years for the same area.  Technically, each dataset contains 4 subdatasets: bloom status, coccolith concentration, particulate organic carbon content and CO2 partial pressure in water driven by coccolithophores. The last three categories contain the parameter values directly 5 calculated. The first subdataset contains information about the quality and content of data. This information is organised as a set of flags attributed to data on reliable observations of blooming presence or absence, or inaccurate data (usually due to 13 clouds) as well as data on coastal land. Figure 7 provides both an example of a status matrix and the matrix containing coccolith concentration values.

Data availability
Dataset is available on Zenodo (Kondrik et al. 2018b; https://doi.org/10.5281/zenodo.1402033). Data granules are divided into directories by regions and years, each child directory contains files with 8-day periods data on the bloom status, coccolith concentration, PIC, ΔpCO2. Data are stored in NetCDF4 format with GDAL-support, that allows to use the data immediately with any NetCDF-based or GIS software. Tips about how to read the data and QGIS styles for fast visualizations are also 10 provided.