Two databases derived from BGC-Argo ﬂoat measurements for marine biogeochemical and bio-optical applications

. Since 2012, an array of 105 Biogeochemical-Argo (BGC-Argo) ﬂoats has been deployed across the world’s oceans to assist in ﬁlling observational gaps that are required for characterizing open-ocean environments. Proﬁles of biogeochemical (chlorophyll and dissolved organic matter) and optical (single-wavelength particulate optical backscattering, downward irradiance at three wavelengths, and photosynthetically available radiation) variables are collected in the upper 1000 m every 1 to 10 days. The database of 9837 vertical proﬁles collected up to January 2016 is presented and its spatial and temporal coverage is discussed. Each variable is quality controlled with speciﬁcally developed procedures and its time series is quality-assessed to identify issues related to biofouling and/or instrument drift. A second database of 5748 proﬁle-derived products within the ﬁrst optical depth (i.e., the layer of interest for satellite remote sensing) is also presented and its spatiotemporal distribution discussed. This database, devoted to ﬁeld and remote ocean color applications, includes diffuse attenuation coefﬁcients for downward irradiance at three narrow wavebands and one broad waveband (photosynthetically available radiation), calibrated chlorophyll and ﬂuorescent dissolved organic matter concentrations, and single-wavelength particulate optical backscattering. To demonstrate the applicability of these databases, data within the ﬁrst optical depth are compared with previously established bio-optical models and used to validate remotely derived bio-optical products. The quality-controlled databases are publicly available from the SEANOE (SEA scieNtiﬁc Open data Edition) publisher at https://doi.org/10.17882/49388 and https://doi.org/10.17882/47142 for vertical proﬁles and products within the ﬁrst optical depth, respectively.


Introduction
In the early 2000s, the international oceanographic community raised concerns about the large uncertainties still limiting the estimation of key biogeochemical processes in the ocean that contribute to controlling the Earth's climate (e.g., primary production and carbon export). The spatial and temporal under-sampling of most of the world's oceans was considered the main cause of this limitation (Munk, 2000;Hall et al., 2010). The same community thus proposed the implementation of autonomous platforms, such as the Biogeochemical-Argo profiling floats (hereafter BGC-Argo floats), as one solution to fill this observational gap (Johnson et al., 2009;Claustre et al., 2010a). Unlike sampling from vessels, BGC-Argo floats operate with high temporal and spatial coverage, including remote areas and periods when ship-based sampling is difficult. BGC-Argo can therefore help the scientific community to accumulate observations on biogeochemical properties from the surface to the interior of the ocean in a new and systematic way (Claustre et al., 2010a;Biogeochemical-Argo Planning Group, 2016;Johnson and Claustre, 2016). This, together with several other recent efforts to compile global biologically or biogeochemically relevant datasets (Peloquin et al., 2013;Sauzède et al., 2015;Bakker et al., 2016;Mouw et al., 2016;Valente et al., 2016), may provide new insights on marine ecological and biogeochemical processes and help better understand if oceans and their properties have changed and/or are changing over the decades .
In 2012, an array of BGC-Argo floats started to be deployed in several oceanic areas encompassing a wide range of biogeochemical and trophic conditions, from subpolar to tropical and from eutrophic systems to oligotrophic midocean gyres (Organelli et al., 2016a. This array of floats was devoted to the acquisition of profiles of key biogeochemical quantities via their optical properties (i.e., chlorophyll a and colored dissolved organic matter, CDOM) and of hydrological variables (i.e., temperature and salinity). In addition, the array provided measurements of the underwater light field (i.e., irradiance) and of the inherent optical properties (i.e., particulate optical beam attenuation and backscattering coefficients) of the oceans. All these measurements, and derived quantities, are useful for both biogeochemical and bio-optical studies, to address the variability in biological processes (e.g., phytoplankton phenology and primary production; Lacour et al., 2015) and linkages with physical drivers (Boss et al., 2008;Boss and Behrenfeld, 2010;Lacour et al., 2017;Mignot et al., 2017;Stanev et al., 2017), to estimate particulate organic carbon concentrations and export (e.g., Bishop et al., 2002;Dall'Olmo and Mork, 2014;, and to support satellite missions through validation of bio-optical products retrieved from ocean color remote sensing (e.g., chlorophyll concentration; Claustre et al., 2010b;IOCCG, 2011IOCCG, , 2015Gerbi et al., 2016;Haëntjens et al., 2017) or by identification of those regions with bio-optical behaviors departing from mean-statistical trends (i.e., bio-optical anomalies; Organelli et al., 2017).
The study reported here presents a quality-controlled database of biogeochemical and bio-optical vertical profiles acquired by more than 100 BGC-Argo floats equipped with a homogeneous and interoperable instrument configuration. The "Biogeochemical and OPtical Argo Database -profiles" (BOPAD-prof; Barbieux et al., 2017a) includes 0-1000 m measurements of calibrated fluorometric chlorophyll a (Chl, mg m −3 ) and fluorescent dissolved organic matter (FDOM, ppb of quinine sulfate) concentrations, the particulate optical backscattering coefficient at 700 nm (b bp (700), m −1 ), downward irradiance E d (λ) at three wavelengths (380, 412, and 490 nm, µW cm −2 nm −1 ), and the spectrally integrated photosynthetically available radiation (PAR, µmol quanta m −2 s −1 ). Temperature (T , • C) and salinity (S, PSU) provide the hydrographic context for the optical observations. The geographic and temporal distribution of each parameter is described and discussed. A second database is specifically devoted to field and remote ocean color applications (Organelli et al., 2016b). It is focused on observations and derived products within the first optical depth Z pd (also known as the penetration depth, i.e., the layer of interest for satellite remote sensing; Gordon and Mc-Cluney, 1975; units of m) and includes the "Biogeochemical and OPtical Argo Database -surface" (BOPAD-surf) Chl, FDOM, and b bp (700) quantities derived from the qualitycontrolled vertical profiles in addition to the diffuse attenuation coefficients for downward irradiance (K d (λ), m −1 ) and PAR (K d (PAR), m −1 ). Data presented in BOPAD-surf are compared with existing bio-optical models and used in conjunction with products derived from satellite platforms in order to show applicability for validating ocean color biooptical products. Finally, sources of uncertainties are presented, and errors discussed, for variables in BOPAD-prof and profile-derived products within BOPAD-surf.

Biogeochemical-Argo floats: instruments, sampling strategy, and data
The PROVOR CTS4 profiling float used in this study is one of the latest models of autonomous platforms developed by NKE Marine Electronics Inc. (France). Designed in the context of the Remotely-Sensed Biogeochemical Cycles in the Ocean (remOcean) and Novel Argo Ocean Observing System (NAOS) projects, this profiling float has also been adopted by several international collaborators and research programs. A full technical description of the platform and instrument arrangements can be found in Leymarie et al. (2013) and Organelli et al. (2016a). All PROVOR CTS4 profiling floats were programmed to acquire 0-1000 m vertical profiles every 1 to 10 days while operating (every 4 ± 2 days on average; see Appendix A), depending on the mission and scientific objectives. Upward profiles commenced from the 1000 m parking depth in time for surfacing at local noon. Data acquisition was nominally at 0.20 m resolution between surface and 10 m, 1 m resolution between 10 and 250 m, and generally 10 m resolution between 250 and 1000 m (except on some occasions when it was 1 m).
An array of 105 BGC-Argo floats acquired more than 10 000 vertical profiles of bio-optical and biogeochemical variables over a broad range of oceanic environments and trophic conditions between October 2012 and January 2016. A WET Labs ECO (Environmental Characterization Optics) sensor installed on each BGC-Argo float provided 0-1000 m vertical profiles of chlorophyll (fluorometer with excitation and emission of 470 and 695 nm) and dissolved organic matter (fluorometer with excitation and emission of 370 and 460 nm) fluorescence and of the volume scattering coefficient (β(θ , λ)) measured at an angle of 124 • and a wavelength λ of 700 nm (Sullivan et al., 2013;Schmechtig et al., 2016). The multispectral ocean color radiometer OCR-504 (Satlantic Inc.) provided vertical profiles of PAR and downward irradiance E d (λ) at three wavelengths (380, 412, and 490 nm) in the upper 250 m. Electronic counts of each measured variable were converted into geophysical quantities using the calibration factors and standard practices provided by the manufacturers (Satlantic, 2013;WET Labs, 2016). According to the standard procedures for Argo data management (Wong et al., 2015), each profile was quality-controlled by applying methods specifically developed for each parameter (see Sect. 2.2). Because sensor performance might degrade over the float lifetime, each float was evaluated for possible corruption by biofouling or instrument drift (see Sect. 2.3). A total of 9837 BGC-Argo stations, each one corresponding to an upward profile, composed the database BOPAD-prof presented in this study (Fig. 1). To discuss the geographic and temporal representativeness of the database, the 9837 quality-controlled stations were grouped into 25 geographic areas. BGC-Argo float names, number of profiles, lifetime, and additional details are presented in Appendix A.

Quality control of vertical profiles
Vertical profiles of Chl concentration were quality-controlled following procedures and recommendations in Schmechtig et al. (2014). Profiles were (1) adjusted for nonzero deep values and (2) corrected by removing negative spikes lower than twice the 10th quantiles of the residual signal calculated as the difference between the profile values and a median filter (five-point window). No interpolation of missing data was performed. Positive spikes were retained; (3) measured values outside the specific range reported in the manufacturer's technical specifications were removed (WET Labs, 2016). No interpolation of missing data was performed and (4) profiles were corrected for non-photochemical quench-ing (NPQ; Kiefer, 1973) by extrapolation of the fluorescence at the bottom of the mixed layer to the surface following Xing et al. (2012) and Schmechtig et al. (2014). Profile-byprofile analysis-visualization, number of invalid points, and related origin are available on http://seasiderendezvous.eu. Profiles collected in areas such as the Black Sea and subtropical regions were further corrected for the contribution of fluorescence originating from non-algal matter following procedures described in Xing et al. (2017). The correction was applied when Chl and FDOM concentrations were positively correlated below the depth at which Chl was supposed to be zero (for equations and quantitative metrics see . The magnitude of this correction within the mixed layer varied between 3 and 50 % (for details see Table 2 in Xing et al., 2017, for the same database). Finally, according to the recommendations by Roesler et al. (2017) on the overestimation by standard WET Labs fluorometers, remaining Chl values were divided by a factor of 2 to correct for the global bias in factory calibration.
The FDOM vertical profiles were quality-controlled according to the following procedures: (1) measured values outside the specific range reported in the manufacturer's technical specifications were removed (WET Labs, 2016). No interpolation of missing data was performed; (2) negative and positive spikes outside the 25th and 75th quantiles of the raw profile were removed, and subsequently any measurement with an absolute residual value > 4 calculated as the difference between the profile and a mean filter was removed. Finally, according to the assumption that deep CDOM concentrations are conservative in a given water body (Nelson et al., 2010), and assuming that the BGC-Argo floats included in this database spent their lifetime mainly within the same deep water mass, an offset was applied to each FDOM profile to align the median value between 950 and 1000 m with the first profile and correct for possible sensor drift.
Following procedures described in , vertical profiles of the angular scattering coefficient β(124 • , 700) were (1) converted into the particulate angular scattering coefficient by removing the contribution of pure seawater, which depends on water temperature and salinity (Zhang et al., 2009); (2) converted to the particulate optical backscattering coefficient at 700 nm (b bp (700)) using a χ factor equal to 1.076 (Sullivan et al., 2013); and (3) verified for measured values outside the specific range reported in the manufacturer's technical specifications (WET Labs, 2016). No interpolation of removed data was performed; (4) the profiles were corrected by removing negative spikes lower than twice the 10th quantiles of the residual signal calculated as the difference between the profile and a median filter (Briggs et al., 2011). No interpolation of missing data was performed. Positive spikes were retained.
Vertical profiles of PAR and E d (λ) were quality controlled following the procedures detailed in Organelli et al. (2016a). This protocol accepts measurements acquired both under clear and cloudy sky conditions as good as soon as these re-  main stable during the cast (Organelli et al., 2016a). A first step of the quality control consisted of identifying and discarding each profile acquired under very unstable sky and sea conditions (see quantitative metrics in Organelli et al., 2016a). The remaining profiles were quality controlled to identify and remove (1) nonzero dark measurements at depth, (2) sporadic atmospheric clouds, and (3) wave focusing (Zaneveld et al., 2001) in the upper part of the profile. Because E d (λ) and PAR measurements are collected up to a few centimeters from the sea surface, quality-controlled vertical profiles were completed by values just below it (E d (0 − )). The E d (0 − ) values were calculated by extrapolation within the first optical depth (Z pd ) using a second-degree polynomial fit (Organelli et al., 2016a), with Z pd calculated as Z eu /4.6 (Morel, 1988). The euphotic depth, Z eu , is the depth at which PAR is reduced to 1 % of its value just below the surface and was calculated from measured PAR profiles. To achieve E d (0 − ) calculations, initial values of Z eu and Z pd were first estimated from the shallowest PAR measurement and subsequently from that corresponding to 0 − . For radiometric data prior to the application of the above-mentioned qualitycontrol procedures, the reader is referred to the archive at https://doi.org/10.17882/42182 (Argo, 2000).

Testing for biofouling and instrument drift
A set of four tests was specifically developed to identify potential biofouling and instrument drift. To achieve a reliable evaluation for each of the 105 BGC-Argo floats, each variable was examined both individually and in conjunction with the others, which is greatly aided by redundancy among derived quantities. A combination of raw profiles and quality-controlled products was needed for the analysis. Ancillary data such as measurements acquired in drift mode at 1000 m (i.e., between two following ascent profiles) were also included in the analysis and they can be publicly accessed at http://www.oao.obs-vlfr.fr/maps/en/. Test 1 was conducted on raw time series of salinity, Chl, FDOM, b bp (700), and E d (λ), i.e, before the application of the quality-control procedures described in Sect. 2.2. Test 1 aimed to identify sharp gradients in measured variables over the entire profile (i.e., sudden decrease or increase of Chl and FDOM concentrations or increase in b bp (700) values) not attributable to any biological or hydrological cause (e.g., particle aggregates or nepheloid layer of particles). Tests 2 and 3 were conducted on raw measurements collected by each profiler when in drift mode. Test 2 analyzed time series of the sensors' dark measurements for Chl and E d (λ) at the 1000 m parking depth. Test 3 consisted of the analysis of the relationship between raw FDOM and salinity at the 1000 m parking depth over time. Assuming that deep CDOM concentrations are conservative in the same water body (Nelson et al., 2010), variations in deep FDOM values for a given salinity are likely due to changes in sensor performances (Fig. 2). Test 4 was based on the comparison between irradiance values just above the sea surface (E d (0 + )) and those modeled by Gregg and Carder (1990) for clear cloudless sky, as described by Organelli et al. (2016a). The performance of this test, which assesses the accuracy of measured irradiance values, strongly depends on the value extrapolated to the sea surface (i.e., E d (0 − )). E d (0 + ) values at 380, 412, and 490 nm were obtained by dividing E d (0 − ) derived from quality-controlled profiles as described in Sect. 2.2 by the transmission across the sea-air interface factor (Austin, 1974). When the results , FDOM values around 2.5 ppb of quinine sulfate represent measurements collected during the first 2 years of the float lifetime and which have not been discarded. Colors indicate density of measurements for a given salinity vs. the FDOM value (red > blue). of the tests above indicated possible measurement issues (i.e., 1710 profiles spanned across 70 floats), each preprocessing variable time series was interrupted and only previously collected profiles were retained (i.e., 9837 stations in BOPADprof).

Bio-optical products within the first optical depth
BOPAD-surf was compiled using 5748 stations with qualitycontrolled Z eu and Z pd values (Fig. 1). The procedure of Organelli et al. (2016a) reduced the number of PAR profiles that can be exploited for deriving optical quantities within the first optical depth (see Sect. 2.2 for computation) by about 40 % (e.g., because of atmospheric clouds) with respect to BOPAD-prof. To compute vertical diffuse attenuation coefficients for downward irradiance (K d (λ)) and PAR (K d (PAR)) within Z pd , each radiometric profile was binned in 1 m intervals. K d (λ) and K d (PAR) values were then derived from a linear fit, after removal of outliers, between the natural logarithm of the radiometric quantity and depth (in units of pressure) following Mueller et al. (2003). K d (λ) and K d (PAR) values obtained from linear fits based on less than three points or with a determination coefficient (r 2 ) lower than 0.90 were discarded . Values of Chl, FDOM, and b bp (700) were also derived, within the first optical depth, from quality-controlled vertical profiles. Before computation, FDOM quality-controlled profiles were smoothed by applying first a median filter (five-point window) and then an average filter (seven-point window). A median filter (five-point window) was applied to qualitycontrolled b bp (700) profiles to identify and subsequently remove positive spikes. Finally, Chl, FDOM, and b bp (700) profiles were binned in 1 m intervals, and the average within Z pd was computed.

Comparison with satellite data
To demonstrate the applicability of these BGC-Argo databases, satellite-derived diffuse attenuation coefficients of downward irradiance at 490 nm (K d (490) sat ) obtained by the GlobColour project (ACRI-ST, 2017) were downloaded from the web portal http://seasiderendezvous.fr/matchup.php and compared to the in situ BGC-Argo counterparts. K d (490) sat values were obtained, for the period October 2012 to January 2016, from daily Level 3 Chl merged products using the empirical algorithm by Morel et al. (2007a). Chl products were merged using MODIS-Aqua and VIIRS Level 3 products (NASA reprocessing R2014.0); see fully detailed merging procedures in ACRI-ST (2017). As statistics of the match-up analysis, the RMSE (m −1 ) and the median percentage difference (MPD) were calculated according to Organelli et al. (2016c).

Quality-controlled vertical profiles
In this section, specific examples of quality control are presented for each examined variable to provide context for the database. In the case of Chl profiles, three examples extracted from floats operating in different trophic and optical environments are presented (North Atlantic subpolar gyre, Black Sea, and South Atlantic subtropical gyre; Fig. 3). It is recalled here that all quality-controlled Chl values are divided by 2 as recommended by Roesler et al. (2017). The raw North Atlantic profile (Fig. 3a) exhibits strong non-photochemical quenching at the surface and positive spikes at depth. After the quality control, NPQ is corrected and the positive spikes that are likely related to biological information are retained (Fig. 3a). The Black Sea vertical chlorophyll profile ( Fig. 3b) is mainly characterized by a monotonic Chl increase to depth, where the concentration is expected to be null. As Proctor and Roesler (2010) and Xing et al. (2017) stated, the observed Chl increase at depth is due to very high CDOM (which is a consequence of the anoxic conditions prevailing at depth in the Black Sea) and non-algal matter concentrations that can affect the chlorophyll fluorescence signal. After correcting the profile according to Xing et al. (2017), Chl concentrations below 100 m are zero. The profile from the South Atlantic subtropical gyre mostly exhibits a nonzero dark offset, which is removed in quality control (Fig. 3c).
Raw FDOM vertical profiles are generally noisy and spiky, especially in the upper water column (Fig. 4). After quality control, large spikes are identified and removed, and the profile is aligned to match the 950-1000 m median value of the first profile acquired by the float (Fig. 4). Depending on the application, further processing of FDOM profiles such as smoothing and filtering is recommended before use (see, for example, Sect. 2.4).
In the case of b bp (700) vertical profiles, the examples in Fig. 5 represent quality-controlled profiles with and without positive spikes (see Sect. 2.4). Although positive spikes likely indicate the occurrence of large aggregates and are essential to monitoring carbon fluxes towards the deep ocean (Briggs et al., 2011), they can introduce some noise when export of particulate organic carbon due to small particles (Dall'Olmo and Mork, 2014), the physiological status of the algal community (Barbieux et al., 2017b), or the bio-optical behavior of world's oceans  is analyzed. Both versions of b bp (700) profiles are archived in BOPAD-prof.
All the quality-controlled profiles of E d (λ) and PAR included in the database presented correspond to Type 1 (i.e., best quality) in Organelli et al. (2016a). The examples in Fig. 6 represent E d (412) profiles collected in eastern Mediterranean Sea waters under different sky conditions. The profile in Fig. 6a is acquired under nearly clear sky conditions. In this case, the quality-control procedure only identifies and removes dark values at depth (not shown) and those corresponding to wave focusing (Zaneveld et al., 2001) at the surface. The profile in Fig. 6b is instead characterized by nonzero dark values in deep waters (not shown) and sporadic atmospheric clouds, with the major cloud perturbing data acquisition for at least 2 min. The ensemble of tests of the applied quality-control procedure (Organelli et al., 2016a) detects the various perturbations (Fig. 6b) (490), and PAR can be found in Organelli et al. (2016a).

BOPAD-prof: spatiotemporal distribution of the biogeochemical and optical Argo database of vertical profiles
Deployment of BGC-Argo floats has been mainly focused, within limitations of project-driven resources, on some of the important carbon-export regions of the Atlantic Ocean (Alkire et al., 2012), on areas with dynamic trophic regimes (e.g., Mediterranean Sea; D'Ortenzio and Ribera d'Alcalà, 2009), and on oligotrophic mid-ocean gyres, in all cases in regions with depths greater than 1000 m (except on a very few occasions). As a result, the 9837 BGC-Argo stations of vertical profiles within BOPAD-prof cover a wide range of trophic conditions and represent the first step to set up a publicly available and interoperable database for biogeochemical and bio-optical studies. Hereafter, we present the spatial and temporal coverage of quality-controlled vertical profiles for each biogeochemical and bio-optical variable between the world's hemispheres and among regions. The spatiotemporal distribution of temperature profiles, which are representative of the entire raw database, is also shown. The latitudinal and monthly distributions of the qualitycontrolled profiles show similar patterns among the eight variables (Fig. 7), which indicates that the quality-control  (700)) collected in the South Atlantic subtropical (float WMO 6901439) and North Atlantic subpolar (float WMO 6901516) gyres, respectively. Open cyan circles indicate positive spikes (Briggs et al., 2011). procedures do not bias the sampling spatially or temporally. However, the total number of profiles for a given latitude and month of the year is different among variables. It is generally the highest for Chl (Fig. 7c, d) and b bp (700) (Fig. 7g, h). Because of the strict quality control by Organelli et al. (2016a) that removes radiometric profiles acquired under very unstable meteorological conditions, the total number of E d (λ) and PAR profiles is generally the lowest (Fig. 7i-p).
In the Northern Hemisphere, the database covers a broader latitudinal range than in the Southern Hemisphere. Data range from the Equator to the Arctic Ocean, and late spring to midsummer are the most represented periods. The number of profiles is substantially lower between January and April. This occurs especially for radiometric quantities (Fig. 7) as a consequence of the decreasing stability of the water column associated with deteriorated sky and sea conditions (D'Ortenzio et al., 2005;Lacour et al., 2015). This high contribution of the Northern Hemisphere to the database is due to the first projects piloting the deployment of BGC-Argo floats that were mainly focused on the North Atlantic subpolar gyre (i.e., 48-65 • N; remOcean project) and the Mediterranean Sea (i.e., 31-44 • N; NAOS project). Latitudes higher than 67 • N are included thanks to a 3-year operating float collecting all variables except FDOM (Fig. 7e). Latitudes between 0 and 30 • N (i.e., subtropical gyres and surrounding zones) are also represented owing to measurements acquired by 10 BGC-Argo floats (Fig. 7). Note, however, that the number of FDOM profiles at these latitudes is lower than for the other variables as a consequence of sensor failure on some floats and absence in those floats deployed in the framework of the UK Bio-Argo and E-AIMS projects (in which the FDOM sensor was replaced by a sensor measuring particulate backscattering coefficient at 532 nm, b bp (532)). The Northern Hemisphere is also represented by data collected in two marginal seas (Fig. 1): the Black and Red seas. Similar to subtropical gyres and surrounding areas, the number of FDOM profiles in the Black Sea is lower than for other variables because half of the floats deployed in this area measured b bp (532) instead of FDOM (see Sect. 7 for b bp (532) data availability).
The Southern Hemisphere is primarily represented by data collected at latitudes between 38 and 56 • S (Fig. 7) in the Atlantic and Indian sectors of the Southern Ocean (Fig. 1). In contrast to the Northern Hemisphere, no floats have been deployed or reached latitudes higher than 60 • S (Fig. 7). Measurements of each variable are also acquired by seven floats in southern subtropical gyres (around 16-25 • S) both in the Atlantic and Pacific oceans and by two floats in the region close to New Caledonia in the South Pacific (Fig. 1). The temporal coverage of data collected in the Southern Hemisphere remains uniform from January to September for each variable, but then increases from October to December (Fig. 7). This reflects a switch to adaptive sampling to better resolve the phytoplankton bloom. Similar to the Northern Hemisphere, the number of radiometric profiles tends, however, to slightly decrease during the autumn and the austral summer (from June to August) as a consequence of the worsening meteorological conditions and deepening mixed layer depths (Dong et al., 2008).
The 25 selected regions (grouped into nine major areas) contribute, in terms of number of profiles, in different proportions to the database (Fig. 8). This is a consequence of the different number of floats deployed in each area together with a modulated profiling frequency (from 1 to every 10 days). The North Atlantic Ocean dominates BOPAD-prof, as a consequence of the intensive sampling characterizing the subpolar gyre area in multiple programs. Vertical profiles acquired in the Southern Ocean and the western Mediterranean Sea each represent on average 18 % of the database. The eastern Mediterranean Sea is about 14 %, while the South Atlantic subtropical gyre and surrounding areas contribute 6.3 % on average. The South Pacific Ocean represents only 3 to 5 % of the vertical profiles within BOPAD-prof, while polar and marginal seas individually represent a proportion < 3 % of each collected variable. Large areas such as the North Pacific and Indian oceans equal 0 % as no deployments occurred in those regions.

BOPAD-surf: properties of the bio-optical database within the first optical depth and joint use with remote sensing of ocean color
Because of the unique in situ spatial and temporal coverage, the international community of optical oceanographers (Claustre et al., 2010b;IOCCG, 2011IOCCG, , 2015 Biogeochemical-Argo Planning Group, 2016) has recently recognized measurements collected by BGC-Argo floats as a fruitful resource of data for bio-optical applications, such as the identification of regions with optical properties departing from mean statistical relationships  as well as the validation of ocean color reflectances (Gerbi et al., 2016) and bio-optical products (IOCCG, 2015;Haëntjens et al., 2017). In this context, BOPAD-surf has been compiled with 5748 stations of biogeochemical (i.e., Chl and FDOM) and bio-optical (i.e., K d (λ), K d (PAR), and b bp (700)) variables within the first optical depth (i.e., the layer of interest for ocean color) as derived from previously quality-controlled vertical profiles. The characteristics of this database are described hereafter.
All the 5748 BGC-Argo stations correspond to qualitycontrolled measurements of euphotic and first optical depths and represent about 60 % of the database of qualitycontrolled vertical profiles. Ranges and averages (and associated standard deviations) of Z eu and Z pd and of the other variables are reported in Table 1. In agreement with previous observations (Morel and Maritorena, 2001;Lee et al., 2007;Morel et al., 2007a, b;Soppa et al., 2013;Organelli et al., 2014), values of Z eu and Z pd vary mostly in the ranges of 10.5-180.2 and 2.3-39.2 m, respectively, with the deepest values characterizing the Atlantic and South Pacific ocean gyres (Fig. 9a, b). The shallowest Z eu and Z pd layers are instead characteristic of the North Atlantic subpolar gyre in spring, the western Mediterranean Sea, and the Black Sea (Fig. 9a, b). The observed ranges of Chl, FDOM, b bp (700), K d (λ), and K d (PAR) values derived from BGC-Argo measurements (Table 1) are also in good agreement with previous observations (Morel and Maritorena, 2001;Morel et al., 2007a, b;Cetinić et al., 2012;Dall'Olmo et al., 2012;Peloquin et al., 2013;Sauzède et al., 2015;Valente et al., 2016). As examples of their spatial distribution across the explored regions, K d (412) and K d (PAR) are shown in Fig. 9c and d, respectively. The reader is referred to the work by Organelli et al. (2017) for regional variability in K d (380) and K d (490) coefficients.  As a consequence of the variable-specific quality-control procedures, each variable within BOPAD-surf is represented with different proportions in the 25 regions (Table 2). Of the 5748 stations with quality-controlled Z eu , 83-90 % contain Chl, FDOM, and b bp (700) measurements; 62-72 % contain K d (λ) values within Z pd ; and > 90 % contain K d (PAR). The Labrador Sea region contains the highest fraction of profiles of each variable (13.81-17.08 %), while the Iceland Basin and the Irminger Sea contribute on average 7.6-7.8 % of the profiles in the database (Table 2). In the Mediterranean Sea, the northwestern, southwestern, and Ionian basins each contribute between 5.5 and 9.7 % of the profiles, while the Levantine and Tyrrhenian seas each contribute about 4 % on average (Table 2). In the Southern Hemisphere, the eastern Atlantic and the Indian sectors of the Southern Ocean each contribute about 6-10 % of the entire database, while the relative contribution of the western part of the Atlantic sector is < 4.45 % (Table 2). Subtropical gyres of both hemispheres contribute from 1.47 to 4.43 % according to the variable (Table 2). Marginal seas (i.e., Black and Red seas) and transition zones among various trophic regimes represent less than 3 % of the whole database within the first optical depth ( Table 2). The North Pacific and Indian oceans equal 0 %.
The goal of BOPAD-surf supporting in situ and remote bio-optical applications is demonstrated by two examples of possible use. As a first exercise, previously established bio-optical relationships (Morel et al., 2007a) are evaluated against the BGC-Argo database. It is important to identify the regions with bio-optical behaviors deviating from the average trend because a bio-optical anomaly could likely lead to uncertainties in retrieving bio-optical and biogeochemical quantities from satellite ocean color observations . The relationship of K d (PAR) as a function of K d (490) for the BGC-Argo database is in good agreement with those by Morel et al. (2007a). Slight deviations appear, however, at the lowest K d (PAR) and K d (490) values and mainly correspond to samples collected in the subtropical gyres and eastern Mediterranean Sea (Fig. 10). In a second exercise, K d (490)   compared to K d (490) coefficients obtained from BGC-Argo floats (Fig. 11). While the two products agree approximately at moderate values (K d (490) ∼ 0.1 m −1 ), estimates from BGC-Argo floats are considerably lower on average, especially at high and very low water clarity. This result strongly warrants further investigation. Thanks to the unprecedented spatial and temporal distribution provided by these autonomous platforms, as well as to the understanding of the associated uncertainty, ocean color algorithm and product validation can routinely be performed in several regions so that errors and possible causes of failure (e.g., influence of Raman scattering; Westberry et al., 2013) could be assessed and/or solved and algorithms for improving the quality of retrievals may be refined.
BOPAD-surf does not, however, represent the only effort in compiling an extensive database tailored for in situ and remote bio-optical applications. BOPAD-surf and the compilation published by Valente et al. (2016;hereafter VL2016) may be good partners, and thus be beneficial to the largest community of oceanographers as soon as complementarities and differences are highlighted. Though BOPAD-surf's temporal coverage is shorter than for VL2016, it extends bio-optical measurements through 2013-2015. It includes regions such as the North Atlantic subpolar gyre, the Southern Ocean, and the Red Sea that are not archived in VL2016. Conversely, VL2016 offers data from the North Pacific and Indian oceans where no PROVOR CTS4 profiling floats have been deployed (Fig. 1). BOPAD-surf complements VL2016 by also providing a balanced acquisition of variables during  wintertime and harsh periods. Considering variables and differences in acquisition and processing, only the diffuse attenuation coefficients for downward irradiance at 412 and 490 nm (i.e., K d (412) and K d (490)) are directly comparable between the two databases. VL2016 offers a 25-band resolution of these coefficients in the visible range, while BOPAD-surf extends such a measurement to a single wavelength in the UV region (i.e., K d (380)) and includes attenuation coefficients for one broad waveband (i.e., K d (PAR)). Similarly, BOPAD-surf provides measurements of the particulate optical backscattering at 700 nm, a band not included in VL2016 (27 bands between 405 and 683 nm). The main differences between the two databases appear for the variables relating to Chl and colored dissolved or detrital material. Because of calibration challenges for deriving accurate Chl concentrations from in vivo fluorescence measurements (see Sect. 6), VL2016 is only compiled with Chl concentrations obtained from high-performance liquid chromatography (HPLC) and/or spectrophotometric or fluorometric measurements on algal pigment extracts. FDOM is a different parameter from a dg (λ) in Valente et al. (2016). While a dg (λ) relies on the light absorption properties of the whole pool of colored dissolved and/or particulate organic material, FDOM only measures the fluorescence emitted by a fraction of this matter. Depending on the excitation and emission wavelengths of the sensor, FDOM can be a proxy of concentrations of freshly produced material or more aged humic substances (Nelson and Gauglitz, 2016). However, in some regions, FDOM can be significantly correlated to a dg (λ) and thus be retrievable from ocean color remote sensing (e.g., Matsuoka et al., 2017). FDOM data included in BOPADsurf also represent a useful resource to improve the understanding of the optical behavior of the oceans . Finally, no measurements of remote sensing reflectance are archived within BOPAD-surf, but successors of the PROVOR CTS4 profiling floats used in BOPAD-surf are planned to be deployed in order to collect multispectral downward irradiance and upwelling radiance measurements.

Data uncertainty
Through this section, characterization of the uncertainty associated with each quality-controlled variable within BOPAD-prof, and for derived products contained in BOPAD-surf, is provided. No error propagation and budgets are presented here. When using fluorescence measurements as a proxy of Chl concentration, the uncertainty may propagate from conversion of electronic counts in geophysical quantities, through the application of quality-control procedures for the influence of the NPQ and/or other environmental variables (e.g., non-algal matter), to calibration corrections. The sensor sensitivity of 0.007 mg m −3 (i.e., one digital count) is critical at the surface of most oligotrophic environments or for deep low Chl values, where it may be twice as high as the signal (Fig. 12a). Correction for the NPQ may also introduce uncertainties depending on the procedure and assumptions on which the method relies. However, a comparison between the method by Xing et al. (2012) used here and based on the calculation of the mixed layer depth, and an alternative correction developed by Sackmann et al. (2008) based on the use of particulate optical backscattering, showed similar performances for BGC-Argo Chl measurements (X. Xing, unpublished data). As discussed by Xing et al. (2017), the correction of Chl profiles for non-algal matter disturbance by using alternative procedures with respect to the one applied here may also introduce errors, which vary regionally and are the highest in the Black Sea area (∼ 0.1 mg m −3 ), while the lowest are observed in the subpolar North Atlantic Ocean and Mediterranean Sea (∼ 0.007 and 0.004 mg m −3 , respectively). A main challenge in quality-assessing fluorescence Chl measurements relies on the assumption of what is measured and what is actually phytoplankton biomass . The fluorescence-to-chlorophyll ratio depends on changes in nutrient availability, growth phase, photophysiology, and taxonomic composition of algal communities (Cullen, 1982). This implies that calibration factors may change regionally and seasonally. Indeed, standard fluorometer corrections rely on the comparison with contemporaneous HPLC-determined chlorophyll concentrations, which are the most accurate estimates for phytoplank-ton pigments. However, given the BGC-Argo particularity of sampling autonomously, over long periods and across different regions, any HPLC-based calibration performed at the time of the deployment may become invalid during the float's voyage. Haëntjens et al. (2017) recommend the use of radiometric data available on floats together with models (Xing et al., 2011) to systematically verify the calibration of the Chl fluorometer, and applied corrections, over time. In this study, no spatiotemporal variability in the fluorescence-tochlorophyll ratio has been taken into account to correct BGC-Argo Chl measurements, and no radiometry-based corrections have been used to avoid redundancy among variables and derived quantities (i.e., K d (490)). Only the correction for the instrument-induced bias recommended by  has been applied, though it might be insufficient and thus under-correct Chl values measured at high latitudes and especially in the Southern Ocean . Chl profiles prior to the application of any quality-control procedures used here, including NPQ and the recommended calibration factor by Roesler et al. (2017), are also archived in BOPAD-prof so that alternative chains of protocols can be applied at the user's discretion.
FDOM measurements within BOPAD-prof appeared very noisy even after quality control and spike detection (see Sect. 3). However, using the profile in Fig. 4b as a specific example, the impact of the sensor sensitivity (0.28 ppb of quinine sulfate, ∼ one digital count) on the measured values may be critical for surface measurements (Fig. 12b). Low FDOM values at the surface may be a result of the attenuation by other optically significant substances of the light fluoresced by the dissolved material (Downing et al., 2012) and/or be quenched as an effect of increasing temperature (Baker et al., 2005). No specific methods for BGC-Argo floats measurements are currently available to correct for the thermal fluorescence quenching properties, and it has been preferred to avoid implementation of published procedures (Wratas et al., 2011;Downing et al., 2012;Ryder et al., 2012) as they can be applied at the user's discretion.
Uncertainties related to the particulate optical backscattering, as acquired by WET Labs ECO sensors or instruments with similar or same technical and geometrical characteristics, have been discussed by already published studies (Dall'Olmo et al., 2009;Briggs et al., 2011;Sullivan et al., 2013;. Experimental errors may arise from multiple sources such as conversion and calibration coefficients (e.g., scaling factor and dark counts), instrument age, and sensor responsiveness to environmental factors such as temperature and light (Sullivan et al., 2013). The impact of the sensor sensitivity (2.2 × 10 −6 m −1 ) on the measured values is low (Fig. 12c). The combined uncertainty is generally less than 10 % (Sullivan et al., 2013), but it may increase up to about 30 % in most oligotrophic environments (Dall'Olmo et al., 2009). In particular, the recent analysis by , which includes the same BGC-Argo floats used in this study, suggests that more con-sistent b bp (700) measurements would be achieved by taking into account a bias equal to 3.5 × 10 −5 m −1 due to changes in dark counts from the time of the sensor's purchase to that of deployment. Disagreement between different sensor models measuring b bp (700) in the same areas may yield a bias of up to 30 % .
Experimental uncertainties in radiometric profiles may arise from instrument tilt with respect to the vertical (maximum of ±10 %; E. Leymarie, unpublished data) and sensor calibration (2-4 %; Hooker et al., 2002). The shading of the float's antenna and conductivitytemperature-depth (CTD) sensor head is negligible for the E d (λ) sensor, except over a few degrees of the sun's azimuth (direct shading; E. Leymarie, unpublished data). The study by Briggs et al. (2017) on radiometers implemented on the PROVOR CTS4 BGC-Argo floats also evidences the dependency of sensor dark counts on ambient temperature. The uncertainty in factory dark measurements is the lowest near 20 • C (< 0.01 µW cm −2 nm −1 for E d (λ); < 1.4 µmol quanta m −2 s −1 for PAR), for both E d (λ) and PAR. The highest errors occur when the radiometer operates near 0 • C, as the uncertainty grows up to about 0.06 µW cm −2 nm −1 for E d (490) and 2.6 µmol quanta m −2 s −1 for PAR. Similarly, higher uncertainties are also observed when radiometric measurements are acquired around 30 • C (∼ 0.03 µW cm −2 nm −1 for E d (412) and E d (490); Briggs et al., 2017) rather than near 20 • C. It is important to note, however, that dark offsets generally affect profiles at depth as the irradiance drops to 0, whilst their impact is less than 1 % for the highest values at the top of the ocean (Organelli et al., 2016a).
In BOPAD-surf, the standard error is associated with each value of diffuse attenuation coefficient for downward irradiance and PAR (K d (λ) and K d (PAR)) as derived from the linear fit on log quantities within the first optical depth Z pd (see Sect. 2.4). Errors can have an impact of up to 33 % on the measured coefficients, although the median value for the entire database is less than 5 % regardless of the waveband, with the minimum found for K d (380) (i.e., 3.4 %). Because Chl, FDOM, and b bp (700) represent the mean value of the profile within Z pd , the standard deviations are archived in BOPAD-surf. The median value of the coefficient of variation (CV%; calculated as 100 (SD-to-mean ratio)), for the entire database, is low for all three variables and around 5 % in the case of FDOM and b bp (700). The variability in Chl concentration is close to 0 % as a consequence of the application of the method by Xing et al. (2012), which corrects the NPQ by extrapolating the Chl value at the bottom of the mixed layer to the surface. More importantly, such a low variability in the observed variables suggests that they were homogenously distributed within the first optical depth as derived from PAR measurements and that Z pd was similar or shallower than the mixed layer depth.  (Barbieux et al., 2017a) and BOPADsurf (Organelli et al., 2016b) are publicly available from the SEANOE (SEA scieNtific Open data Edition) publisher at https://doi.org/10.17882/49388 and https://doi.org/10.17882/47142, respectively. BOPAD-surf version 2 is the one used and described in this study. Float name, number of cycles, and profile, date, latitude, and longitude are reported in both databases. In BOPAD-prof, vertical profiles of Chl before quality control and b bp (700) with removal of positive spikes (see Sect. 2.4) are also included. BOPAD-surf includes standard errors of K d (λ) and K d (PAR) as derived from a linear fit (see Sect. 2.4) and standard deviations of averaged Chl, FDOM, and b bp (700) values within the first optical depth. BGC-Argo raw data used in this study are publicly available online (Argo, 2000) and distributed as netCDF files. Vertical profiles of b bp (532) collected in the frame of the UK Bio-Argo (nine floats) and E-AIMS (five floats) projects can be downloaded at https://doi.org/10.17882/42182 (Argo, 2000). Files included in BOPAD-prof and BOPAD-surf can be read in table format by using standard functions of most common programming languages.

Conclusions and recommendations for use
The first measurements of biogeochemical and bio-optical variables collected by the PROVOR CTS4 generation of autonomous BGC-Argo floats have been quality-controlled and synthesized into a single database of nearly 10 000 vertical profiles (BOPAD-prof), collected in just 3 years despite meteorological conditions in several oceanic areas with depths greater than 1000 m. Profile-derived bio-optical variables within the first optical depth have also been condensed into a database dedicated to support field and remote bio-optical applications (BOPAD-surf). Spatial and temporal coverages have been presented. Possible uncertainties for each variable have been provided.
The two databases presented here can be directly exploited for several applications, from biogeochemistry and primary production estimation and modeling, to the analysis of the physical forcing on biology together with the assessment of any seasonal and sub-seasonal dependence, and to the evaluation of the ocean's bio-optical variability. For specific examples based on same PROVOR CTS4 profiling floats included in this study, the reader is referred to the works by Dall'Olmo and Mork (2014) and  for estimation and analysis of particulate organic carbon concentrations and fluxes; Lacour et al. (2017), Mignot et al. (2017), and Stanev et al. (2017) for observing the impact of physical drivers on biology; and Organelli et al. (2017) and Barbieux et al. (2017b) for analysis of the variability in diffuse attenuation coefficients for downward irradiance and particulate optical backscattering-to-chlorophyll ratios across different oceanic areas, respectively. It is worth noting that the latter two studies have been pursued by exclusively exploiting BOPAD-surf and BOPAD-prof. The new and systematic way BGC-Argo floats collect data, and their potential in dramatically increasing oceanic observations in a restricted time, also supplement and complement published carbon cycle and optically relevant pan-oceanic data compilations (Peloquin et al., 2013;Sauzède et al., 2015;Bakker et al., 2016;Mouw et al., 2016;Valente et al., 2016).
BOPAD-surf has also proved to directly support ocean color algorithm and product validation. Online platforms (i.e., http://seasiderendezvous.eu) are already available to support nearly real-time ocean color applications and interactive management of BGC-Argo profiles. We remind readers that, according to the specific use intended for these data, further processing may be needed. Additional corrections, e.g., dark counts and temperature dependence for radiometric or FDOM measurements, might be required at the user's discretion. Additional or regional adjustments on the calibration factor for chlorophyll fluorescence might also be needed . The quality-control procedures applied here remove only major, known sensor issues.
Finally, these two databases are a first step to provide users with the unprecedented quantity of autonomous in situ measurements processed with common internationally accepted procedures. However, due to the characteristics of the Biogeochemical-Argo network Biogeochemical Argo Group, 2016) and its youthfulness, both databases are likely to evolve as new regions are explored, improved vertical and temporal frequency is achieved, and more advanced quality-control procedures are developed. Therefore, it is expected that BOPAD-prof and BOPAD-surf could be amended and/or enriched in the future with new quality-controlled profiles and products, or they might be merged with other already-operating configurations of autonomous profiling floats and sensors (e.g., Johnson et al., 2017). The way the two databases have been built makes them potentially fully interoperable with future compilations. Table A1. Region, basin, abbreviation, and a list of the BGC-Argo floats for the 25 geographic regions included in the Biogeochemical-Argo database. Lifetime and average profile interval for each float is shown. Research project and principal investigator are also reported for each float. Note that the total number of floats is > 105 because some floats moved across two or more basins during their lifetime. An average profile interval of > 10 days indicates a temporary loss of communication between the server and the profiling float, which resulted in periods of inactivity. Dates in the table are given as dd/mm/yy.