A global compilation of in situ data is useful to evaluate the
quality of ocean-colour satellite data records. Here we describe the data
compiled for the validation of the ocean-colour products from the ESA Ocean
Colour Climate Change Initiative (OC-CCI). The data were acquired from
several sources (including, inter alia, MOBY, BOUSSOLE, AERONET-OC, SeaBASS, NOMAD,
MERMAID, AMT, ICES, HOT and GeP&CO) and span the period from 1997 to 2018.
Observations of the following variables were compiled: spectral
remote-sensing reflectances, concentrations of chlorophyll
Currently, there are several sets of in situ bio-optical data, worldwide, suitable for validation of ocean-colour satellite data. Whereas some are managed by the data producers, others are in international repositories with contributions from multiple scientists. Many have rigid quality controls and are built specifically for ocean-colour validation. The use of only any one of these datasets would limit the number of data in validation exercises. It is, therefore, vital to acquire and merge all these datasets into a single unified dataset to maximize the number of matchups available for validation, their distribution in time and space, and, consequently, to reduce uncertainties in the validation exercise. However, merging several datasets together can be a complicated task. First it is necessary to acquire and harmonize all datasets into a single standard format. Second, during the merging, duplicates between datasets have to be identified and removed. Third, the metadata should be propagated throughout the process and made available in the final merged product. Ideally, the compiled dataset would be made available as a simple text table, to facilitate ease of access and manipulation. In this work such unification of multiple datasets is presented. This was done for the validation of the ocean-colour products from the ESA Ocean Colour Climate Change Initiative (OC-CCI), but with the intent to serve the broader user community as well.
A merged dataset is not without drawbacks: it is likely to be large and so not always easy to manipulate; because the merging is done on pre-existing, processed databases, it is not possible to have full control of the whole processing chain; the dataset would be a compilation of observations collected by several investigators using different instruments, sampling methods and protocols, which might eventually have been modified by the processing routines used by the repositories or archives. To minimize these potential drawbacks, we have, for the most part, incorporated only datasets that have emerged from the long-term efforts of the ocean-colour and biological oceanographical communities to provide scientists with high-quality in situ data, and we implemented additional quality checks on the data to enhance confidence in the quality of the merged product. Nevertheless, it is still recognized that different and unpredictable uncertainties may affect data from the diverse sources as a result of the application of a variety of field/laboratory instruments, methods and data reduction schemes.
In Sect. 2 the methodologies used to harmonize and integrate all data, as well as a description of individual datasets acquired, are provided. In Sect. 3 the geographic distribution and other characteristics of the final merged dataset are shown. Section 4 provides an overview of the data.
The compiled global set of bio-optical in situ data described in this work
has an emphasis, though not exclusive, on open-ocean data. It comprises the
following variables: remote-sensing reflectance (rrs), chlorophyll
This is the second version of the compilation of global bio-optical in situ data described by Valente et al. (2016). A track-change file of the manuscript of the first version can be found in the Supplement. The new version has more data and a higher temporal and spatial coverage. The increases in the number of observations are mainly for chla, rrs and aph. In comparison with Valente et al. (2016), the observations of chla and aph have doubled in number and provide a better spatial coverage, especially in the Southern and Arctic Ocean. The rrs values also increased in number, but not as much in spatial coverage, because most of the new observations came from fixed locations.
The present second version is a compilation of data from sources used in the first version (MOBY, BOUSSOLE, AERONET-OC, SeaBASS, NOMAD, MERMAID, AMT, ICES, HOT and GeP&CO) plus data from additional sources (AWI, ARCSSPP, BARENTSSEA, BATS, BIOCHEM, BODC, CALCOFI, CCELTER, CIMT, COASTCOLOUR, ESTOC, IMOS, MAREDAT, PALMER, SEADATANET, TPSS and TARA). The main differences from the first version are (1) some of the data sources used in the first version were updated (MOBY, AERONET, SeaBASS and HOT), (2) new data sources were added, (3) a new variable was compiled (total suspended matter), (4) the format of the database was modified and (5) two new flags were added.
Concerning the change in format, in Valente et al. (2016) the compilation was provided as one unique two-dimensional table. Now, given its increased size (136 250 rows and 1286 columns compared with 80 524 rows and 267 columns previously), the table has been broken into three smaller tables that relate to each other via one unique key identifying each row. One additional table is also provided to help with data manipulation. Despite this change, the compilation should still be viewed conceptually as one unique table, and as such, it is still described in that way. In the present version, two flags were added: flag_time and flag_chl_method. The first is because in the present version three data sources were used (ESTOC, MAREDAT and TPSS) where information on time (hour of the day) was not available. The time for these observations was set to 12:00:00 UTC and the observations were flagged with “1” in the column flag_time. A second flag was necessary, because in two data sources (ARCSSPP and SEADATANET) there was uncertainty on whether the compiled chlorophyll concentrations were measured using fluorometric, spectrophotometric or high-performance liquid chromatography (HPLC) methods. The compiled chlorophyll observations from these two data sources were flagged with “1” in the column flag_chl_method and were marked as chla_fluor.
Remote-sensing reflectance (rrs) is a primary ocean-colour product defined
as rrs
Chlorophyll
With regard to the inherent optical properties (aph, adg, bbp), if not
already calculated and provided in the contributed datasets, they were
computed from related variables that were available: particle absorption
(ap), detrital absorption (ad), coloured dissolved organic matter (CDOM)
absorption (ag) and total backscattering (bb). The following equations were
used: adg
The merged dataset was compiled from 27 sets of in situ data, which were
obtained individually either from archives that incorporate data from
multiple contributors (SeaBASS, NOMAD, MERMAID, ICES, ARCSSPP, BIOCHEM,
BODC, COASTCOLOUR, MAREDAT, SEADATANET) or from particular contributors,
measurement programmes or projects (MOBY, BOUSSOLE, AERONET-OC, HOT,
GeP&CO, AMT, AWI, BARENTSSEA, BATS, CALCOFI, CCELTER, CIMT, ESTOC, IMOS,
PALMER, TPSS, TARA) and were subsequently homogenized and merged. Data
contributors are listed in Table 2. There were
methodological differences between datasets. Therefore, after acquisition,
and prior to any merging, each set of data was preprocessed for quality
control and converted to a common format. During this process, data were
discarded if they had (1) unrealistic or missing date and geographic
coordinate fields, (2) poor quality (e.g. original flags) or method of
observation that did not meet the criteria for the dataset (e.g. in situ
fluorescence for chlorophyll concentration), and (3) spuriously high or low
data. For the last item, the following limits were imposed: [0.001–100] mg m
The standard variables, nomenclatures and units in the final table.
Original sets of data and data contributors in the final table.
Continued.
Continued.
Continued.
Once a set of data was homogenized, its data were integrated into a unique table. This final merging focused on the removal of duplicates between the sets of data. Although some duplicates are known (e.g. MOBY, BOUSSOLE, AERONET-OC and NOMAD data are found in SeaBASS and MERMAID), others are unknown (e.g. how many data of GeP&CO, ICES, AMT and HOT are within NOMAD, SeaBASS and MERMAID). Therefore, duplicates were identified using the metadata (dataset and subdataset) when possible and temporal–spatial matches, as an additional precaution. For temporal–spatial matches, several thresholds were used, but typically 5 min and 200 m were taken to be sufficient to identify most duplicated data, which reflected small differences in time, latitude and longitude, between the different sets of data. Larger thresholds were used in some cases as a cautionary procedure. This was the case when searching for NOMAD data in other datasets, because NOMAD includes a few cases where merging of radiometric and pigment data was done with large spatial–temporal thresholds (Werdell and Bailey, 2005). A large temporal threshold was also used when integrating observations from the three data sources that did not have time available (ESTOC, MAREDAT and TPSS). In regard to all data, if duplicates were found, data from the NOMAD dataset were selected first, followed by data from individual projects or contributors (MOBY, BOUSSOLE, AERONET-OC, AMT, HOT,GeP&CO, AWI, BARENTSSEA, BATS, CALCOFI, CCELTER, CIMT, ESTOC, IMOS, PALMER, TPSS and TARA), and finally for the remaining datasets (SeaBASS, MERMAID,ICES, ARCSSPP, BIOCHEM, BODC, COASTCOLOUR, MAREDAT and SEADATANET). This procedure was chosen to preserve the NOMAD dataset as a whole, since it is widely used in ocean-colour validation. It should be noted that, by this procedure, data from individual projects or contributors may be listed under NOMAD (e.g. some PALMER data are found in NOMAD with metadata string nomad_palmer_lter). After giving priority to NOMAD, the priority was generally given to data from individual projects or contributors, but due to an incremental approach, where only new data are added to previous versions of the compilation, some data from individual projects or contributors (BATS, CALCOFI, CIMT, PALMER and TPSS) added in later stages may be found under other data sources. This occurs mainly for BATS and CALCOFI, which have their earlier chlorophyll data in SeaBASS with metadata strings seabass_bats* and seabass_cal*, and also CIMT, which has some of its data under COASTCOLOUR. After all data from a given source were free of duplicates, they were merged consecutively by variable in the final table. During this process, we also searched for rows (stations) that were separated from each other by time differences less than 5 min and horizontal spatial differences of less than 200 m. When such rows were found, the observations in those rows were merged into a single row. The compiled merged data were compared with the original sets to certify that no errors occurred during the merging. As a final step, a water column (station) depth was recorded for each observation, which was the closest water column depth from the ETOPO1 global relief model (National Geophysical Data Center ETOPO1; Amante and Eakins, 2009). For observations where the closest water depth was above sea level (e.g. data collected very near the coast), it was given the value of zero.
Data processing thus included two major steps: preprocessing and merging. The first step was related to each set of contributing datasets in particular and aimed to identify problems and convert the data of interest to a standard format. The second step dealt with the integration of data into one unique file and included the elimination of duplicated data between the individual sets of data. In the next subsections a brief overview of each original set of data is provided.
MOBY is a fixed mooring system operated by the National Oceanic and Atmospheric Administration (NOAA) that provides a continuous time series of water-leaving radiance and surface irradiance in the visible region of the spectra since 1997. The site is located a few kilometres west of the Hawaiian island of Lanai where the water depth is about 1200 m. Since its deployment, MOBY measurements have been the primary basis for the on-orbit vicarious calibrations of the SeaWiFS and MODIS ocean-colour sensors. A full description of the MOBY system and processing is provided in Clark et al. (2003). Data are freely available for scientific use at the MOBY Gold directory. The products of interest are the Scientific Time Series files, which refer to MOBY data averaged over sensor-specific wavelengths and particular hours of the day (around 20:00–23:00 UTC). For this work, the satellite band-average products for SeaWiFS, MODIS AQUA, MERIS, VIIRS and the Ocean and Land Colour Instrument (OLCI) were compiled from the R2017 Reprocessing. The in-band-average subproduct was used, and to maintain the highest quality only data determined from the upper two arms (Lw1) and flagged as good quality were acquired. Data from the MOBY203 deployment were discarded due to the absence of surface irradiance data. The compiled variable was the remote-sensing reflectance, rrs, which was computed from the original water-leaving radiance (Lw) and surface irradiance (Es). The water-leaving radiances were corrected for the bidirectional nature of the light field (Morel and Gentili, 1996; Morel et al., 2002) using the same lookup table and method as that used in the SeaWiFS Data Analysis System (SeaDAS) processing code. The MOBY data were reprocessed in 2017 (MOBY R2017 Reprocessing) to include various improvements in the calibration of the instrument and post-processing, which include (1) a new method to extrapolate the upwelling radiance attenuation coefficient to the surface, (2) an increase in arm depth by 0.234 m and (3) a single pixel shift in the data for the red spectrograph collected at a bin factor of 384. Only the last two changes were included in present compilation. As mentioned before, the MOBY data compiled in this work are sensor-specific. Therefore, attention is necessary to use the correct MOBY data when validating a particular sensor. The way MOBY data are stored in the final merged table is consistent with the original wavelengths; however, these wavelengths can differ from what is sometimes expected to be the central wavelength of a given band and sensor. Irrespective of the wavelength where MOBY data are stored in the final table, for validation of bands 1–6 of SeaWiFS, MOBY data stored in the final merged table at 412, 443, 490, 510, 555 and 670 nm, respectively, should be used. For validation of bands 1–6 of MODIS AQUA, MOBY data stored in the final merged table at 416, 442, 489, 530, 547 and 665 nm, respectively, should be used. For validation of bands 1–7 of MERIS, MOBY data stored in the final merged table at 410.5, 440.4, 487.8, 507.7, 557.6, 617.5 and 662.4 nm, respectively, are the appropriate data. For validation of bands 2–8 of OLCI, MOBY data stored in the final merged table at 412.0676, 443.1898, 490.7176, 510.6403, 560.5796, 620.626 nm and 665.3737, respectively, are the appropriate data. Finally, for validation of bands 1–5 of VIIRS, MOBY data stored in the final merged table at 412.9, 444.5, 481.2, 556.3 and 674.6 nm, respectively, are the appropriate data.
The BOUSSOLE project started in 2001 with the objective of establishing a time
series of bio-optical properties in oceanic waters to support the
calibration and validation of ocean-colour satellite sensors (Antoine et
al., 2006). The project consists of a monthly cruise programme and a permanent
optical mooring (Antoine et al., 2008). The mooring collects radiometry and
inherent optical properties (IOPs) in continuous mode every 15 min at two depths (4 and 9 m nominally). The monthly cruises are devoted to the mooring
servicing, to the collection of vertical profiles of radiometry and IOPs,
and to water sampling at 11 depths from the surface down to 200 m, for
subsequent analyses including phytoplankton pigments, particulate
absorption, CDOM absorption and suspended particulate matter load. The
BOUSSOLE mooring is in the western Mediterranean Sea at a water depth of
2400 m. All pigment (2001–2012) and radiometric (2003–2012) data were
provided by the principal investigator. The compiled variables were rrs
and chla_hplc. Observations of the diffuse attenuation
coefficient (kd) were not included in the present compilation, as they
were under internal quality revision at the time of data acquisition.
Remote-sensing reflectance was computed from the original
fully normalized water-leaving radiance (nLw_ex),
which is the normalized water-leaving radiance (nLw, previously
described), with a correction for the bidirectional nature of the light
field (Morel and Gentili, 1996; Morel et al., 2002). The solar irradiance
(Fo) was computed from two available variables in the original set of
data: the normalized water-leaving radiance (nLw) and the remote-sensing
reflectance (rrs), using the equation Fo
AERONET-OC is a component of AERONET, including sites where sun photometers
operate with a modified measurement protocol leading to the determination of
the fully normalized water-leaving radiance (Zibordi et al., 2006, 2009). As a result of collaboration between the Joint Research
Centre (JRC) and NASA, this component has been specifically developed for
the validation of ocean-colour radiometric products. The strength of
AERONET-OC is “the production of standardized measurements that are
performed at different sites with identical measuring systems and protocols,
calibrated using a single reference source and method, and processed with
the same codes” (Zibordi et al., 2006, 2009). All high-quality data (Level-2) were acquired from the project website for 11
sites: Abu_Al_Bukhoosh (
In comparison with the previous compilation of AERONET-OC data from the Lucinda site, a calibration correction was applied by NASA affecting instrument SN-520. All radiometric data from this instrument provided by NASA prior to October 2018 were underestimated by approximately a factor of 2 due to incorrect application of instrument gains during the processing.
SeaBASS is one of the largest archives of in situ marine bio-optical data
(Werdell et al., 2003). It is maintained by NASA's Ocean Biology
Processing Group (OBPG) and includes measurements of optical properties,
phytoplankton pigment concentrations, and other related oceanographic and
atmospheric data. The SeaBASS database consists of in situ data from
multiple contributors, collected using a variety of measurement instruments
with consistent, community-vetted protocols from several marine platforms
such as fixed buoys, handheld radiometers and profiling instruments.
Quality control of the received data includes a rigorous series of protocols
that range from file format verification to inspection of the geophysical
data values (Werdell et al., 2003). Radiometric data were acquired
through the Validation search tool, which provided in situ data with
matchups for particular ocean-colour sensors (Bailey and Werdell, 2006). The
criterion in the search query was defined to have the minimal flag
conditions in the satellite data, to retrieve a greater number of matchups
and, therefore, in situ data. Regarding phytoplankton pigment data, the
majority were acquired through the Pigment search tool, which provided
pigment data directly from the archives. As was stated in the SeaBASS
website, the Pigment search tool was originally designed to return only
in vitro fluorometric measurements, which is consistent with our approach,
but over time chlorophyll
NOMAD is a publicly available dataset compiled by the NASA OBPG at the Goddard Space Flight Center. It is a high-quality global dataset of coincident radiometric and phytoplankton pigment observations for use in ocean-colour algorithm development and satellite-data product-validation activities (Werdell and Bailey, 2005). The source of the bio-optical data is the SeaBASS archive; therefore, many dependencies exist between these two datasets, which were addressed during the merging. The current version (version 2.0 ALPHA, 2008) includes data from 1991 to 2007 and an additional set of observations of inherent optical properties. The current version was used in this work, but with an additional set of columns of remote-sensing reflectance corrected for the bidirectional effects (Morel and Gentili, 1996; Morel et al., 2002). This additional set of columns was provided directly by the NOMAD creators. The compiled variables were rrs, chla_hplc, chla_fluor, aph, adg, bbp and kd. Conversion was necessary only for aph, adg and bbp and followed the procedures described in Sect. 2.1. For the calculation of bbp the variable bb was used with a smooth fitting to remove noise. A portion of the NOMAD data were optically weighted (for methods see Werdell and Bailey, 2005). These data are not consistent with the protocols chosen in this work, but these observations were retained since NOMAD is a widely used dataset in ocean-colour validation.
MERMAID provides in situ bio-optical data matched with concurrent and
comparable MERIS Level 2 satellite ocean-colour products (Barker, 2013a, b). The MERMAID in situ database consists of data from multiple
contributors, measured using a variety of instruments and protocols from
several marine platforms such as fixed buoys, handheld radiometers and
profiling instruments. Comprehensive quality control and protocols are used
by MERMAID to integrate all the data into a common and comparable format
(Barker, 2013a, b). Access to MERMAID data is limited to the
MERIS Validation Team, the MERIS Quality Working Group and to the in situ
data contributors. For this work, access has been granted to the MERMAID
database through a signed service level agreement. The MERMAID data
includes subsets of several datasets used in this compilation (MOBY,
AERONET-OC, BOUSSOLE, NOMAD). These observations were removed from the
MERMAID dataset to avoid duplication (as discussed in Sect. 2.1). The
compiled variables were rrs, chla_hplc,
chla_fluor, aph, adg, bbp, kd and
tsm. Remote-sensing reflectance was calculated by dividing the
original fully normalized water-leaving reflectance (Rw_ex), which is the water-leaving reflectance (
HOT programme provides repeated comprehensive observations of the
hydrography, chemistry and biology of the water column at a station located
100 km north of Oahu, Hawaii, since October 1988 (Karl and Michaels, 1996).
This site is representative of the North Pacific subtropical gyre. Cruises
are made approximately once a month to the deep-water station ALOHA (A
Long-Term Oligotrophic Habitat Assessment; 22
GeP&CO is part of the French PROOF programme and aims to describe and
understand the variability of phytoplankton populations, as well as to assess its
consequences on the geochemistry of the oceans (Dandonneau and Niang, 2007).
It is based on the quarterly travels of the merchant ship Contship London
from France to New Caledonia in the Pacific. A scientific observer sailed on
each trip and operated the sampling for surface water, filtration, various
measurements and checking at several times of each day. The experiment
started in October 1999 and finished in July 2002. Pigment data were
extracted from the project website. Additional pigment data obtained during
the OISO-4 cruise in the southern Indian Ocean on board R/V
AMT is a multidisciplinary programme, which undertakes biological, chemical
and physical oceanographic research during an annual voyage between the UK
and destinations in the South Atlantic (Robinson et al., 2006). The
programme was established in 1995 and since then has completed 28 research
cruises. Pigment data between 1997 (AMT5) and 2005 (AMT17) were provided by
the British Oceanographic Data Centre (BODC) following a specific request
for discrete observations of chlorophyll
ICES is a network of more than 4000 scientists from almost 300 institutes, with 1600 scientists participating in activities annually. The ICES Data Centre manages a number of large dataset collections related to the marine environment covering the northeastern Atlantic, Baltic Sea, Greenland Sea and Norwegian Sea. The majority of data originate from national institutes that are part of the ICES network of member countries. Data were provided (on 28 April 2014) from the ICES database on the marine environment (Copenhagen, Denmark) following a specific request. The ICES data were made available under the ICES data policy, and if there is any conflict between this and the policy adopted by the users, then the ICES policy applies. The compiled variables were chla_hplc and chla_fluor.
The ARCSSPP database is a synthesis of observations between 1954 and 2006 from the Arctic Ocean and northern seas (Matrai et al., 2013). The observations
were acquired from data repositories, publications or provided by individual
investigators. The database includes quality-controlled observations of
productivity and chlorophyll
In this work, the AWI data source refers to the group of observations that
were provided to the OC-CCI project by Astrid Bracher. These are bio-optical
observations collected during several cruises in the Atlantic and Pacific
Ocean. All data were available through the PANGAEA repository. Observations
of concentration of chlorophyll
BATS is a long-term study by the Bermuda Institute of Ocean Sciences based
on regular cruises in the western Atlantic Ocean (Sargasso Sea) since 1988.
The cruises at the BATS site (
The BARENTSSEA data source refers to a group of observations that were
provided to OC-CCI project by Knut Yngve Børsheim. This collection was
developed using data from the archives of the Institute of Marine Research
(Norway). It comprises observations of temperature, salinity and
chlorophyll
BioChem is an archive of marine biological and chemical data maintained by
Fisheries and Oceans Canada (DFO, 2018; Devine et al., 2014). The available
observations are from department research initiatives and collected in areas
of Canadian interest. Available parameters include pH, nutrients,
chlorophyll, dissolved oxygen and other plankton data (species and biomass).
Chlorophyll measurements from in vitro fluorometric methods were extracted
(from
BODC is the designated marine science data centre for the United Kingdom.
The data used in this work derive from a specific request for discrete
observations of chlorophyll
CalCOFI is a partnership of the California Department of Fish and Wildlife,
National Oceanic and Atmospheric Administration Fisheries Service, and
Scripps Institution of Oceanography. CalCOFI has conducted quarterly cruises
off southern and central California since 1949. Data collected in the upper
500 m include temperature, salinity, oxygen, nutrients, chlorophyll,
primary productivity, plankton biodiversity and biomass. For this work,
only observations of chlorophyll
CCELTER investigates the California Current coastal pelagic ecosystem, with
a focus on long-term forcing. The CCELTER data include primary and derived
measurements from both Process and CalCOFI-augmented cruises, as well as other time series. CCELTER data include variables from the physical environment,
biogeochemistry and biological populations/communities. For this work
chlorophyll observations measured from discrete bottle samples from CCELTER
Process cruises determined by extraction and bench fluorometry
(
CIMT was a non-operational programme where marine scientists from different
disciplines and institutions combine their efforts on observations directed
towards understanding the central California upwelling system. The CIMT
archived data include coastal ocean observations from satellites, shipboard
data, moorings and large marine animal movements. For this work, pigment
data from discrete bottle samples taken during CIMT monthly cruises were
used. Data were acquired from the project website
(
COASTCOLOUR datasets were designed to evaluate the performance of ocean-colour satellite algorithms in the retrieval of water quality parameters in
coastal waters (Nechad et al., 2015a). Three types of COASTCOLOUR datasets
are available: (1) a matchup dataset where in situ bio-optical observations
are available simultaneously with a cloud-free MERIS product, (2) an in situ
reflectance dataset where an in situ reflectance is available simultaneously
with an in situ measurement of chlorophyll
ESTOC is an open-ocean monitoring site located in the eastern North Atlantic
subtropical gyre. ESTOC was initiated in 1991 with particle flux
measurements and in 1994 began standard observations of the water column,
in addition to the deployment of a current meter mooring. The core
parameters measured at ESTOC include salinity, temperature, current speed,
nutrients, chlorophyll, inorganic carbon, particulate organic carbon and
nitrogen, and sinking particle flux (Neuer et al., 2007). For this work
measurements of chlorophyll
IMOS is a national collaborative research infrastructure supported by
Australian Government. Since 2006, IMOS has operated a wide range of observing
equipment throughout the coastal and open ocean around Australia, making all
data openly available to the scientific community and other stakeholders
and users. In this work, the IMOS dataset refers only to a data collection
entitled IMOS National Reference Station (NRS) – Phytoplankton HPLC Pigment
Composition Analysis, which was acquired from the Australian Ocean Data
Network Portal (
Relative spectral frequency of remote-sensing reflectance in the final table, using 10 nm wide class intervals, defined as the ratio of the number of observations at a particular waveband to the total number of observations at all wavebands, multiplied by 100 to report results in percentage. Data at a total of 611 unique wavelengths, between 404.7 and 1022.1 nm, were compiled.
The distribution of
The MAREDAT database is a global assemblage of pigments measured by HPLC
(Peloquin et al., 2013a) from the combination of 136 independent field datasets,
solicited from investigators and databases. The database provides high-quality measurements of taxonomic pigments including chlorophyll
Temporal distribution of chlorophyll
Ranges of remote-sensing reflectance band ratios (412 : 443 and 490 : 555) for all data. The points from the NOMAD dataset are shown in blue for reference. To maximize the number of ratios per dataset a search window up to 12 nm was used, when the four wavelengths (412, 443, 490, 555) were not simultaneously available. The effect of different search windows was negligible in the ratio distribution.
Global distribution of remote-sensing reflectance per dataset in the final table. The data sources are identified with different colours. Points show locations where at least one observation is available. Crosses show sites from which time series data of remote-sensing reflectance are available.
PALMER is a monitoring station located in western Antarctic Peninsula. The
Palmer station investigates the marine ecology of the Southern Ocean with
a focus on the pelagic marine ecosystem, including sea ice habitats, regional
oceanography and nesting sites of seabird predators. The PALMER data include
measurements of meteorological, oceanographic, sea ice, predators, nutrients
and biogeochemistry, pigments, primary production, zooplankton and microbe
parameters. This work used the measurements of chlorophyll analysed by HPLC
and fluorometry taken at the Palmer station
(
Comparison of coincident observations of chlorophyll
Number of observations per chlorophyll
SeaDataNet is a Pan-European infrastructure for ocean and marine data
management. It aims to develop a standardized system for managing large and
diverse datasets collected by oceanographic cruises and automatic
observation systems. For this work, discrete chlorophyll
Global distribution of chlorophyll
Global distribution of chlorophyll
In this work, the TPSS data source refers to a group of observations that were provided to this compilation by Trevor Platt and Shubha Sathyendranath. This is a collection of bio-optical in situ data collected during cruises predominantly in the northwestern Atlantic but also from the Indian Ocean, South Pacific and central Atlantic (see Sathyendranath et al., 2009, for additional details regarding the cruises). It comprises measurements of phytoplankton pigments and algal pigment absorption coefficients. The time of day was unavailable and was set to 12:00:00 UTC. These observations were flagged with “1” in the column flag_time. The compiled variables were chla_hplc, chla_fluor and aph.
The chlorophyll
The Tara expeditions consist of several cruises around the world, some with
durations of several years, designed to study and understand the
distribution of planktonic organisms in the world ocean. The discrete
observations of remote-sensing reflectance and chlorophyll
In this work several sets of bio-optical in situ data were acquired,
homogenized and merged into a single table. The table comprises in situ
observations between 1997 and 2018, with a global distribution, and includes
the following variables: remote-sensing reflectance (rrs), chlorophyll
Observations of remote-sensing reflectance are available at 611 unique
wavelengths (i.e. columns), between 404.7 and 1022.1 nm (Fig. 1). In
total there are 59 781 observations (i.e. rows) with remote-sensing
reflectance in the table. The total number of observations are partitioned
per contributing datasets as follows: AERONET-OC (31 574), BOUSSOLE
(17 364), MOBY (5466), NOMAD (3326), MERMAID (885), SeaBASS (698), AWI
(54), COASTCOLOUR (307) and TARA (107). Data from AERONET-OC, BOUSSOLE and
MOBY correspond to continuous time series, and, hence, the higher number of
observations. Data distribution at 44
A remote-sensing reflectance maximum band ratio (as defined in
text) ([443,490,510]
The distribution of
For chlorophyll
Coincident observations of chlorophyll
The distribution of absorption coefficients band ratios:
adg(443)
Summary of median values for aph, adg and bbp at 44
Global distribution of observations of inherent optical properties (algal pigment absorption coefficient aph, detrital plus CDOM absorption coefficient adg, and particle backscattering coefficient bbp) in the final table.
Global distribution of diffuse attenuation coefficient for
downward irradiance (kd) and total suspended matter (tsm) per dataset in the final table. The tsm and kd points from MERMAID overlap
each other in the western Black Sea (
Examples of bio-optical relationships in the final merged table:
The inherent optical properties (aph, adg and bbp) are available
at 550 unique wavelengths between 300 and 850 nm. There is a total of 3293,
1654 and 792 observations, for aph, adg and bbp, respectively.
For aph the total number of observations is distributed among NOMAD
(1190), TPSS (966), COASTCOLOUR (593), AWI (458), SeaBASS (14) and MERMAID
(72). For adg the contributions are as follows: NOMAD (1079),
COASTCOLOUR (531), SeaBASS (11) and MERMAID (33). The bbp observations
come from NOMAD (371), COASTCOLOUR (154), SeaBASS (32) and MERMAID (235).
The data distribution of aph, adg and bbp at 44
Finally, for the diffuse attenuation coefficient for downward irradiance
(kd) there are 25 unique wavelengths between 405 and 709 nm. There is a
total of 2454 observations from NOMAD (2266), SeaBASS (118) and MERMAID
(70). Data distribution of kd at 44
Although most of the stations with concurrent variables are from the NOMAD
dataset, for completeness, an examination of bio-optical relationships is
provided (Fig. 16). The relation between aph at 443 nm and chlorophyll
Information about the data availability can be found in Appendix B.
In this work, a compilation of bio-optical in situ data is presented,
resulting from the acquisition, homogenization and integration of several
sets of data obtained from different sources. The compiled data have a
global coverage and span the period from 1997 to 2018. Minimal changes were
made to the original data, other than the ones occurring from conversion to
standard format and quality control. In situ measurements of the following
variables were compiled: remote-sensing reflectance, chlorophyll
The final set of data consists of a substantial number of in situ
observations, available in a simple text table and processed in a way that
could be used directly for the evaluation of satellite-derived ocean-colour
data. The major advantages of this compilation are that it merges six
commonly used data sources in ocean-colour validation (MOBY, BOUSSOLE,
AERONET-OC, SeaBASS, NOMAD and MERMAID), four data sources developed for
ocean-colour applications (AWI, COASTCOLOUR, TPSS and TARA) and 17
additional sets of chlorophyll
A former version of this article was published on 3 June 2016 and is available at
The compiled data are available at
Example of how the compiled data look. The result if the compilation is queried for the chlorophyll data from subdataset seabass_car81 is shown.
The supplement related to this article is available online at:
AV complied the database, carried out the integration and quality checking, and drafted the manuscript. The first six authors are part of the ESA OC-CCI team and contributed to the design of the compilation and to the quality checking, as well as contributing data. The remaining authors are listed alphabetically and are data contributors (see their respective dataset in Table 2) or individuals responsible for the development of a particular dataset (e.g. JW for NOMAD and KB for MERMAID). All data contributors (listed in Table 2) were contacted for authorization of data publishing and offered co-authorship. In the case of the ICES dataset the permission for publishing was given by the ICES team. All the authors have critically reviewed the manuscript. MW and TM passed away before submission. We regard their approval of this work as implicit.
The authors declare that they have no conflict of interest.
This paper is a contribution to the ESA OC-CCI project. This work is also a
contribution to project PEst-OE/MAR/UI0199/2014. We would like to thank the
efforts of the teams responsible for collection of the data in the field and
of the teams responsible for processing and storing the data in archives,
without which this work would not be possible. We thank Tamoghna Acharyya
and Robert Brewin at Plymouth Marine Laboratory for their initial
contribution to this work. We thank the NOAA (US) for making available the
MOBY data and Yong Sung Kim for the help in questions about MOBY data.
BOUSSOLE is supported and funded by the European Space Agency (ESA), the
Centre National d'Etudes Spatiales (CNES), the Centre National de la
Recherche Scientifique (CNRS), the Institut National des Sciences de
l'Univers (INSU), the Sorbonne Université (SU) and the Institut de la
Mer de Villefranche (IMEV). We thank ACRI-ST, ARGANS and ESA for access to
the MERMAID database (
This research has been supported by the ESA Climate Change Initiative – Ocean Colour project (ref: AO-1/6207/09/I-LG).
This paper was edited by David Carlson and reviewed by two anonymous referees.