Daily to annual net primary production in the North Sea determined using autonomous underwater gliders and satellite Earth observation

Shelf-seas play a key role in both the global carbon cycle and coastal marine ecosystems through the drawn-down and fixing of carbon, as measured through phytoplankton net primary production (NPP). Measuring NPP in situ, and extrapolating this to the local, regional and global scale presents challenges however because of limitations with the techniques utilised (e.g. radiocarbon isotopes), data sparsity and the inherent biogeochemical heterogeneity of coastal and open-shelf waters. Here, we introduce a powerful new technique based on the synergistic use of in situ glider profiles and satellite Earth Observation 5 measurements which can be implemented in a real-time or delayed mode system. We apply this system to a fleet of gliders successively deployed over a 19-month time-frame in the North Sea, generating an unprecedented fine scale time-series of NPP in the region (Loveday and Smyth, 2020). At the large-scale, this time-series gives close agreement with existing satellitebased estimates of NPP for the region and previous in situ estimates. What has not been elucidated before is the high-frequency, small-scale, depth-resolved variability associated with bloom phenology, mesoscale phenomena and mixed layer dynamics. 10


Introduction
Our understanding of the global ocean has been transformed over the past two decades by the advent of autonomous observations from gliders and floats (Chai et al., 2020;Roemmich et al., 2019;Mignot et al., 2014;Smith et al., 2011). Such platforms have shown the capability to probe the marine environment at increasingly fine temporal and spatial resolution at local, regional https://doi.org/10.5194/essd-2021-311 glider payloads or platform types. We apply this method to an 19-month autonomous glider field campaign in the North Sea, a critical shelf-sea for fisheries with other multiple environmental stressors including eutrophication (Ferreira et al., 2011), deoxygenation (Queste et al., 2016), shipping (Barry et al., 2006) and pollution (Salomons et al., 2012). For the first time 55 we uncover the considerable temporal and spatial variability in NPP, driven by seasonal succession, fronts (Miller, 2009) and topographical features, as well as capturing two winter seasons which are crucial in conditioning the system for the following spring and summer periods.

Ingested glider data
As part of the Alternative Framework to Assess Marine Ecosystem Functioning in Shelf Seas (AlterEco) project, a sustained 60 presence of autonomous underwater gliders in the North Sea was maintained between November 2017 and May 2019. The programme aimed to keep at least two gliders in the field at all times, to provide measurement redundancy and assist with data validation. All gliders had a basic instrumentation package consisting of Conductivity Temperature Depth (CTD) in order to determine vertical profiles of temperature and salinity, and a Seabird Scientific ECO-puck for fluorescence and back-scattering measurement. The data set presented here is confined to only those gliders with ECO-pucks configured for chlorophyll fluores- AlterEco glider missions are grouped in seven deployments 1 , outlined in table 1. The glider data for these deployments is 70 available from the British Oceanographic Data Centre (BODC) at https://www.bodc.ac.uk/data/bodc_database/gliders/. The data is supplied in the Everyone's Gliding Observatories (EGO) format 2 , which aggregates all profiles from a single glider mission into one NetCDF file.
3 Method 3.1 Overview of the NPP processor 75 The NPP processor comprises a set of Python-based routines that manage the ingestion, quality control, correction, and preand post-processing of autonomous underwater glider profiles, as well as interfaces with external routines to calculate spectral PAR (Gregg and Carder, 1990) and NPP itself (Morel, 1991), which are implemented in the C programming language. Figure 2 shows a detailed flow diagram for the various processing stages. The processor supports multiple approaches to NPP calculation, depending on the availability of glider-based optical sensor data. 80 1 doi; 10.5285/b57d215e-065f-7f81-e053-6c86abc01a82 and 10.5285/86429662-97b8-74fa-e053-6c86abc0a97c 2 http://dx.doi.org/10.25607/OBP-768) from a remote FTP repository on a user determined schedule. All ingested source files are stored in an initial deployment directory, and catalogued in a centralised SQLite database. This non-destructive approach supports the continual updating of the glider record from a remote catalogue in the NRT case, while preventing replication. The database monitors, records and manages all subsequent stages of the processor.
NPP calculations are performed on a profile-by-profile basis. Glider data typically consists of both downward (dive) and 95 upward (climb) components in a single file, which in our processing framework represents two profiles. If a pre-existing profile designation is provided, as is usually the case in EGO data, this is used to split the source data into profiles. If no designation is provided, the ingested data is split into single files according to the turning points in the smoothed depth record. Smoothing is performed using a 5 th order Savitzky-Golay filter, with a nominal window of 151 points. Individual profiles are then stored in the staging directory for future use.

Constructing Earth Observation trajectories
Due to trade offs necessary to achieve the multiple mission priorities of the AlterEco programme, not all gliders were able to accommodate PAR sensors. Here, when required, SEO-based PAR data is used in lieu of in situ measurements. This increases both the flexibility and utility of the method for the operational oceanography community, allowing it to be applied to glider data where only in situ chlorophyll-a fluorescence is available. In addition, SEO and reanalysis data is used to provide information 105 on the prevailing atmospheric and marine conditions during each glider mission. A list of SEO and reanalysis data sources, and the variables that are extracted and/or derived can be found in table 2 and in figure 2.
When a glider mission is updated (e.g. a new profile is added in NRT mode), the processor calculates the new temporal and spatial extents of the mission. Using these extents, the processor gathers the required SEO and reanalysis data from the specified source, concatenates the retrieved catalogue in time and trims the spatial coverage to produce a 'data cube' that matches the 110 glider mission extents. The spatial trimming is performed remotely, on the server side, if the data service in use allows this capability, reducing data transfer costs and time. In NRT mode, new data is added to extend the cube as required, without the need to download the entire catalogue once again (e.g. via the concatenation of new time slices to the existing local record). This operation is performed for all variables, for all gliders, irrespective of whether they have the relevant in situ measurement, allowing for the continual validation of the use of SEO and reanalysis data as a substitute.

115
Once the data cube has been constructed, the average time and location of each profile are extracted and concatenated into a one-dimensional time series of the glider trajectory. Bi-linear interpolation is then used to retrieve the corresponding SEO and reanalysis data from the relevant data cube, resulting in an SEO-trajectory file for each variable, with a value for each profile. During construction, the cube is both spatially and temporally 'padded' to eliminate 'edge effects' associated with interpolation.  (Frouin et al., 1989) (see table 2). Instantaneous values of broadband PAR, corresponding to the glider measurement times, are derived from the average values as follows. The light distribution is modelled as a sine curve between dawn (T = 0) and dusk (T = π). The amplitude of this curve is determined such that the 125 integrated value below it matches that of the daily average. The instantaneous value is then extracted by interpolating the curve at the glider measurement time, and converted to W m −2 .

Pre-processing
The pre-processing step consolidates the glider and SEO-based data on a profile-by-profile basis, performs quality control procedures and selects the relevant variables for NPP calculations depending on availability (figure 2). Sporadic missing values 130 are common in in situ data. Where possible, linear interpolation is used to fill these gaps in the positional, depth and pressure data. If interpolation is not possible, the profile is discarded and no further processing takes place.
Following this, and where not provided directly, conservative temperature and absolute salinity are calculated from the glider CTD record using the TEOS-10 Python GSW toolbox 3 . Mixed layer depth (MLD) is then calculated from the density gradient according to Holte and Talley (2009). If the MLD calculation fails, the MLD from the previous profile is used. The MLD is 135 prevented from being shallower than 5 m, as depths shallower that this are typically poorly sampled by a glider. Once the physical variables are processed, the PAR and [Chl-a] profiles are assessed, along with the back-scattering data, if present. PAR data delivered in raw counts is corrected to W m −2 using the calibration coefficients specific to the sensor. The in situ [Chl-a] data is similarly treated. In the latter case, [Chl-a] data is discarded where the calibrated value exceeds 1e 5 mg m −3 .
In the case of DM processing, post mission calibration factors can also optionally be applied, though none were applied in this

Determining the PAR profile
PAR sensors do not always acquire at the same sampling rate as the glider CTD sensor. Consequently, where it is available in situ PAR data for a given profile is interpolated onto the glider depth record prior to further processing. The decision point for the use of glider or SEO-based broadband PAR is made as follows;

145
-Where a profile falls during the day-time and glider E d is available, this is used by default (though the use of SEO-PAR can be forced, to permit validation). K dP AR is calculated from the linear regression of the logged PAR values with depth.
The regression is weighted by the square-root of the magnitude of the logged PAR values, emphasising the effect of the surface layers. E d at depth is then projected to the surface using the K dP AR value, giving sub-surface broadband PAR  (2015). Fresnel reflectance is calculated using the wind speed, relative humidity and mean sea level pressure derived from the SEO trajectory files.
-Where glider PAR is not available SEO-based surface broadband PAR (E + o ) is substituted. E − o is calculated as described in the previous step. SEO K dP AR , calculated from SEO K d490 Saulquin et al. (2013), is then used to project broadband PAR into the subsurface across the glider depth record.

155
-Where SEO K d490 is not available, K dP AR is calculated according to equation 1 (Saulquin et al. (2013)), where PAR(Z eu ) is assumed to be 1% of PAR(0), and the euphotic depth (Z eu ) is determined from the maximum in situ [Chl-a] above the MLD, according to equation 2 Lee et al. (2007).
The PAR record is labelled as bad, and is not processed, in the case of i) night-time profiles, ii) substantial gaps in the [Chl-a] data, or iii) where the glider is within 5 m of the bathymetry depth, as interpolated from the GEBCO 15 arc-second gridded product 4 . The latter criteria prevents the glider from deriving NPP estimates from [Chl-a] readings that may have been gathered at depths where particle re-suspension in likely to make them unreliable. Euphotic depth (Z eu : defined as the 1% of the surface light depth) is subsequently calculated for all good profiles, and is selectively used in the correction of the [Chl-a] reconstructed from the glider profiles by typically ∼50 W m -2 . In addition, the SEO-based values do not take account of the instantaneous cloud conditions and, correspondingly, the standard deviation in the surface PAR time series is lower. However, the SEO-based interpolation method gives an accurate facsimile of the daily PAR cycle, with a meanĒ + o that falls within 7% of the in situ broadband value. This error value that is comparable with the 5% "in air" performance of the in situ PAR sensor itself 5 .

Using the optical backscatter profile
Where available, optical backscattering measurements (b bp ) may be used to correct the surface chlorophyll fluorescence profile 175 for near-surface quenching (Hemsley et al., 2015). The backscattering data is initially passed through a 7-point running minimum filter to remove spikes (Thomalla et al., 2018). Negative values are removed and the scattering profile is subsequently interpolated onto the glider depth record.

Quality control and quenching correction of the chlorophyll fluorescence profile
The glider chlorophyll data is initially screened for erroneous values, with measurements of [Chl-a] < 0.0 mg m −3 discarded.

180
To account for bio-fouling, the record is also discarded where there is a consistent "step change" of > 5 mg m −3 in [Chl-a] at depths below both the MLD and Z eu , as compared with the initial deployment value. As with the treatment of the PAR data, the [Chl-a] is then interpolated onto the glider depth record on a profile-by-profile basis. Where the interpolation fails due to lack of data, the entire profile is discarded and no NPP calculation is performed.
Fluorescence quenching in phytoplankton is caused by a variety of physiological acclimation mechanisms in order to avoid 185 photo-damage under excessive irradiance (Kiefer, 1973). This effect typically manifests as a depression of the fluorescence signal in the surface waters during daylight, and particularly around solar noon when the downwelling irradiance is at a maximum (Xing et al., 2012;Biermann et al., 2015). Multiple approaches to quenching correction have been proposed, e.g. The Xing et al. (2012) method clearly outperforms the other methods tested, and is used to process all the gliders deployed during the AlterEco programme. Its strong performance is ascribed to its ability to appropriately capture the regional seasonal 210 interplay between the MLD and euphotic depth in the shelf seas. As shown in figure 5, the MLD sits above the euphotic depth during spring. This allows for the establishment of a deep chlorophyll maxima (DCM) (see figure 6), which is particularly important for NPP in this region (Fernand et al., 2013). In this case, quenching corrections using euphotic depth as a maximum depth limit (e.g. (Biermann et al., 2015)) over-correct as they tend to encapsulate the DCM in the quenching correction process, extrapolating erroneously high [Chl-a] to the surface.

Calculating and scaling the spectral irradiance profile
Once the pre-processing stages have been completed for all available glider profiles (figure 2) spectral E d profiles are calculated for each using the solar irradiance model described by Gregg and Carder (1990). To account for local meteorological conditions, the model runs using the [O 3 ], cloud cover, wind speed, relative humidity and total column water vapour parameters for each profile, stored in the relevant trajectory file (Table 2). These spectral E d values calculated from the model are scaled such that 220 their integrated value between 400 nm and 700 nm matches the corresponding E + o measurements provided by the glider or SEO data sources (see section 3.4.1). This scaling correction accounts for instantaneous sky conditions associated with each profile. model suitable for Case-1 waters (Carr, 1986), with the non-water component of light attenuation ascribed to [Chl-a] only (Morel and Maritorena, 2001).

230
The processor retains the ability to implement this method, as detailed extensively in Hemsley et al. (2015) and represented in Figure 2 by the "Case 1" decision box. However, the optically complex waters of the shelf-seas are rich in sediment, and do not conform to the Case-1 paradigm. Implementation of a spectral irradiance model more suitable to the region requires the consistent deployment of in situ PAR and back-scattering sensors that is not available across the AlterEco programme.
Consequently, no PAR-based scaling of the [Chl-a] profiles is performed for this data set. This caveat is further discussed in 235 section 5.4.

Calculating NPP
Net primary production is calculated from the corrected [Chl-a] and depth profiles and spectral downwelling PAR for each profile using the Morel (1991) model, as presented in Hemsley et al. (2015 and shown in equation 3. The model calculates NPP through a triple integral across day length (L), depth (D 1 = 0, D 2 = Z eu ) and wavelength (λ 1 = 400 nm, λ 2 = 700 nm).

240
The absorption cross section per unit of chlorophyll (a * [m −1 ]) and net growth rate (φ µ [mol(carbon) mol(quanta) −1 ]) are parameterised as in (Morel, 1991). Figure 7 allows a comparison of using SEO-based PAR in the calculation of spectral E d and, subsequently NPP, in contrast to using in situ PAR. SEO-based PAR is shown to function as a suitable proxy in this method, remaining highly correlated with its in situ counterpart, with mean values that are within 2% of the target estimate.

250
When combined, the NPP times series derived from the AlterEco glider deployments spans a 19-month period, as shown in figure 8. Two peaks are captured in April/May 2018 and April 2019, corresponding with the timing of the regional spring bloom, and with the latter event significantly more intense.

Comparison with historical measurements in the North Sea
Two distinct advantages of gliders are that they sample flexibly, in terms of horizontal space and depth, and they can gather data at high frequency. As there are no pre-existing measurements of NPP in the North Sea with comparable frequency, here we compare our results with available estimates of annual mean productivity. Table 4

Value and utility
Primary production is highly variable on short temporal and spatial scales. The impact of the mesoscale variability associated with fronts (Olita et al., 2017;Taylor and Ferrari, 2011) and eddies (Hansen et al., 2010;Hu et al., 2014) can be extensive. High frequency changes in tidal phase (Zhao et al., 2019), sky conditions and the local wave field (Reed et al., 2011) can also exert a 315 strong influence. To monitor the impact of these processes in highly productive shelf seas, it is desirable to continually sample key regions using technologies that support adaptive sampling strategies. Autonomous underwater vehicles (AUVs), such as gliders, offer one such approach to this problem, offering persistent monitoring of shelf sea biogeochemistry (Chai et al., 2020;Liblik et al., 2016) and informing regional model assimilation strategies (Skákala et al., 2021).
This data set presents the first intra-annual, glider-based in situ NPP time series for the North Sea, that is able to address 320 questions pertaining to biophysical interactions on a high-frequency basis. From figure 8 it is clear that the NPP signal is modulated at multiple frequencies within individual deployments, and substantial spatial heterogeneity exists between codeployments (e.g 455 (Orca) and 454 (Cabot)). Further analysis of this data set should give insight into the physical processes that contribute to this variability. When deployed with multiple mission goals in mind, glider payload space typically comes at a premium. Most notable in this case, is the effect on the deployment of PAR sensors, which are present on less than 50% of missions. However, adaptation of previous methodology to accommodate SEO based PAR estimates has been shown to be feasible. Combining SEO surface data with AUV profiles also presents interesting options for reconstructing subsurface fields. Machine learning methods have demonstrated the feasibility of combining SEO surface fields with in situ profile to render a three dimensional picture of ocean biogeochemical properties (Sauzède et al., 2015(Sauzède et al., , 2016. The data set presented here would be well suited for application of 330 such methods to evaluate and further extend coverage of NPP data in the global ocean.
The processing method developed here allows for glider-based NPP to be calculated in a much broader array of cases.
While in DM it can replicate the approach of Hemsley et al. (2015), the inclusion of differing quenching algorithms promotes application to different regions and/or different sensor loads (e.g. those without back-scattering). The flexible inclusion of SEO data in lieu of in situ PAR measurements expands this utility even further, allowing NPP calculations from gliders with a more 335 limited array of sensors without substantial loss of accuracy ( Figure 7). Finally, the ability to support NRT ingestion of glider data allows for NPP calculation in an operational setting.

Limitations
PAR, when spectrally decomposed, can be used to provide a calibration of the [Chl-a] fluorometer (Hemsley et al., 2015).
Although the fluorometer calibration may be accurate at the start of an individual mission, calibration using nearby discrete

340
[Chl-a] samples at launch and retrieval of the glider may lead to a false sense of security, particularly in areas of high heterogeneity, such as experienced during this study. Hemsley et al. (2015) showed that within mission variability in the correction factor is possible due to changes in phytoplankton community structure. However, the model previously proposed is suitable for case-1 waters only, and does not account for absorption and scattering by CDOM and sediment, respectively, and so no dynamic calibration is applied here. The strong agreement between AlterEco glider NPP measurements and both satellite and historical 345 in situ estimates (see section 5.2) underlines the validity of the data set, and future work will consider the incorporation of a model to cater for more complex waters, where glider payload allows.
While the quenching correction method of Xing et al. (2012) proved most appropriate in this case, this result should not be a considered a general solution. This rationale underpins the decision to incorporate multiple methods to correct near surface fluorescence, however, the eventual method chosen is limited by the sensors deployed, most notably the availability of 350 backscattering data (Figure 1). The availability of back-scattering data allows for the a wider selection of correction methodologies in both DM (Thomalla et al., 2018;Hemsley et al., 2015) and NRT  processing. In addition, its inclusion is essential to constructing a complex water model, as discussed earlier in this section.
For long duration missions (i.e. more than a few days) bio-fouling of sensors mounted on gliders can affect data quality.
Unlike Argo floats, which typically park at depths well below the euphotic zone (∼1 km), for 10 days, gliders spend a greater 355 portion of their time in the photic zone, allowing the build-up of a bacterial substrate and then algal colonisation. Despite many strategies to mitigate bio-fouling (copper covered sensors, bio-wipers), it is impossible to completely eradicate it currently, and even predicting its onset is problematic. Anecdotally on moorings situated in the western English Channel (Smyth et al., 2010a(Smyth et al., , https://doi.org/10.5194/essd-2021 Open Access

Conclusions
This paper discusses the generation of an 19-month, near-continuous glider-based data set of net primary production in the North Sea; the first of its kind for the region. The methodology used to derive this time-series is discussed in detail, with specific focus on the approaches taken to account for fluorescence quenching and the use of SEO-based PAR data in lieu 365 of in situ sensors. While, in this case, pre-processed glider data from the AlterEco programme serves as a starting point, consideration is also given to adaptation of the method for NRT and operational use. Although limitations in the approach used are discussed, especially in regard to the feasibility of dynamic calibration and effects of biofouling, the results show strong agreement with previous studies as well satellite derived estimates and the results of biogeochemical model simulations. They present a unique, depth-resolved picture of the high-frequency variability and spatial heterogeneity present in the production in 370 region and highlight the advantages of using autonomous systems to persistently monitor the shelf-seas, especially in tandem with remote sensing based approaches. The newly developed processing approach also has implications for the development of a PP indicator (e.g. through the Marine Strategic Framework Directive food web descriptor), overcoming some of the temporal and spatial sampling limitations that have historically undermined its inclusion in assessments, relegating its listing to candidate only.

7 Code and data availability
The data is made available via the British Oceanographic Data Centre (BODC), via doi:10/fm39. For full details, please see Loveday and Smyth (2020). Access to the code for the primary productivity processor is available, via GitLab, on request to the lead author.
Author contributions. Ben Loveday and Tim Smyth developed the methodological approach and led writing of the manuscript. Ben Loveday 380 built the processing system and generated the resulting data set. The remaining authors are responsible for the glider deployments, the provision of supporting data sets and calibration information, and for providing methodological input in the manuscript.
Competing interests. The authors declare that they have no competing interests.