A uniform p CO 2 climatology combining open and coastal oceans

. In this study, we present the ﬁrst combined open- and coastal-ocean p CO 2 mapped monthly climatology (Landschützer et al., 2020b, https://doi.org/10.25921/qb25-f418, oceans/MPI-ULB-SOM_FFN_clim.html, 8 April 2020) constructed from observations collected between 1998 and 2015 extracted from the Surface Ocean CO 2 Atlas (SOCAT) database. We combine two neural network-based p CO 2 products, one from the open ocean and the other from the coastal ocean, and investigate their consistency along their common overlap areas. While the difference between open- and coastal-ocean estimates along the overlap area increases with latitude, it remains close to 0 µatm globally. Stronger discrepancies, however, exist on the regional level resulting in differences that exceed 10 % of the climatological mean p CO 2 , or an order of magnitude larger than the uncertainty from state-of-the-art measurements. This also illustrates the potential of such an analysis to highlight where we lack a good representation of the aquatic continuum and future research should be dedicated. A regional analysis further shows that the seasonal carbon dynamics at the coast–open interface are well represented in our climatology. While our combined product is only a ﬁrst step towards a true representation of both the open-ocean and the coastal-ocean air–sea CO 2 ﬂux in marine carbon budgets, we show it is a feasible task and the present data product already constitutes a valuable tool to investigate and quantify the dynamics of the air–sea CO 2 exchange consistently for oceanic regions regardless of its distance to the coast.


Introduction
Since the beginning of the industrial revolution, human activities such as fossil fuel energy combustion, cement production and land used change have emitted a large quantity of carbon dioxide (CO 2 ) into the atmosphere, disturbing the global carbon cycle and inducing global climate change (Friedlingstein et al., 2019). The ocean plays a fundamental role in understanding the fate of anthropogenic carbon dioxide since it acts as a CO 2 sink and removes roughly 25 % of the anthropogenic CO 2 emitted into the atmosphere every year (Friedlingstein et al., 2019). However, uncertainties are still associated with this estimate, especially in highly heterogeneous and/or poorly monitored regions such as the Arctic Ocean, the southeastern Pacific and the coastal ocean (Reg-nier et al., 2013;Laruelle et al., 2014). Reducing the uncertainty of current marine CO 2 sink estimates is however essential to improve our understanding of the underlying processes controlling the contemporary and future distribution of anthropogenic CO 2 between atmosphere, land and ocean.
While current oceanic CO 2 sink estimates largely rely on the output from hindcast simulations of global biogeochemistry models (Sarmiento et al., 2010;Le Quéré et al., 2018) and atmospheric as well as oceanic inverse models (Mikaloff Fletcher et al., 2006;Gruber et al., 2009;Wanninkhof et al., 2013), several observation-based estimates built on surface ocean CO 2 measurements have emerged in the past years Rödenbeck et al., 2015;Zscheischler et al., 2017;Laruelle et al., 2017). These estimates are, in part, the result of the community effort that led to the establishment of two large and still-growing collections of surface ocean CO 2 measurements, namely the LDEO database (Takahashi et al., 2018) and the Surface Ocean CO 2 Atlas (SOCAT) database (Pfeil et al., 2013;Sabine et al., 2013;Bakker et al., 2014Bakker et al., , 2016. The oceanic uptake of CO 2 is directly proportional to the partial pressure difference of CO 2 ( pCO 2 ) between the oceanic surface water and the atmosphere. Therefore, the increase in available observations from roughly 6 million in the first release of the SOCAT database (SOCATv1.5) in 2011 (Pfeil et al., 2013) to a total of more than 23 million observations gathered in version 6 (SOCATv6)  resulted in increasingly detailed and accurate observationalbased studies investigating the ocean carbon sink (Rödenbeck et al., 2015). While earlier work such as that of Takahashi et al. (2009) focused on the long term mean CO 2 uptake and its spatial and seasonal variations, the sustained increase in data density now allows investigating temporal variations on longer timescales (Rödenbeck et al., 2014;Majkut et al., 2014;Landschützer et al., 2014;Rödenbeck et al., 2015;Jones et al., 2015;Landschützer et al., 2016), suggesting a variable ocean CO 2 sink on interannual to decadal timescales (Rödenbeck et al., 2015;Landschützer et al., 2015). These estimates, however, suffer from two main sources of uncertainty. The first is related to the kinematic transfer of CO 2 across the air-sea interface (Wanninkhof and Trinanes, 2017;Roobaert et al., 2018), and a second, less well quantified, source is related to the interpolation of sparse surface ocean partial pressure of CO 2 data (e.g., Rödenbeck et al., 2015;Landschützer et al., 2014).
Similar to the open-ocean, coastal regions -defined here following the broad SOCAT boundary definition of 400 km distance from shore used in Laruelle et al. (2017) -are also recognized as a CO 2 sink for the atmosphere (e.g., Laruelle et al., 2014) but have long been constrained using scarce data of uneven spatial and temporal distribution (Thomas et al., 2004;Borges et al., 2005;Cai et al., 2006;Chen and Borges, 2009;Laruelle et al., 2010;Cai, 2011;Chen et al., 2013;Dai et al., 2013). Therefore, because of the strong physical and biogeochemical heterogeneity of the coastal ocean, a proper representation of the spatiotemporal patterns in CO 2 fluxes could only be achieved in the best-monitored regions of the world (Laruelle et al., 2014). More recently, the application of neuronal network-based interpolation methods similar to those applied for the open ocean resulted in the first continuous global pCO 2 climatology for the coastal ocean, which improved the estimation of coastal carbon sink and its spatial variability Roobaert et al., 2019). It is also only very recently that studies have performed a globalscale analysis of the seasonal variability of the air-water CO 2 exchange (Roobaert et al., 2019).
As an additional challenge, many different boundaries have been used to delineate the frontier between coastal-and open-ocean waters in the past (Walsh, 1988;Borges et al., 2005;Liu et al., 2010;Laruelle et al., 2010Laruelle et al., , 2013. The choice of a specific delineation has nevertheless important implications for the quantification of the coastal CO 2 sink as well as the adjacent open-ocean sink and their temporal trends (Laruelle et al., 2014. Including the contribution of the coastal ocean in observation-based air-sea CO 2 exchange estimates, i.e., the aim of this study, is important not only to improve the quantification of the present-day global ocean sink which has so far been based on open-ocean data only, but also to properly analyze the trends and spatiotemporal variabilities of all ocean waters in a consistent manner. Several recent studies have indeed suggested that, as a whole, the intensity of the CO 2 sink per unit area could be stronger in coastal regions than in the open ocean (Borges et al., 2005;Cai, 2011;Laruelle et al., 2010Laruelle et al., , 2014, whereas Roobaert et al. (2019) suggest that adjacent open and coastal regions behave similarly.
This distinct behavior of the coastal ocean, with possibly a stronger present-day uptake and a fast-increasing airsea pCO 2 gradient on decadal timescales, is not only relevant for today's quantification of the ocean sink but also for constraining the anthropogenic perturbation of the marine CO 2 sink. So far, the latter has only been estimated by assuming similar changes in open-ocean and coastal-sea CO 2 flux densities since pre-industrial times , while other studies have proposed larger anthropogenic perturbations for the shallow parts of the ocean by mostly relying on conceptual modeling approaches (e.g., Bauer et al., 2013). The need for a unified coastal-open-ocean pCO 2 climatology is further reinforced by the recent upward revision of the pre-industrial global ocean CO 2 outgassing fueled by the river carbon loop (Kwon et al., 2014;Resplandy et al., 2018). As a significant fraction of this CO 2 outgassing derived from terrestrial carbon inputs likely takes place near the coast or across the coastal-openocean transition, it is important to establish a global ocean pCO 2 climatology that can be used as a benchmark for increasingly refined models reconstructing the historical evolution of the marine carbon sink.
As a first step towards this goal, we combine two stateof-the-art sea surface observational pCO 2 products for the open ocean and the coastal regions to create a common global pCO 2 climatology that covers the entirety of the global ocean to better represent the spatiotemporal patterns in the overall marine carbon sink. The combined data product is the first continuous coastal-open-ocean pCO 2 climatology constructed with a near-uniformly treated dataset. It also includes the Arctic Ocean, which was not considered in previous open-ocean global analyses  and was only partly included in the coastal pCO 2 climatology of Laruelle et al. (2017). In spite of its relatively limited surface area and a significant proportion of seasonal sea ice coverage which prevents most of the gas exchange (Lovely et al., 2015), the Arctic Ocean and its extensive continental shelves is a major contributor of the global coastal CO 2 sink (Yasunaka et al., 2016), displaying some of the most intense air-water CO 2 exchange rate per unit area (Roobaert et al., 2019). The incorporation of these highlatitude regions is thus essential to avoid a bias when analyzing the role of the coastal zone on the global ocean CO 2 sink.
Here, using the new global ocean pCO 2 climatology as well as the individual coastal-ocean and open-ocean data products, we investigate how well the coastal-open-ocean continuum is reconstructed through statistical error analysis. In particular, our goal is to address the following research questions: (1) to what extent do reconstructed pCO 2 estimates from both products agree with one another in regions where they overlap and (2) to what extent are eventual mismatches related to data sparsity, both for the temporal pCO 2 mean and the seasonal climatology?

Open-ocean and coastal-ocean datasets
Our analysis is based on two recently published sea surface pCO 2 data products. The first one, updated from Landschützer et al. (2016), covers broadly the open ocean at a distance of 1 • off the coast, and the second dataset, by Laruelle et al. (2017), covers the coastal domain plus the adjacent open ocean up until 400 km away from the shoreline for a total surface area of 70 × 10 6 km 2 . Both datasets are based on the same neural network interpolation method, i.e., the SOM-FFN (Self Organizing Map -Feed Forward Neural Network) method . While the individual datasets (from here onward "NN open " for the openocean dataset and "NN coast " for the coastal-ocean dataset) have been extensively described and validated in their individual publications Laruelle et al., 2017), we present here a short summary of each product including their most recent updates and the procedure used to merge both datasets.
The SOM-FFN method consists of a two-step interpolation approach. First, a marine region (i.e., either open ocean or coastal ocean) is divided into biogeochemical provinces based on similarities within selected environmental CO 2 driver data. These provinces are illustrated in Landschützer et al. (2014) and Laruelle et al. (2017). Secondly, the nonlinear relationship between a second set of driver data and available sea surface pCO 2 data from the gridded SOCAT database is established and can then be used to fill gaps where no observations exist (see Landschützer et al., 2013). The gridded SOCAT data consist of measurements that received a quality flag of D and lower, illustrating a measurement uncertainty within 5 µatm. Both open-and coastal-ocean applications rely on satellite and reanalysis data, but different sets of environmental driver variables are used. For the open-ocean analysis, sea surface temperature, salinity, mixed layer depth, chlorophyll a and atmospheric CO 2 are used as proxy variables.
While leaving NN coast unchanged to its original publication , we here provide two updates to NN open compared to its previous publications (see Landschützer et al., 2013Landschützer et al., , 2014. Firstly, we replaced the mixed layer depth proxy of the NN open from de Boyer Montegut et al. (2004) to the MIMOC product (Schmidtko et al., 2013) as it allows us to expand our analysis region, creating a maximum overlap area between NN open with NN coast . We tested the impact of this change and found that SOCAT observations are reconstructed bias free with a root mean squared error of less than 20 µatm similar to Landschützer et al. (2016). Secondly, for completeness, we also include the Arctic Ocean in NN open , allowing the comparison between products to be extended to the high latitudes. In order to achieve this, the Arctic Ocean was assigned its own stand-alone oceanic biome in the SOM procedure (see Landschützer et al., 2013). Previous global-scale studies avoided the Arctic Ocean Landschützer et al., 2014), however more recent studies by Yasunaka et al. (2016) illustrate that the increase in measurements makes a reconstruction feasible. Due to its uniqueness in its seawater properties, we find that assigning the Arctic Ocean a stand-alone biome, which is not varying in time, provides the best reconstruction. This way, the Arctic pCO 2 is only determined by Arctic Ocean measurements (starting at 79 • N in the Atlantic Ocean), while Arctic Ocean measurements do not influence other biomes. Hence, the remainder of the global ocean remains unchanged by this addition, and the pCO 2 product is thus considered the same as the one presented in Landschützer et al. (2016).
The NN open and NN coast are all available at the same monthly temporal resolution but are applied at different spatial resolutions. While NN open uses a 1 • × 1 • resolution, the coastal pCO 2 data product is constructed at a higher 0.25 • × 0.25 • resolution to better capture the spatial heterogeneity of the coastal zone. Thus, in order to combine and compare the products at the same spatial resolution, we divided each 1 • × 1 • grid cell of the open ocean into 16 equal 0.25 • ×0.25 • bins. NN coast combines observations from 1998 through 2015 using SOCATv4, whereas NN open uses SO-CATv5 data from 1982 through 2016. In this study, we constructed a climatological mean for the common period covered by both products (1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015). Despite the use of different versions of the SOCAT database used to generate the two pCO 2 products (SOCATv4 vs SOCATv5), we expect little influence on our results, since most of the new data introduced into SOCATv5 compared to SOCATv4 were added in the later years and, in particular, 2016, which is excluded from our analysis. Figure 1 illustrates the temporal mean of all available pCO 2 observations extracted from the SO-CATv5 dataset for the 1998-2015 period. Figure 2 shows the climatological mean pCO 2 for both NN open  and NN coast (Laruelle . The data products rely on sea masks that lead to a common overlap area at the coastal-open-ocean transition of roughly 42 × 10 6 km 2 , reflecting the lack of a commonly recognized definition of the boundary between both environments. While the landward limit of the NN open is located at 1 • (and therefore varies in km depending on the geographical position) offshore, NN coast extends from the coastline to either 400 km offshore or the 1000 m isobath, whichever is encountered first. The bathymetry used follows the SOCAT coastal definition (Pfeil et al., 2013) and excludes estuaries and inner water bodies . This overlap area is the subject of our error analysis described below.

Merging algorithm
The combination of the two data products takes place in three steps, which are illustrated in Fig. 3. In a first step, we divide the globe into a raster of coarse 30 • ×30 • boxes starting at 90 • N and 180 • W. The large box size ensures that, even in remote regions, observations from both open ocean and coastal ocean are represented in the overlap area. We then investigate the overlap area for each raster box individually. In a second step, within each 30 • × 30 • box, the pixels that are only covered by either NN open or NN coast are assigned their respective pCO 2 value. In a third step, all pixels where open-ocean and coastal-ocean pCO 2 products overlap, that is, all 0.25 • × 0.25 • pixels with co-located pCO 2 values in the open-ocean and coastal-ocean datasets, are identified. To assign a pCO 2 value in this overlap area, we weight the open and coastal pCO 2 estimates by their standard error relative to the SOCATv5 open and SOCATv5 coastal-ocean datasets, respectively. We calculate the standard error at the scale of each 30 • × 30 • raster, as in these larger-scale regions enough observations are available to provide an error statistic. To implement this scheme, we first calculate the standard error on each 30 • × 30 • box as where RMSE is the root mean square error of the open and coastal datasets with respect to the SOCATv5 gridded observations, N is the number of available gridded data from SO-CATv5 available in a given 30 • ×30 • raster box, and the subscript i refers to either NN open or NN coast , respectively. Since we have simply divided the open ocean from a 1 • × 1 • grid into 16 equal 0.25 • × 0.25 • bins, we use an effective number of N eff = N/16 for the open ocean. We do not account for autocorrelation in our calculations since we are only interested in the difference between the standard errors and assume autocorrelation lengths of similar magnitude between the SOCATv5 gridded datasets located in the coastal-ocean and open-ocean domains, respectively. Next we calculate the total error for each 30 • × 30 • degree raster region r: We also calculate the scale, for each grid cell in the overlap area, the weight given to the open-ocean and coastal-ocean local pCO 2 value by the standard error of each raster region: Substantial differences exist between the mean difference and standard deviations of NN open and NN coast and the respective measurements from the SOCAT database within each 30 • × 30 • degree raster. Figure 4 illustrates these differences. While both NN open and NN coast have a near 0 bias for the mean difference, some rasters show differences exceeding 15 µatm. While more variability appears in NN coast , this can largely be explained by the overall smaller number of gridded measurements. The larger number of gridded measurements in NN open is a result from the division of the 1 • × 1 • cells into 16 quarter degree boxes. Therefore, we reduce the number of effective degrees of freedom for the open The long term mean pCO 2 field at 0.25 • resolution for NN open and NN coast is shown in Fig. 5. In most oceanic regions, the transition from open to coastal ocean occurs without steep gradients, particularly in the subtropics (∼ 20-50 • N) of the Northern Hemisphere. However, exceptions exist in the tropics like the Peruvian upwelling system, the Namibian-Angolan coast in the South Atlantic, and off Somalia and the Arabian Peninsula. Moreover, abrupt spatial gradients in pCO 2 have been observed in large river plumes such as that of the Amazon (Ibanhez et al., 2015) or on continental shelves influenced by large rivers. The identification of such gradients, however, results only from a first order visual inspection between the two products. In what follows, we perform a quantitative analysis of the merging procedure and of the resulting pCO 2 fields in the overlap area. Figure 6 reports the absolute pCO 2 difference in % between NN coast and NN open along the common overlap area relative to the mean partial pressure of the merged climatology. Figure 6 shows a clear latitudinal pattern with the lowest difference in the low and subtropical latitudes and the largest differences in the high latitudes, especially in the Northern Hemisphere. We find in particular, that discrepancies are large in the newly added Arctic Ocean, but also in other seasonally ice-covered areas that have been previously described in NN open and NN coast publications (e.g., the Labrador Sea). One significant contributor to this difference might be that NN coast uses information about sea ice in reconstructing the surface ocean pCO 2 . Acknowledging this discrepancy in seasonally ice-covered regions, we further focus our error analysis and products comparison on ice-free areas, based on the sea-ice product of Rayner et al. (2003). There are some exceptions to this general latitudinal trend consistent with our first qualitative inspection, such as along the Pacific coastline of South America, the African coast in the South Atlantic and the Arabian Sea, i.e., the regions with steep gradients already identified above. Furthermore, a gradient of decreasing pCO 2 from the coast to the open ocean has been reported over the continental shelves of the eastern US and Brazil (Laruelle et al., 2015;Arruda et al., 2015) and may exist in other regions as a consequence of the influence of rivers oversaturated in CO 2 combined with a limited estuarine filter (Laruelle et al., 2015). It is thus possible that the pCO 2 predicted by the coastal SOM-FFN is slightly skewed towards higher values in some regions because of the presence of overall higher pCO 2 observations in the calibration Figure 6. pCO 2 mismatch between NN coast and NN open in the overlap area relative to the mean CO 2 partial pressure of the merged product. Blue colors indicate a mismatch below 5 %, whereas yellow and red colors indicate a mismatch of more than 5 %. data pool. While there is no clear basin-wide bias structure, systematic differences can be found regionally such as in the southeastern Pacific Ocean and the Southern Ocean (south of 35 • S). Overall, the largest relative differences are located in the overlap areas of the Arctic Ocean.
In spite of clear regional discrepancies, the mean difference, that is to say the bias, between the two estimates in the overlap area remains close to 0 µatm when integrated globally (Table 1), whether or not the comparison is limited to the locations where observations exist (Table 1 columns 1-3). Furthermore, the mismatch between the two products is in the range of the mismatch between the individual products and the available observations in SOCATv5. This result is a consequence of the neural network-based interpolation applied here at the global scale. In particular, the SOM-FFN is designed to minimize the mean squared error between available observations and the network output over the entire domain of application.
The global RMSE between NN open and NN coast as well as the SOCAT observations within the overlap area is in the range of previously reported global values by Landschützer et al. (2016) and Laruelle et al. (2017). In general, the spread between open-ocean and continental-coastal pCO 2 varies more than the spread between coastal estimates and SOCAT or between open estimates and SOCAT, possibly indicating that the SOM-FFN method is having difficulties generalizing the pCO 2 in the coastal-open-ocean continuum.
3.2 Regional analyses of pCO 2 field A more detailed analysis is performed on the overlap of several regions selected to encompass a wide variety of conditions. These regions, indicated in Fig. 5, include three areas characterized by strong upwelling and offshore transport (Peruvian upwelling system, Canary upwelling system, US west coast) but contrasted data coverage, two data-rich re- gions (Sea of Japan, US east coast) of which one comprises a marginal sea (Sea of Japan), one region where seasonal data are scarce (west coast of Australia), and a region characterized by strong river outflow (Amazon river plume).
In order to further investigate the role of existing observations in upwelling regions, we first focus on the Canary upwelling system and the Peruvian upwelling system. These two regions are part of the eastern boundary upwelling systems and subject to many ecosystem stressors, such as ocean acidification or deoxygenation (Gruber, 2011). Therefore, monitoring the full aquatic continuum is essential in these regions. Both are characterized by strong upwelling and significant offshore transport of carbon-rich water from depth (see, e.g., Lovecchio et al., 2018;Franco et al., 2018) resulting in elevated pCO 2 levels exceeding atmospheric levels at the sea surface. Such values are consistent with observations in the Canary upwelling system (Fig. 7) extracted from either the open-ocean SOCAT dataset (Bakker et al., 2016, Fig. 7b) or the coastal SOCAT dataset (Bakker et al., 2016, Fig. 7c) and, consequently, the merged pCO 2 product (Fig. 7a). Furthermore, the Canary upwelling system is well covered by both open-ocean and coastal-ocean observations. As a consequence -despite a few areas with larger differences -the overall mismatch between the coastal ocean and NN open (Fig. 7d) is in the range of their relative mismatch towards the observations (see Fig. 7e-f) and generally within 10 µatm.
In contrast to the Canary upwelling system, the Peruvian upwelling system shows a steep pCO 2 gradient between the offshore and nearshore regions (Fig. 8a), particularly just south of the Equator. A closer inspection of the available observations ( Fig. 8b and c) reveals that, particularly in the nearshore domain at the Equator, several of the few available observations of the sea surface pCO 2 indicate low partial pressures resulting in a low reconstructed coastal pCO 2 , as already identified by Laruelle et al. (2017). The mismatch that results from the upscaling of the low pCO 2 data in the coastal domain is further reflected in the difference between the coastal-and open-ocean pCO 2 fields in the overlap area (Fig. 8d). The mismatch between the open ocean and NN coast exceeds 30 µatm and is larger than the difference between the individual products and the observations (Fig. 8e-f), suggesting that the disagreement between the open ocean and NN coast in the overlap area stems from their data treatment. The fewer existing coastal observations of low pCO 2 are extrapolated in space, spreading a potential mismatch over a larger area. Likewise, the nearshore domain in the NN open is influenced by the high CO 2 partial pressures offshore. This data sparsity and spatial heterogeneity is a further challenge for model evaluation (Franco et al., 2018). No steep pCO 2 gradient can be identified along the west coast of Australia in the merged product (Fig. 9). The highest CO 2 partial pressures are found nearshore along the Leeuwin current (Smith et al., 1991), and the lowest observed pCO 2 can be found along the West Australian Current. The area is spatially covered both in the open-and coastal-ocean SO-CAT datasets (Fig. 9b and c), and therefore the overall difference towards observed values remains among the smallest of all investigated regions. This is remarkable given the lack of seasonal observations, which will be discussed in the subsequent section. NN open and NN coast agree with each other spatially within 15 µatm (Fig. 9d), which is in the range of the mismatch between the individual products and the respective SOCAT observations (Fig. 9e-f). Both products tend to overestimate the low pCO 2 towards the south of the domain. This is reflected in the positive mismatch towards the SOCAT observations ( Fig. 9e and f) in the common overlap area where, the difference between the neural network estimates and the raw data exceeds 15 µatm for both products. Observations in the Sea of Japan and adjacent Pacific Ocean suggest large variability in the pCO 2 with the lowest observed values just north of the Korean Peninsula and the highest observed pCO 2 in the Yellow Sea (Fig. 10b-c). Furthermore, low pCO 2 is also observed south of the island of Hokkaido. These large spatial variations in the pCO 2 are also visible in the merged pCO 2 product (Fig. 10a). A notable exception is the Korean Straight, where observations suggest a lower pCO 2 than reconstructed. The strong variability in the observed pCO 2 reflects the complex carbon dynamics in the Sea of Japan (Chen et al., 1995;Park et al., 2006), which is also reflected in the larger mismatch between products and towards the SOCAT observations (Fig. 10d-f). The disagreement may indicate that the global-scale NN open and NN coast products are not particularly skilled in representing the strong regional dynamics of marginal sea. A better agreement between the neural network reconstructions and observations is found in the Pacific Ocean east of the Japanese islands, where the merged estimate also reveals a better agreement between NN open and NN coast (Fig. 10d) and low biases in the range of 5 µatm towards SOCAT observations ( Fig. 10e and f).
Some of the best monitored regions spanning both coastal and nearshore open ocean can be found along the US coast (Fennel et al., 2008;Signorini et al., 2013;Laruelle et al., 2015;Fennel et al., 2019). Indeed all 1 × 1 • open ocean and almost all 0.25 • × 0.25 • coastal pixels are filled with raw observations off the eastern US coastline. While the mean of all observed pCO 2 values from SOCAT ( Fig. 11b and c) suggests substantial regional variability, the merged estimate ( Fig. 11a) is, as a result of the neural network interpolation algorithm, substantially smoother. In particular, the lower latitudes (25-35 • N, Fig. 11e  Similarly well monitored to the US east coast is the US west coast upwelling system, not the least because its variability is tightly linked to El Niño-Southern Oscillation (ENSO) (see, e.g., Lynn and Bograd, 2002;Frischknecht et al., 2015). Here, we find an overall good agreement between NN coast and NN open . The agreement in the overlap area of the merged product (Fig. 12d) is among the best reported globally. Interestingly, nearshore, the merged estimate (Fig. 12a) reveals a lower mean pCO 2 than suggested from both the open-ocean and coastal-ocean SOCAT datasets ( Fig. 12b and c). The small error compared to the SOCAT observations suggests that this is not the result of the two products being in disagreement but might relate to changes in upwelling as a result of interannual variability linked to ENSO events that are not well captured by the merged product.
Finally, we investigate the spatial structure of the reconstructed pCO 2 from a region typically dominated by the freshwater outflow of a large river mouth, i.e., the Amazon outflow in the tropical Atlantic Ocean (Fig. 13). Studies linking circulation with the local CO 2 dynamics are sparse (Ibanhez et al., 2015;Lefevre et al., 2013). Very few observations exist, particularly in the nearshore region (Fig. 13b-c). Nevertheless, studies suggest that the Amazon river outflow becomes a significant CO 2 sink when it mixes with ocean waters (Lefevre et al., 2010). The strong variance in observed pCO 2  provides a challenge for any algorithm to reconstruct the full pCO 2 field in such a region. Nevertheless, both coastal and oceanic data products are in good agreement (Fig. 13d) with the exception of the area under direct influence of Amazon river outflow. This difference potentially stems from the NN open being unable to associate the pCO 2 variability observed in this area with the strong salinity gradients, which is better represented in the coastal-ocean pCO 2 product. Both products show differ- ences of similar magnitude when compared to the SOCAT observations ( Fig. 13e-f) and similar error structures as both products overestimate the pCO 2 in the northern and underestimate the pCO 2 in the southern sections of the overlap area.
While global errors between the data products and observations remain low (see Table 1), Figs. 7-13 show that, at the regional scale, larger differences emerge. We therefore expend our standard error statistics as presented in Table 2 for the selected regions. Overall, we find at the regional level that the inter-product mismatch, represented by the bias, is substantially larger than in the global analysis but does not exceed ∼ 8 µatm with one prominent exception: the Peruvian upwelling system where the mismatch reaches 14.8 µatm. Here, the substantial disagreement between the two products results from the underestimation of the coastal observations in the overlap domain by the coastal-ocean pCO 2 product already shown by Laruelle et al. (2017).
We find that the bias between NN open and NN coast in the overlap area is larger where they are not co-located to observations ( Table 2). The error spread between NN open and Table 2. Mean error analysis (bias and RMSE) within the overlap area between NN open and NN coast and the observations from the SOCATv5 dataset  for 7 oceanic regions. The comparison is performed for the total overlap area, the area fraction where no observations exist and the area covered by observations. The biases and RMSE between pCO 2 products and SOCATv5 datasets are also reported for the open ocean and coastal ocean.

Region
Coastal  Table 2). Exceptions include the US east coast and the west coast of Australia possibly linked to the larger mismatch of the individual products towards the respective SO-CAT observations at these locations. Results from both products in the Amazon outflow region, in the US east coast for NN coast and in the west coast of Australia for NN open , show a larger bias towards the SOCAT observations than the respective inter-model bias, illustrating that both methods generalize well. This further suggests that the estimates are locally constrained by information outside the investigated domain, which is possible considering the spatial distributions of the biogeochemical provinces generated by the SOM.

Seasonality
A further analysis in the selected regions aims to investigate the seasonal differences in pCO 2 between the original data products, the merged product and observations (Fig. 14). In particular, we investigate the extent to which the mean biases reported above can be explained by seasonal differences in pCO 2 among the different products. To this end, we average all months from 1998 through 2015 to create a seasonal climatology from our pCO 2 products, without correction to a nominal reference year. We repeat this procedure for the SOCAT datasets, likewise without any corrections but being aware that this could lead to a sampling bias in the observed climatology. This approach is justified because we lack knowledge about the short-term variability in the observed carbon cycle, and it is thus unclear on how such a correction would improve the representation of the observed pCO 2 field.
In spite of the lack of seasonal sampling bias corrections, our analysis displays, for most regions, a close correspondence within a few microatmospheres (µatm) between openocean and coastal-ocean pCO 2 data from SOCAT within the overlap area (blues and yellow bars in Fig. 14) with deviations mostly arising in the Peruvian upwelling system and the Amazon outflow regions where monthly differences can exceed 10 µatm. The good correspondence is expected to some degree because both datasets share a large fraction of the data. The analysis shows that the seasonality of the neural network-based on NN open and NN coast satisfactorily reproduces the seasonal fluctuations obtained directly from the raw data, highlighting that the reconstructed seasonal cycle is well constrained by the existing observations. Monthly deviations between the products largely stay within 10 µatm. An exception is the Sea of Japan in boreal winter, where NN open overestimates the surface ocean pCO 2 values recorded in the SOCAT data. All but three of the selected regions have full seasonal data coverage. The three regions without full coverage are the west coast of Australia, the Amazon outflow region and the Peruvian upwelling system. Despite the lack of seasonal observations along the west coast of Australia, both products agree well with regards to the seasonal cycle and differences stay within of 8-10 µatm between the different products. Likewise, the otherwise good agreement between coastal-ocean and open-ocean estimate breaks down in the boreal summer in the Amazon outflow region, despite the lack of strong seasonality in the tropical latitudes. The largest mismatch between data products and observations exists along the Peruvian upwelling system, where monthly differences between open-ocean and coastal-ocean estimates exceed 40 µatm. Both estimates however show similar seasonal variability. The seasonal analysis further reveals that from all investigated regions, the Peruvian upwelling system shows the largest monthly differences between openocean and coastal-ocean SOCAT observations, with, for example, mean differences in March exceeding 30 µatm between the open-ocean and coastal-ocean SOCAT datasets . Furthermore, the largest observed partial pressures in NN open appear in August where no data are available in the coastal-ocean SOCAT dataset, highlight- ing that NN open draws information from observations further away from shore during this month.

Conclusions
In this analysis, we combined two recently published sea surface pCO 2 products, covering the open-ocean and the coastal-ocean domain. While the spatial coverage of NN open includes all surface waters located further than 1 • off the coast, the spatial coverage of the NN coast includes surface waters until 400 km off the coast, leading to an overlap domain of roughly 300 km close to the Equator and increasing in extend towards the poles around the land surface. The common overlap area was used to compare both reconstructed pCO 2 estimates at regional to global scale and whether the observed agreement/disagreement is linked to data availability.
Our results show that, for most of the global ocean and particularly the subtropical latitudes in the Northern Hemisphere, NN open and NN coast agree well within the overlap domain. However, stronger differences exist in other parts of the world, particularly in the Peruvian upwelling system, the Arctic and Antarctic, the African coastline in the South Atlantic and the Arabian Sea, where fewer observations exist. Additionally, we find larger discrepancies in the marginal Sea of Japan. In other regions without complete seasonal data coverage such as the west coast of Australia, however, both products compare well. We therefore conclude that the lack of data coverage and the biogeochemical complexity triggered by upwelling, river influx or seasonal ice coverage both contribute to the mismatch. Additionally, methodological differences between NN open and NN coast , such as differences in predictor data, result in local differences, for example, in ice-covered regions where NN coast relies on sea ice as predictor or shallow, stratified waters, where mixed layer depth serves as an important proxy in NN open . Closer inspection reveals that for most of the overlap regions, the difference between the open-ocean and coastal-ocean estimates falls within the range of the difference between NN open and NN coast and the respective SOCAT dataset from which they were created. Therefore, the combined pCO 2 climatology is not only a step forward in including the full oceanic domain with all its complexity into carbon budget analyses, but it also helps to identify areas where additional continuous observations are critically needed to close current knowledge gaps.
Another way forward to further reduce the bias between the coastal-and open-ocean estimates would be to reconsider the cut-off definition between the two domains. Data-sparse and often strongly variable regions such as the Peruvian upwelling system are very sensitive to the data selected to generate the pCO 2 fields. The overlap analysis proposed here, particularly the percent mismatch and RMSE analysis, further serves as a benchmark on how well we understand the coastal-to-open ocean continuum and its spatial variability and where we still lack essential measurements to close the gap between existing estimates, for example, the Peruvian upwelling system or the seasonally ice-covered high-latitude regions, in particular the Arctic Ocean. A next step should include the reduction of the mismatch between coastal-and open-ocean estimates in order to combine the two. This is an essential step towards an observation-driven global carbon budget. Closing such a gap, however, requires close collaborations between open-ocean and coastal-ocean carbon cycle scientists in the future to be considered of high importance.
Finally, we introduced a new concept where we can locally evaluate the upscaling of existing measurements based on a common overlap region. In this study, we focused on mean differences and seasonal climatologies at regional and global scales. We find an encouraging agreement between seasonal cycles which gives us confidence that the existing products might be suitable to be applied to study lower frequency signals such as trends and interannual variability. Understanding of how differences in trends and inter-annual variabilities between the coastal and open oceans emerge and how they are linked to data availability should be a next step. Such an analysis is essential to gain confidence in observational constraints and to find ways to further improve them in order to close the global carbon budget based on observations and provide data products form model benchmarking. Our approach can also be used to compare other overlapping datasets at a time when advanced interpolation techniques are yielding more and more oceanic data products with different spatial extensions and boundaries. Our study is therefore an important step towards a truly representative global ocean observation-based CO 2 product that includes all ocean domains.
Author contributions. PL designed the study and wrote the manuscript together with PR, GGL and AR. PL developed the open-ocean pCO 2 product and GGL developed the coastal-ocean pCO 2 product.
Competing interests. The authors declare that they have no conflict of interest.
Acknowledgements. Peter Landschützer is supported by the Max Planck Society for the Advancement of Science. The research leading to these results has received funding from the European Community's Horizon 2020 project under grant agreement no. 821003 (4C). Goulven G. Laruelle is a research associate of the F.R.S-FNRS at the Université Libre de Bruxelles. Pierre Regnier received funding from the VERIFY project from the European Union Horizon 2020 research and innovation program under grant agreement no. 776810. This study benefited from discussions with Katharina Six from the Max Planck Institute for Meteorology. The Surface Ocean CO 2 Atlas (SOCAT) is an international effort, supported by the International Ocean Carbon Coordination Project (IOCCP), the Surface Ocean Lower Atmosphere Study (SOLAS), and the Integrated Marine Biogeochemistry and Ecosystem Research program (IMBER), to deliver a uniformly quality-controlled surface ocean CO 2 database. The many researchers and funding agencies responsible for the collection of data and quality control are thanked for their contributions to SOCAT.
Financial support. This research has been supported by the European Commission projects 4C (grant no. 821003) and VERIFY (grant no. 776810).
Review statement. This paper was edited by Jens Klump and reviewed by Rik Wanninkhof and one anonymous referee.