Articles | Volume 16, issue 2
Data description paper
06 Feb 2024
Data description paper |  | 06 Feb 2024

A regional pCO2 climatology of the Baltic Sea from in situ pCO2 observations and a model-based extrapolation approach

Henry C. Bittig, Erik Jacobs, Thomas Neumann, and Gregor Rehder

Ocean surface pCO2 estimates are of great interest for the calculation of air–sea CO2 fluxes, oceanic uptake of anthropogenic CO2, and eventually the Global Carbon Budget. They are accessible from direct observations, which are discrete in space and time and thus always sparse, or from biogeochemical models, which only approximate reality. Here, a combined method for the extrapolation of pCO2 observations is presented that uses (1) model-based patterns of variability from an empirical orthogonal function (EOF) analysis of variability with (2) observational data to constrain EOF pattern amplitudes in (3) an ensemble approach, which locally adjusts the spatial scale of the mapping to the density of the observations. Thus, data-constrained, gap- and discontinuity-free mapped fields including local error estimates are obtained without the need for or dependence on ancillary data (e.g. satellite sea surface temperature maps). This extrapolation approach is generic in that it can be applied to any oceanic or coastal region covered by a suitable model and observations. It is used here to establish a regional pCO2 climatology of the Baltic Sea (Bittig et al.2023:, largely based on ICOS-DE ship of opportunity (SOOP) Finnmaid surface pCO2 observations between Lübeck-Travemünde (Germany) and Helsinki (Finland). The climatology can serve as improved input for atmosphere–ocean CO2 flux estimation in this coastal environment.

1 Introduction

The ocean plays a major role in the global carbon cycle and has a controlling function on the atmospheric CO2 content on longer timescales (DeVries2022). Since the rise of atmospheric CO2 concentrations since the beginning of the pre-industrial period (year 1750), the ocean has taken up ∼25 % of the CO2 released from human activities (Friedlingstein et al.2022), with the annual uptake mainly related to the increase in the air–sea pCO2 imbalance. The role of coastal and continental shelf waters is more complex. Apart from atmospheric CO2 levels, changes in nutrient loads and organic matter supply from land, changes in weathering in the drainage basin, and even changes in the functioning and composition of biological key players on various levels can lead to changes in the inorganic carbon system and thus the source–sink function of coastal seas (e.g. Laruelle et al.2018; Müller et al.2016; Carstensen and Duarte2019; Kuliński et al.2022). Moreover, coastal seas provide an important conduit of land-derived carbon into the open ocean's interior (e.g. Thomas et al.2004).

For the Baltic Sea, several attempts have been made to quantify the net CO2 air–sea balance in the form of a pCO2 climatology as well as to derive trends in surface water pCO2, with partly contradictory results (e.g. Omstedt et al.2009; Parard et al.2016, 2017; Becker et al.2021; Neumann et al.2022; Wesslander et al.2010). Most of the approaches either used the output from biogeochemical models or tried to create pCO2 fields from mapped proxy data and observational data using extrapolation approaches. Seasonal mapping of pCO2 is particularly challenging for the Baltic Sea due to its high regional and temporal variability, a salinity gradient affecting e.g. CO2 equilibria, and a large seasonal amplitude caused by high net productivity in spring and summer and entrainment of waters enriched in remineralization products due to mixed-layer deepening in autumn and winter (Schneider and Müller2018).

Climatologies of pCO2 on a global or ocean-wide scale are an important tool for the quantification of the oceanic CO2 sink in the framework of e.g. the Global Carbon Budget (Landschützer et al.2013; Friedlingstein et al.2022). Tailored regional analysis can help to gain insight into changes in the source–sink behaviour of distinct regions, as recently demonstrated for the northern European shelf, including the Baltic Sea (Becker et al.2021).

A robust climatology and trend for the Baltic Sea has, apart from refining the estimate of the net CO2 flux in relation to the Global Carbon Budget, several implications. As the entire Baltic Sea area belongs to the territorial waters or exclusive economic zones (EEZs) of one of the pan-Baltic nations, the air–sea CO2 fluxes might be of importance for current and future carbon management and accounting schemes in the framework of national emission reduction targets (Luisetti et al.2020). A pCO2 climatology could also serve as a baseline for potential negative emissions applications, including blue carbon or coastal alkalinity enhancement (GESAMP2019). Knowledge of monthly pCO2 fields and their variability might also help to identify and quantify the impact of perturbations and extreme events, e.g. heat waves (Humborg et al.2019).

In this work, we build a foundation for such applications. We first present a novel extrapolation approach, followed by construction of a Baltic Sea pCO2 climatology. The extrapolation approach is then evaluated and put into context with existing mapping methods. Notable features of the seasonal pCO2 climatology are discussed and special attention given to the regional long-term pCO2 trend before we conclude our work.

2 Methods

2.1 Extrapolation approach

For mapping from scarce observational data to spatially filled maps of the Baltic Sea, we use an ensemble of truncated empirical orthogonal function (EOF) reconstructions. For a more detailed description than the brief summary below, please consult Appendix A.

EOF decomposition and singular value decomposition (SVD) have been used widely in atmospheric and ocean science (e.g. Lorenz1956; Weare et al.1976; Weare and Newell1977; Hannachi et al.2007). They can be used to efficiently reduce the dimensionality of the original dataset (Lorenz1956; Davis1976; Preisendorfer1988; Monahan et al.2009; Jolliffe and Cadima2016). Here, we use the EOF decomposition of a model dataset X of pCO2 in the Baltic Sea to obtain spatial EOF patterns ei of pCO2 variability.

In a second step, observational pCO2 data Y are used in conjunction with these spatial patterns of variability in a truncated EOF reconstruction to constrain the EOF spatial patterns' amplitudes (Kaplan et al.2000; Preisendorfer1988) and thus to extrapolate from scarce observational data to the full domain (Sect. A3). This step represents an optimization of a cost function Q (Eq. A19), which in our case also takes the observational data uncertainty into account to avoid overfitting (Sect. A4). For a given EOF reconstruction with l spatial modes, we thus obtain both an extrapolated field as well as an estimate of the mapping uncertainty σP, representing the uncertainty in the extrapolation with the given data constraints. In addition, we derive a representational error estimate σP for the variance not resolved by the given EOF reconstruction due to truncation and use of just l modes (Sect. A4).

However, the choice of how many EOF modes to use for a given truncated EOF reconstruction, or at which level to truncate, is an arbitrary one often inspired by a certain threshold of the total variance explained (e.g. Kaplan et al.2000).

In a third step, we therefore use an ensemble approach to circumvent this problem: instead of a single EOF reconstruction with a fixed number of modes l, we use a series of EOF reconstructions from just one mode up to a maximum number of lmax modes (Sect. A5). This series is then combined by a weighted ensemble at each location with the individual EOF reconstruction's mapping uncertainties as weights (Eqs. A25 and A26; Sect. A6).

Note that the weights are spatially resolved and depend on the available data constraints, i.e. the ensemble weights provide for locality, which includes adaptation of the mapping's spatial scales to the data constraint density, thus providing for a more robust extrapolation and more realistic uncertainty estimates than with a fixed number l of EOF modes.

We thus obtain an ensemble mean pCO2 value xreconstr, the average number of modes l used in the ensemble of reconstructions at a given location, and an uncertainty estimate σreconstr (Eq. A28), which consists of the sample variance about the ensemble mean value, the ensemble-averaged mapping uncertainty σP (from each truncated EOF reconstruction with l from 1lmax), and the ensemble-averaged representational error σP (from each EOF reconstruction) (Sect. A6). All these quantities are obtained without gaps on the full spatial domain of the original model dataset X.

Finally, due to the high temporal dynamics of our Baltic Sea environment, we use an expansion of the EOF reconstruction approach to reconstruct not only the data value, but both the data value and a (short-term) linear trend, in order to temporally collate (temporally extended) observations into a time-coherent, synoptic picture (Sect. A7Elken et al.2019).

Figure 1Approach to build a regional pCO2 climatology of the Baltic Sea. (1) pCO2 data from an ecosystem model are mathematically analysed for dominant patterns of variability in a truncated EOF analysis. (2) Patterns of variability and (3) pCO2 observations of a given month are used to reconstruct Baltic Sea pCO2 distributions X, which are combined through an ensemble approach into (4) a monthly climatology of surface pCO2, including a pCO2 error estimate, and the (short-term) pCO2 trend, including the pCO2 trend error estimate. For illustration, time series are given for three sample locations for model pCO2 (green), pCO2 observations (black), and climatological pCO2 (red).

2.2 Baltic Sea pCO2 climatology

Given limitations of modelled data, we aimed to produce an observation-based pCO2 climatology. For this, we used (a) the above extrapolation approach with spatial patterns based on ecosystem model data and (b) observations of surface pCO2 from the Surface Ocean CO2 Atlas (SOCAT) to produce a monthly climatology of pCO2 as well as of the linear, short-term pCO2 trend to temporally collate pCO2 observations (Sect. A7) within a month (see Fig. 1 for a visualization of the approach).

2.2.1 Spatial patterns of variability

Spatial patterns of variability ei for our extrapolation approach are based on model data from a Baltic Sea setup of the Ecological ReGional Ocean Model (ERGOM version 1.2). This version of ERGOM includes a simple carbon cycle as described in Kuznetsov and Neumann (2013) with amendments to allow for non-Redfield stoichiometry as outlined in Neumann et al. (2022). In principle, any carbon-containing biogeochemical ocean model of choice can be used as a basis for the patterns of variability ei. Nonetheless, a better carbon representation in the model likely provides better-suited ei patterns of variability.

The model data variability characteristics are illustrated in the Appendix (Fig. A1). Less than 1 % of the locations show a temporal decorrelation scale ≤7 d, so that we chose a weekly aggregation of the model data for variability pattern extraction. For the climatology itself, we chose a more typical monthly resolution.

ERGOM has been shown to adequately mirror observations of the large-scale nitrate, phosphate, oxygen, and carbon distribution (Neumann et al.2022; Eilola et al.2011; Neumann et al.2015). However, shortcomings still exist in the exact magnitude and timing of phytoplankton carbon uptake and release throughout the seasons (Neumann et al.2022; Fig. 4).

From an ERGOM version 1.2 model run from 1948 to 2020, we used the last 20 years of modelled surface pCO2 from 2001 to 2020 averaged into 1044 weekly means. The model run has a horizontal resolution of 3 nautical miles, which yields a dataset X with 12 010 grid points m for the Baltic Sea area south of the Skaggerak (i.e. HELCOM subbasins 2 to 17, HELCOM Secretariat2017) and 1044 time steps n.

From these model data X, patterns of variability ei were extracted by EOF analysis (see Appendix A for details). The analysis of our model data retained 224 EOF modes with a minimal cross-validation error of 22.6 µatm against the model data and a total explained variance of 98.6 %. The spatial patterns ei have the same 3 nautical mile resolution as the model data.

Figure 2(a) Map of the Baltic Sea with the number of individual months with pCO2 observations available in SOCATv2022 between June 2003 and December 2021 (maximum 223 months). (b) Cruise tracks of BONUS INTEGRAL cruises on RV Aranda in February and March 2019 and RV Elisabeth Mann Borgese in May and June 2019.

2.2.2 Surface pCO2 observations

Surface pCO2 measurements in the Baltic Sea were obtained from SOCAT version 2022 (Pfeil et al.2013; Bakker et al.2016; Bakker et al.2022), which collects surface pCO2 data from underway observations of ships of opportunity (SOOPs) or research vessels. All observations are based on CO2 measurements in air equilibrated with sea surface waters (Körtzinger et al.1996; Pierrot et al.2009) with a typical accuracy of 2–5 µatm (in some cases ≤10µatm), which is indicated by the respective quality flags A–E (Pierrot et al.2009; Bakker et al.2016).

Here, Baltic Sea pCO2 data for the period June 2003 to December 2021 were used. They were from the ICOS-DE SOOP Finnpartner/Finnmaid line (Schneider et al.2006; Gülzow et al.2011) between Lübeck-Travemünde and Helsinki, which covers the southern and central Baltic Sea, or the ICOS-SE SOOP Tavastland line between Lübeck-Travemünde and Oulu or Kemi, which additionally covers the northern basins starting from 2019 (Fig. 2a). Additional surface pCO2 data originated from RV Aranda cruise ARA04/2019 in February and March 2019 and RV Elisabeth Mann Borgese cruise EMB214 (Rehder et al.2021) in May and June 2019 (Fig. 2b), which were carried out as part of the BONUS INTEGRAL project. Data processing and quality control followed the SOCAT guidelines. In total, data coverage in the southern and central Baltic Sea is high, with locally up to 189 out of 223 months covered (i.e. June 2003 to December 2021). Data in the northern Baltic Sea, however, are only available since February 2019, with observations during a total of 15 out of 35 months.

2.2.3 Monthly climatology construction

For every month t of the 189 months with observations, the observations were centred temporally on the 15th of each month as t (see Sect. A7), so that the extrapolation approach provides a field X with a spatial distribution of

  • the reconstructed pCO2 (xt,reconstr, Eq. A27) at t,

  • the pCO2 error estimate (σt,reconstr, Eq. A28) at t,

  • the (short-term) pCO2 trend at t,

  • an error estimate of the pCO2 trend at t, and

  • the average number of patterns ei used in the reconstruction (l, Eq. A25)

for each month with observations (e.g. Fig. 3 for Xs of May 2013 as an illustration).

Figure 3Monthly fields X of surface pCO2 reconstruction for May 2013. (a) Reconstructed pCO2 xt,reconstr with observation locations given as black dots. (b) pCO2 error estimate σt,reconstr with small values close to observations. (c) pCO2 trend and (d) its error estimate. (e) Observed (blue) and reconstructed pCO2 (red) against latitude as well as (f) their difference as the median (solid line) and interquartile range (pale shading). (g) Average number of patterns l of the reconstruction: above-average l indicates the use of high-order, smaller-scale patterns ei, notably close to data constraints from observations, whereas below-average l indicates areas where low-order, larger-scale patterns ei dominate the reconstruction ensemble (see Eq. A26). Compare Fig. A2 for a reconstruction without a temporal trend (Sect. A7).

The construction of a monthly pCO2 climatology for the entire Baltic Sea has its bottleneck in the temporal coverage of the northern basins, i.e. the Bothnian Sea, the Quark, and the Bothnian Bay. While the central and southern Baltic Sea has approximately 85 % of all the months covered since June 2003, observations in the northern basins only start in February 2019, with 43 % monthly coverage until the end of 2021. While the extrapolation approach provides fully filled maps of pCO2, including the northern basins, also for months pre-February 2019, they are associated with larger uncertainties, i.e. limited explanatory power, in the northern basins (e.g. Fig. 3b). To not create a wrong impression of seasonality or interannual variability, especially if the mapping uncertainty fields σt,reconstr are not respected by a casual user, we chose to provide a mean seasonal climatology Y. Here, the weighting of the climatology (Eq. 1; see below) ensures the preference of reconstructions with proper observation constraints (i.e. February 2019 and onwards for the northern basins) over reconstructions without them (i.e. pre-February 2019).

For a given location m, χm denotes the (reconstructed) time series of X (with size 1×189) and ym the time series of the mean monthly climatology Y (with size 1×12), respectively. The monthly means ym were then obtained by


where wm is the time series of inverse variance weights (Eq. 2) with size 1×189 at the given location m, Om a time series operator or matrix of size 12×189 that selects the monthly mean ym corresponding to the month of χm, ttref the time difference (in decimal years) between each of the 189 monthly X and a reference time, tref, and gm an extra degree of freedom to allow for a linear long-term trend in X for each location m (e.g. surface pCO2). As a reference time, the middle of 2013 was chosen, which closely corresponds to the mean date of all the observations.

Equation (1) represents a system of 189 linear equations with 13 degrees of freedom to calculate 12 weighted mean values ym and a long-term trend gm. Through the inverse variance weights (Eq. 2), monthly maps χm with better-constrained data, i.e. smaller variance of reconstruction σm,reconstr2 at the given location m (and for the given month), obtain preference in the weighted mean (Eq. 1).

When done for each point m, one obtains the mean seasonal climatology Y (with size m×12), centred on 2013, as well as a map of its mean linear (long-term) trend G (with size m×1) for each of the five fields X discussed.

2.2.4 Dataset description

The dataset (Bittig et al.2023) consists of two plain-text files. One contains the mean seasonal climatology Y and the other the mean linear (long-term) trend G. Based on the 3 nautical mile resolution of the original model data, both feature 12 010 grid points m for HELCOM sub-basins 2 to 17. For the months January through to December, the climatology file gives pCO2 (µatm), a pCO2 error estimate (1σ) (µatm), a (short-term) pCO2 trend for the 15th of each month (µatm d−1), a (short-term) pCO2 trend error estimate (1σ) (µatm d−1), and the average number of patterns (l) (see Figs. A3 and A4). The long-term trend file gives both the long-term linear trend G (µatm yr−1) (Fig. 7) and its error estimate (1σ) (µatm yr−1).

3 Results

3.1 Extrapolation approach

The mapping approach gives fully filled fields on the entire spatial domain from scattered observational data. The mapped pCO2 is in a similar range and frequency distribution to observations with improvements in terms of distribution and correlation over the original model pCO2 (Fig. 4a). The mapped fields are smooth and without spatial gaps or discontinuities. The pCO2 error estimate is reduced in the spatial vicinity of observations. In contrast, pCO2 error is markedly increased both near shore as well as in sub-basins not covered by observations, e.g. the northern basins or the Gulf of Riga for a SOOP Finnmaid-based reconstruction (e.g. Fig. 3b). Still, pCO2 values remain within reasonable margins even in those areas (e.g. Fig. 3a). We cannot observe a tendency of the mapping approach to give extreme values or outliers in the absence of observations (compare Fig. 4a). Rather, with few or no data constraints, variability pattern ei amplitudes in the ensemble tend towards zero, i.e. towards zero deviation, which is equivalent to the spatially resolved temporal mean pCO2 of the model (Appendix A).

Addition of a linear temporal trend to collate observations to a common time t helps to reduce the mismatch between temporally spread observations and a full-domain mapping of a dynamic coastal system (compare e.g. Figs. 3e, f and A2c, d, respectively).

However, for about half the grid points of the 189 monthly mappings, the magnitude of the short-term temporal pCO2 trend (e.g. Fig. 3c) at a given location m is insignificant, i.e. within its trend error estimate (e.g. Fig. 3d). At the same time, a considerable portion of the significant pCO2 trends is found at grid points m in the direct footprint of observations where the trend error estimates are small in the first place. This demonstrates that a monthly temporal resolution is sufficient for construction of a Baltic Sea pCO2 climatology with our approach.

3.2 Quality of interpolation and extrapolation

To assess the quality of the obtained fields, we consider (a) the mapped result against the original observations and (b) a comparison of mappings from concurrent subsets of observations.

For the first aspect, we consider the residual pCO2 against SOCAT observations for all 189 monthly mapped fields from the climatology construction (e.g. Fig. 3e, f). We use typical metrics for interpolation methods (correlation, bias, standard deviation). As the monthly fields X were constructed from those observations, this assesses primarily to which degree the data constraints were taken into account for the mapping, e.g. the balance between point-to-point reproduction and smoothing with the given patterns of variability ei.

Figure 4Histograms for (a) pCO2 observations (black) as described in Sect. 2.2.2, pCO2 reconstructed at the observation times and locations from the 189 monthly mappings (red), and model pCO2 extracted at the observation times and locations (green). Reconstructed and observed pCO2 data are highly correlated (R2), and they show a similar distribution. Correlation between modelled and observed pCO2 is weaker, and their distributions show noticeable differences. (b) The pCO2 difference between reconstructed (red) or modelled (green) pCO2 and observed pCO2. There is no bias in the reconstruction, while differences between model and observations show a much wider range than the reconstructed pCO2. (c) The pCO2 error estimates σreconstr of the reconstruction at the observation times and locations. (d) The ratio between the pCO2 difference and pCO2 error estimate, where a ratio ≤1 means that the observed pCO2 is within σreconstr of the mapped pCO2.


The comparison shows highly correlated pCO2 data (R2=0.95), a similar pCO2 distribution (Fig. 4a), and an unbiased mapping with a standard deviation of 26 µatm (Fig. 4b). This includes both misfits of the mapped fields as well as variations within observations of a given grid point location (compare the spread between observations and mapping in Fig. 3e). The error estimate is of a similar magnitude (Fig. 4c) and is well collocated. That is, residuals are within the error estimate in 68 % of the cases (Fig. 4d).

Figure 5Subsetting evaluations for May 2019 observations using 1× 1 subsets (top row; a–d), 2× 2 subsets (middle row; e–h), and 3× 3 subsets (bottom row; i–l), respectively, with maps (a, e, i) illustrating one example subsetting grid where observations (blue) in white boxes are used for pCO2 reconstruction and observations in grey boxes for evaluation. Histograms are for (b, f, j) the pCO2 difference between reconstructed and observed pCO2 within the evaluation boxes. (c, g, k) The pCO2 error estimate σreconstr of the reconstruction at the observation times and locations within the evaluation boxes. (d, h, l) The ratio between the pCO2 difference and pCO2 error estimate within the evaluation boxes, where a ratio ≤1 means that the observed pCO2 is within σreconstr of the mapped pCO2. The colours represent eight different 1× 1, 2× 2, and 3× 3 subsetting grids only offset spatially in latitude and longitude.

The second aspect, how data constraints in one area transfer to accurate predictions in another part of the domain, is more difficult to assess objectively. It requires the observational data to be split into a subset that is used for mapping and a subset that is deemed independent and used for evaluation. How this subsetting is done, however, directly influences the outcome of the evaluation. That is, while such evaluations may seem instructive and generalizable, they are highly dependent on design and thus carry some degree of arbitrariness.

To illustrate this, we consider a series of monthly mappings for May 2019, the month with the highest density of observations in our 189 monthly mappings. We use chessboard-like grids with alternating white and black boxes for subsetting, where observations in the white boxes are used for mapping and observations in the black boxes are used for evaluation. Grid box sizes vary from 1× 1, 2× 2, to 3× 3 subsetting grids (see Fig. 5a, e, and i for example grids). To check for dependence on a specific data distribution, the 1× 1, 2× 2, and 3× 3 subsetting grids were shifted spatially in latitude and longitude so that every grid size had eight different realizations, each with different sets of data in the mapping and evaluation subsets, respectively (Fig. 5).

For 1× 1 subsetting, the eight realizations show a relatively coherent picture: differences ΔpCO2 between mapped and observed pCO2 (Fig. 5b) are slightly larger (by 5–12 µatm) than for interpolated data (Fig. 4b) and are mostly unbiased. pCO2 error estimates are elevated (Fig. 5c vs. Fig. 4c), while the ratio of differences and error estimates, i.e. the collocation of bias and error estimate, shows a similar distribution for both extrapolation (Fig. 5d) and interpolation (Fig. 4d). With increasing spatial scales of the 2× 2 and 3× 3 subsetting, pCO2 differences and error estimates become larger, as would be expected (Fig. 5f, j and g, k). Encouragingly, however, the ratio of differences and error estimates preserves a similar shape (Fig. 5h, l). That is, the mapping approach seems to provide suitable mapping error estimates.

More concerning, however, are increasing differences between the eight realizations with increasing spatial scales. Especially for the 3× 3 subsetting, some realizations show a pCO2 bias, which additionally can be different between realizations (blue vs. orange realizations, Fig. 5j), as well as noticeable differences in pCO2 error estimates and thus mapped pCO2 (e.g. green realization, Fig. 5k). This illustrates a strong dependence on the specific choice of subsetting, i.e. which data are included in mapping and which data are excluded for validation.

We conclude that, with a given choice of subsetting, one implicitly chooses which statistical values one wants to get out. While our evaluation could be described as being well designed, it turns out to be starting-point-dependent. If a month with less data coverage were chosen, differences between realizations (i.e. which data end up in the training or validation subset) would be even larger. Therefore, based on the above illustration, any subsetting-based evaluation needs to be taken with a sufficient amount of caution, as the subsetting design will imprint itself on the outcome.

Figure 6Smoothed histograms for the five output fields of the climatology Y: (a) reconstructed pCO2 xt,reconstr, (b) pCO2 error estimate σt,reconstr, (c) pCO2 short-term trend, (d) error estimate in the pCO2 short-term trend, and (e) the average number of patterns l used in the reconstruction. The colour code indicates the respective month. pCO2 follows a typical seasonal cycle, while the pCO2 error estimate and the number of patterns show no apparent seasonal variations. pCO2 trends show higher magnitudes during spring (March and April) and late summer (August and September), which is mirrored by enhanced pCO2 trend error estimates in these months.


3.3 Baltic Sea pCO2 climatology

3.3.1pCO2 distribution

The mean seasonal pCO2 distribution xreconstr matches previously published seasonal cycles of surface pCO2 in the western and central Baltic Sea (Schneider and Müller2018): a strong spring bloom pCO2 drawdown occurs between March and May to a low around 200 µatm in May, some slight increase in June, and a second summer bloom low in July, followed by relaxation towards atmospheric levels in September and October and supersaturated levels during late autumn and winter, peaking around 500 µatm in December (Fig. 6a; Figs. A3 and A4, first column).

Regionally, the western basins lead the mean seasonal cycle, while the central Gotland Basin and the Gulf of Finland trail behind. The productive season is even shorter in the northern basins, with the major pCO2 drawdown in the Bothnian Sea occurring in June (Figs. A3 and A4, first column). Similarly, the seasonal amplitude is less pronounced in the western or northern basins than in the Gotland Basin and is most intense in the Gulf of Finland. For the northern basins, this is the first fully seasonal climatology from pCO2 observations.

The pCO2 error estimate σreconstr shows a minimum of 12 µatm. Close to observations, the pCO2 error estimate of the mean seasonal climatology is of the order of 20 µatm and as such close to the value of the individual mappings (compare Fig. 6b and Fig. 4c). Further away, σreconstr increases up to a level around 90 µatm (95th percentile), with a spatial preference for high σreconstr in the northern basins as well as in the Gulf of Riga, the Oder Bight, and other sheltered, near-shore areas (Figs. A3 and A4, second column).

3.3.2pCO2 short-term trends

In general, the short-term pCO2 trends follow the mean seasonal pCO2 cycle with on average negative trends, i.e. decreasing pCO2, during winter and spring. Positive trends prevail during late summer and autumn (Fig. 6c), with the strongest trend magnitudes occurring in the northern Baltic proper and the Gulf of Finland (Figs. A3 and A4, third column).

However, in 8 out of the 12 months, the majority of estimated trends are insignificant. That is, more than 70 % are smaller than their error estimate, indicating that inclusion of a short-term trend in the mapping may not be required for these months. Only in March and April as well as in August and September are the majority of pCO2 trends significant (despite elevated pCO2 trend error estimates occurring at the same time in some areas; see Fig. 6d). That is, significant short-term pCO2 trend magnitudes fall together with the strong springtime pCO2 drawdown as well as with autumnal mixed-layer deepening and entrainment of high-pCO2 waters (Fig. 6; more in Jacobs et al.2021).

Like for the pCO2 distribution, trend error estimates are higher in the northern basins as well as the Gulf of Riga, Oder Bight, and other sheltered, near-shore areas (Figs. A3 and A4, fourth column).

3.3.3 Number of patterns or mapping scales

The average number of patterns l per month can be seen as a qualitative indicator to show preference for reconstructions with larger or smaller spatial scales, respectively, in the weighted mean (Eq. A25) of the reconstruction.

For the climatology Y, l shows a maximum around 105 patterns (Fig. 6e). This level is at about 45 % of the maximum number of patterns, i.e. indicating only a slight preference for reconstructions with larger-than-average scales in the weighted mean. However, a notable fraction with a higher number of patterns l and a peak around 140 patterns exists (Fig. 6e). These are located in the vicinity of the observations, where the preference in the weighted mean is for reconstructions with stronger small-scale features (see e.g. Fig. 3g).

There is no seasonal imprint on the average number of patterns l, which is in contrast to the pCO2 and pCO2 short-term trend distributions (Fig. 6).

3.4 Long-term pCO2 trend

The mean climatology Y is constructed from the 189 monthly mappings X and a linear long-term trend G (see Eq. 1 and Sect. 2.2.3). For pCO2, the analysis shows a significant long-term pCO2 increase of +1.4±0.5µatm yr−1 (1σ) in the southern Baltic Sea from the Great Belt to the Bornholm Basin and southern parts of the Gotland Basin (Fig. 7). A similar increase (+1.5±0.7µatm yr−1) is obtained in selected regions of the northern Baltic proper, the Åland Sea, and the Gulf of Finland. A positive trend in pCO2 is also visible in coastal areas along the Finnish coast in the northern basins as well as in the Gulf of Riga. However, multi-year observations in those areas are unavailable to support a long-term pCO2 trend (see Sect. 2.2.2; Fig. 2). In all the other areas, the data available do not indicate a significant long-term trend, i.e. one that is outside the 68.3 % (1σ) confidence bound. This includes in particular the western and eastern Gotland Basin but also the northern basins (Fig. 7), where the observational time series is likely too short to allow for the detection of a significant long-term trend (Fig. 2).

Figure 7Long-term trend G of surface pCO2 from the monthly mappings and the climatology construction (Eq. 1). Areas where the long-term trend is not significant are cross-hatched. In the southern Baltic Sea, a significant increase of +1.4µatm yr−1 is obtained from the observations. Selected parts of the northern Baltic proper, the Åland Sea, and the Gulf of Finland give a similar increase of +1.5 µatm yr−1, while the western and eastern Gotland Basin show no significant long-term trend.

4 Discussion

4.1 Extrapolation approach

The extrapolation approach uses an EOF analysis of the data covariance Q as a foundation. When working with an EOF reconstruction, one needs to bear in mind the purely mathematical nature of the EOF modes' ei construction. They are built by (1) maximizing the (partial) variance they explain and by (2) being orthogonal to all previous EOF modes (Preisendorfer1988; Monahan et al.2009; Dommenget2015). They may therefore indicate some causal connection within the original data X, which requires confirmation by other means. They may, however, just as well group some portion of (partial) variance in one part of the domain with some portion of (partial) variance in another part of the domain just to mathematically maximize the amount of (partial) variance of the given EOF mode ei. This can give rise to apparent EOF-mode “teleconnections”, where one part of the data or spatial domain is seemingly tightly coupled to another part of the data, e.g. for the second EOF mode ei=2, which always has a dipole pattern. Often, there is a temptation to attribute such couplings or “teleconnections” to a driving mechanism. However, given its origin, patterns of EOF modes should be interpreted with great caution (and only with ancillary, supporting data) and are best seen to be of a mathematical rather than mechanistic nature (Monahan et al.2009; Dommenget2015). We therefore did not try to assign physical mechanisms or drivers to specific EOF patterns ei in our work.

The use of EOF modes as a basis for extrapolation ensures that the extrapolated map covers the full spatial domain, that it is gap-free, and that it is discontinuity-free.

A key aspect of truncated EOF reconstructions is the number l of significant modes included. Our cross-validation approach provided a relatively large number of lmax=224 modes. The relatively large number of modes is a prerequisite for a small representational error P, i.e. for the covariance not resolved by the collection of truncated EOF modes, which gives a lower bound on the uncertainty of the mapping σreconstr (Eq. A28). In our case, the average σP when using all 224 patterns ei amounts to 14.7 µatm. If only the first 50 patterns ei were included, the average representational error would be as high as 29.1 µatm.

A reconstruction with a small number of modes provides for a more uniform, large-scale homogeneous field, where gaps in the data are filled by the large-scale picture. However, such reconstructions may lack the flexibility to reproduce real features of the observations, e.g. through too strong smoothing. Conversely, a large number of modes provides for a fine, small-scale field with high flexibility. However, features in some areas without nearby observations may be badly constrained with the risk of “ghost” signals.

By using an ensemble of reconstructions that cover the entire range of 1…lmax modes and by using the mapping variance σP2 as weights (Eqs. A25 and A26), we find an optimal trade-off in our mapping approach, where both aspects are balanced. In this way, the scales used in the mapping are adjusted spatially according to the distribution of observations as constraints.

The cost function of the mapping approach minimizes the residual between observations and mapped data (Eq. A19). In this way, a synoptic pCO2 mapping is obtained. If observations, however, are temporally extended compared to the system's temporal dynamics, the pCO2 mapping can be distorted and cannot represent a synoptic picture, depending on the spatial and temporal pattern by which observations are obtained (Elken et al.2019).

The introduction of a temporal trend at each location (Sect. A7) as an additional degree of freedom reduces such distortion caused by sampling. It both improves the fit to the data (compare Fig. A2c, d vs. Fig. 3e, f) and provides extra information.

  • For a given mapping, a strong or weak trend indicates where temporal dynamics are high or low and informs on where the frequency of observations should be enhanced or not, respectively.

  • For a time series of mappings, information on both the value and trend allows for a more accurate interpolation by a cubic Hermite spline (Appendix B) compared to a standard, linear interpolation of the values alone (e.g. Fig. 8).

Figure 8Difference between monthly point-by-point interpolation (green) or interpolation with a trend (orange) in a dynamic coastal system. Example data are from the northern Baltic proper (approximately 5845 N, 02100 E), with pCO2 observations X shown in black. They are the basis for monthly mapped fields X of surface pCO2 (blue) as well as for a linear (short-term) pCO2 trend (red). Compared to a linear interpolation between monthly pCO2 values (green), interpolation including the trend information (orange) gives a better reproduction of the observed dynamics, particularly in spring and autumn. A cubic Hermite spline was used for the calculation (Appendix B).


For our mapping of surface pCO2 in monthly intervals, we conclude that the inclusion of a temporal trend in the mapping is required, particularly for spring and autumn. For months without strong trends, we do not see adverse effects in the mapped pCO2 compared to an extrapolation without inclusion of a temporal trend.

The dynamics of the pCO2 field are prescribed by the model data covariance matrix Q (Fig. A1a). The closer the model and real-world variability match, the easier it is for the mapping to reflect reality and to provide a most realistic picture. Nonetheless, also for a model dataset X that may lack some real-world features or their magnitude, the observational constraints provide an improvement in the mapped field compared to the original model (Fig. 4).

The mapping error estimates are elevated where observations are scarce or dynamics are high to start with. For our pCO2 mapping, the model covariance Q is the origin of a pronounced coast–basin difference in the pCO2 variance caused by both physical and biological drivers in the model. This prescribed data covariance distribution is retained in the mapping error estimates σreconstr. For example, they are enlarged in dynamic river plumes, in the sheltered, shallow Gulf of Riga, or in other near-shore areas, unless there are observations to constrain dynamics and thus reduce σreconstr. In regions with few observations, σreconstr approaches the prescribed data variance from Q (Fig. A1a).

We do not observe a seasonality in the number of patterns l of the mapping (Fig. 6e). At the same time, the pCO2 value varies strongly between seasons. In addition, there is a slight preference for observations to occur in spring, summer, and autumn over winter months. However, there is no discernible difference in the spatial coverage of pCO2 observations between seasons (data not shown). The aseasonality of l can be explained by the “large-scale” mapping error σP, used for the ensemble weights (Eq. A26) to only depend on the distribution of samples Ht and their respective observational error estimates S. Therefore, the extrapolation approach does not discriminate or distinguish by (pCO2) value, and there is no seasonal imprint in the number of patterns.

Figure 9Mean monthly pCO2 distribution for 1 month per season (left to right) for different climatologies: Parard et al. (2016) (a–d), Becker et al. (2021) (e–h), and this work (i–l). The first two climatologies are proxy-based, where artefacts of the proxy directly translate to pCO2, e.g. for wintertime satellite retrievals (a, d) or the subtle 5× 4 gridding in Becker et al. (2021) (e–h). Our approach uses observed pCO2 data directly for mapping, which gives fewer artefacts but is only applicable when pCO2 observations are available.

4.2 Comparison with other pCO2 mapping approaches

SOCAT observations have been used previously to build surface pCO2 climatologies of the Baltic Sea (see Fig. 9 for examples). They rely on observation-driven extrapolation approaches, which can be grouped into different categories: (a) one where sparse data points are extrapolated based on some statistic metric of influence, e.g. a spatial decorrelation length or timescale, and (b) one where parametrizations with proxy data in combination with filled fields of those proxies are used for extrapolation (Rödenbeck et al.2015).

Our mapping approach belongs to the first category. For a given month or time window, we use the available pCO2 data directly and one to one without an added transformation to inform the extrapolated pCO2 map. As a (spatial) metric of influence, the variability patterns are used. This gives a direct link between observations of a given month or time window and the mapped field. Conversely, if there are no pCO2 observations in a given time window, this directly translates to a (time) gap in the mapped fields for any approach of the first category.

Most previous pCO2 mappings in the Baltic Sea belong to the second category (e.g. Parard et al.2016; Becker et al.2021). In a first step, they use all available pCO2 data together with collocated proxies (e.g. location, sea surface temperature, chlorophyll-a concentration, water depth, distance to the coast, season) to establish a relationship or parametrization between proxies and pCO2. In a second step, distributions of those proxy parameters are used to establish a pCO2 distribution in time and space. Here, pCO2 data of a given month do not directly inform the extrapolated pCO2 map but only indirectly through the established relationship using all the data. Conversely, if there are no pCO2 observations in a given time window but proxy data are available, they can be used to estimate a pCO2 map nonetheless. That is, (temporal) gap filling is possible with the two-step procedure of approaches of the second category.

As a drawback, mapped pCO2 fields of the second category rely on the validity of the proxy relationship and, equally important, on good proxy input data. In some cases, e.g. satellite-based sea surface temperature or chlorophyll-a concentration, proxies may be (seasonally) prone to artefacts, e.g. due to stronger cloud cover in winter months. Such artefacts then directly translate into mapped pCO2 fields and may create unrealistic patterns or out-of-range values (e.g. Fig. 9a, d; Parard et al.2016). A mapping approach of the first category such as the one presented here, which uses the observations directly, is not affected by external-source data quality (Fig. 9i–l).

Wesslander et al. (2010)Wesslander et al. (2010)Laruelle et al. (2018)Schneider and Müller (2018)Becker et al. (2021)Becker et al. (2021)

Table 1Summary of long-term pCO2 trends for different areas and time periods in the Baltic Sea as obtained from different studies.

Download Print Version | Download XLSX

4.3 Long-term pCO2 trend

The long-term trend of surface pCO2 in Baltic Sea surface waters has been studied before (Table 1): Wesslander et al. (2010) describe an increase in monthly pCO2 (approximately +20µatm yr−1) for the 1993–1998 period in the eastern Gotland Basin, which seems mostly driven by less intense summer pCO2 minima (from approximately 50 µatm in 1993 to approximately 250 µatm in 1998). For the longer period 1993–2009, they do not detect a significant trend. While Wesslander et al. (2010) use pCO2 calculated from pH and alkalinity, all further studies are based on the same surface pCO2 observations archived in SOCAT (Bakker et al.2016), albeit using different methods and time extents.

Laruelle et al. (2018) found an increase of +2.9±2.4µatm yr−1 (1σ) for wintertime pCO2 from 1995 to 2011 in the southern and central Baltic Sea. At the same time, they see a similar spatial gradient to our work (Fig. 7) with a stronger increase in the south-west and a zero or even negative trend in the northern Gotland Basin.

Schneider and Müller (2018) describe an increase of +4.6 to +6.1µatm yr−1 (1σ between 0.6 and 1.5 µatm yr−1) for different areas for the 2008–2015 period. However, their analysis did not deseasonalize the observations prior to linear trend analysis. During their analysed period, there is an increase in observations during autumn and winter (with typically high pCO2) and a decrease in early spring observations (with typically low pCO2), while late-spring and summer observations have a similar data coverage over the years. Their relatively high pCO2 trend estimate therefore seems to be partially caused by a seasonal shift in data availability that was left unaccounted for in the analysis.

Becker et al. (2021) derived trends of +2.05±0.12µatm yr−1 for the southern Baltic Sea (south of 56 N) and +1.84±0.21µatm yr−1 for the central Baltic Sea (between 56 and 61 N) for the period 1995–2016. However, they observe a stronger trend in earlier years (starting in the 1990s) than for later periods (starting in the mid-2000s).

Our analysis covers the more recent period 2003–2021 and gives significant trends of +1.4±0.5µatm yr−1 (1σ) in the southern Baltic Sea (south of 56.5 N) and +1.5±0.7µatm yr−1 in the Åland Sea, parts of the northern Baltic proper, as well as the Gulf of Finland (between 59 and 61 N). The central Baltic Sea and the northern basins themselves show no significant trend. The positive trend (+1.2±0.6µatm yr−1) in coastal waters of the northern basins along the Finnish coast as well as in the Gulf of Riga (Fig. 7) currently has no support from direct observational constraints in those areas (Fig. 2). In contrast, it likely originates from high similarity to other coastal waters (e.g. along the southern Baltic shore) in the model data variability analysis and thus in the variability patterns ei used for mapping. As such, the pCO2 increase may be realistic but lacks field data support.

Together with the literature, our results seem to indicate a reduction in overall surface pCO2 increase in the Baltic Sea during the past decades or even its complete absence like in the central Baltic Sea (Fig. 7) in recent periods. While atmospheric pCO2 levels are still rising (approximately +2.4µatm yr−1 in the past decade; Lan et al.2022), this has implications for the CO2 balance between the atmosphere and the Baltic Sea. This supports previous claims of coastal seas becoming a less intense source of CO2 or eventually turning into (an increasing) CO2 sink (Laruelle et al.2018).

6 Conclusions

In this work, we developed an extrapolation approach that combines two worlds: models, specifically the distribution and connectivity that exist in model data variability; and observations in that they provide constraints of the real-world picture.

The most notable features of the approach are that it does not tend to give extreme, out-of-range values even with few data constraints and that it provides local error estimates, which reflect both underlying variability, e.g. coast–basin gradients, and observational data constraints. We consider of particular merit the fact that the extrapolation scheme adapts its spatial scales to the number of observations in a certain area, leading to a sound representation of less uncertainty where more data are available.

Used together with high-quality surface pCO2 data from SOCAT, we established a climatology that covers the entire Baltic Sea. Given the present data scarcity in the northern basins of the Baltic Sea, a mean pCO2 climatology is what is reliably achievable today. It can serve as a seasonal baseline of regional pCO2 evolution that enables assessment of interannual variations, e.g. with respect to timing or amplitude. With sustained and/or enhanced observations in such data-poor regions, a fully monthly resolved pCO2 dataset will become realistic in the future. To this end, SOOP Tavastland and SOOP Silja Serenade (crossing between Stockholm and Helsinki) provide a positive outlook for uncertainty reduction. Both SOOP lines were recently adopted by ICOS.

Finally, our extrapolation approach as well as the method to establish a climatology are not limited to pCO2 or the Baltic Sea. Instead, they are transferable to other areas and parameters where the present work can serve as a template.

Appendix A: Extrapolation approach

A1 Notation and mathematical background

To represent a spatial dataset at a given time t with m spatial points, we use the data vector xt with size m×1. For a spatial time series of n times, we use the data matrix X with size m×n, where the n columns correspond to n data vectors xt. Data in xt and X are the deviation from the space-dependent temporal mean xm. The empirical data covariance matrix QXX with size m×m is given by

(A1) Q XX = 1 n - 1 XX T .

The singular value decomposition (SVD) or empirical orthogonal function (EOF) decomposition of the data matrix X decomposes X into a series of orthonormal factors or principal components according to

(A2) X = E Σ A T ,

where E is a m×m orthonormal matrix, Σ a m×n rectangular diagonal matrix, and A a n×n orthonormal matrix. With the above convention on m and n, the column vectors ei (with size m×1) in E represent spatial patterns, while the column vectors ai (with size n×1) in A represent amplitude time series. They are also called left-singular vectors and right-singular vectors, respectively. The diagonal elements of Σ are the so-called singular values si of X. Equation (A2) can be written in vector form as

(A3) x t = E Σ a t = E α t ,

where αt=Σat is the so-called dimensional amplitude at a given time t.

Based on a SVD of X (Eq. A2), the m×m matrix B=XXT can be expressed as

(A4) B = XX T = E Σ A T A Σ T E T = E Σ Σ T E T = E Λ ̃ E T ,

where Λ̃ is a diagonal matrix whose diagonal elements are λ̃i=si2. Equation (A4) states the eigendecomposition of B. Since matrix B is proportional to the covariance matrix Q (Eq. A1), we can state the eigendecomposition of the data covariance matrix Q as

(A5) Q = E Λ E T ,

with corresponding eigenvalue problem

(A6) QE = Λ E


(A7) Λ = 1 n - 1 Λ ̃ .

The spatial patterns ei with size m×1 are eigenvectors of the data covariance matrix, and the diagonal elements λi of Λ the corresponding eigenvalues λi, which give the amount of variance associated with each ei. From the set of eigenvectors ei and their corresponding amount of variance λi, the spatial distribution xt at a given time t can be reconstructed by determination of a suitable amplitude vector at (Eq. A3) (see textbooks of linear algebra or statistical data analysis, e.g. Dommenget2015).

A2 Truncation of eigenvalue modes

For practical purposes, reconstruction often uses only the first l leading eigenvectors instead of all m eigenvectors ei (1lm). That is, E, Λ, and A are split into a matrix E, Λ, and A of size m×l, l×l, and n×l, respectively, which contain the first l leading eigenvector modes, and a matrix E, Λ, and A, which contain the remaining ml eigenvector modes. Equations (A2), (A3), and (A5) thus become

(A8)X=EΣAT+EΣAT,(A9)xt=EΣat+EΣat, and(A10)Q=EΛET+EΛET=P+P.

The leading eigenvector modes ei cover most of the data variance and often represent more “large-scale” spatial patterns, while the modes higher than l contain only a small amount of the data variance and can be seen as “small-scale” spatial patterns (Lorenz1956; Kaplan et al.2000; Dommenget2015). Correspondingly, the first and second terms in Eq. (A10) give the large-scale and small-scale variances P and P, respectively, whereas in Eq. (A8) and its vector form Eq. (A9) they give the large-scale and small-scale contributions, respectively, to the spatial data vector xt.

This split can also be interpreted as decomposition into a “signal” part and a “noise” part. Equations (A8)–(A10) with only the l leading modes (i.e. with only the first term) are called a truncated EOF reconstruction in which dimensionality is reduced from m to l modes. Higher-order modes (in E and Λ, i.e. the second term) are assumed to be dominated by noise and error and are discarded in a truncated eigenvector or EOF reconstruction (Lorenz1956; Kaplan et al.2000).

A3 Extrapolation from observational data

To represent a set of k observations at a given time t, we use the observational data vector yt and the observational error σt, both with size k×1. The transfer operator Ht is then a “sampling” or observation function that depends on the spatial configuration of the observation points and samples from the data vector xt of size m×1 so that the result x^t,

(A11) x ^ t = H t x t ,

has the same size k×1 as the observations. To be comparable with x^t, the observational data yt are expressed as a deviation from the space-dependent temporal mean xm too (Elken et al.2019).

The eigenvalue reconstruction in truncated form (Eq. A9) then becomes

(A12) H t x t = H t E Σ a t .

Minimization of a suitable cost function Q, e.g. for least-squares optimization as in Elken et al. (2019),

(A13)Q(αt)=HtEαt-ytTHtEαt-yt,with (A14)αt=Σat,

yields a system of l linear equations with l unknowns (Elken et al.2019):

(A15)Dαt=h,with (A16)D=ETHtTHtE and(A17)h=ETHtTyt,

which provides the observation-based eigenvector amplitudes αt or at (Eq. A14), respectively.

With Eq. (A9) in truncated form and Eq. (A14), the data vector xt can be obtained, i.e. an extrapolation from k×1 observational data points yt to the entire spatial domain of m×1 data points xt with the help of the truncated eigenvalue decomposition of m×l eigenvectors and l eigenvalues ei and λi, or E and Λ, respectively.

A4 Error considerations

Both “observational error” and “representational error” impact the determination of the eigenvector amplitudes at. Observational error σt includes both instrumental and sampling error and can be used to limit the impact of observational data constraints yk (Kaplan et al.2000). Representational error is the error made by truncation, i.e. by using only the l largest, leading eigenvectors and neglecting the remainder of the spectrum, which can be derived from the “small-scale” covariance P contribution (Kaplan et al.2000).

The error covariance matrix is the sum of the observational data error covariance S and a term that accounts for the covariance created in the truncated modes E not resolved by the analysis:

(A18) R = S + H t E Λ E T H t T = S + H t P H t T ,

where S is a diagonal matrix with the observations' variance σt2 on the diagonal and Ht is the aforementioned sampling operator (Kaplan et al.2000).

By addition of constraints on the cost function Q (a) to limit the estimated eigenvector amplitudes based on the amount of variance they explain in the original eigenvalue decomposition and (b) that at determination is limited by the uncertainty of observations and from truncation (see above), Eq. (A13) is modified to (Kaplan et al.2000)

(A19) Q ( α t ) = H t E α t - y t T R - 1 H t E α t - y t + α t T Λ - 1 α t ,

and the solution Eqs. (A15)–(A17) thus become (Kaplan et al.2000)

(A20)Dαt=h,with (A21)D=ETHtTR-1HtE+Λ-1 and(A22)h=ETHtTR-1yt.

On the other hand, the large-scale portion of the data covariance Q (Eq. A10) can be obtained from this solution according to Kaplan et al. (2000):

(A23) P = ED - 1 E T .

P can be seen as the mapping uncertainty of the extrapolation.

Note that the calculation of P only requires the sampling operator Ht, i.e. where samples are present, as well as an observational error estimate S, but not the actual observations or amplitudes at these locations. In the absence of any observation, Ht contains only zeros and Eqs. (A21) and (A23) then simplify to D=Λ-1 and P=EΛET (compare Eq. A10), respectively. Addition of observational constraints thus reduces the mapping uncertainty P.

For our purposes, we assume that off-diagonal elements in the covariance P are negligible, i.e. that the truncated eigenvector modes E are sufficiently small-scale and that their variances Λ are small enough that they do not show a noticeable covariance contribution (Eq. A18). We therefore approximate P by a diagonal matrix whose diagonal elements are obtained from rearrangement of Eq. (A10):

(A24) diag ( P ) = diag ( Q ) - diag ( E Λ E T ) ,

where Q is accessible from the data X (Eq. A1) and the second term from the truncated eigenvalue decomposition.

A5 Number of significant modes

A critical aspect before any extrapolation from observations is how many modes ei to use, or – in other words – where to truncate reconstruction, i.e. how to distinguish between modes representing desired variability and modes representing “noise”. Often, an arbitrary threshold of the total variance covered by the l leading modes is chosen (e.g. Kaplan et al.2000).

Here, we apply the DINEOF (Data Interpolating Empirical Orthogonal Functions) variant of SVD or EOF decomposition of the dataset X to find the lmax leading eigenvectors ei. The number lmax of significant modes is determined by a cross-validation procedure from the data X (Beckers and Rixen2003), and we use this lmax as the upper bound for l in our ensemble approach (see below).

A6 Ensemble approach for robustness and locality

Depending on the spatial distribution or clustering of observations yt, not all amplitudes of the lmax spatial eigenvectors ei may be well constrained. In some configurations, e.g. with few or clustered observations, where Ht selects only a few of the m spatial data points, a reconstruction with fewer modes than lmax may give a solution with a smaller mapping uncertainty P than the truncated reconstruction with all lmax modes ei.

Figure A1Model data X characteristics: (a) pCO2 variability shown by the square root of the model data covariance matrix Q's diagonal elements, i.e. the standard deviation of the pCO2 data X at each location. (b) pCO2 variability shown by the temporal decorrelation timescale, i.e. the time (d) at which the lagged autocorrelation drops below a threshold of 0.63 correlation. Less than 1 % of the locations have a temporal decorrelation scale of ≤7 d, while about 25 % are within ≤30 d. To adequately reflect the pCO2 data dynamics in the variability pattern extraction (Sect. 2.2.1), a weekly aggregation of the model data X was therefore chosen compared to e.g. daily or monthly model output.


Figure A2Same as Fig. 3 but without temporally collating the observations to the middle of the month (details in Sect. A7). The general picture of (a) reconstructed pCO2 xt,reconstr, (b) pCO2 error estimate σt,reconstr, and (e) average number of patterns l of the reconstruction is comparable to the temporally coherent reconstruction (Fig. 3a, b, g, respectively), but (c) mapped pCO2 (red) gives only one uniform value throughout the month, and (d) differences to observations are therefore increased (compare Fig. 3e, f).

Figure A3Monthly climatology Y of the surface pCO2, pCO2 error estimate, pCO2 trend, pCO2 trend error estimate, and average number of EOF patterns in the ensemble (left to right) for the months January–June (top to bottom).

Figure A4Same as Fig. A3 but for July–December.

We therefore use an ensemble approach over the truncated reconstructions, where we vary l from just 1 mode to lmax modes:

(A25) X = l = 1 l max w l X l l = 1 l max w l ,

where the weights wl (with size m×1) are based on the diagonal elements of P, i.e. the variance σP,t2 or mapping uncertainty of the extrapolation:

(A26) w l = 1 σ P , t 2 m σ P , t 2 .

As the mapped variance diag(P) varies in magnitude with the number of eigenvectors included, the weights wl for a given number of modes l are normalized by the total variance over all m data points, which gives a normalized spatial weight vector wl (Eq. A26). For a given spatial data point m, preference is thus given to reconstructions where it is among the better-constrained ones for the given number of modes l.

For 𝓧l, each using only the first l (lmax) eigenvector modes (Eq. A25), we insert the following.

  • The reconstructed data vector xt (from Eq. A9 in truncated form) to yield as X the ensemble mean xt

  • The term xt-xt2 to yield as X the biased weighted sample variance about the ensemble mean σt,mean2

  • The mapping variance σP,t2 (i.e. the diagonal of P; Eq. A23) to yield as X the remaining mean “large-scale” variance σP,t2

  • The approximation of the truncated variance σP,t2 (i.e. the diagonal of P; Eq. A24) to yield as X the mean “small-scale” variance σP,t2

  • The number of modes l itself to yield as X the average number of modes l used in the reconstruction as a spatial vector of size m×1 each, i.e. on the full spatial domain

The reconstructed data vector xt,reconstr and the total variance of reconstruction σt,reconstr2 are given by


A7 Temporal coherence

So far, we have considered the observations yt and the reconstructed data vector xt to originate and be valid for a given time t, respectively. However, in practice, observations y will spread over a certain time extent Δt, e.g. days for a single SOOP transect to weeks for a basin-covering cruise. If this time extent (e.g. duration of a cruise) is comparable to the timescale of the system's dynamics (e.g. spring surface warming or bloom onset), distortions in the reconstructed fields xt may occur (Elken et al.2019) as different “states” of the system could be sampled at different space–time locations.

To collate temporally extended observations into a common synoptic reconstruction without artefacts, Elken et al. (2019) added a linear temporal (short-term) trend for each eigenvector mode to the optimization, thus collating observations made at different observation times t to a common reference time instance t.

To this end,

  • the time difference Δt between observation and reference times is introduced:

    (A29) Δ t = t - t .
  • The sampling operator Ht is extended to not only sample from the eigenvector ei the value χ corresponding to the observation, but also the value times the time difference, χΔt, corresponding to a (short-term) time gradient. HtE thus becomes a matrix of size k×2l.

  • The l eigenvalues are replicated with a constant scaling factor, so that Λ becomes a 2l×2l diagonal matrix. That is, both the value (first l eigenvalues) and its time difference (second l eigenvalues times the scaling factor) follow the same eigenvector ei order and importance, with the scaling factor to determine the balance between both. Based on preliminary tests, we used an empirical scaling between the spatial value pattern and temporal trend pattern of 9×10-4, equal to a ratio of approximately 1 µatm per 33 d.

  • The eigenvalue amplitude vector at is doubled in size to 2l×1, where the first l values continue to represent the eigenvectors' ei amplitudes, while the second l values represent their temporal derivative.

Thus, the system of linear equations (Eqs. A20A22) becomes a system of 2l equations with 2l unknowns, and the reconstruction (Eq. A9) provides both an extrapolation to the entire spatial domain of m×1 data locations at reference time t as well as a temporal trend of change in the m×1 data locations at reference time t (Elken et al.2019).

Appendix B: Seasonal pCO2 from value and linear trend climatology by cubic Hermite spline calculation

With a pCO2 value x and linear pCO2 trend d given for each month of the climatology, the seasonal evolution of pCO2 at a given location m can be calculated as a cubic Hermite spline according to

(B1) x t = h 00 ( γ ) x k + h 10 ( γ ) d k ( t k + 1 - t k ) + h 01 ( γ ) x k + 1 + h 11 ( γ ) d k + 1 ( t k + 1 - t k ) ,


(B2)h00(γ)=2γ3-3γ2+1,(B3)h10(γ)=γ3-2γ2+γ,(B4)h01(γ)=-2γ3+3γ2,(B5)h11(γ)=γ3-γ2, and(B6)γ=t-tktk+1-tk.

Here, tk and tk+1 are the times of the climatological month before and after the time t of interest, and γ gives the normalized time. h(γ) are Hermite basis functions and xk and xk+1 as well as dk and dk+1 are the associated monthly pCO2 value and pCO2 trend, respectively, which determine the seasonal pCO2 xt at time t.

Author contributions

HCB and GR conceived the study. The method was developed by HCB with important input by EJ and GR. TN performed the model simulations and HCB the analysis. HCB led the manuscript writing with contributions by all the co-authors.

Competing interests

The contact author has declared that none of the authors has any competing interests.


Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.


Efforts by Anna Willstrand-Wranne (SMHI) and Tobias Steinhoff (GEOMAR/NORCE) around SOOP Tavastland observations are highly acknowledged. The Surface Ocean CO2 Atlas (SOCAT) is an international effort endorsed by the International Ocean Carbon Coordination Project (IOCCP), the Surface Ocean Lower Atmosphere Study (SOLAS), and the Integrated Marine Biosphere Research (IMBeR) programme to deliver a uniformly quality-controlled surface ocean CO2 database. The many researchers and funding agencies responsible for the collection of data and quality control are thanked for their contributions to SOCAT. The authors want to thank two anonymous reviewers and the handling editor for their valuable and constructive comments which improved the manuscript.

Financial support

This work was funded by the projects C-SCOPE (grant no. 03F0877D) and BONUS INTEGRAL (grant no. 03F0773A), which received funding from BONUS (Art. 185), funded jointly by the EU, the German Federal Ministry of Education and Research, the Swedish Research Council Formas, the Academy of Finland, the Polish National Centre for Research and Development, and the Estonian Research Council. Computational power was provided by the North German Supercomputing Alliance (HLRN). Measurements on SOOP Finnmaid were temporarily (2009–2011) funded by the German Federal Ministry of Education and Research in the framework of BONUS projects Baltic-C (grant no. 03F0486A), Baltic Gas (grant no. 03F0488B), and, since 2012, ICOS-D (grant nos. 01LK1101F and 01LK1224D). Additional support came from the JERICO-S3 project funded by the European Commission's H2020 Framework Programme under grant agreement no. 871153.

The publication of this article was funded by the Open Access Fund of the Leibniz Association.

Review statement

This paper was edited by Giuseppe M. R. Manzella and reviewed by two anonymous referees.


Bakker, D. C. E., Pfeil, B., Landa, C. S., Metzl, N., O'Brien, K. M., Olsen, A., Smith, K., Cosca, C., Harasawa, S., Jones, S. D., Nakaoka, S., Nojiri, Y., Schuster, U., Steinhoff, T., Sweeney, C., Takahashi, T., Tilbrook, B., Wada, C., Wanninkhof, R., Alin, S. R., Balestrini, C. F., Barbero, L., Bates, N. R., Bianchi, A. A., Bonou, F., Boutin, J., Bozec, Y., Burger, E. F., Cai, W.-J., Castle, R. D., Chen, L., Chierici, M., Currie, K., Evans, W., Featherstone, C., Feely, R. A., Fransson, A., Goyet, C., Greenwood, N., Gregor, L., Hankin, S., Hardman-Mountford, N. J., Harlay, J., Hauck, J., Hoppema, M., Humphreys, M. P., Hunt, C. W., Huss, B., Ibánhez, J. S. P., Johannessen, T., Keeling, R., Kitidis, V., Körtzinger, A., Kozyr, A., Krasakopoulou, E., Kuwata, A., Landschützer, P., Lauvset, S. K., Lefèvre, N., Lo Monaco, C., Manke, A., Mathis, J. T., Merlivat, L., Millero, F. J., Monteiro, P. M. S., Munro, D. R., Murata, A., Newberger, T., Omar, A. M., Ono, T., Paterson, K., Pearce, D., Pierrot, D., Robbins, L. L., Saito, S., Salisbury, J., Schlitzer, R., Schneider, B., Schweitzer, R., Sieger, R., Skjelvan, I., Sullivan, K. F., Sutherland, S. C., Sutton, A. J., Tadokoro, K., Telszewski, M., Tuma, M., van Heuven, S. M. A. C., Vandemark, D., Ward, B., Watson, A. J., and Xu, S.: A multi-decade record of high-quality fCO2 data in version 3 of the Surface Ocean CO2 Atlas (SOCAT), Earth Syst. Sci. Data, 8, 383–413,, 2016. a, b, c, d

Bakker, D. C. E., Alin, S. R., Becker, M., Bittig, H. C., Castaño-Primo, R., Feely, R. A., Gkritzalis, T., Kadono, K., Kozyr, A., Lauvset, S. K., Metzl, N., Munro, D. R., Nakaoka, S., Nojiri, Y., O'Brien, K. M., Olsen, A., Pfeil, B., Pierrot, D., Steinhoff, T., Sullivan, K. F., Sutton, A. J., Sweeney, C., Tilbrook, B., Wada, C., Wanninkhof, R., Willstrand Wranne, A., Akl, J., Apelthun, L. B., Bates, N., Beatty, C. M., Burger, E. F., Cai, W.-J., Cosca, C. E., Corredor, J. E., Cronin, M., Cross, J. N., De Carlo, E. H., DeGrandpre, M. D., Emerson, S. R., Enright, M. P., Enyo, K., Evans, W., Frangoulis, C., Fransson, A., García-Ibáñez, M. I., Gehrung, M., Giannoudi, L., Glockzin, M., Hales, B., Howden, S. D., Hunt, C. W., Ibánhez, J. S. P., Jones, S. D., Kamb, L., Körtzinger, A., Landa, C. S., Landschützer, P., Lefèvre, N., Lo Monaco, C., Macovei, V. A., Maenner Jones, S., Meinig, C., Millero, F. J., Monacci, N. M., Mordy, C., Morell, J. M., Murata, A., Musielewicz, S., Neill, C., Newberger, T., Nomura, D., Ohman, M., Ono, T., Passmore, A., Petersen, W., Petihakis, G., Perivoliotis, L., Plueddemann, A. J., Rehder, G., Reynaud, T., Rodriguez, C., Ross, A. C., Rutgersson, A., Sabine, C. L., Salisbury, J. E., Schlitzer, R., Send, U., Skjelvan, I., Stamataki, N., Sutherland, S. C., Sweeney, C., Tadokoro, K., Tanhua, T., Telszewski, M., Trull, T., Vandemark, D., van Ooijen, E., Voynova, Y. G., Wang, H., Weller, R. A., Whitehead, C., and Wilson, D.: Surface Ocean CO2 Atlas Database Version 2022 (SOCATv2022) (NCEI Accession 0253659), NOAA National Centers for Environmental Information [data set],, 2022. a, b

Becker, M., Olsen, A., Landschützer, P., Omar, A., Rehder, G., Rödenbeck, C., and Skjelvan, I.: The northern European shelf as an increasing net sink for CO2, Biogeosciences, 18, 1127–1147,, 2021. a, b, c, d, e, f, g, h

Beckers, J. M. and Rixen, M.: EOF Calculations and Data Filling from Incomplete Oceanographic Datasets, J. Atmos. Ocean. Tech., 20, 1839–1856,<1839:ECADFF>2.0.CO;2, 2003. a

Bittig, H. C., Jacobs, E., Neumann, T., and Rehder, G.: A regional pCO2 climatology of the Baltic Sea, PANGAEA [data set],, 2023. a, b, c

Carstensen, J. and Duarte, C. M.: Drivers of pH Variability in Coastal Ecosystems, Environ. Sci. Technol., 53, 4020–4029,, 2019. a

Davis, R. E.: Predictability of Sea Surface Temperature and Sea Level Pressure Anomalies over the North Pacific Ocean, J. Phys. Oceanogr., 6, 249–266,<0249:POSSTA>2.0.CO;2, 1976. a

DeVries, T.: The Ocean Carbon Cycle, Annu. Rev. Environ. Resour., 47, 317–341,, 2022. a

Dommenget, D.: An Introduction to Statistical Analysis in Climate Research, (last access: 31 January 2024), 2015. a, b, c, d

Eilola, K., Gustafsson, B. G., Kuznetsov, I., Meier, H. E. M., Neumann, T., and Savchuk, O. P.: Evaluation of biogeochemical cycles in an ensemble of three state-of-the-art numerical models of the Baltic Sea, J. Marine Syst., 88, 267–284,, 2011. a

Elken, J., Zujev, M., She, J., and Lagemaa, P.: Reconstruction of Large-Scale Sea Surface Temperature and Salinity Fields Using Sub-Regional EOF Patterns From Models, Front. Earth Sci., 7, 232,, 2019. a, b, c, d, e, f, g, h

Friedlingstein, P., O'Sullivan, M., Jones, M. W., Andrew, R. M., Gregor, L., Hauck, J., Le Quéré, C., Luijkx, I. T., Olsen, A., Peters, G. P., Peters, W., Pongratz, J., Schwingshackl, C., Sitch, S., Canadell, J. G., Ciais, P., Jackson, R. B., Alin, S. R., Alkama, R., Arneth, A., Arora, V. K., Bates, N. R., Becker, M., Bellouin, N., Bittig, H. C., Bopp, L., Chevallier, F., Chini, L. P., Cronin, M., Evans, W., Falk, S., Feely, R. A., Gasser, T., Gehlen, M., Gkritzalis, T., Gloege, L., Grassi, G., Gruber, N., Gürses, Ö., Harris, I., Hefner, M., Houghton, R. A., Hurtt, G. C., Iida, Y., Ilyina, T., Jain, A. K., Jersild, A., Kadono, K., Kato, E., Kennedy, D., Klein Goldewijk, K., Knauer, J., Korsbakken, J. I., Landschützer, P., Lefèvre, N., Lindsay, K., Liu, J., Liu, Z., Marland, G., Mayot, N., McGrath, M. J., Metzl, N., Monacci, N. M., Munro, D. R., Nakaoka, S.-I., Niwa, Y., O'Brien, K., Ono, T., Palmer, P. I., Pan, N., Pierrot, D., Pocock, K., Poulter, B., Resplandy, L., Robertson, E., Rödenbeck, C., Rodriguez, C., Rosan, T. M., Schwinger, J., Séférian, R., Shutler, J. D., Skjelvan, I., Steinhoff, T., Sun, Q., Sutton, A. J., Sweeney, C., Takao, S., Tanhua, T., Tans, P. P., Tian, X., Tian, H., Tilbrook, B., Tsujino, H., Tubiello, F., van der Werf, G. R., Walker, A. P., Wanninkhof, R., Whitehead, C., Willstrand Wranne, A., Wright, R., Yuan, W., Yue, C., Yue, X., Zaehle, S., Zeng, J., and Zheng, B.: Global Carbon Budget 2022, Earth Syst. Sci. Data, 14, 4811–4900,, 2022. a, b

GESAMP: High Level Review of a Wide Range of Proposed Marine Geoengineering Techniques, Rep. Stud. GESAMP No. 98, (IMO/FAO/UNESCO-IOC/UNIDO/WMO/IAEA/UN/UN Environment/UNDP/ISA Joint Group of Experts on the Scientific Aspects of Marine Environmental Protection), (last access: 31 January 2024), 2019. a

Gülzow, W., Rehder, G., Schneider, B., Schneider v. Deimling, J., and Sadkowiak, B.: A new method for continuous measurement of methane and carbon dioxide in surface waters using off-axis integrated cavity output spectroscopy (ICOS): An example from the Baltic Sea, Limnol. Oceanogr.-Methods, 9, 176–184,, 2011. a

Hannachi, A., Jolliffe, I. T., and Stephenson, D. B.: Empirical orthogonal functions and related techniques in atmospheric science: A review, Int. J. Climatol., 27, 1119–1152,, 2007. a

HELCOM Secretariat: HELCOM subbasins 2017 (level 2), (last access: 14 May 2019), 2017. a

Humborg, C., Geibel, M. C., Sun, X., McCrackin, M., Morth, C.-M., Stranne, C., Jakobsson, M., Gustafsson, B., Sokolov, A., Norkko, A., and Norkko, J.: High Emissions of Carbon Dioxide and Methane From the Coastal Baltic Sea at the End of a Summer Heat Wave, Front. Mar. Sci., 6, 493,, 2019. a

Jacobs, E., Bittig, H. C., Gräwe, U., Graves, C. A., Glockzin, M., Müller, J. D., Schneider, B., and Rehder, G.: Upwelling-induced trace gas dynamics in the Baltic Sea inferred from 8 years of autonomous measurements on a ship of opportunity, Biogeosciences, 18, 2679–2709,, 2021. a

Jolliffe, I. T. and Cadima, J.: Principal component analysis: a review and recent developments, Phil. Trans. R. Soc. A, 374, 20150202,, 2016. a

Kaplan, A., Kushnir, Y., and Cane, M. A.: Reduced Space Optimal Interpolation of Historical Marine Sea Level Pressure: 1854–1992, J. Climate, 13, 2987–3002,<2987:RSOIOH>2.0.CO;2, 2000. a, b, c, d, e, f, g, h, i, j, k

Kuliński, K., Rehder, G., Asmala, E., Bartosova, A., Carstensen, J., Gustafsson, B., Hall, P. O. J., Humborg, C., Jilbert, T., Jürgens, K., Meier, H. E. M., Müller-Karulis, B., Naumann, M., Olesen, J. E., Savchuk, O., Schramm, A., Slomp, C. P., Sofiev, M., Sobek, A., Szymczycha, B., and Undeman, E.: Biogeochemical functioning of the Baltic Sea, Earth Syst. Dynam., 13, 633–685,, 2022. a

Kuznetsov, I. and Neumann, T.: Simulation of carbon dynamics in the Baltic Sea with a 3D model, J. Marine Syst., 111–112, 167–174,, 2013. a

Körtzinger, A., Thomas, H., Schneider, B., Gronau, N., Mintrop, L., and Duinker, J. C.: At-sea intercomparison of two newly designed underway pCO2 systems – Encouraging results, Mar. Chem., 52, 133–145,, 1996. a

Lan, X., Tans, P., and Thoning, K. W.: Trends in globally-averaged CO2 determined from NOAA Global Monitoring Laboratory measurements. Version 2022-11, (last access: 15 November 2022), 2022. a

Landschützer, P., Gruber, N., Bakker, D. C. E., Schuster, U., Nakaoka, S., Payne, M. R., Sasse, T. P., and Zeng, J.: A neural network-based estimate of the seasonal to inter-annual variability of the Atlantic Ocean carbon sink, Biogeosciences, 10, 7793–7815,, 2013. a

Laruelle, G. G., Cai, W.-J., Hu, X., Gruber, N., Mackenzie, F. T., and Regnier, P.: Continental shelves as a variable but increasing global sink for atmospheric carbon dioxide, Nat. Commun., 9, 454,, 2018. a, b, c, d

Lorenz, E. N.: Empirical orthogonal functions and statistical weather prediction, Scientific Report No. 1, Statistical Forecasting Project, Massachusetts Institute of Technology, Department of Meteorology, Cambridge, MT, USA, (last access: 31 January 2024), oCLC: 2293210, 1956. a, b, c, d

Luisetti, T., Ferrini, S., Grilli, G., Jickells, T. D., Kennedy, H., Kröger, S., Lorenzoni, I., Milligan, B., van der Molen, J., Parker, R., Pryce, T., Turner, R. K., and Tyllianakis, E.: Climate action requires new accounting guidance and governance frameworks to manage carbon in shelf seas, Nat. Commun., 11, 4599,, 2020. a

Monahan, A. H., Fyfe, J. C., Ambaum, M. H. P., Stephenson, D. B., and North, G. R.: Empirical Orthogonal Functions: The Medium is the Message, J. Climate, 22, 6501–6514,, 2009. a, b, c

Müller, J. D., Schneider, B., and Rehder, G.: Long-term alkalinity trends in the Baltic Sea and their implications for CO2-induced acidification, Limnol. Oceanogr., 61, 1984–2002,, 2016. a

Neumann, T.: ERGOM 1.2 model hindcast 1948–2020, IOW [data set], (last access: 31 January 2024), 2021. a

Neumann, T., Siegel, H., and Gerth, M.: A new radiation model for Baltic Sea ecosystem modelling, J. Marine Syst., 152, 83–91,, 2015. a

Neumann, T., Radtke, H., Cahill, B., Schmidt, M., and Rehder, G.: Non-Redfieldian carbon model for the Baltic Sea (ERGOM version 1.2) – implementation and budget estimates, Geosci. Model Dev., 15, 8473–8540,, 2022. a, b, c, d

Omstedt, A., Gustafsson, E., and Wesslander, K.: Modelling the uptake and release of carbon dioxide in the Baltic Sea surface water, Cont. Shelf Res., 29, 870–885,, 2009. a

Parard, G., Charantonis, A. A., and Rutgersson, A.: Using satellite data to estimate partial pressure of CO2 in the Baltic Sea, J. Geophys. Res.-Biogeo., 121, 1002–1015,, 2016. a, b, c, d

Parard, G., Rutgersson, A., Raj Parampil, S., and Charantonis, A. A.: The potential of using remote sensing data to estimate air–sea CO2 exchange in the Baltic Sea, Earth Syst. Dynam., 8, 1093–1106,, 2017. a

Pfeil, B., Olsen, A., Bakker, D. C. E., Hankin, S., Koyuk, H., Kozyr, A., Malczyk, J., Manke, A., Metzl, N., Sabine, C. L., Akl, J., Alin, S. R., Bates, N., Bellerby, R. G. J., Borges, A., Boutin, J., Brown, P. J., Cai, W.-J., Chavez, F. P., Chen, A., Cosca, C., Fassbender, A. J., Feely, R. A., González-Dávila, M., Goyet, C., Hales, B., Hardman-Mountford, N., Heinze, C., Hood, M., Hoppema, M., Hunt, C. W., Hydes, D., Ishii, M., Johannessen, T., Jones, S. D., Key, R. M., Körtzinger, A., Landschützer, P., Lauvset, S. K., Lefèvre, N., Lenton, A., Lourantou, A., Merlivat, L., Midorikawa, T., Mintrop, L., Miyazaki, C., Murata, A., Nakadate, A., Nakano, Y., Nakaoka, S., Nojiri, Y., Omar, A. M., Padin, X. A., Park, G.-H., Paterson, K., Perez, F. F., Pierrot, D., Poisson, A., Ríos, A. F., Santana-Casiano, J. M., Salisbury, J., Sarma, V. V. S. S., Schlitzer, R., Schneider, B., Schuster, U., Sieger, R., Skjelvan, I., Steinhoff, T., Suzuki, T., Takahashi, T., Tedesco, K., Telszewski, M., Thomas, H., Tilbrook, B., Tjiputra, J., Vandemark, D., Veness, T., Wanninkhof, R., Watson, A. J., Weiss, R., Wong, C. S., and Yoshikawa-Inoue, H.: A uniform, quality controlled Surface Ocean CO2 Atlas (SOCAT), Earth Syst. Sci. Data, 5, 125–143,, 2013. a

Pierrot, D., Neill, C., Sullivan, K., Castle, R., Wanninkhof, R., Lüger, H., Johannessen, T., Olsen, A., Feely, R. A., and Cosca, C. E.: Recommendations for autonomous underway pCO2 measuring systems and data-reduction routines, Deep-Sea Res. Pt. II, 56, 512–522,, 2009. a, b

Preisendorfer, R. W.: Principal component analysis in meteorology and oceanography, no. 17 in Developments in atmospheric science, Elsevier, Amsterdam, 1988. a, b, c

Rehder, G., Velasco-Sobeck, S., Kreuzer, L., Gutierrez Loza, L., Koziorowska, K., Makuch, P., Jacobs, E., Otto, S., Bittig, H., Lainela, S., Honkanen, M., and Kolbe, M.: Continuous nutrient, trace gas and CO2 measurements during EMB214, May–June 2019, PANGAEA [data set],, 2021. a, b

Rödenbeck, C., Bakker, D. C. E., Gruber, N., Iida, Y., Jacobson, A. R., Jones, S., Landschützer, P., Metzl, N., Nakaoka, S., Olsen, A., Park, G.-H., Peylin, P., Rodgers, K. B., Sasse, T. P., Schuster, U., Shutler, J. D., Valsala, V., Wanninkhof, R., and Zeng, J.: Data-based estimates of the ocean carbon sink variability – first results of the Surface Ocean pCO2 Mapping intercomparison (SOCOM), Biogeosciences, 12, 7251–7278,, 2015.  a

Schneider, B. and Müller, J. D.: Biogeochemical Transformations in the Baltic Sea: Observations Through Carbon Dioxide Glasses, Springer Oceanography, Springer International Publishing,, 2018. a, b, c, d

Schneider, B., Kaitala, S., and Maunula, P.: Identification and quantification of plankton bloom events in the Baltic Sea by continuous pCO2 and chlorophyll a measurements on a cargo ship, J. Marine Syst., 59, 238–248,, 2006. a

Thomas, H., Bozec, Y., Elkalay, K., and de Baar, H. J. W.: Enhanced Open Ocean Storage of CO2 from Shelf Sea Pumping, Science, 304, 1005–1008,, 2004. a

Weare, B. C. and Newell, R. E.: Empirical orthogonal analysis of atlantic ocean surface temperatures, Q. J. Roy. Meteor. Soc., 103, 467–478,, 1977. a

Weare, B. C., Navato, A. R., and Newell, R. E.: Empirical Orthogonal Analysis of Pacific Sea Surface Temperatures, J. Phys. Oceanogr., 6, 671–678,<0671:EOAOPS>2.0.CO;2, 1976. a

Wesslander, K., Omstedt, A., and Schneider, B.: Inter-annual and seasonal variations in the air–sea CO2 balance in the central Baltic Sea and the Kattegat, Cont. Shelf Res., 30, 1511–1521,, 2010. a, b, c, d, e

Short summary
We present a pCO2 climatology of the Baltic Sea using a new approach to extrapolate from individual observations to the entire Baltic Sea. The extrapolation approach uses (a) a model to inform on how data at one location are connected to data at other locations, together with (b) very accurate pCO2 observations from 2003 to 2021 as the base data. The climatology can be used e.g. to assess uptake and release of CO2 or to identify extreme events.
Final-revised paper