Ocean surface

The ocean plays a major role in the global carbon cycle and has a controlling function on the atmospheric CO

For the Baltic Sea, several attempts have been made to quantify the net CO

Climatologies of

A robust climatology and trend for the Baltic Sea has, apart from refining the estimate of the net CO

In this work, we build a foundation for such applications. We first present a novel extrapolation approach, followed by construction of a Baltic Sea

For mapping from scarce observational data to spatially filled maps of the Baltic Sea, we use an ensemble of truncated empirical orthogonal function (EOF) reconstructions. For a more detailed description than the brief summary below, please consult Appendix

EOF decomposition and singular value decomposition (SVD) have been used widely in atmospheric and ocean science

In a second step, observational

However, the choice of how many EOF modes to use for a given truncated EOF reconstruction, or at which level to truncate, is an arbitrary one often inspired by a certain threshold of the total variance explained

In a third step, we therefore use an ensemble approach to circumvent this problem: instead of a single EOF reconstruction with a fixed number of modes

Note that the weights are spatially resolved and depend on the available data constraints, i.e. the ensemble weights provide for locality, which includes adaptation of the mapping's spatial scales to the data constraint density, thus providing for a more robust extrapolation and more realistic uncertainty estimates than with a fixed number

We thus obtain an ensemble mean

Finally, due to the high temporal dynamics of our Baltic Sea environment, we use an expansion of the EOF reconstruction approach to reconstruct not only the data value, but both the data value and a (short-term) linear trend, in order to temporally collate (temporally extended) observations into a time-coherent, synoptic picture

Approach to build a regional

Given limitations of modelled data, we aimed to produce an observation-based

Spatial patterns of variability

The model data variability characteristics are illustrated in the Appendix (Fig.

ERGOM has been shown to adequately mirror observations of the large-scale nitrate, phosphate, oxygen, and carbon distribution

From an ERGOM version 1.2 model run from 1948 to 2020, we used the last 20 years of modelled surface

From these model data

Surface

Here, Baltic Sea

For every month

the reconstructed

the

the (short-term)

an error estimate of the

the average number of patterns

Monthly fields

The construction of a monthly

For a given location

Equation (

When done for each point

The dataset

The mapping approach gives fully filled fields on the entire spatial domain from scattered observational data. The mapped

Addition of a linear temporal trend to collate observations to a common time

However, for about half the grid points of the 189 monthly mappings, the magnitude of the short-term temporal

To assess the quality of the obtained fields, we consider (a) the mapped result against the original observations and (b) a comparison of mappings from concurrent subsets of observations.

For the first aspect, we consider the residual

Histograms for

The comparison shows highly correlated

Subsetting evaluations for May 2019 observations using 1

The second aspect, how data constraints in one area transfer to accurate predictions in another part of the domain, is more difficult to assess objectively. It requires the observational data to be split into a subset that is used for mapping and a subset that is deemed independent and used for evaluation. How this subsetting is done, however, directly influences the outcome of the evaluation. That is, while such evaluations may seem instructive and generalizable, they are highly dependent on design and thus carry some degree of arbitrariness.

To illustrate this, we consider a series of monthly mappings for May 2019, the month with the highest density of observations in our 189 monthly mappings. We use chessboard-like grids with alternating white and black boxes for subsetting, where observations in the white boxes are used for mapping and observations in the black boxes are used for evaluation. Grid box sizes vary from 1

For 1

More concerning, however, are increasing differences between the eight realizations with increasing spatial scales. Especially for the 3

We conclude that, with a given choice of subsetting, one implicitly chooses which statistical values one wants to get out. While our evaluation could be described as being well designed, it turns out to be starting-point-dependent. If a month with less data coverage were chosen, differences between realizations (i.e. which data end up in the training or validation subset) would be even larger. Therefore, based on the above illustration, any subsetting-based evaluation needs to be taken with a sufficient amount of caution, as the subsetting design will imprint itself on the outcome.

Smoothed histograms for the five output fields of the climatology

The mean seasonal

Regionally, the western basins lead the mean seasonal cycle, while the central Gotland Basin and the Gulf of Finland trail behind. The productive season is even shorter in the northern basins, with the major

The

In general, the short-term

However, in 8 out of the 12 months, the majority of estimated trends are insignificant. That is, more than 70 % are smaller than their error estimate, indicating that inclusion of a short-term trend in the mapping may not be required for these months. Only in March and April as well as in August and September are the majority of

Like for the

The average number of patterns

For the climatology

There is no seasonal imprint on the average number of patterns

The mean climatology

Long-term trend

The extrapolation approach uses an EOF analysis of the data covariance

The use of EOF modes as a basis for extrapolation ensures that the extrapolated map covers the full spatial domain, that it is gap-free, and that it is discontinuity-free.

A key aspect of truncated EOF reconstructions is the number

A reconstruction with a small number of modes provides for a more uniform, large-scale homogeneous field, where gaps in the data are filled by the large-scale picture. However, such reconstructions may lack the flexibility to reproduce real features of the observations, e.g. through too strong smoothing. Conversely, a large number of modes provides for a fine, small-scale field with high flexibility. However, features in some areas without nearby observations may be badly constrained with the risk of “ghost” signals.

By using an ensemble of reconstructions that cover the entire range of

The cost function of the mapping approach minimizes the residual between observations and mapped data (Eq.

The introduction of a temporal trend at each location (Sect.

For a given mapping, a strong or weak trend indicates where temporal dynamics are high or low and informs on where the frequency of observations should be enhanced or not, respectively.

For a time series of mappings, information on both the value and trend allows for a more accurate interpolation by a cubic Hermite spline (Appendix

Difference between monthly point-by-point interpolation (green) or interpolation with a trend (orange) in a dynamic coastal system. Example data are from the northern Baltic proper (approximately 58

For our mapping of surface

The dynamics of the

The mapping error estimates are elevated where observations are scarce or dynamics are high to start with. For our

We do not observe a seasonality in the number of patterns

Mean monthly

SOCAT observations have been used previously to build surface

Our mapping approach belongs to the first category. For a given month or time window, we use the available

Most previous

As a drawback, mapped

Summary of long-term

The long-term trend of surface

Our analysis covers the more recent period 2003–2021 and gives significant trends of

Together with the literature, our results seem to indicate a reduction in overall surface

Surface

In this work, we developed an extrapolation approach that combines two worlds: models, specifically the distribution and connectivity that exist in model data variability; and observations in that they provide constraints of the real-world picture.

The most notable features of the approach are that it does not tend to give extreme, out-of-range values even with few data constraints and that it provides local error estimates, which reflect both underlying variability, e.g. coast–basin gradients, and observational data constraints. We consider of particular merit the fact that the extrapolation scheme adapts its spatial scales to the number of observations in a certain area, leading to a sound representation of less uncertainty where more data are available.

Used together with high-quality surface

Finally, our extrapolation approach as well as the method to establish a climatology are not limited to

To represent a spatial dataset at a given time

Based on a SVD of

For practical purposes, reconstruction often uses only the first

This split can also be interpreted as decomposition into a “signal” part and a “noise” part. Equations (

To represent a set of

The eigenvalue reconstruction in truncated form (Eq.

With Eq. (

Both “observational error” and “representational error” impact the determination of the eigenvector amplitudes

The error covariance matrix

By addition of constraints on the cost function

Note that the calculation of

For our purposes, we assume that

A critical aspect before any extrapolation from observations is how many modes

Here, we apply the DINEOF (Data Interpolating Empirical Orthogonal Functions) variant of SVD or EOF decomposition of the dataset

Depending on the spatial distribution or clustering of observations

Model data

Same as Fig.

.

Monthly climatology

Same as Fig.

We therefore use an ensemble approach over the truncated reconstructions, where we vary

For

The reconstructed data vector

The term

The mapping variance

The approximation of the truncated variance

The number of modes

So far, we have considered the observations

To collate temporally extended observations into a common synoptic reconstruction without artefacts,

To this end,

the time difference

The sampling operator

The

The eigenvalue amplitude vector

With a

HCB and GR conceived the study. The method was developed by HCB with important input by EJ and GR. TN performed the model simulations and HCB the analysis. HCB led the manuscript writing with contributions by all the co-authors.

The contact author has declared that none of the authors has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.

Efforts by Anna Willstrand-Wranne (SMHI) and Tobias Steinhoff (GEOMAR/NORCE) around SOOP

This work was funded by the projects C-SCOPE (grant no. 03F0877D) and BONUS INTEGRAL (grant no. 03F0773A), which received funding from BONUS (Art. 185), funded jointly by the EU, the German Federal Ministry of Education and Research, the Swedish Research Council Formas, the Academy of Finland, the Polish National Centre for Research and Development, and the Estonian Research Council. Computational power was provided by the North German Supercomputing Alliance (HLRN). Measurements on SOOP

This paper was edited by Giuseppe M. R. Manzella and reviewed by two anonymous referees.