OceanTACO: A Multi-Sensor Global Ocean Sea Surface State Dataset

Lehmann, Nils; Aybar, Cesar; Shah, Ando; Passaro, Marcello; Bamber, Jonathan L.; Zhu, Xiao Xiang

doi:10.5194/essd-2026-232

Preprints

https://doi.org/10.5194/essd-2026-232

Preprints

10 Apr 2026

| 10 Apr 2026

Status: this preprint is currently under review for the journal ESSD.

OceanTACO: A Multi-Sensor Global Ocean Sea Surface State Dataset

Nils Lehmann, Cesar Aybar, Ando Shah, Marcello Passaro, Jonathan L. Bamber, and Xiao Xiang Zhu

Abstract. We present OceanTACO, a harmonised global collection of sea surface state datasets designed to support reproducible Earth system research. The collection integrates satellite altimetry, sea surface temperature, salinity, surface winds, reanalysis fields, and Argo in situ observations within a unified cloud-optimised specification based on Transparent Access to Cloud-optimised datasets (TACO). It includes Level-3 observations, Level-4 gap-filled products, and reanalysis outputs while preserving native spatial and temporal resolution. The core dataset spans 29 March 2023 to 1 August 2025, covering the Surface Water and Ocean Topography (SWOT) mission, with an extended record from 1 January 2015 until 29 March 2023 for non-SWOT sources.

Datasets are harmonised through standardised metadata, spatial referencing, and temporal indexing, enabling consistent spatiotemporal queries across sensors and processing levels. A uniform internal structure reduces product-specific preprocessing and allows the same data-access routines to be applied across regions, sensors, and studies. This supports Earth systems analyses workflows such as validation against in situ observations, comparisons between observation and mapped products, observation system experiments, and multivariate sensor analyses.

Example applications demonstrate cross-product collocation with Argo, analysis of sea surface height variability during extreme events, and relationships between surface variables relevant for data-driven reconstruction. OceanTACO improves accessibility to coordinated multi-source analyses while preserving data provenance and native observation characteristics, and can be extended with new missions without restructuring the dataset. The core and extended dataset are available at https://doi.org/10.57967/hf/8171 (Lehmann and Aybar, 2026a) and https://doi.org/10.57967/hf/8172 (Lehmann and Aybar, 2026b) respectively.

Received: 27 Mar 2026 – Discussion started: 10 Apr 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Nils Lehmann, Cesar Aybar, Ando Shah, Marcello Passaro, Jonathan L. Bamber, and Xiao Xiang Zhu

Status: open (until 25 Jul 2026)

Post a comment Subscribe to comment alert

CC1: 'Comment on essd-2026-232', Giuseppe M.R. Manzella, 01 Jun 2026 reply

The paper presents a "data system" composed of data from various sources and therefore subject to potentially different processing methods and/or algorithms. Highlighting these possibilities and providing information on the challenges encountered when using data from various sources is a plus.
The data system can certainly be useful for many purposes, and therefore the paper, while not original, is valuable.
Some information should be added for further useful references. Any data set contains noise and spurious data. Have these been eliminated from the data residing in the various repositories? Was a check performed within the "OceanTACO" system?
A weakness of the article is represented by the usage examples, which are rather superficial and do not provide sufficient detail, as in 3.3 and 3.5. In the case of Observation System Experiments and Mission Impact Studies, how much of an impact might the definition of very large regions (perhaps unable to highlight subregional phenomena) have on (e.g.) impact studies? Even more vague is the paragraph regarding machine learning, which could also be considered data mining. There are many machine learning models, including supervised, unsupervised, semisupervised, and reinforcement learning. Some details on what has been used by the cited authors could be very useful.

Reply

Citation: https://doi.org/10.5194/essd-2026-232-CC1
RC1: 'Comment on essd-2026-232', Anonymous Referee #1, 15 Jun 2026 reply

The manuscript describes a collection of ocean surface fields from observations and one ocean and one atmospheric reanalysis, which already existed on, or were brought to, regular grids close to their original resolution. Apart from the simple interpolation/aggregation, the product compresses the data by using int16 representation together with a standard compression routine. As regridding and lossy compressions (Reichelt et al. 2026) are widely available standard procedures, the main merit of the product is mostly the provision of a unique STAC catalog that unifies all products. This lowers the barrier to data use, particularly for disciplines unfamiliar with ocean and atmosphere data.
The manuscript could be better organized. The most important information about the temporal and spatial resolution of the source data and of the final product is not available for all products. It is also hard to find since listed at diverse places. Also, information about depth for GLORYS-12 and Argo is missing, suggesting that some data include vertical information.
Details:
What is the reason to limit the range to 2015-2025? Most data is available for decades before. Altimeter and GLORYS-12 are from 1993 and Argo from early 2000s. I think the short range limits the applications because the collection will be most useful for statistical analyses.
Section 2.1 Information about temporal and spatial resolution should be provided along with the data information.

L 150-151 It would be useful to know the temporal and spatial resolution in the OceanTACO product for the other data as well.
L158-159 Are the original time and location stamps are preserved or is the data interpolated in any way?
L 163-164 Why are products listed that do not fall within the OceanTACO period?
L189-190 Are the complete profiles provided or only surface data. If only near surface data, which depth?
L221 Not up to 4? For instance, if the region covers a corner where 4 regions meet.
2.3.2/2.3.3 Strange way of numbering the sections. 2.3.2 seems to contain 2.3.3 otherwise it is and empty section. Consequently 2.3.3 should named 2.3.2.1.
Equation 2: My understanding is that the along-track data has a resolution of about 7 km, such that the number of SLA observations per cell is about one. Do these metrics make sense with such low number of samples or is the input data of much higher resolution?
Compression: Is it possible to compare your compression with other methods discussed in (https://doi.org/10.5194/egusphere-2026-60) and maybe evaluate by ClimateBenchPress?
Fig.4 What is shown is obviously a rather trivial consequence of how many significant digits int16 can represent. At least add a word about that.

The scatter plot is not useful since only pretty large errors will be visible there. I could be interested in a relative error normalized by the std of the data. For regions with low variability of a few cm, the error may reach 1% or so.
L324-325 Maybe mention other eddy rich regions, e.g Gulf Stream or in the Southern Ocean as well.
L 332-336 Not so clear what the advantage is here. It seems much harder to do along-track spectra with the gridded fields, since it is not so easy anymore to find the points belonging to one track. Also it seems that the analysis of along-track SSH and SWOT is performed over different points. I guess the whole regions shown in the Figure were considered. That would explain the different PSD levels for the longer wavelengths where the difference in resolution should not be an issue. Seems this example is more a demonstration of pitfalls if analysis becomes to easy, people will not care about important details anymore.
Table A1 It would be good to add the temporal resolution and for GLORYS-12 that this is only surface data.
Table B1 It would be useful to report the error as a noise-to-signal error to provide a better idea if the resulting product is still useful. Or for which applications it may still be usable. For instance, the salinity signal is often less than 0.1. The RMSE value suggests that in some regions the error reaches 1% leaving only 2 digits significant. For SSS better use g/Kg, if this is meant.

Reply

Citation: https://doi.org/10.5194/essd-2026-232-RC1

Nils Lehmann, Cesar Aybar, Ando Shah, Marcello Passaro, Jonathan L. Bamber, and Xiao Xiang Zhu

Data sets

OceanTACO Core Dataset Nils Lehmann https://doi.org/10.57967/hf/8171

Model code and software

Data Generation Code Nils Lehmann and Cesar Aybar https://github.com/nilsleh/oceanTACO

Interactive computing environment

ReadTheDocs Documentation Page Nils Lehmann https://oceantaco.readthedocs.io/en/latest/index.html

Nils Lehmann, Cesar Aybar, Ando Shah, Marcello Passaro, Jonathan L. Bamber, and Xiao Xiang Zhu

Viewed

Total article views: 574 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
397	149	28	574	24	25

HTML: 397
PDF: 149
XML: 28
Total: 574
BibTeX: 24
EndNote: 25

Views and downloads (calculated since 10 Apr 2026)

Month	HTML	PDF	XML	Total
Apr 2026	247	98	19	364
May 2026	136	41	4	181
Jun 2026	14	10	5	29

Cumulative views and downloads (calculated since 10 Apr 2026)

Month	HTML	PDF	XML	Total
Apr 2026	247	98	19	364
May 2026	136	41	4	181
Jun 2026	14	10	5	29

Viewed (geographical distribution)

Total article views: 571 (including HTML, PDF, and XML) Thereof 571 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 25 Jun 2026

Short summary

We created a global ocean dataset that brings together satellite measurements, model outputs, and observations into one consistent system. We did this to reduce the time and effort needed to combine different data sources, to improve reproducibility, and enable new analyses. The result makes it easier to study ocean changes, compare methods, and support better understanding of climate processes and extreme events.


Total:	0
HTML:	0
PDF:	0
XML:	0