OceanTACO: A Multi-Sensor Global Ocean Sea Surface State Dataset
Abstract. We present OceanTACO, a harmonised global collection of sea surface state datasets designed to support reproducible Earth system research. The collection integrates satellite altimetry, sea surface temperature, salinity, surface winds, reanalysis fields, and Argo in situ observations within a unified cloud-optimised specification based on Transparent Access to Cloud-optimised datasets (TACO). It includes Level-3 observations, Level-4 gap-filled products, and reanalysis outputs while preserving native spatial and temporal resolution. The core dataset spans 29 March 2023 to 1 August 2025, covering the Surface Water and Ocean Topography (SWOT) mission, with an extended record from 1 January 2015 until 29 March 2023 for non-SWOT sources.
Datasets are harmonised through standardised metadata, spatial referencing, and temporal indexing, enabling consistent spatiotemporal queries across sensors and processing levels. A uniform internal structure reduces product-specific preprocessing and allows the same data-access routines to be applied across regions, sensors, and studies. This supports Earth systems analyses workflows such as validation against in situ observations, comparisons between observation and mapped products, observation system experiments, and multivariate sensor analyses.
Example applications demonstrate cross-product collocation with Argo, analysis of sea surface height variability during extreme events, and relationships between surface variables relevant for data-driven reconstruction. OceanTACO improves accessibility to coordinated multi-source analyses while preserving data provenance and native observation characteristics, and can be extended with new missions without restructuring the dataset. The core and extended dataset are available at https://doi.org/10.57967/hf/8171 (Lehmann and Aybar, 2026a) and https://doi.org/10.57967/hf/8172 (Lehmann and Aybar, 2026b) respectively.
The paper presents a "data system" composed of data from various sources and therefore subject to potentially different processing methods and/or algorithms. Highlighting these possibilities and providing information on the challenges encountered when using data from various sources is a plus.
The data system can certainly be useful for many purposes, and therefore the paper, while not original, is valuable.
Some information should be added for further useful references. Any data set contains noise and spurious data. Have these been eliminated from the data residing in the various repositories? Was a check performed within the "OceanTACO" system?
A weakness of the article is represented by the usage examples, which are rather superficial and do not provide sufficient detail, as in 3.3 and 3.5. In the case of Observation System Experiments and Mission Impact Studies, how much of an impact might the definition of very large regions (perhaps unable to highlight subregional phenomena) have on (e.g.) impact studies? Even more vague is the paragraph regarding machine learning, which could also be considered data mining. There are many machine learning models, including supervised, unsupervised, semisupervised, and reinforcement learning. Some details on what has been used by the cited authors could be very useful.