the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A global eddy-collocated temperature and salinity profile dataset (v1.0): integrating multiplatform in situ observations with satellite-detected mesoscale eddies
Abstract. Mesoscale eddies are a fundamental component of ocean circulation and play a crucial role in shaping the three-dimensional distribution of ocean temperature and salinity. However, observational constraints have long limited systematic, global-scale quantification of eddy-induced thermohaline variability. Here, we present a global eddy-collocated historical temperature and salinity profile dataset spanning 29 years (1993–2021), constructed by integrating in situ hydrographic profile observations with satellite-derived mesoscale eddy tracking products. The dataset contains 2.35 million quality-controlled temperature-salinity profiles, each collocated with the nearest mesoscale eddy on the sampling day that may have influenced the observed water column. The profiles provide broad global coverage, with most 2°×2° grid boxes containing more than 150 observations, enabling statistically robust analyses from regional to global scales. Validation against well-documented regional eddy signatures shows that the dataset consistently reproduces well-established eddy-induced temperature and salinity anomaly structures across diverse ocean regions. Example applications demonstrate the dataset’s capability to investigate the spatial heterogeneity and vertical extent of eddy-induced thermohaline anomalies, eddy impacts on mixed-layer depth and stratification, eddy contributions to subsurface extreme temperature events, and eddy-driven heat and material transports. This dataset provides a comprehensive observational foundation for advancing quantitative assessments of mesoscale eddy impacts on regional to global ocean physical environment, heat budgets, and climate change.
- Preprint
(3413 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on essd-2026-89', Anonymous Referee #1, 09 Apr 2026
-
RC2: 'Comment on essd-2026-89', Anonymous Referee #2, 15 Apr 2026
Below are my comments on the eddy-collocated temperature and salinity profile dataset by He et al.
This manuscript presents a global eddy-collocated T–S profile archive that links 2.35 million quality-controlled WOD profiles to satellite-detected mesoscale eddies over the altimetry era. The scale is impressive, with 2.35 million quality-controlled profiles spanning 1993–2021, broad spatial coverage, and a standardized radius-normalized eddy coordinate system. The manuscript is generally well structured, the motivation is clear. Generally, I think this manuscript is worthwhile to publish, but there are several major concerns must be addressed before the publication.
Major Comments
1. Independent dataset cross-validation: One of the major limitations of the manuscript is the complete absence of any reference to, or comparison with, two datasets that are directly analogous in scope and purpose to the one proposed here.
First, Ioannou et al. (2024) presents a global TOEddies atlas explicitly integrated with nearly 3 million collocated Argo float profiles spanning 2000–2023, with the express aim of providing "a novel examination of eddy-induced subsurface variability and the role of mesoscale eddies in the transport of global ocean heat and biogeochemical properties", language that is nearly identical to the framing of the present manuscript. The accompanying open-access dataset (Laxenaire et al., 2024; SEANOE, https://www.seanoe.org/data/00917/102877/) is publicly available. Second, Simoes-Sousa et al. (2026, Earth System Science Data, https://doi.org/10.5194/essd-18-1089-2026) also presents another global ocean profile and altimetry-derived eddy colocation product published in the same journal. Neither work is cited anywhere in the manuscript.
These omissions are not peripheral. Any reader of an ESSD paper will immediately ask: how does this dataset differ from the TOEddies + Argo product, and from Simoes-Sousa et al. 2026? Which is preferable for which applications? What does the added value of the proposed dataset? The authors should directly and substantively engage with these prior works. At a minimum, this requires (a) citing and describing both datasets in the Introduction, (b) articulating clearly what the proposed product offers that the others do not, and (c) providing a quantitative cross-dataset comparison in at least one or two representative regions. Without this discussion, the novelty claim of the manuscript is substantially undermined.
2. Validation is largely qualitative and insufficient: The current validation strategy relies primarily on reproducing mean eddy structures that are already established in the literature (Figs. 4, 7, and 8) and on spatial percentage maps (Figs. 9–12). While demonstrating consistency with prior results is a useful first step, it does not by itself persuasively establish the accuracy, uncertainty, or false-match rate of the proposed dataset. I would recommend two or three of the following additions:
(1) Can add quantitative metrics beyond visual pattern comparison. In selected representative regions, the authors should directly compare anomaly amplitudes, core depths, radial structure widths, and sign-consistency rates between their product and those of independently published composites. Particular attention should be paid to energetic regions, the Gulf Stream, Kuroshio Extension, ACC, and the Agulhas leakage region, where eddies cluster densely and the nearest-eddy assumption is most susceptible to failure (see Major Comment 6 below).
(2) Can add time-series validation. Showing that the seasonal and interannual variability of eddy thermohaline anomalies recovered from the dataset is consistent with independent observations in selected regions would substantially strengthen confidence in the product.
(3) Carry out cross-product validation against the TOEddies + Argo dataset (Ioannou et al., 2024; Laxenaire et al., 2024) and/or the Simoes-Sousa et al. (2026) product. Differences in anomaly amplitude, radial structure, and regional data density would reveal where the methodological choices (eddy product, colocation rule, QC criteria) lead to materially different results, which is itself valuable scientific information.
3. The Agulhas retroflection and the Cape Basin are one of the hotspots of global eddy activity (Schubert et al., 2021) and have been studied in detail using eddy-profile colocation methods by Laxenaire et al. (2019, 2020). Given that it is precisely this region where the limitations of the chosen eddy product and colocation methodology are most acute (see Major Comment 6 below), It would therefore be valuable for the authors to include more discussion of this region (e.g., Figs 7-8). This region deserves a dedicated attention. For example, can reference or compare with the individual-eddy reconstructions of Laxenaire et al. (2019, 2020)
4. The quality control processes of the original WOD profile: In Sections 2.2 and 2.3, the dataset is derived using WOD quality flags.
First, the ambiguity in the flag specification must be resolved. In Line 134, the author said that ‘we extracted temperature and salinity profiles with quality control flags marked as ‘0’ (accepted) during the period of satellite’. Actually, WOD provides several types of quality control flags (Salinity_WODflag, Salinity_WODprofileflag, Depth_WODflag etc.), and different types of flags have different performance to identify outliers (an example can be found in the Fig. 9a of Tan et al., 2025). The manuscript does not specify which flags were applied or how they were combined. This must be clarified.
Second, the WOD quality-control flags may be too weak to identify outliers. This issue has been discussed in detail by Tan et al. (2023, Figs. 14–15), Tan et al. (2025, Fig. 14), and Good et al. (2023, Fig. 2). These serial studies suggest that, even after quality control based only on WOD flags, the temperature and salinity data may still contain a substantial number of non-negligible outliers, especially in observations in the pre-Argo era. Visual evidence of residual outliers appears to be present in the author’s manuscript: the vertical sections in Figs. 8a, 8e, and 8f display suspicious spikes or 'dirty points' at approximately 700 m, 1000 m, and 500 m depth, respectively, and Fig. 10c shows an anomalously large spike in the orange line near 35°N that is absent from the corresponding blue line at the same latitude. Such outliers can strongly bias mean and standard deviation estimates (as shown in Fig. 8 of Tan et al. (2025) and Supplementary Fig. S1 of Zhang et al. (2024)) and propagate errors through vertical interpolation into the final eddy anomaly composites. Another example is that when I tried to validate the author’s netCDF dataset, if I calculate the salinity standard deviation in each grid box at some selected standard depth levels, I can find that there are many ‘suspicious spikes’ or ‘discontinuous blood-red spots’ in the open seas (see my attachment), and this is very likely due to the QC performance. Moreover, if the authors try to calculate the standard deviation map of the Fig. 7 and Fig. 8, it is likely that the these ‘discontinuous blood-red spots’ occurs.
Anyway, quality control is non-negligible and is one of the largest uncertainty sources in the T(OHC)/S estimate (Boyer et al., 2016), especially to resolve the mesoscale process (Tan et al., 2022 states ‘Although the large-scale pattern is similar on a global-basin scale, its meso-micro scale features are visibly different’; Yuan et al., 2026 states that ‘An investigation indicates that the WOD local climatological range in its QC check, which is constructed by all historical data, mainly represents the historical ocean conditions and thus removes more positive but realistic positive temperature anomalies in the eddy-rich regions (Boundary Currents and Antarctic Circumpolar Currents regions) than CODC’). Therefore, the impact of QC on the final dataset should be taken carefully.
One possible solution is that the author could use the CODC (CAS-Oceanographic Data Center, Global Ocean Science Database) quality-controlled and bias-corrected in situ ocean temperature and salinity observation dataset (http://www.ocean.iap.ac.cn/ftp/cheng/CODCv2_Insitu_T_S_database/) either as a primary data source or as a cross-check. The main data source of the CODC is also the WOD, but it provides the quality control flag that can remove the outliers as much as possible (more than the WOD-QC) with minimizing the possibility of mistakenly flagging good data (more details could be found in Zhang et al., 2024, and Tan et al., 2025).
In addition, the author should remove profiles on the Argo grey list (WOD doesn’t remove them in their quality control scheme), which may contain significant salinity drift. I didn’t find any information about whether the author removed the grey list or not in the manuscript. If the author hasn’t removed it yet, please remove it.
5. Systematic instrument biases: Figure 5c shows that XBT, bottle (OSD), and APB data are included in the dataset, although as a small fraction of the total. Each of these instrument types is known to have systematic instrumental biases in the WOD archive: XBT depth and temperature errors have been documented and corrected in many studies (e.g., Cheng et al., 2014); bottle–CTD temperature inconsistencies have been quantified by Gouretski et al. (2022); and APB temperature biases, which are especially relevant for Southern Ocean coverage, have been characterised by Gouretski et al. 2024. These biases are not negligible when the goal is to quantify mesoscale eddy thermohaline anomalies at the level of tenths of a degree Celsius or hundredths of a practical salinity unit.
The authors should either apply the published bias corrections for each instrument type, or demonstrate quantitatively that including these instruments does not materially affect the eddy anomaly estimates in the relevant regions and time periods. For example, the CODC quality-controlled and bias-corrected in situ ocean temperature and salinity dataset mentioned in my previous point also includes temperature profiles that had been bias-corrected. It would be valuable if the authors could consider using bias-corrected data in the proposed dataset, or at least assess whether excluding XBT, OSD, and APB data would have a disproportionate effect on certain regions or time periods.
6. The choice of eddy detection product and the nearest-eddy colocation assumption: The manuscript uses the META3.2 product for eddy detection and assigns each profile to the nearest eddy on its sampling day. Both choices carry limitations that deserve explicit discussion:
(1) META3.2 versus more sophisticated eddy atlases. Laxenaire et al. (2018) developed the TOEddies algorithm and validated it systematically against an independent dataset of upper-ocean eddies identified from surface drifters. Their results show that TOEddies correctly identifies approximately 10-15% more validated eddies than META, with lower polarity-mismatch rates, particularly for structures in the 25–60 km radius range. Critically, META3.2 does not detect eddy merging and splitting events. Laxenaire et al. (2018) and Ioannou et al. (2024) demonstrate that these events are abundant, concentrated in the most energetic regions, such as the Cape Basin, western boundary currents, and the ACC, and affect approximately 3% of all detected eddies. When a splitting event occurs while a profile is sampled, META will assign the profile to one fragment while the hydrographic anomaly may be centred in another. The manuscript neither acknowledges this source of contamination nor discusses the choice of META over alternatives.(2) Subsurface-intensified eddies. Laxenaire et al. (2019) demonstrate through a Lagrangian reconstruction of a single Agulhas ring that eddies can transition from surface-intensified to subsurface-intensified structures as they propagate, retaining large density and temperature anomalies at depth (200–1200 m) while their sea-surface height signature diminishes. Laxenaire et al. (2020) confirm statistically that the majority of Agulhas rings in the South Atlantic are subsurface-intensified. For these eddies, the eddy centre and radius inferred from altimetry will not accurately reflect the location of the thermohaline anomaly core, meaning that radius-normalised distances assigned to the colocated profiles are systematically in error. This limitation may apply not only in the Agulhas region but also to any subsurface-intensified eddy family.
(3) The nearest-eddy assumption in energetic regions. In regions where eddies cluster densely, including the Gulf Stream, Kuroshio Extension, ACC, and Cape Basin, many profiles will be located near multiple eddies simultaneously, and the nearest eddy in terms of centre-to-centre distance may not be the one whose dynamics dominate the observed hydrography. The manuscript acknowledges this issue briefly (lines 155–161) but does not quantify its magnitude. A useful diagnostic would be to compute, as a function of region, the fraction of profiles for which the nearest eddy is also the eddy within whose boundary (outer contour) the profile falls, information that is directly available from the META3.2 eddy contour data.
Minor Comments1. Dataset temporal coverage: The abstract and methods emphasize ‘1993–2021’, whereas the conclusions state ‘1993–2022’. This must be reconciled throughout.
2. Table 1 should include units for each variable.
3. Section 2.3: How did the author handle the large vertical gap below 200m? Are there any criteria for these intervals below 200m? Maybe the author can also refer to the methods used in Gouretski 2018 (h = 20 + 0.24⋅z, where z is the mean distance between the two adjacent levels in meters).
4. Figure 5a and Figure 5b: add the upper triangle to the colorbar to indicate ‘more than’ 400 profiles.
5. Figure 7a: The grey central band visible in the South China Sea cyclonic eddy panel of Fig. 7a is unexplained. Please add the explanation.
6. Figure 11 and Section 4.2: how did the author define the MLD? Please add the details about the definition. For example, the density threshold, temperature threshold, or hybrid algorithm used (and the reference depth).
7. For Figures 9–12, it would be great to overlay or provide sample count maps/contour lines, because visual interpretation of low-data regions is otherwise difficult.
8. Laxenaire et al., 2020 shows that composite methods systematically underestimate peak heat content anomalies at the eddy centre relative to individual reconstructions, because eddies of different sizes and intensities are pooled in a common normalised coordinate system. The manuscript should acknowledge this limitation and note that the mean anomaly fields it provides do not capture the full range of eddy variability.
9. Some typos should be taken care of. For example:
Line 90: “form the WOD” should be “from the WOD.”
Line 193: “the choose of profiles” should be “the choice of profiles.”
Line 251: “random sampled” should be “randomly sampled.”
Line 453: “to what extend” should be “to what extent.”
Line 471: “Zendo” should be “Zenodo” in the data availability section
References
Boyer, T., and Coauthors, 2016: Sensitivity of global upper-ocean heat content estimates to mapping methods, XBT bias corrections, and baseline climatologies. J. Climate, 29, 4817–4842.
Cheng, L., J. Zhu, R. Cowley, T. P. Boyer, and S. Wijffels, 2014: Time, probe type and temperature variable bias corrections on historical expendable bathythermograph observations. J. Atmos. Oceanic Technol., 31, 1793–1825.
Good, S., and Coauthors, 2023: Benchmarking of automatic quality control checks for ocean temperature profiles and recommendations for optimal sets. Front. Mar. Sci., 9, 1075510.
Gouretski, V., 2018: World Ocean Circulation Experiment–Argo global hydrographic climatology. Ocean Sci., 14, 1127–1146.
Gouretski, V., Cheng, L., and Boyer, T., 2022: On the consistency of the bottle and CTD profile data. J. Atmos. Oceanic Technol., 39, 1869–1887.
Gouretski, V., Roquet, F., and Cheng, L., 2024: Measurement biases in ocean temperature profiles from marine mammal dataloggers. J. Atmos. Oceanic Technol., 41, 629–645.
Ioannou, A., Guez, L., Laxenaire, R., and Speich, S., 2024: Global assessment of mesoscale eddies with TOEddies: comparison between multiple datasets and colocation with in situ measurements. Remote Sensing, 16, 4336.
Laxenaire, R., Speich, S., Blanke, B., Chaigneau, A., Pegliasco, C., and Stegner, A., 2018: Anticyclonic eddies connecting the western boundaries of Indian and Atlantic Oceans. J. Geophys. Res. Oceans, 123, 7651–7677.
Laxenaire, R., Speich, S., and Stegner, A., 2019: Evolution of the thermohaline structure of one Agulhas Ring reconstructed from satellite altimetry and Argo floats. J. Geophys. Res. Oceans, 124, 8969–9003.
Laxenaire, R., Speich, S., and Stegner, A., 2020: Agulhas ring heat content and transport in the South Atlantic estimated by combining satellite altimetry and Argo profiling floats data. J. Geophys. Res. Oceans, 125, e2019JC015511.
Laxenaire, R., Guez, L., Chaigneau, A., Isic, M., Ioannou, A., and Speich, S., 2024: TOEddies Global Mesoscale Eddy Atlas Colocated with Argo Float Profiles. SEANOE. https://www.seanoe.org/data/00917/102877/
Pegliasco, C., Delepoulle, A., Mason, E., Morrow, R., Faugère, Y., and Dibarboure, G., 2022: META3.1exp: a new global mesoscale eddy trajectory atlas derived from altimetry. Earth Syst. Sci. Data, 14, 1087–1107.
Schubert, R., and Coauthors, 2021: Evidence of eddy-related deep-ocean current variability in the northeast Atlantic Ocean induced by the Gulf Stream and the Brazilian Current / North Brazil Current systems. Ocean Sci., 17, 1–25.
Simoes-Sousa, I., and Coauthors, 2026: Integrating global ocean profiles and altimetry-derived eddies. Earth Syst. Sci. Data, 18, 1089–1101.
Tan, Z., Zhang, B., Wu, X., Dong, M., and Cheng, L., 2022: Quality control for ocean observations: from present to future. Sci. China Earth Sci., 65, 215–233.
Tan, Z., and Coauthors, 2023: A new automatic quality control system for ocean profile observations and impact on ocean warming estimate. Deep-Sea Res. I, 194, 103961.
Tan, Z., and Coauthors, 2025: CODC-S: a quality-controlled global ocean salinity profiles dataset. Sci. Data, 12, 917.
Zhang, B., and Coauthors, 2024: CODC-v1: a quality-controlled and bias-corrected ocean temperature profile database from 1940–2023. Sci. Data, 11, 666.
Yuan, H., Cheng, L., Pan, Y., Zhang, B., Meyssignac, B., Trenberth, K., Zhu, Y., Song, X., Zheng, H., Bao, S. and Du, J., 2026. Six-fold reduction in ocean heat content estimate uncertainty since 1960. (preprint at https://doi.org/10.21203/rs.3.rs-9248956/v1) -
RC3: 'Comment on essd-2026-89', Anonymous Referee #3, 16 Apr 2026
This dataset provides in situ temperature and salinity observations in the new coordinates of mesoscale eddies. The concept of the eddy coordinate itself is not significantly novel, but I think this global product is quite useful not only for physical oceanographers but also for biological and chemical oceanographers. I basically agree with accepting this manuscript, but I would like to add minor comments that could help to further improve its usefulness.
The present dataset is directly applicable if the eddies are assumed to move with frozen structures. However, it would not be straightforward if temporal variations of the structure were considered. Even with the same eddy ID, the structure of the eddy would vary temporally, partly due to the vorticity dissipation, especially for eddies that last more than a few years. Therefore, I think it would be useful to include an index like “eddy age” (the period after its generation), which can be extracted from the original eddy database based on satellite altimetry data.
Another index I believe useful is “vertical displacement." Some eddies behave like waves, so the vertical displacement of the thermocline propagates westward rather than holding the water mass. The eddy's movement keeps the temperature and salinity anomaly when the specific water mass moves with it. Meanwhile, when the vertical displacement propagates, the strength of the temperature and salinity anomaly would depend on the background structure. Therefore, I think it would be useful to estimate the vertical displacement using the background structure, in addition to the anomalies.
Including those indexes would significantly increase the advantage of this database.
Citation: https://doi.org/10.5194/essd-2026-89-RC3
Data sets
Codes and data for "A global eddy-collocated temperature and salinity profile dataset (v1.0): integrating multiplatform in situ observations with satellite-detected mesoscale eddies" Qingyou He https://doi.org/10.5281/zenodo.18590979
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 212 | 106 | 19 | 337 | 28 | 30 |
- HTML: 212
- PDF: 106
- XML: 19
- Total: 337
- BibTeX: 28
- EndNote: 30
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
In this work by He and coauthors a dataset providing collocated temperature and salinity data with satellite-derived eddy information. The paper would require major changes to improve its clarity and refine the scope. I have three major concerns as described below, as well as a number of minor suggestions.
The novelty of this contribution is not clearly communicated, as the processing of the input data (WOD and AVISO) appears to be relatively basic (essentially interpolation). Beyond easier access to existing resources, the value added by the authors should be made explicit.
Discussion on the effect of trends (e.g., salinity and temperature, eddy, other datasets used) is not provided adequately (global warming mentioned once in sect. 2.2), but may be very relevant. Please expand the discussion and provide context also for the possible influence of observational changes through time.
The level of detail and support of several pieces of text is insufficient; for example, the discussion of temperature extremes is missing a description of the methodology (and several key parameters, such as the baseline used to define such extremes), and the application section is very scarce and limited mostly to citation of works by the paper authors. The authors should clarify that the dataset is not gridded (even though a 2x2 grid is mentioned in various places).
Minor suggestions:
36 why heat budgets?
52 why freshwater?
62 add reference
86 this reads like a straw man argument
89 collocation may be working at the surface, but how deep this would hold?
90 text in brackets is puzzling
103 fix typo in flowchart
128 I am confused by the reference on Captain Cook; please clearly list which data sources (Argo, moorings,...) are used in this dataset. Profiles are only possible with Argo, right?
134 while eddies are daily, are the WOD data provided with the same resolution? This means that profiles for a period shorted than a day are aggregated?
137 the QC applied by WOD should be outlined
143 well then the grid is not uniform in general, only by layer. How do you define this layers, anyway?
149 what is the maximum distance allowed?
163 explain in the text how d, D, and R are defined
167 I don't understand this. If profiles are extracted only in the vicinity of eddies, wouldn't this induce a bias in the selection, which would not be random?
173 this point is confusing. Being a 30-year period, trends may be relevant. How are these accounted for in your method? And aren't somehow eddy signatures visible in areas where those are more persistent? Also, how good is this product closer to the coasts?
186 "ambientes"?
Fig 4a typo
Table 1 is missing units
Fig. 5 I don't understand if/how (a) and (b) should differ. Can also "glider" and "others" provide profiles? It would be useful to report data density by depth
245 repetition, rephrase
250 I am not following your reasoning. Wouldn't eddy trap lagrangian sensors?
Fig. 6 why is the resolution different between left and right maps? What are Gamma and P?
204 please explain how interpolation is made
271 this is not clear
296 this should be shown with your dataset, as it should be possible
344 no references in these and close sections, only self-citation?
352 as in 296, this remains speculative and should be verified
386 was this defined already?
408 include references and key details, e.g. the climatology used
416 daily or coarser climatology?
Fig. 12 is the spatial resolution still 2 deg in these plots?
436 this section seems too sketched and unsupported
472 a single 10GB file is not ideal; why not splitting by basin, for example? The file is also lacking metadata (reference, data sources) and units. Why is "contour" and "char" set to 20?
475 the specific link should be provided