the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Global open-ocean daily turbulent heat flux dataset (1992–2020) from SSM/I via deep learning
Abstract. Air–sea turbulent heat fluxes – latent heat flux (LHF) and sensible heat flux (SHF) – are fundamental to the Earth’s energy and moisture budgets and to ocean–atmosphere coupling. Global flux estimates via bulk aerodynamic algorithms depend on sea surface temperature (SST), surface wind speed (SSW), near-surface air temperature (Ta), and specific humidity (Qa), but orbital sampling and cloud contamination leave gaps in satellite inputs that propagate uncertainty to Ta/Qa and hence to LHF/SHF. Here we present DeepFlux, a global daily 1° × 1° heat-flux dataset for 29 years (January 1992–December 2020). The dataset is produced with a concise completion-then-retrieval workflow: Special Sensor Microwave/Imager (SSM/I) variables (SSW, cloud liquid water, total column water vapor, and rain rate) are first gap-filled using the AI-based Generalized Data Completion Model (GDCM) to yield spatiotemporally continuous inputs; these – together with Optimum Interpolation SST (OISST) – are then used to retrieve Ta and Qa via the AI-based Matrices-Points Fusion Network (MPFNet). LHF and SHF are then computed using a bulk algorithm. Validation against in-situ buoy observations shows that the dataset closely matches the true measurements, with RMSEs of 0.53 °C (Ta), 0.70 g kg⁻¹ (Qa), 5.53 W m⁻² (SHF), and 25.28 W m⁻² (LHF). Comparisons with widely used flux products indicate differences among products, reflecting variability in flux estimates from different sources. DeepFlux provides an open, consistent, observation-constrained view of near-surface meteorology and air–sea heat exchange for climate diagnostics, model evaluation, and process studies. DeepFlux v1.0 is openly available under CC BY 4.0 at [repository] (DOI: http://dx.doi.org/10.12157/IOCAS.20250823.001). If you want to download without registering you can visit https://zenodo.org/records/17160579.
- Preprint
(4284 KB) - Metadata XML
-
Supplement
(1059 KB) - BibTeX
- EndNote
Status: open (until 23 Mar 2026)
- RC1: 'Comment on essd-2025-545', Anonymous Referee #1, 19 Feb 2026 reply
-
RC2: 'Comment on essd-2025-545', Anonymous Referee #2, 26 Feb 2026
reply
This paper presents a method and dataset for global ocean heat flux over an almost 30 year period. The method draws on re-analysis data to machine-learn how to "complete" SSMI-series satellite fields of data. From those completed fields for variables such as surface temperature, humidity and wind speeds, a bulk-formulae based module computes fluxes (one has to read another paper to understand how that step is formulated more fully). In comparison with in situ based measurements of these components and associated fluxes, the authors present results suggesting the new product is scientifically competitive with established products. Some analysis of the drivers of long-term trends in the fluxes of the new product are shown.
It is an interesting contribution to the development of better quantification of air-sea fluxes. My critical comments on the work are as follows.
The method of "SSMI completion" is heavily machine-learning led. The approach is presented, inevitably, at a relatively high level. It sounds methodical and reasonable, but nonetheless, with such an approach, many specific design choices are made the affect the result. Other choices could have been made, and this structural uncertainty in the design is not explored. This seems to me to be a general problem with machine-learning approaches, where choices for processing are not really based on physical understanding or hypotheses: the inability to attribute the outcomes to scientific uncertainties, because machine-learning design choices are at least as important.
In this context, the total independence of comparison data from training data becomes crucial. But this is not always easy to be clear about, especially when re-analysis fields have been used as part of the training, as the assimilation products may well have ingested all or much of the comparison data. Comments focussed on and acknowledging any limitations of independence would help on this topic.
I was left a little unclear what the precise measurand heat-flux is. I infer the product aims for a global completed heat flux equivalent to the instantaneous heat fluxes one would be able to retrieve from the satellite data, and that the comparison data are matches to the nearest SSMI comparison time. If so, there is an issue with using the product for long-term change analysis in that there is a subdaily cycle in heat fluxes, and the satellite observation times are not consistent over the full period. (Targetting an explicitly daily-mean heat flux by machine learning might be a useful approach and could be validated against in situ data at high temporal resolution aggregated to daily values.)
OISST is used for SST trends. This is an unfortunate choice among the available options for a long-term SST record, as OISST's operational mode of production is associated with inconsistency of bias referencing over time (Journal of Climate 34, 2923–2939 (2021)), causing relatively out-of-family trends (instability) over the period of this dataset. (See: 10.1175/JCLI-D-20-0793.1 ; https://climate.esa.int/documents/2370/SST_CCI_D5.1_CAR_v1.1-signed.pdf.)
I would like to see in the paper a comparison of the accuracy statistics for matches that were present in the SSMI swaths compared to the infilled times-and-places. This would be a good measure of the effectiveness of the infilling in providing a "daily" complete product.
Overall, the paper is well written and presented. There is inconsistency in acronyms being presented within and without being italicised, and sometimes named differently in figures (e.g. SSW and WS). Table 1 is very confusing and needs to be aligned in a way the reader can understand what is connected to what.
Citation: https://doi.org/10.5194/essd-2025-545-RC2
Data sets
DeepFlux v1.0: A Global Open Oceans Daily Heat Flux Dataset For 1992–2020 From SSMI Satellite Data Using Deep Learning Models Haoyu Wang et al. https://doi.org/10.12157/IOCAS.20250823.001
Model code and software
GDCM Haoyu Wang et al. https://doi.org/10.12157/IOCAS.20250823.001
MPFNet Haoyu Wang et al. https://doi.org/10.12157/IOCAS.20250823.001
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 388 | 177 | 34 | 599 | 86 | 33 | 61 |
- HTML: 388
- PDF: 177
- XML: 34
- Total: 599
- Supplement: 86
- BibTeX: 33
- EndNote: 61
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This is a novel approach to calculate turbulent heat fluxes over the ocean. Aside from some usual manuscript omissions and corrections (noted below), my main concern is on the description and complete understanding by the authors of the data sets used, in particular, the SSM/I products that they presumably obtained from RSS. In my opinion, they lack full understanding of this data set, what the limitations may be, etc. but use them on face value. Here are just some of my concerns:
My primary suggestion would be to include more details on this data set and demonstrate your understanding of what you are using. Otherwise, it just seems to be a huge data exercise.
Some general comments (and this is not all of them)