the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Bowen ratio-constrained global dataset of air-sea turbulent heat fluxes from 1993 to 2017
Abstract. Air-sea turbulent heat fluxes, including the sensible heat flux (SHF) and latent heat flux (LHF), along with the Bowen ratio (β, ratio of SHF to LHF), are crucial for understanding air-sea interaction and global energy and water budgets. However, the existing products, primarily developed using the semi-empirical bulk aerodynamic methods and data-driven machine learning approaches, are often weak in accuracy and physical rationality, due to the uncertainties in the environmental forcings and inappropriate parameterizations. In this study, we generated a global daily 0.25° product of air-sea turbulent heat fluxes using the Bowen ratio-constrained Neural Network (NN) model (referred to as the BrTHF model) that could coordinately estimate the SHF and LHF, along with the observations from 197 globally distributed buoys and multi-source remote sensing and reanalysis forcings. The spatial ten-fold cross-validation results showed that the BrTHF model, achieving root mean square errors of 6.05 W/m2, 23.67 W/m2 and 0.22 and correlation coefficients of 0.93, 0.91 and 0.25 for the SHF, LHF and β, respectively, outperformed the physics-agnostic NN model and seven widely used air-sea turbulent heat flux products (including JOFURO3, IFREMER, SeaFlux, ERA5, MERRA2, OAFlux, and OHF). Furthermore, the inter-comparison of the spatial distribution of multi-year means, as well as intra-annual and inter-annual change patterns showed that the BrTHF product reliably simulated global SHF, LHF and β, in contrast to the machine learning-based OHF product that failed to replicate these patterns. The main advantage of the BrTHF model lies in its improved rationality of β estimates, successfully eliminating the outliers observed in the physics-agnostic NN model and the seven typical products. The improved SHF, LHF, and β estimates can allow for more accurate quantification of the global air-sea energy and water budgets, enhance our understanding of air-sea interaction, and improve projections of climate change under global warming. The 0.25° daily global product from 1993 to 2017 can be freely accessed from the National Tibetan Plateau Data Center (TPDC) [https://doi.org/10.11888/Atmos.tpdc.302578, Tang and Wang (2025)].
- Preprint
(13769 KB) - Metadata XML
-
Supplement
(638 KB) - BibTeX
- EndNote
Status: open (until 12 Jul 2025)
-
RC1: 'Comment on essd-2025-272', Anonymous Referee #1, 19 Jun 2025
reply
Summary and Merit:
Global air-sea flux estimates are useful for understanding the transport of heat and water throughout the globe. With this dataset, the authors use a physics-constrained data-driven method to generate a dataset at moderate resolution (0.25 degrees) from 1993-2017. A key improvement is realistic representation of the ratio of SHF to LHF. While I think the work itself is a very interesting exercise and think this has strong potential to be a useful dataset, I do have a significant concern that I would like to see discussed.
Main comment:
I am not entirely convinced that the training dataset has large enough spatial and temporal coverage for the neural network to accurately generalize and produce a product with global-scale coverage. In particular, from Figure 2, it looks like the training observations are disproportionately from the tropical ocean. Outside of the tropics, only the northeast Pacific and North Atlantic appear to have (visually) reasonable coverage. To evaluate performance on “unseen” locations, the authors employ spatial-informed cross validation. While this procedure demonstrates that predictions are reasonably accurate at the different spatial domains that are part of the training set, this does not indicate that predictions will be accurate in regions where there are not any existing data. For instance, there are many locations in the southern hemisphere presumably characterized by different dynamics than the locations in training dataset. The comparisons between basins presented later are also only reflective of the locations in Fig 2, I think. Of additional concern is that there are many variables used in training which likely have a relationship with air-sea fluxes that is very location-specific.
I do appreciate that the authors attempt to address this issue with the above, but I don’t think this goes far enough. I also acknowledge that this is not an easy comment to address (i.e., more buoy measurements cannot be used if the buoys do not exist). But, I still think the discussion of this could be improved. One idea might be to perform an even more targeted form of cross-validation, e.g., removing one of the isolated locations from training to see how well the neural network performs— and use this to quantify uncertainty. E.g., Remove the single location south of Australia from training, and see how the NN performs for predictions of that location when only the others are used in training. The current Figures 3-5 lump data together from different regions, so it is not possible to determine how well performance is for the isolated locations. Such an approach could be repeated for other single isolated locations to get a generalized idea of uncertainty at several of the remote locations not included in training. There probably could be other ways to address it as well. But in any case, there needs to be some manner of disclaimer- the R values and RMSE shown represent performance at the locations used in training and do not necessarily indicate the same performance in a generalized global sense.
Line-by-line comments and suggestions:
Title/abstract – It might be helpful to explicitly mention that these are bulk flux predictions
L66 – typo seriously “imped”
L68 – change “ascribed” to “attributed”
L70-77 – I think this section should be more explicit on what the problems are with existing parameterizations
L78 – clarify what upscaling means in this context
L93 – “patterns”
L103 – I don’t understand what “their synergistic changes” refers to
L107 – ambiguous whether “this work” refers to the 2024 work or the present paper
L118 – “three fold”
L146-161 – I think these datasets should be listed in table form, not as a long paragraph. It would make this much easier to read.
L202 – By forcing variables, it might be helpful to clarify that this means variables used in training the neural network
L214 – not sure it’s necessary to list these out in paragraph form. To be concise it might be better to simply refer to the relevant table.
L276 – I am concerned that the relationships between air sea fluxes and these 11 variables are not globally generalizable.
L316 – Might be helpful to add a short explanation on why you chose these metrics
L363-383, Fig 5 – While performance in terms of RMSE is clearly improved as explained, depending on the application it might be considered a deficiency that BrTHF does not reproduce extreme values of Bowen ratio that we know exist from the observations (i.e. the distribution is not necessarily better represented than the other models). I think this needs to be explicitly discussed.
L400+ - I think it might be useful to compare the performance by basin to the amount of data coverage between basins. This might help explain why the model performed the way it did.
Fig 7 – I would recommend to use a color other than blue for the second and third columns. As is, it is confusing that dark blue = poor performance in column 1, but dark blue = good performance in columns 2 and 3.
I also think it should be very clear that the basins here just represent the buoy locations that are available in those basins; not uniform coverage in them.
L448-449 – That looks true for all datasets, not just BrTHF from Figure 8. I would recommend to clarify.
Fig 8-9 – Is there a measure of uncertainty in these long-term averages that could be included on the plots?
L472 – “rest of the products”
L482-483 – I would recommend to speculate on what regions/mechanism may have caused this positive trend, as it differs from the other products.
Sec 3.3 – This section implies that performance between BrTHF and Seaflux-ERA5 is similar, even in regard to Bowen ratio which earlier seemed to be the point of significant improvement for BrTHF. Please comment on this.
Fig 13 – It’s a bit confusing that the labels on the color bar are below the plots on the left. It might be more intuitive to add a title above each subplot rather than a colorbar label.
L553-555 – Do we trust these results, considering that there was significant uncertainty at high latitudes (and the NN was trained on few observations from high latitudes)? Could this be an artifact of the training data/procedure?
L588 – “custom”
L590 – I’m unconvinced that the absence of outliers is an improvement, since outliers exist in the observations. Please comment on this.
L609-618 – I’m not sure that this isn’t also true for the present dataset based on looking at Figure 2
L666 – Performance in terms of SHF/LHF did not clearly look superior based on the plots. Please clarify that the largest improvement is in Bowen ratio.
Citation: https://doi.org/10.5194/essd-2025-272-RC1
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
144 | 18 | 8 | 170 | 7 | 5 | 7 |
- HTML: 144
- PDF: 18
- XML: 8
- Total: 170
- Supplement: 7
- BibTeX: 5
- EndNote: 7
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1