Bowen ratio-constrained global dataset of bulk air–sea turbulent heat fluxes from 1993 to 2017

Wang, Yizhe; Tang, Ronglin; Liu, Meng; Huang, Lingxiao; Li, Zhao-Liang

doi:10.5194/essd-18-3391-2026

Articles | Volume 18, issue 5

https://doi.org/10.5194/essd-18-3391-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/essd-18-3391-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 18, issue 5

Data description article

|

19 May 2026

Data description article |

| 19 May 2026

Bowen ratio-constrained global dataset of bulk air–sea turbulent heat fluxes from 1993 to 2017

Yizhe Wang, Ronglin Tang, Meng Liu, Lingxiao Huang, and Zhao-Liang Li

Download

Final revised paper (published on 19 May 2026)
Supplement to the final revised paper
Preprint (discussion started on 05 Jun 2025)
Supplement to the preprint

Interactive discussion

Status: closed

RC1:
'Comment on essd-2025-272', Anonymous Referee #1, 19 Jun 2025

Summary and Merit:

Global air-sea flux estimates are useful for understanding the transport of heat and water throughout the globe. With this dataset, the authors use a physics-constrained data-driven method to generate a dataset at moderate resolution (0.25 degrees) from 1993-2017. A key improvement is realistic representation of the ratio of SHF to LHF. While I think the work itself is a very interesting exercise and think this has strong potential to be a useful dataset, I do have a significant concern that I would like to see discussed.

Main comment:

I am not entirely convinced that the training dataset has large enough spatial and temporal coverage for the neural network to accurately generalize and produce a product with global-scale coverage. In particular, from Figure 2, it looks like the training observations are disproportionately from the tropical ocean. Outside of the tropics, only the northeast Pacific and North Atlantic appear to have (visually) reasonable coverage. To evaluate performance on “unseen” locations, the authors employ spatial-informed cross validation. While this procedure demonstrates that predictions are reasonably accurate at the different spatial domains that are part of the training set, this does not indicate that predictions will be accurate in regions where there are not any existing data. For instance, there are many locations in the southern hemisphere presumably characterized by different dynamics than the locations in training dataset. The comparisons between basins presented later are also only reflective of the locations in Fig 2, I think. Of additional concern is that there are many variables used in training which likely have a relationship with air-sea fluxes that is very location-specific.

I do appreciate that the authors attempt to address this issue with the above, but I don’t think this goes far enough. I also acknowledge that this is not an easy comment to address (i.e., more buoy measurements cannot be used if the buoys do not exist). But, I still think the discussion of this could be improved. One idea might be to perform an even more targeted form of cross-validation, e.g., removing one of the isolated locations from training to see how well the neural network performs— and use this to quantify uncertainty. E.g., Remove the single location south of Australia from training, and see how the NN performs for predictions of that location when only the others are used in training. The current Figures 3-5 lump data together from different regions, so it is not possible to determine how well performance is for the isolated locations. Such an approach could be repeated for other single isolated locations to get a generalized idea of uncertainty at several of the remote locations not included in training. There probably could be other ways to address it as well. But in any case, there needs to be some manner of disclaimer- the R values and RMSE shown represent performance at the locations used in training and do not necessarily indicate the same performance in a generalized global sense.

Line-by-line comments and suggestions:

Title/abstract – It might be helpful to explicitly mention that these are bulk flux predictions

L66 – typo seriously “imped”

L68 – change “ascribed” to “attributed”

L70-77 – I think this section should be more explicit on what the problems are with existing parameterizations

L78 – clarify what upscaling means in this context

L93 – “patterns”

L103 – I don’t understand what “their synergistic changes” refers to

L107 – ambiguous whether “this work” refers to the 2024 work or the present paper

L118 – “three fold”

L146-161 – I think these datasets should be listed in table form, not as a long paragraph. It would make this much easier to read.

L202 – By forcing variables, it might be helpful to clarify that this means variables used in training the neural network

L214 – not sure it’s necessary to list these out in paragraph form. To be concise it might be better to simply refer to the relevant table.

L276 – I am concerned that the relationships between air sea fluxes and these 11 variables are not globally generalizable.

L316 – Might be helpful to add a short explanation on why you chose these metrics

L363-383, Fig 5 – While performance in terms of RMSE is clearly improved as explained, depending on the application it might be considered a deficiency that BrTHF does not reproduce extreme values of Bowen ratio that we know exist from the observations (i.e. the distribution is not necessarily better represented than the other models). I think this needs to be explicitly discussed.

L400+ - I think it might be useful to compare the performance by basin to the amount of data coverage between basins. This might help explain why the model performed the way it did.

Fig 7 – I would recommend to use a color other than blue for the second and third columns. As is, it is confusing that dark blue = poor performance in column 1, but dark blue = good performance in columns 2 and 3.

I also think it should be very clear that the basins here just represent the buoy locations that are available in those basins; not uniform coverage in them.

L448-449 – That looks true for all datasets, not just BrTHF from Figure 8. I would recommend to clarify.

Fig 8-9 – Is there a measure of uncertainty in these long-term averages that could be included on the plots?

L472 – “rest of the products”

L482-483 – I would recommend to speculate on what regions/mechanism may have caused this positive trend, as it differs from the other products.

Sec 3.3 – This section implies that performance between BrTHF and Seaflux-ERA5 is similar, even in regard to Bowen ratio which earlier seemed to be the point of significant improvement for BrTHF. Please comment on this.

Fig 13 – It’s a bit confusing that the labels on the color bar are below the plots on the left. It might be more intuitive to add a title above each subplot rather than a colorbar label.

L553-555 – Do we trust these results, considering that there was significant uncertainty at high latitudes (and the NN was trained on few observations from high latitudes)? Could this be an artifact of the training data/procedure?

L588 – “custom”

L590 – I’m unconvinced that the absence of outliers is an improvement, since outliers exist in the observations. Please comment on this.

L609-618 – I’m not sure that this isn’t also true for the present dataset based on looking at Figure 2

L666 – Performance in terms of SHF/LHF did not clearly look superior based on the plots. Please clarify that the largest improvement is in Bowen ratio.

Citation: https://doi.org/10.5194/essd-2025-272-RC1
- AC3: 'Reply on RC1', Ronglin Tang, 18 Aug 2025
  
  Please see our responses in the attached file. We sincerely appreciate your valuable suggestions, which have greatly helped us improve the manuscript.
  
  Citation: https://doi.org/10.5194/essd-2025-272-AC3
RC2:
'Comment on essd-2025-272', Anonymous Referee #2, 03 Jul 2025

Please see the attached file

Citation: https://doi.org/10.5194/essd-2025-272-RC2
- AC2: 'Reply on RC2', Ronglin Tang, 18 Aug 2025
  
  Please see our responses in the attached file. We sincerely appreciate your valuable suggestions, which have greatly helped us improve the manuscript.
  
  Citation: https://doi.org/10.5194/essd-2025-272-AC2
EC1:
'Comment on essd-2025-272', Tobias Gerken, 11 Jul 2025

After reading the reviewer comments and considering the concerns raised by reviewers, I am inviting the authors to post their reply. However, based on the reviewer's concerns, I would like to inform the authors that unless there is a compelling case that dispels these concerns, a re-submission of the manuscript would likely lead to a rejection of the manuscript.

Citation: https://doi.org/10.5194/essd-2025-272-EC1
- AC1: 'Reply on EC1', Ronglin Tang, 18 Aug 2025
  
  Re: We appreciate the opportunity to revise our manuscript and address the reviewer comments in detail. We acknowledge the editor’s concern that a compelling case is needed to justify resubmission. In response, we have carefully considered all major and minor comments, substantially revised the manuscript to improve clarity and accuracy, and strengthened the scientific basis of our methodology and conclusions. We hope that our point-by-point replies and the revised manuscript sufficiently address the concerns raised.
  
  Citation: https://doi.org/10.5194/essd-2025-272-AC1

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

AR by Ronglin Tang on behalf of the Authors (12 Sep 2025) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (18 Sep 2025) by Tobias Gerken

RR by Anonymous Referee #1 (27 Sep 2025)

Suggestions for revision or reasons for rejection

I appreciate the significant effort put forth by the authors in revision which has improved the manuscript. However, I still have several concerns. Please see my comments below.

1) My first comment is in regard to whether the product can be trusted in regions far away from buoy observations. I appreciate that the authors have adopted my suggestion to perform targeted cross-validation of an isolated buoy location, and determine that the performance is similar to other products for remote locations. This addition is most welcome, and demonstrates that the NN exhibits some degree of generalizability that was not shown in the original version. While this is a step in the right direction, I think (and the authors acknowledge) that spatial limitations of the training data still likely influence the results. Because of this, I question whether BrTHF is an improvement over existing products on a global scale. At minimum I think a stronger disclaimer is needed considering that the authors present this as a global product.

2) I think there needs to be a dedicated discussion in the text on why it is important to represent the Bowen ratio accurately. That is, what specific applications would this be useful for? For example, are there deficiencies in previous studies utilizing other products that would be resolved if the Bowen ratio was more accurately modeled (e.g., without the extreme outliers that exist in alternate products)? In terms of demonstrating that this is a useful dataset, I think this is essential to discuss. From the tables and as noted by the other reviewer, the quantitative improvement of individual SHF and LHF terms is incremental compared to other products and probably not useful on its own. The main “improvement” is in the Bowen ratio, yet I’m not sure I understand exactly why this improvement would be useful.

3) There are still some features of the predicted values that need further explanation. I agree that extreme values of Bowen ratio are eliminated in BrTHF (though this needs some more justification on why those are a problem- the authors state they result from measurement error, but I don’t understand how that can be the case for model-derived products). My concern is the lack of representation of the dynamic range of Bowen ratios. The authors state that this is a slight underestimate, while from Fig 5 it seems like a large underestimate. Related, Fig 6 seems to show that for small Bowen ratios, the other products yield a more realistic range of values than BrTHF (if I’m interpreting that figure correctly). So essentially, it seems like BrTHF is eliminating one problem (extreme outliers) at the expense of creating another (too small of a dynamic range). The second problem seems to also be by intention (L213)- i.e., intentionally training on a narrower range – I don’t understand how this is reasonable to do. These aspects need to be clearly discussed, and there needs to be an explanation on whether this trade-off is actually an improvement for potential future applications using the dataset.

4) I still notice some errors and unclear statements in the text and think this would benefit from another close read-through. A few examples:

L72 – I don’t understand what the “key process” is?
L106 – These aren’t really separate “approaches”, just studies focusing on different variables.
L211 – “measurement errors” is very vague. Also not sure it is reasonable to assume that since you are using published data, unless there’s specific information in the metadata on this
L365 – Add additional justification on the metrics. Particularly, all of these will be sensitive to extreme values, and it doesn’t necessarily seem fair to remove extremes from training then evaluate in this way without some more justification.
L605 – The Gulf Stream, Brazil Current, and Sea of Japan are not high latitude

Hide

RR by Tobias Gerken (17 Dec 2025)

ED: Reconsider after major revisions (17 Dec 2025) by Tobias Gerken

AR by Ronglin Tang on behalf of the Authors (27 Jan 2026) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (12 Feb 2026) by Tobias Gerken

RR by Anonymous Referee #1 (02 Mar 2026)

ED: Publish subject to minor revisions (review by editor) (12 Mar 2026) by Tobias Gerken

AR by Ronglin Tang on behalf of the Authors (18 Mar 2026) Author's response Author's tracked changes Manuscript

ED: Publish as is (25 Mar 2026) by Tobias Gerken

AR by Ronglin Tang on behalf of the Authors (28 Mar 2026)

Post-review adjustments

AA – Author's adjustment | EA – Editor approval

AA by Ronglin Tang on behalf of the Authors (13 May 2026) Author's adjustment Manuscript

EA: Adjustments approved (13 May 2026) by Tobias Gerken

Download

Article (22738 KB)
Full-text XML

Short summary

We developed a new global daily dataset of turbulent heat exchanges between the ocean and atmosphere from 1993 to 2017. Utilizing a novel approach that combines machine learning with physical constraints, our model generates more accurate and physically reasonable estimates compared to existing datasets. This advancement enables improved understanding of ocean-atmosphere interactions, which are crucial for monitoring Earth's energy and water cycles and enhancing climate change projections.