MLAWind: A Monthly Sea Surface Wind Dataset Derived from an Interpretable Machine Learning Approach Integrating In-Situ Observations and Satellite Data

Guo, Weihao; Zhang, Rongwang; Wang, Xin; Wang, Dongxiao

doi:10.5194/essd-2025-725

Preprints

https://doi.org/10.5194/essd-2025-725

Preprints

13 Feb 2026

| 13 Feb 2026

Status: this preprint is currently under review for the journal ESSD.

MLAWind: A Monthly Sea Surface Wind Dataset Derived from an Interpretable Machine Learning Approach Integrating In-Situ Observations and Satellite Data

Weihao Guo, Rongwang Zhang, Xin Wang, and Dongxiao Wang

Abstract. A gridded sea surface wind dataset with long temporal coverage is crucial for understanding atmospheric circulation changes and air-sea interactions at different time scales. This study employs an interpretable machine learning model based on random forest algorithm to generate a 1°×1° monthly sea surface wind dataset (MLAWind) from 1950 to 2023, covering the near-global ocean within 60° S–60° N. The data reconstruction model integrates the Cross-Calibrated Multi-Platform (CCMP) satellite data and the spatially sparse long-term International Comprehensive Ocean-Atmosphere Data Set (ICOADS), exhibiting robust interpretability and generalization capability. Evaluations demonstrate that the MLAWind dataset exhibits better agreement with remote sensing observations than existing reanalysis datasets during the training period (1993–2022), while maintaining robust performance during the independent testing period in 2023. Moreover, the performance of MLAWind since 1950 is assessed across multiple time scales. Its characteristics in climatology, annual cycle, and inter-annual variability are comparable to those of existing reanalysis datasets, even during the non-satellite period prior to 1993. Uncertainties remain in the long-term trends of different datasets. The trend derived from MLAWind is corroborated by independent coral records during 1950–1982, which demonstrates its strong capability in reconstructing historical sea surface wind variations. The results indicate that MLAWind serves as a reliable data resource for global climate change research. The reconstructed MLAWind dataset is publicly accessible at https://doi.org/10.5281/zenodo.17354864 (Guo et al., 2025b).

Received: 27 Nov 2025 – Discussion started: 13 Feb 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Weihao Guo, Rongwang Zhang, Xin Wang, and Dongxiao Wang

Status: open (until 27 Apr 2026)

Post a comment Subscribe to comment alert

RC1: 'Comment on essd-2025-725', Anonymous Referee #1, 17 Feb 2026 reply

The authors used a random forest and SHAP algorithms reconstructed a monthly sea surface wind in 1°×1° grids between 60°S-60°N from 1950-2023 based on ICOADS observations. The results were validated by ERA5, JRA-55, NOAA-20C, NCEP1, NCEP2, and Mn/Ca records in climatology, annual cycle, interannual variability, and long-term trend. The paper is well written and the dataset should be a good reference to user communities, and can be published in ESSD after a major revision. My major concerns are (a) lack of clarification on the use of CCMP as label variable, (b) lack of inter-comparisons against independent observations, and (c) lack of validation on wind direction.

L28-30, it is more important about the evaluation during the non-training period, as stated later in L32-35 the results are comparable. Therefore it should be clearly stated about the improvement of the current study.
L90, “It” is not clear, is it “WASWind”?
L121, “an interpretable machine learning algorithm” need a reference.
L136, WD, it is not clear why U, V, and WS were used in validation data while the inputs from ICOADS used WS and WD.
L156, “Smith (1980)” is messing in references.
L160-171, Have these satellite-based wind gone through a “bias-adjustment” process as for the in situ observations? How do we know these satellite observations are consistent with in situ observations?
L208-211, I am not clear how the CCMP (1993-2023) is used as a label variable while ICOADS (1950-2023) used as feature input. What is the label variable during 1950-1992?
Figure 3a,c, I assume the color shading represent wind speed, my question is: what are the wind directions. Is there any metrics to measure the successes of the reconstruction? Clear differences can be found in wind speed in 2000 in the northern North Pacific and northern North Atlantic.
L309-316, I suggest adding a table to compare these RMSE, R-squared, bias etc so that readers can easily understand the results.
Figure 4, is it possible to check the wind direction v/u as a verification metrics in Eqs (1)-(5)?
Figures 7-10 are good for the intercomparisons. However, I am wondering the direct comparisons against observations particularly independent observations to see whether MLWind’s performance is comparable with those reference datasets?
Table 2, I have difficulty to understand how the Mn/Ca trend can be quantitatively compared with wind because they are different variables in different units. Further, the difference among MLWind, ERA5, NCEP1, and NOAA-20C are large, why?

Reply

Citation: https://doi.org/10.5194/essd-2025-725-RC1

Weihao Guo, Rongwang Zhang, Xin Wang, and Dongxiao Wang

Data sets

Machine learning-assisted sea surface wind dataset (MLAWind) W. Guo et al. https://doi.org/10.5281/zenodo.17354864

Weihao Guo, Rongwang Zhang, Xin Wang, and Dongxiao Wang

Viewed

Total article views: 320 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
204	99	17	320	24	34

HTML: 204
PDF: 99
XML: 17
Total: 320
BibTeX: 24
EndNote: 34

Views and downloads (calculated since 13 Feb 2026)

Month	HTML	PDF	XML	Total
Feb 2026	142	52	12	206
Mar 2026	62	47	5	114

Cumulative views and downloads (calculated since 13 Feb 2026)

Month	HTML	PDF	XML	Total
Feb 2026	142	52	12	206
Mar 2026	62	47	5	114

Viewed (geographical distribution)

Total article views: 315 (including HTML, PDF, and XML) Thereof 315 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 26 Mar 2026

Short summary

This study develops a monthly near-global sea surface wind dataset (MLAWind) from 1950–2023 through an interpretable machine learning framework integrating in-situ observations and satellite data. Evaluations show that MLAWind achieves comparable performance to the widely-used reanalysis datasets at different time scales. It provides reliable historical wind data during both satellite and non-satellite periods, demonstrating broad application prospects in ocean and climate research.


Total:	0
HTML:	0
PDF:	0
XML:	0