Jingwei-Nutrients: A global spatiotemporal reconstruction of ocean nutrients (1965&ndash;2023) using multi-task deep learning

Wang, Zhaokun; Lu, Bin; Xin, Yi; Ito, Takamitsu; Zhou, Lei; Cheng, Lijing; Li, Yuanlong; Wang, Xinbing; Jin, Meng

doi:10.5194/essd-2026-309

Preprints

https://doi.org/10.5194/essd-2026-309

Preprints

12 May 2026

| 12 May 2026

Status: this preprint is currently under review for the journal ESSD.

Jingwei-Nutrients: A global spatiotemporal reconstruction of ocean nutrients (1965–2023) using multi-task deep learning

Zhaokun Wang, Bin Lu, Yi Xin, Takamitsu Ito, Lei Zhou, Lijing Cheng, Yuanlong Li, Xinbing Wang, and Meng Jin

Abstract. Dissolved nitrate, phosphate, and silicate are fundamental drivers of marine primary productivity and the biological carbon pump. However, the development of continuous, long-term global datasets has long been severely hindered by extreme historical data sparsity and complex biogeochemical dynamics. Statistical interpolation methods struggle to simultaneously fill the severely sparse data gaps and capture the non-linear interactions, necessitating advanced artificial intelligence (AI) to explicitly learn and leverage their underlying relationships. Nevertheless, most existing AI methods reconstruct nutrients independently (i.e., Single-Task Learning), failing to exploit the synergistic effects inherent in cross-nutrients stoichiometry. In this study, we present Jingwei-Nutrients, a global monthly dataset at resolution from 0 to 2000 m depth spanning 1965 to 2023, reconstructed using a Transformer-based Multi-Task Learning (MTL) framework trained on a comprehensive, quality-controlled multi-source observational database. Evaluation on the validation set yields values of 0.980, 0.961, and 0.983, with RMSEs of 2.21, 0.23, and 6.35 for nitrate, phosphate, and silicate, respectively. Temporal K-fold cross-validation reveals that the MTL framework consistently achieves higher and lower RMSE for all three nutrients compared to single-task models, with larger accuracy gains in data-sparse earlier decades such as 1965–1975. Our dataset reproduces consistent global climatology patterns and seasonal cycles with World Ocean Atlas (WOA). Furthermore, independent evaluations against long-term monitoring stations (HOT and KERFIX) and GO-SHIP cruise sections (P16N, P16S, and P06E) demonstrate our effectiveness across multi-decadal temporal trend, spatial variability and vertical changes. Additionally, an ensemble-based uncertainty analysis reveals interpretable spatial heterogeneities and a long-term decreasing trend in global uncertainty, which directly mirrors the historical transition from sparse early sampling to modern observing networks. This dataset fills a critical gap in historical ocean biogeochemical observations, providing a reliable, physically consistent foundation for marine biogeochemical modeling and climate change studies. The dataset is openly available at https://doi.org/10.5281/zenodo.19491198.

Received: 23 Apr 2026 – Discussion started: 12 May 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Zhaokun Wang, Bin Lu, Yi Xin, Takamitsu Ito, Lei Zhou, Lijing Cheng, Yuanlong Li, Xinbing Wang, and Meng Jin

Status: open (until 18 Jun 2026)

Post a comment Subscribe to comment alert

RC1: 'Comment on essd-2026-309', Anonymous Referee #1, 29 May 2026 reply

The manuscript reconstructs a 60-year global spatiotemporal nutrient dataset using a robust artificial intelligence model, providing an important data resource for future studies on nutrient dynamics and marine ecosystem analyses. I particularly appreciate the comprehensive and detailed preprocessing procedures applied to the original nutrient observations, which substantially enhance the credibility and reliability of the resulting dataset. Overall, I believe the manuscript reaches the publication standard of ESSD. However, several issues require clarification before publication.
1. The manuscript provides detailed descriptions of the modeling framework and comparison experiments, especially the independent validations conducted for different time periods, which effectively demonstrate the robustness of the proposed workflow. However, I suggest that the authors additionally train and evaluate the model using the complete temporal dataset (i.e., all periods combined) and provide the corresponding results for reference. Furthermore, in the Abstract, are the reported values of 0.980, 0.961, and 0.983 intended to represent Average Accuracy?
2. Lines 45–46: The study by Deutsch indeed suggests strong relationships among multiple nutrients, and similar concepts are reflected by Prof. Redfield. However, it remains unclear why the reconstructed dataset itself should inherently preserve such nutrient interrelationships. Please provide a more in-depth explanation and justification.
3. Related to the previous comment, Lines 324–332 Figure 3 indicates that the multi-task learning (MTL) framework does not appear to provide substantial improvements over the single-task approach for nitrogen (N) and silicate (Si). What are the underlying reasons for this behavior? Please analyze the potential sources of error and discuss why the benefits of MTL differ among nutrients. In addition, please specify the value of K used in the K-fold cross-validation.
4. Line 103: The manuscript states that “the MTL architecture maximizes the utility of fragmented historical observations.” What evidence supports this statement? In particular, what periods are referred to as “data-sparse eras”? Please provide a clearer definition and supporting analysis.
5. Line 115: The term “physically consistent” is used multiple times throughout the manuscript. Its meaning should be explicitly defined and supported by appropriate references. Expressions such as “considering physical correlations” or “integrating biogeochemical information” may be more precise. A similar concern applies to the term “physically interpretable” in Line 389.
6. Table 1: The numbers of observations for nitrogen and phosphorus are relatively similar, which may weaken the effectiveness of the multi-task learning framework. However, phosphorus exhibits a good performance improvement under MTL. How can we explain this result? Please provide further discussion.
7. Line 243: The term “natural language” appears unrelated to the scope of this study and should be reconsidered. Line 263: I do not fully agree with the description involving “biogeochemical constraints.” Please consider revising the terminology.
8. Line 298: The word “constructive” is enclosed in quotation marks, which is somewhat confusing. Please provide a more direct and precise description rather than a figurative expression. Similarly, the phrase “autonomously infer” in Line 274 sounds unusual in this context and could be revised for clarity.
9. Figure 10: Regarding the HOT station, please clarify whether the comparison is based on dedicated long-term HOT station observations or on data extracted from the corresponding latitude and longitude location.
10. Several figures would benefit from labeling subpanels with letters (e.g., a, b, c, d) to facilitate clearer descriptions and discussions in the text.

Reply

Citation: https://doi.org/10.5194/essd-2026-309-RC1

Zhaokun Wang, Bin Lu, Yi Xin, Takamitsu Ito, Lei Zhou, Lijing Cheng, Yuanlong Li, Xinbing Wang, and Meng Jin

Data sets

Jingwei-Nutrients: A global spatiotemporal reconstruction of ocean nutrients (1965–2023) using multi-task deep learning Zhaokun Wang et al. https://doi.org/10.5281/zenodo.19491198

Zhaokun Wang, Bin Lu, Yi Xin, Takamitsu Ito, Lei Zhou, Lijing Cheng, Yuanlong Li, Xinbing Wang, and Meng Jin

Viewed

Total article views: 275 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
200	63	12	275	14	14

HTML: 200
PDF: 63
XML: 12
Total: 275
BibTeX: 14
EndNote: 14

Views and downloads (calculated since 12 May 2026)

Month	HTML	PDF	XML	Total
May 2026	200	63	12	275
Jun 2026	0

Cumulative views and downloads (calculated since 12 May 2026)

Month	HTML	PDF	XML	Total
May 2026	200	63	12	275
Jun 2026	0

Viewed (geographical distribution)

Total article views: 275 (including HTML, PDF, and XML) Thereof 275 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 02 Jun 2026

Short summary

We present Jingwei-Nutrients, a global monthly dataset of ocean nitrate, phosphate, and silicate from 1965 to 2023 down to 2000 meters. Built using a multi-task deep learning framework, it merges sparse historical data with ocean physics. This continuous record helps scientists understand marine ecosystems and climate change responses. We also provide the Jingwei web platform (https://jingwei.acemap.info/map) for dynamic data exploration and visualization without coding.


Total:	0
HTML:	0
PDF:	0
XML:	0