Preprints
https://doi.org/10.5194/essd-2025-654
https://doi.org/10.5194/essd-2025-654
12 Nov 2025
 | 12 Nov 2025
Status: this preprint is currently under review for the journal ESSD.

A historical nutrient dataset (1895–2024) for the North Pacific: reconstructed from machine learning and hydrographic observations

Chuanjun Du, Naiwen Zheng, Shuh-Ji Kao, Minhan Dai, Zhimian Cao, Dalin Shi, Qiancheng Li, Hao Wang, and Xiaolin Li

Abstract. Nutrients play a critical role in oceanic primary productivity and the biological pump. However, compared to hydrographic parameters such as temperature and salinity, nutrient observations are limited due to their labor-intensive and costly measurements. Thus, nutrient observations are several orders of magnitude sparser than hydrographic observations. In this study, we first established a rigorous data quality control procedure to clean the hydrographic and nutrient (including NO₃⁻, NO₂⁻, DIP, and Si(OH)₄) observations collected from World Ocean Database (WOD) and CLIVAR and Carbon Hydrographic Data Office (CCHDO) in the North Pacific. Subsequently, the cleaned and high-quality CCHDO dataset was used to train three machine learning models – Random Forest, Light Gradient Boosting Machine (LightGBM), and Gaussian Process Regression – to establish relationships between nutrient concentrations and key variables, including space coordinates (longitude, latitude, and depth), time variables (year and month), and water mass properties (indexed by potential temperature and salinity). Validation shows that the reconstruction closely matches the observations, with RMSEs of <1.41, <0.071, <0.089 and <3.07 mmol kg-1 for NO₃⁻, NO₂⁻, DIP, and Si(OH)₄, respectively. The validated models were then applied to reconstruct nutrient concentrations from the hydrographic observations in WOD, ​most of which lacked direct nutrient measurements​​. This resulted in ~473 million reconstructed nutrient data points across 1.92 million stations for each nutrient, spanning from 1895 to 2024, representing a 2,127 to 2,393-fold increase compared to the original nutrient observations in the North Pacific (197,539 to 222,234). This new dataset will be valuable for studying nutrient variability under climate change and anthropogenic influences, and for providing transient boundary conditions in ocean biogeochemical models. The dataset generated in this study is openly available on Zenodo at https://zenodo.org/records/17451417.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Chuanjun Du, Naiwen Zheng, Shuh-Ji Kao, Minhan Dai, Zhimian Cao, Dalin Shi, Qiancheng Li, Hao Wang, and Xiaolin Li

Status: open (until 19 Dec 2025)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Chuanjun Du, Naiwen Zheng, Shuh-Ji Kao, Minhan Dai, Zhimian Cao, Dalin Shi, Qiancheng Li, Hao Wang, and Xiaolin Li

Data sets

Validated temperature and salinity data, and reconstructed nutrient concentrations in the North Pacific (1895–2024) C. Du et al. https://zenodo.org/records/17451417

Chuanjun Du, Naiwen Zheng, Shuh-Ji Kao, Minhan Dai, Zhimian Cao, Dalin Shi, Qiancheng Li, Hao Wang, and Xiaolin Li

Viewed

Total article views: 35 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
32 2 1 35 3 2 2
  • HTML: 32
  • PDF: 2
  • XML: 1
  • Total: 35
  • Supplement: 3
  • BibTeX: 2
  • EndNote: 2
Views and downloads (calculated since 12 Nov 2025)
Cumulative views and downloads (calculated since 12 Nov 2025)

Viewed (geographical distribution)

Total article views: 35 (including HTML, PDF, and XML) Thereof 35 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 13 Nov 2025
Download
Short summary
Nutrient levels govern oceanic primary production, but measuring them is labor-intensive and costly. To address this, we used machine learning models to learn the hidden relationships between easy-to-measure ocean properties (like temperature and salinity) and nutrient levels. Applying this model, we created ~ 470 million nutrient data points across the North Pacific from 1895 to 2024. This data will help to understand nutrient and marine ecosystem variability under climate change.
Share
Altmetrics