<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "https://jats.nlm.nih.gov/nlm-dtd/publishing/3.0/journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" xml:lang="en" dtd-version="3.0" article-type="data-paper">
  <front>
    <journal-meta><journal-id journal-id-type="publisher">ESSD</journal-id><journal-title-group>
    <journal-title>Earth System Science Data</journal-title>
    <abbrev-journal-title abbrev-type="publisher">ESSD</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">Earth Syst. Sci. Data</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">1866-3516</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/essd-18-2951-2026</article-id><title-group><article-title>A historical nutrient dataset (1895–2024) for the  North Pacific: reconstructed from machine  learning and hydrographic observations</article-title><alt-title>A historical nutrient dataset (1895–2024) for the North Pacific</alt-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author" corresp="yes" rid="aff1">
          <name><surname>Du</surname><given-names>Chuanjun</given-names></name>
          <email>cjdu@hainanu.edu.cn</email>
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Zheng</surname><given-names>Naiwen</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Kao</surname><given-names>Shuh-Ji</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff2">
          <name><surname>Dai</surname><given-names>Minhan</given-names></name>
          
        <ext-link>https://orcid.org/0000-0003-0550-0701</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff2">
          <name><surname>Cao</surname><given-names>Zhimian</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff2">
          <name><surname>Shi</surname><given-names>Dalin</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Li</surname><given-names>Qiancheng</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Wang</surname><given-names>Hao</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Luo</surname><given-names>Xunlan</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="yes" rid="aff2">
          <name><surname>Li</surname><given-names>Xiaolin</given-names></name>
          <email>xlli@xmu.edu.cn</email>
        <ext-link>https://orcid.org/0000-0001-9314-5716</ext-link></contrib>
        <aff id="aff1"><label>1</label><institution>School of Marine Sciences, Hainan University, Haikou 570228, China</institution>
        </aff>
        <aff id="aff2"><label>2</label><institution>State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences,  Xiamen University, Xiamen 361102, China</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Chuanjun Du (cjdu@hainanu.edu.cn) and Xiaolin Li (xlli@xmu.edu.cn)</corresp></author-notes><pub-date><day>28</day><month>April</month><year>2026</year></pub-date>
      
      <volume>18</volume>
      <issue>4</issue>
      <fpage>2951</fpage><lpage>2969</lpage>
      <history>
        <date date-type="received"><day>30</day><month>October</month><year>2025</year></date>
           <date date-type="rev-request"><day>12</day><month>November</month><year>2025</year></date>
           <date date-type="rev-recd"><day>22</day><month>February</month><year>2026</year></date>
           <date date-type="accepted"><day>2</day><month>April</month><year>2026</year></date>
      </history>
      <permissions>
        <copyright-statement>Copyright: © 2026 Chuanjun Du et al.</copyright-statement>
        <copyright-year>2026</copyright-year>
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://essd.copernicus.org/articles/18/2951/2026/essd-18-2951-2026.html">This article is available from https://essd.copernicus.org/articles/18/2951/2026/essd-18-2951-2026.html</self-uri><self-uri xlink:href="https://essd.copernicus.org/articles/18/2951/2026/essd-18-2951-2026.pdf">The full text article is available as a PDF file from https://essd.copernicus.org/articles/18/2951/2026/essd-18-2951-2026.pdf</self-uri>
      <abstract><title>Abstract</title>

      <p id="d2e177">Nutrients play a critical role in oceanic primary productivity and the biological pump. However, compared to hydrographic parameters such as temperature and salinity, nutrient observations are limited due to their labor-intensive and costly measurements. Thus, nutrient observations are several orders of magnitude sparser than hydrographic observations. In this study, we first established a rigorous data quality control procedure to clean the hydrographic and nutrient (including NO<inline-formula><mml:math id="M1" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, NO<inline-formula><mml:math id="M2" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, DIP, and Si(OH)<sub>4</sub>) observations collected from World Ocean Database (WOD) and CLIVAR and Carbon Hydrographic Data Office (CCHDO) in the North Pacific. Subsequently, the cleaned and high-quality CCHDO dataset was used to train three machine learning models – Random Forest, Light Gradient Boosting Machine (LightGBM), and Gaussian Process Regression – to establish relationships between nutrient concentrations and key variables, including space coordinates (longitude, latitude, and depth), time variables (year and month), and water mass properties (indexed by potential temperature and salinity). Validation shows that the reconstruction closely matches the observations, with Root Mean Squared Errors (RMSEs) of <inline-formula><mml:math id="M4" display="inline"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">1.41</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M5" display="inline"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">0.071</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M6" display="inline"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">0.089</mml:mn></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M7" display="inline"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">3.07</mml:mn></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M8" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup> for NO<inline-formula><mml:math id="M10" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, NO<inline-formula><mml:math id="M11" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, DIP, and Si(OH)<sub>4</sub>, respectively. The validated models were then applied to reconstruct nutrient concentrations from the hydrographic observations in WOD, most of which lacked direct nutrient measurements. This resulted in <inline-formula><mml:math id="M13" display="inline"><mml:mrow><mml:mo>∼</mml:mo><mml:mn mathvariant="normal">473</mml:mn></mml:mrow></mml:math></inline-formula> million reconstructed nutrient data points across 1.92 million stations for each nutrient, spanning from 1895 to 2024, representing a 2127- to 2393-fold increase compared to the original nutrient observations in the North Pacific (197 539 to 222 234). This new dataset will be valuable for studying nutrient transport and budgets, spinning up and validating ocean biogeochemical models, assessing long-term nutrients and their stoichiometric changes driven by anthropogenic forcing and climate change. The dataset generated in this study is openly available via Zenodo (<ext-link xlink:href="https://doi.org/10.5281/zenodo.17451417" ext-link-type="DOI">10.5281/zenodo.17451417</ext-link>) (Du et al., 2025).</p>
  </abstract>
    
<funding-group>
<award-group id="gs1">
<funding-source>National Natural Science Foundation of China</funding-source>
<award-id>42494885</award-id>
<award-id>42576215</award-id>
<award-id>42494881</award-id>
<award-id>42276034</award-id>
</award-group>
<award-group id="gs2">
<funding-source>National Key Research and Development Program of China</funding-source>
<award-id>2023YFF0805001</award-id>
</award-group>
<award-group id="gs3">
<funding-source>Natural Science Foundation of Hainan Province</funding-source>
<award-id>624MS037</award-id>
</award-group>
<award-group id="gs4">
<funding-source>Hainan University</funding-source>
<award-id>XKTP2025A05</award-id>
</award-group>
</funding-group>
</article-meta>
  </front>
<body>
      

      
      </body>
    <back><notes notes-type="specialsection"><title>Key points</title>
    

      <p id="d2e328"><list list-type="bullet">
        
        <list-item>

      <p id="d2e335">Rigorous data quality control procedures were applied to clean nutrient and hydrographic data collected from multiple sources in the North Pacific, following state-of-the-art practices.</p>
        </list-item>
        <list-item>

      <p id="d2e341">Three machine learning models demonstrated low errors across diverse validation strategies.</p>
        </list-item>
        <list-item>

      <p id="d2e347">We reconstructed a large database of <inline-formula><mml:math id="M14" display="inline"><mml:mrow><mml:mo>∼</mml:mo><mml:mn mathvariant="normal">473</mml:mn></mml:mrow></mml:math></inline-formula> million nutrient data points across 1.92 million stations (1895–2024), expanding the number of nutrient data points by a factor of 2127–2393 compared to original observations.</p>
        </list-item>
      </list></p>
  </notes>
<sec id="Ch1.S1" sec-type="intro">
  <label>1</label><title>Introduction</title>
      <p id="d2e370">Bio-essential elements such as nitrogen, phosphorus, and silicon constitute the fundamental material basis for marine ecosystems. Their concentrations govern primary and new production (e.g., Browning and Moore, 2023; Lipschultz et al., 2002; Moore et al., 2013) and subsequently regulate oceanic uptake of atmospheric CO<sub>2</sub> (Deutsch and Weber, 2012; Sigman and Hain, 2012). However, traditional nutrient data collection relies heavily on ship-based cruises and subsequent sample analysis, which are labor-intensive, inefficient, and costly (Du et al., 2021). Consequently, compared to the abundant hydrographic data collected from multiple platforms such as Conductivity-Temperature-Depth (CTD) and the Array for Real-time Geostrophic Oceanography (Argo) profilers, etc. nutrient observations are sparse in the ocean. These sparse nutrient observations limit our understanding of both small-scale and long-term nutrient variations and our comprehensive understanding of the mechanisms driving changes in oceanic production and ecosystem dynamics (Bidigare et al., 2009; Yasunaka et al., 2021; Karl et al., 2021).</p>
      <p id="d2e382">To address this data sparsity, two main approaches have been commonly employed to augment the spatiotemporal coverage of the observed nutrient data. The first is objective analysis, which interpolates field measurements to generate broader spatial coverage, as implemented in products such as the World Ocean Atlas (WOA) (e.g., Reagan et al., 2024; Lee et al., 2023). The second is data fusion, which establishes statistical relationships between nutrients and environmental predictors such as temperature (e.g., Kamykowski, 1987, 2008; Kamykowski et al., 2002), density (e.g., Dugdale et al., 1989; Switzer et al., 2003), oxygen, salinity, and chlorophyll <inline-formula><mml:math id="M16" display="inline"><mml:mi>a</mml:mi></mml:math></inline-formula> (Goes et al., 1999; Palacios et al., 2013; Sarangi et al., 2011). Statistical methods including cubic regression, multiple linear regression (Steinhoff et al., 2010; Arteaga et al., 2015; Madani et al., 2024; Zhong et al., 2024), and generalized additive models (Palacios et al., 2013) are frequently used in these efforts.</p>
      <p id="d2e392">Recent studies have demonstrated the potential of machine learning for enhancing the spatial and temporal coverage of nutrient data. For instance, Możejko and Gniot (2008) used Artificial Neural Networks (ANNs) to model time series of total phosphorus concentrations in the Odra River. Self-organizing maps (SOMs) were used to estimate mixed layer nitrate and sea surface nutrients in the open ocean (Steinhoff et al., 2010; Yasunaka et al., 2014). Liu et al. (2022) applied Support Vector Regression, Random Forest Regression, and ANNs to reconstruct monthly surface nutrient concentrations in the Yellow and Bohai Seas from 2003 to 2019. Their results revealed pronounced seasonal and spatial variability in nutrient levels and underscored the influence of environmental drivers such as sea surface temperature and salinity. Similarly, Sundararaman and Shanmugam (2024) employed Gaussian Process Regression (GPR) models to estimate global ocean surface macronutrient concentrations using satellite-derived data, achieving high accuracy and demonstrating their suitability for large-scale marine ecosystem monitoring. Yang et al. (2024) employed a U-net and Earthformer to reconstruct the three-dimensional nitrate distribution by integrating surface data including wind speed, sea surface temperature, chlorophyll <inline-formula><mml:math id="M17" display="inline"><mml:mi>a</mml:mi></mml:math></inline-formula>, solar radiation, and precipitation in the Indian Ocean. These advancements highlight the expanding role of machine learning in marine biochemical data fusion and provide novel insights into nutrient dynamics and their ecological impacts.</p>
      <p id="d2e402">However, many existing approaches rely solely on mathematical extrapolation or data fusion and often neglect the influence of physical seawater properties, such as water mass characteristics. Using the relationship between nutrient concentration and water masses (indexed by temperature and salinity), Du et al. (2021) successfully predicted the nutrient concentrations in the South China Sea. However, the water masses and their relationship with nutrients can also vary with space and time, which should also be taken into consideration. In addition, most research has predominantly focused on nutrient predictions at surface waters – driven by readily available remote-sensing measurements of sea surface temperature and chlorophyll <inline-formula><mml:math id="M18" display="inline"><mml:mi>a</mml:mi></mml:math></inline-formula> – while subsurface nutrient distributions remain poorly studied.</p>
      <p id="d2e413">The North Pacific Ocean is one of the largest marine biomes in the global ocean (Karl and Church, 2017), spanning a broader longitudinal range than the other oceans in the world and a latitudinal range from tropical to polar regions. It includes a subtropical gyre characterized by extremely low surface nutrient concentrations due to Ekman convergence (e.g., Dave and Lozier, 2010; Browning et al., 2022; Dai et al., 2023), and subpolar gyres in the north with elevated nutrient concentrations driven by Ekman divergence. The atmospheric deposition (e.g., Martino et al., 2014; Qi et al., 2020), N2-fixation (e.g., Dai et al., 2023), and denitrification (Bonnet et al., 2017) are thought to be the main nutrient sources and sinks, which are decoupled in space and time in the North Pacific. It has been reported that the North Pacific Subtropical Gyre (NPSG) plays an important role in fixed N inputs in summer, but also contributes disproportionately to losses due to intense water-column denitrification in the eastern Pacific low-oxygen zones (Eugster and Gruber, 2012; Wang et al., 2019).</p>
      <p id="d2e416">The North Pacific Ocean is influenced by multiple upwelling and current systems, including the equatorial and California upwelling systems, North Equatorial Current, Kuroshio Current, etc., which further change nutrient levels in these regions. In addition, the North Pacific Ocean exhibits abundant mesoscale eddies (Chelton et al., 2007), which play a critical role in redistributing nutrients and modulating biological activity (e.g., Benitez-Nelson et al., 2007; Ascani et al., 2013; Barone et al., 2022). The interaction of these multi-scale physical processes with biogeochemical processes results in highly dynamic nutrient variability in the upper ocean. Therefore, high-resolution and extensive nutrient datasets are essential to accurately resolve the nutrient dynamics. Although the WOA (Reagan et al., 2024) serves as a primary nutrient database and is widely used for boundary conditions in biogeochemical models, its applicability is constrained by relatively coarse spatial resolution (currently 1°) and climatological smoothing, which limit its ability to represent mesoscale and episodic features or to capture long-term variations.</p>
      <p id="d2e419">In the North Pacific, Yasunaka et al. (2014) used the SOMs technique to generate monthly surface nutrient maps by integrating sea surface temperature, salinity, chlorophyll <inline-formula><mml:math id="M19" display="inline"><mml:mi>a</mml:mi></mml:math></inline-formula>, and mixed layer depth. These maps revealed seasonal and interannual variability in surface nutrient distributions in the northern North Pacific. To investigate long-term changes, Yasunaka et al. (2016) applied Optimal Interpolation to analyze the spatial and temporal evolution of surface nutrient concentrations. Lee et al. (2023) provided spatiotemporally gridded nitrate and phosphate data in the northwest Pacific from 1980 to 2019 using the spatiotemporal kriging technique. Wang et al. (2023) used the deep neural network model to estimate nitrate concentrations in the upper northwestern Pacific Ocean using temperature and salinity as the primary input parameters.</p>
      <p id="d2e429">In this study, we first collected nutrient data from public databases and applied rigorous quality control procedures. Using machine learning methods, we established relationships between nutrient concentrations and water mass properties, spatial coordinates, and temporal variables. We then evaluated the model performance through a comprehensive error analysis. Finally, the validated models were applied to reconstruct historical nutrient distributions across the North Pacific from 1895 to 2024.</p>
</sec>
<sec id="Ch1.S2">
  <label>2</label><title>Data and methods</title>
<sec id="Ch1.S2.SS1">
  <label>2.1</label><title>Observation data</title>
      <p id="d2e447">Field observations were originally downloaded from the Climate and Ocean: Variability, Predictability, and Change (CLIVAR) and Carbon Hydrographic Data Office (CCHDO), which distributes vessel-based hydrographic data from programs such as the World Ocean Circulation Experiment (WOCE), Joint Global Ocean Flux Study (JGOFS), GO-SHIP, CLIVAR, and other repeat hydrography efforts (<uri>https://cchdo.ucsd.edu/</uri>, last access: 1 October 2024). In total, 631 cruises were collected in the North Pacific, comprising 228 091, 197 617, 225 403, and 212 660 data points for NO<inline-formula><mml:math id="M20" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M21" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula> NO<inline-formula><mml:math id="M22" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> (NO<inline-formula><mml:math id="M23" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>), NO<inline-formula><mml:math id="M24" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, DIP, and Si(OH)<sub>4</sub>, respectively (Table 1). The dataset spans from 1973 to 2022 and was downloaded on 1 October 2024; any updates made after this date were not included in this study. The data cover a geographic range from 120.08° E to 95.17° W and from 2.05° S to 60.25° N. The study domain was slightly extended into the South Pacific to mitigate potential boundary effects during model development.</p>

<table-wrap id="T1"><label>Table 1</label><caption><p id="d2e521">Information on nutrients and their associated hydrographic data collected from CLIVAR and Carbon Hydrographic Data Office (CCHDO) and the information after quality control (QC).</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="6">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="center"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="left"/>
     <oasis:colspec colnum="5" colname="col5" align="center"/>
     <oasis:colspec colnum="6" colname="col6" align="right"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry namest="col2" nameend="col3">Original data </oasis:entry>
         <oasis:entry colname="col4"/>
         <oasis:entry namest="col5" nameend="col6">Data information </oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry rowsep="1" namest="col2" nameend="col3">information </oasis:entry>
         <oasis:entry colname="col4"/>
         <oasis:entry rowsep="1" namest="col5" nameend="col6">after QC </oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">Data</oasis:entry>
         <oasis:entry colname="col3">Stations</oasis:entry>
         <oasis:entry colname="col4"/>
         <oasis:entry colname="col5">Data</oasis:entry>
         <oasis:entry colname="col6">Stations</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Temperature</oasis:entry>
         <oasis:entry colname="col2">327 792</oasis:entry>
         <oasis:entry colname="col3">15 127</oasis:entry>
         <oasis:entry colname="col4"/>
         <oasis:entry colname="col5">327 688</oasis:entry>
         <oasis:entry colname="col6">15 125</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Salinity</oasis:entry>
         <oasis:entry colname="col2">328 502</oasis:entry>
         <oasis:entry colname="col3">15 274</oasis:entry>
         <oasis:entry colname="col4"/>
         <oasis:entry colname="col5">328 275</oasis:entry>
         <oasis:entry colname="col6">15 269</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">NO<inline-formula><mml:math id="M26" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col2">217 725</oasis:entry>
         <oasis:entry colname="col3">9588</oasis:entry>
         <oasis:entry colname="col4"/>
         <oasis:entry colname="col5">213 962</oasis:entry>
         <oasis:entry colname="col6">9021</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">NO<inline-formula><mml:math id="M27" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col2">197 617</oasis:entry>
         <oasis:entry colname="col3">8233</oasis:entry>
         <oasis:entry colname="col4"/>
         <oasis:entry colname="col5">197 539</oasis:entry>
         <oasis:entry colname="col6">8228</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">DIP</oasis:entry>
         <oasis:entry colname="col2">225 403</oasis:entry>
         <oasis:entry colname="col3">9623</oasis:entry>
         <oasis:entry colname="col4"/>
         <oasis:entry colname="col5">222 234</oasis:entry>
         <oasis:entry colname="col6">9474</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Si(OH)<sub>4</sub></oasis:entry>
         <oasis:entry colname="col2">212 660</oasis:entry>
         <oasis:entry colname="col3">8220</oasis:entry>
         <oasis:entry colname="col4"/>
         <oasis:entry colname="col5">210 447</oasis:entry>
         <oasis:entry colname="col6">8121</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d2e751">Hydrographic data for nutrient reconstruction were obtained from the World Ocean Database (WOD; Mishonov et al., 2024), which compiles observations from various platforms, including Autonomous Pinniped Bathythermograph (APB), Conductivity-Temperature-Depth profiler (CTD), Drifting Buoy (DRB), Glider (GLD), Mechanical Bathythermograph (MBT), Moored Buoy (MRB), Ocean Station Data (OSD), Profiling Float (PFL), and Undulating Oceanographic Recorder (UOR). Since nutrient reconstruction models rely on relationships with water masses, only samples containing both temperature and salinity measurements were used; therefore, most APB observations, which record only temperature, were excluded. Among these platforms, CTD, OSD, and PFL provided the majority of usable data. Additionally, several marginal seas – including the South China Sea, the Yellow Sea, the Sea of Japan, and the Sea of Okhotsk – were excluded from this study because they are semi-enclosed and strongly influenced by terrestrial inputs. The spatial domain was consistent with that used for the CCHDO dataset, while the temporal coverage extended from 1875 to 2024. In total, 577 215 683 data points from 2 284 448 stations across 40 113 original cruises were collected (Table 2). In addition, the OSD data before 1970 were extracted for nutrient validation in Sect. 3.1. A total of 102 424, 125 142, 447 335, and 294 734 data points were collected for NO<inline-formula><mml:math id="M29" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, NO<inline-formula><mml:math id="M30" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, DIP, and Si(OH)<sub>4</sub>, respectively.</p>

<table-wrap id="T2" specific-use="star"><label>Table 2</label><caption><p id="d2e791">Information on hydrographic data collected from World Ocean Database, and the data information after quality control (QC). See main text for acronyms' full names.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="8">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:colspec colnum="5" colname="col5" align="left"/>
     <oasis:colspec colnum="6" colname="col6" align="right"/>
     <oasis:colspec colnum="7" colname="col7" align="right"/>
     <oasis:colspec colnum="8" colname="col8" align="right"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1">Platform</oasis:entry>
         <oasis:entry rowsep="1" namest="col2" nameend="col4" align="center">Original data information </oasis:entry>
         <oasis:entry colname="col5"/>
         <oasis:entry rowsep="1" namest="col6" nameend="col8" align="center">Data information after QC </oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">Data</oasis:entry>
         <oasis:entry colname="col3">Stations</oasis:entry>
         <oasis:entry colname="col4">Cruises</oasis:entry>
         <oasis:entry colname="col5"/>
         <oasis:entry colname="col6">Data</oasis:entry>
         <oasis:entry colname="col7">Stations</oasis:entry>
         <oasis:entry colname="col8">Cruises</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">APB</oasis:entry>
         <oasis:entry colname="col2">692 302</oasis:entry>
         <oasis:entry colname="col3">46 454</oasis:entry>
         <oasis:entry colname="col4">189</oasis:entry>
         <oasis:entry colname="col5"/>
         <oasis:entry colname="col6">543 714</oasis:entry>
         <oasis:entry colname="col7">37 209</oasis:entry>
         <oasis:entry colname="col8">154</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">CTD</oasis:entry>
         <oasis:entry colname="col2">157 914 052</oasis:entry>
         <oasis:entry colname="col3">315 177</oasis:entry>
         <oasis:entry colname="col4">8785</oasis:entry>
         <oasis:entry colname="col5"/>
         <oasis:entry colname="col6">135 584 007</oasis:entry>
         <oasis:entry colname="col7">297 036</oasis:entry>
         <oasis:entry colname="col8">8415</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">GLD</oasis:entry>
         <oasis:entry colname="col2">119 302 218</oasis:entry>
         <oasis:entry colname="col3">288 840</oasis:entry>
         <oasis:entry colname="col4">384</oasis:entry>
         <oasis:entry colname="col5"/>
         <oasis:entry colname="col6">69 834 989</oasis:entry>
         <oasis:entry colname="col7">285 778</oasis:entry>
         <oasis:entry colname="col8">380</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">OSD</oasis:entry>
         <oasis:entry colname="col2">8 885 341</oasis:entry>
         <oasis:entry colname="col3">592 225</oasis:entry>
         <oasis:entry colname="col4">21 169</oasis:entry>
         <oasis:entry colname="col5"/>
         <oasis:entry colname="col6">6 942 902</oasis:entry>
         <oasis:entry colname="col7">505 780</oasis:entry>
         <oasis:entry colname="col8">17 671</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">PFL</oasis:entry>
         <oasis:entry colname="col2">284 781 001</oasis:entry>
         <oasis:entry colname="col3">700 798</oasis:entry>
         <oasis:entry colname="col4">9511</oasis:entry>
         <oasis:entry colname="col5"/>
         <oasis:entry colname="col6">255 423 345</oasis:entry>
         <oasis:entry colname="col7">680 531</oasis:entry>
         <oasis:entry colname="col8">9099</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">UOR</oasis:entry>
         <oasis:entry colname="col2">3 373 799</oasis:entry>
         <oasis:entry colname="col3">26 699</oasis:entry>
         <oasis:entry colname="col4">7</oasis:entry>
         <oasis:entry colname="col5"/>
         <oasis:entry colname="col6">3 304 158</oasis:entry>
         <oasis:entry colname="col7">25 813</oasis:entry>
         <oasis:entry colname="col8">6</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">MRB</oasis:entry>
         <oasis:entry colname="col2">1 459 032</oasis:entry>
         <oasis:entry colname="col3">293 734</oasis:entry>
         <oasis:entry colname="col4">65</oasis:entry>
         <oasis:entry colname="col5"/>
         <oasis:entry colname="col6">1 019 565</oasis:entry>
         <oasis:entry colname="col7">88 487</oasis:entry>
         <oasis:entry colname="col8">19</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">DRB</oasis:entry>
         <oasis:entry colname="col2">807 938</oasis:entry>
         <oasis:entry colname="col3">20 521</oasis:entry>
         <oasis:entry colname="col4">3</oasis:entry>
         <oasis:entry colname="col5"/>
         <oasis:entry colname="col6">0</oasis:entry>
         <oasis:entry colname="col7">0</oasis:entry>
         <oasis:entry colname="col8">0</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Total</oasis:entry>
         <oasis:entry colname="col2">577 215 683</oasis:entry>
         <oasis:entry colname="col3">2 284 448</oasis:entry>
         <oasis:entry colname="col4">40 113</oasis:entry>
         <oasis:entry colname="col5"/>
         <oasis:entry colname="col6">472 652 680</oasis:entry>
         <oasis:entry colname="col7">1 920 634</oasis:entry>
         <oasis:entry colname="col8">35 744</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

</sec>
<sec id="Ch1.S2.SS2">
  <label>2.2</label><title>Data quality control</title>
      <p id="d2e1114">Given that the data were collected from multiple platforms using various methods over a long-time span and broad spatial range, quality control (QC) was essential (Du et al., 2021; Wang et al., 2025). Following the QC procedures developed by the World Ocean Database (WOD) (Garcia et al., 2024), we applied comprehensive QC protocols (Fig. 1) to both CCHDO and WOD datasets, including hydrographic and nutrient variables.</p>

      <fig id="F1" specific-use="star"><label>Figure 1</label><caption><p id="d2e1119">Data quality control procedures for temperature, salinity and nutrients collected from the CLIVAR and Carbon Hydrographic Data Office (CCHDO) and the World Ocean Database (WOD) datasets.</p></caption>
        <graphic xlink:href="https://essd.copernicus.org/articles/18/2951/2026/essd-18-2951-2026-f01.png"/>

      </fig>

      <fig id="F2"><label>Figure 2</label><caption><p id="d2e1130">Spatial and temporal distributions of NO<inline-formula><mml:math id="M32" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> (nitrate plus nitrite) after quality control in the North Pacific. <bold>(a)</bold> Distribution of NO<inline-formula><mml:math id="M33" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> data locations, with points color-coded by year; <bold>(b)</bold> station counts per year; <bold>(c)</bold> station counts per month.</p></caption>
        <graphic xlink:href="https://essd.copernicus.org/articles/18/2951/2026/essd-18-2951-2026-f02.png"/>

      </fig>

      <p id="d2e1173">Four levels of QC were applied to identify and remove potentially erroneous or low-quality records from the CCHDO and WOD datasets. The first level targeted individual measurements, including several checks. (1) A range check was conducted by defining depth-dependent acceptable value ranges for each parameter; data falling outside these ranges were flagged as invalid. This check was applied to temperature, salinity, NO<inline-formula><mml:math id="M34" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, NO<inline-formula><mml:math id="M35" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, DIP, and Si(OH)<sub>4</sub>. Note that the NO<inline-formula><mml:math id="M37" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> denotes the sum concentration of NO<inline-formula><mml:math id="M38" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> and NO<inline-formula><mml:math id="M39" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>. At stations lacking direct NO<inline-formula><mml:math id="M40" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> measurements, NO<inline-formula><mml:math id="M41" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> concentrations were derived by summing discrete NO<inline-formula><mml:math id="M42" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> and NO<inline-formula><mml:math id="M43" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> observations. (2) An empirical relationship check was performed to verify consistency among paired variables based on predefined acceptable domains, including temperature–salinity, temperature–NO<inline-formula><mml:math id="M44" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, temperature–NO<inline-formula><mml:math id="M45" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, temperature–DIP, temperature–Si(OH)<sub>4</sub>, salinity–NO<inline-formula><mml:math id="M47" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, salinity–NO<inline-formula><mml:math id="M48" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, salinity–DIP, salinity–Si(OH)<sub>4</sub>, NO<inline-formula><mml:math id="M50" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>–DIP, and NO<inline-formula><mml:math id="M51" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>–Si(OH)<sub>4</sub>. (3) A six-standard-deviation check was conducted by calculating the mean and standard deviation at each depth level; values falling beyond six standard deviations were flagged as outliers. (4) A gradient check assessed the vertical gradients of each parameter at each depth level across stations; data showing abnormal gradients exceeding five standard deviations from the mean were flagged as questionable. (5) A depth/potential density (<inline-formula><mml:math id="M53" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="italic">θ</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) inversion check was applied to detect unrealistic reversals in parameters such as temperature and nutrients, which typically exhibit monotonic relationships with depth or <inline-formula><mml:math id="M54" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="italic">θ</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> in stratified waters; measurements violating preset thresholds for depth–temperature, depth–NO<inline-formula><mml:math id="M55" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, depth–DIP, depth–Si(OH)<sub>4</sub>, <inline-formula><mml:math id="M57" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="italic">θ</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>–temperature, <inline-formula><mml:math id="M58" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="italic">θ</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>–NO<inline-formula><mml:math id="M59" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M60" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="italic">θ</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>–DIP, and <inline-formula><mml:math id="M61" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="italic">θ</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>–Si(OH)<sub>4</sub> were flagged. (6) A spike check was implemented to identify abrupt deviations (spikes) between a measurement and its adjacent vertical neighbors; if the difference exceeded a defined threshold, the data point was flagged as suspect. This check was applied to temperature, NO<inline-formula><mml:math id="M63" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, DIP, and Si(OH)<sub>4</sub>. (7) Only measurements with an original quality flag of “good” from CCHDO and WOD were retained, while those marked as questionable or erroneous were flagged as outliers.</p>
      <p id="d2e1526">Building on the individual-level QC, we implemented additional QC at the station and cruise levels. At the station level, if a station profile contained more than 20 % flagged data points, all data from that station were flagged as questionable. At the cruise level, if over 30 % of a cruise's data were flagged, all data from that cruise were flagged. The final step integrated flags from all three levels (individual, station, and cruise), and any data flagged at any level were excluded. This hierarchical QC protocol effectively eliminates low-quality data. Although this approach may discard some high-quality measurements, the large volume of available data necessitates strict QC to ensure reliability.</p>
      <p id="d2e1529">After quality control, the CCHDO dataset retained 214 943 (9120), 197 539 (8228), 222 234 (9457) and 210 447 (8123) data points (stations), accounting for 94.2 % (95.1 %), 100.0 % (99.9 %), 98.6 % (98.5 %) and 99.0 % (98.8 %) of the original data points (stations) for NO<inline-formula><mml:math id="M65" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, NO<inline-formula><mml:math id="M66" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, DIP, and Si(OH)<sub>4</sub>, respectively (Table 1). The retained stations cover nearly the entire North Pacific Ocean (Fig. 2a), spanning from 1972 to 2023. Most observations were collected after 1980, with a substantial increase after 1990 (Fig. 2b). Seasonally, the number of stations in June, July, and August was approximately three times greater than that in March and December (Fig. 2c).</p>
      <p id="d2e1565">Following quality control, the final WOD dataset comprised 472 652 680 temperature and salinity data points from 1 920 634 stations across 35 744 cruises, spanning 1895 to 2024. These represent 81.9 % of the original observations, 84.1 % of the original stations, and 89.1 % of the original cruises, respectively (Table 2). Spatially, station counts per <inline-formula><mml:math id="M68" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mi mathvariant="italic">°</mml:mi><mml:mo>×</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mi mathvariant="italic">°</mml:mi></mml:mrow></mml:math></inline-formula> grid cell range from 1 to 31 851, with a mean of 249 stations per cell (Fig. 3a). High sampling densities are found off eastern Japan and western North America, resulting from high frequency observations from CTD and OSD platforms, whereas elevated counts in the southwestern North Pacific primarily result from MRB observations. Temporally, fewer than 300 stations per year were collected before 1930. The annual number of stations exceeded 10 000 after 1964 and peaked at approximately 100 000 in 2021 (Fig. 3b). Seasonally, station numbers are highest from May to August (Fig. 3c). Overall, the collected WOD dataset provides 2127–2393 times more observations and 202 times more station records than the CCHDO dataset.</p>

      <fig id="F3"><label>Figure 3</label><caption><p id="d2e1586">Spatial and temporal distribution of the World Ocean Database (WOD) data after quality control. <bold>(a)</bold> Station counts per <inline-formula><mml:math id="M69" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mi mathvariant="italic">°</mml:mi><mml:mo>×</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mi mathvariant="italic">°</mml:mi></mml:mrow></mml:math></inline-formula> grid cell; <bold>(b)</bold> station counts per year; <bold>(c)</bold> station counts per month.</p></caption>
        <graphic xlink:href="https://essd.copernicus.org/articles/18/2951/2026/essd-18-2951-2026-f03.png"/>

      </fig>

</sec>
<sec id="Ch1.S2.SS3">
  <label>2.3</label><title>Machine learning and nutrient reconstruction</title>
      <p id="d2e1628">After rigorous data quality control, CCHDO data were used to train machine learning models. Three algorithms including Random Forest (RF), Light Gradient Boosting Machine (LightGBM), and Gaussian Process Regression (GPR) were applied to establish the relationship between environmental parameters and nutrient concentrations (Fig. 4). These methods are widely used in marine science (Hu et al., 2021; Huang et al., 2022; Yu et al., 2024; Chen et al., 2023; Sundararaman and Shanmugam, 2024). The use of diverse models helps reduce algorithm selection bias. RF is an ensemble technique based on bagging, which builds multiple independent decision trees and aggregates their outputs by voting or averaging (Liaw and Wiener, 2002). Its strengths include high predictive accuracy and reduced overfitting owing to the large number of trees. RF has been applied to predict global primary production (Huang et al., 2021), chlorophyll concentrations (Madani et al., 2024), nutrients (Chen et al., 2023, 2024), dissolved iron (Huang et al., 2022), surface ocean <inline-formula><mml:math id="M70" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula>CO<sub>2</sub> (Chen et al., 2019), and N<sub>2</sub> fixation rates (Yu et al., 2024).</p>

      <fig id="F4" specific-use="star"><label>Figure 4</label><caption><p id="d2e1658">Flowchart of the machine learning framework and its application to WOD hydrographic data for nutrient reconstruction.</p></caption>
        <graphic xlink:href="https://essd.copernicus.org/articles/18/2951/2026/essd-18-2951-2026-f04.png"/>

      </fig>

      <p id="d2e1667">LightGBM is an ensemble learning algorithm based on Gradient Boosting Decision Trees (GBDT). Compared to standard GBDT, LightGBM employs a leaf-wise tree growth strategy and a histogram-based binning technique to improve predictive accuracy and computational efficiency (Ke et al., 2017). It has been successfully applied to predict water levels (Gan et al., 2021), salinity (Dong et al., 2022; Wang et al., 2022), and chlorophyll <inline-formula><mml:math id="M73" display="inline"><mml:mi>a</mml:mi></mml:math></inline-formula> concentration (Su et al., 2021). GPR is a non-parametric Bayesian approach that infers relationships by defining a prior distribution over functions via kernel-based covariance matrices, rather than estimating fixed coefficients. This flexibility allows GPR to capture complex, nonlinear input–output relationships and to quantify prediction uncertainty. GPR has been used in oceanography to estimate global dissolved oxygen and nutrient concentrations (Sundararaman and Shanmugam, 2024).</p>
      <p id="d2e1678">In this study, we used spatial coordinates (longitude, latitude, depth), temporal variables (month and year), and water mass properties (represented by potential temperature and salinity) as environmental predictors of nutrient concentrations. The time predictors used month and year with decimals to capture seasonal, interannual, and long-term variability. The North Pacific contains distinct water masses, including North Pacific Subtropical Water, North Pacific Intermediate Water, Antarctic Intermediate Water, Western South Pacific Central Water, North Pacific Deep Water, and Pacific Deep Water, as well as Circumpolar Deep Water (e.g., Talley et al., 2011; Fuhr et al., 2021). These water masses mix to form different water types associated with distinct nutrient concentrations (Fig. 5). Water types have been found to be an important parameter to reconstruct nutrient concentrations in the South China Sea (Du et al., 2021). Thus, potential temperature and salinity serve as proxies for water mass identification.</p>

      <fig id="F5"><label>Figure 5</label><caption><p id="d2e1683">The water masses – indicated by salinity and potential temperature (<inline-formula><mml:math id="M74" display="inline"><mml:mi mathvariant="italic">θ</mml:mi></mml:math></inline-formula>) – and NO<inline-formula><mml:math id="M75" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> (NO<inline-formula><mml:math id="M76" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M77" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula> NO<inline-formula><mml:math id="M78" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>; color shading) relationships in the North Pacific. The temperature and salinity data were collected from the CCHDO dataset. The gray contour lines and number denote the potential density anomaly. The typical water masses are shown as follows: North Pacific Central Water (NPCW), North Pacific Subtropical Underwater (NPSTUW), North Pacific Subtropical Mode Water (NPSTMW), North Pacific Intermediate Water (NPIW), Dichothermal Water (DtW), Mesothermal Water (MtW), Antarctic Intermediate Water (AAIW), Western South Pacific Central Water (WSPCW), Pacific Deep Water (PDW), and Circumpolar Deep Water (CDW). The water masses and their acronyms are following the classifications in Talley et al. (2011) and Fuhr et al. (2021).</p></caption>
        <graphic xlink:href="https://essd.copernicus.org/articles/18/2951/2026/essd-18-2951-2026-f05.png"/>

      </fig>

</sec>
</sec>
<sec id="Ch1.S3">
  <label>3</label><title>Results</title>
<sec id="Ch1.S3.SS1">
  <label>3.1</label><title>Error estimation</title>
      <p id="d2e1758">Leave-one-out cross-validation was primarily used to quantify model reconstruction errors. The CCHDO dataset was divided into training and testing subsets for model development and performance evaluation, respectively. To assess how data partitioning affects error metrics, we implemented four validation methods based on different data-selection strategies (Fig. 6a). The first three methods involved partitioning the CCHDO dataset into training (80 %) and testing (20 %) subsets. These methods employed three data selection strategies: (1) sample-random, by withholding 20 % of individual samples; (2) station-random, by withholding 20 % of stations; and (3) cruise-random, by withholding 20 % of cruises. Predictions for the held-out subsets, generated using their respective spatial, temporal, and water mass property data, were compared against the actual withheld nutrient measurements to calculate error metrics. These partitioning strategies were designed to evaluate potential errors under the sparse and non-uniform spatiotemporal distribution of observations: Error 1 represented an optimistic estimate (validation data are likely co-located with training data in space and time), Error 3 represented a conservative, generalized scenario (validation data are independent of training data), Error 2 provided an intermediate estimate (validation data may share spatial/temporal context with training data within the same cruise). The choice of error metric (Error 1, 2, or 3) should be guided by the degree of extrapolation in the intended application relative to the training data's spatiotemporal distribution.</p>

      <fig id="F6" specific-use="star"><label>Figure 6</label><caption><p id="d2e1763">Schematic of the error estimation procedure. <bold>(a)</bold> Error estimation based on three types of data selection strategy; <bold>(b)</bold> assessing temporal error evolution by excluding the data at Station ALOHA; <bold>(c)</bold> examining the models' reconstruction error using the hydrographic and nutrient data before 1970. The <inline-formula><mml:math id="M79" display="inline"><mml:mi>T</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M80" display="inline"><mml:mi>S</mml:mi></mml:math></inline-formula> denote the potential temperature and salinity, respectively.</p></caption>
        <graphic xlink:href="https://essd.copernicus.org/articles/18/2951/2026/essd-18-2951-2026-f06.png"/>

      </fig>

      <p id="d2e1795">The validation results for reconstructed NO<inline-formula><mml:math id="M81" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> versus observations under the first three data-selection strategies are shown in Fig. 7. RF and GPR exhibited nearly identical performance, with regression slopes of 0.992–0.998, <inline-formula><mml:math id="M82" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">0.992</mml:mn></mml:mrow></mml:math></inline-formula>, and Root Mean Squared Errors (RMSEs) between 0.734 and 1.313 <inline-formula><mml:math id="M83" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup> (Fig. 7a, c, d, f, g and i). LightGBM showed slightly lower accuracy (slope: 0.991–0.995; <inline-formula><mml:math id="M85" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>: 0.991–0.996; RMSEs: 0.780–1.419 <inline-formula><mml:math id="M86" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup>) (Fig. 7b, e and h). Across different data-selection strategies, sample-random (Error 1) yielded the lowest errors (RMSEs: 0.734–0.983 <inline-formula><mml:math id="M88" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup>) (Fig. 7a–c), station-random (Error 2) was intermediate (RMSEs: 0.908–1.313 <inline-formula><mml:math id="M90" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup>) (Fig. 7d–f), and cruise-random (Error 3) produced the highest errors (RMSEs: 1.243–1.424 <inline-formula><mml:math id="M92" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup>) (Fig. 7; Table 3). This gradient in error estimates underscores the necessity of employing different data-selection strategies for a comprehensive error assessment. The high slopes and <inline-formula><mml:math id="M94" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> values (<inline-formula><mml:math id="M95" display="inline"><mml:mrow><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">0.99</mml:mn></mml:mrow></mml:math></inline-formula>) achieved across all algorithms and data-selection strategies confirmed the robustness of the nutrient reconstructions.</p>

      <fig id="F7" specific-use="star"><label>Figure 7</label><caption><p id="d2e1962">Validating the reconstructed NO<inline-formula><mml:math id="M96" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> concentrations using leave-one-out cross-validation with different data selection strategies and machine learning methods. Plots shown in row 1 correspond to the sample random strategy <bold>(a–c)</bold>, row 2 correspond to the station random strategy <bold>(d, e)</bold>, and row 3 correspond to the cruise random strategy <bold>(g–i)</bold>. Plots shown in column 1 correspond to the Random Forest (RF; <bold>a, d, g</bold>), column 2 correspond to the LightGBM <bold>(b, e, h)</bold>, and column 3 correspond to the Gaussian Process Regression (GPR; <bold>c, f, i</bold>). The black lines and text show the fitted linear regressions, regression equations, coefficient of determination (<inline-formula><mml:math id="M97" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>), <inline-formula><mml:math id="M98" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula> values, and Root Mean Squared Errors (RMSEs). The color represents the data density (<inline-formula><mml:math id="M99" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula>, number of observations). Note that a logarithmic scale is applied to <inline-formula><mml:math id="M100" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula>.</p></caption>
        <graphic xlink:href="https://essd.copernicus.org/articles/18/2951/2026/essd-18-2951-2026-f07.png"/>

      </fig>

<table-wrap id="T3" specific-use="star"><label>Table 3</label><caption><p id="d2e2037">The Root Mean Squared Errors of nutrient reconstruction from different error evaluation strategies (unit: <inline-formula><mml:math id="M101" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup>).</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="16">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="center"/>
     <oasis:colspec colnum="3" colname="col3" align="center"/>
     <oasis:colspec colnum="4" colname="col4" align="center"/>
     <oasis:colspec colnum="5" colname="col5" align="left"/>
     <oasis:colspec colnum="6" colname="col6" align="center"/>
     <oasis:colspec colnum="7" colname="col7" align="center"/>
     <oasis:colspec colnum="8" colname="col8" align="center"/>
     <oasis:colspec colnum="9" colname="col9" align="left"/>
     <oasis:colspec colnum="10" colname="col10" align="center"/>
     <oasis:colspec colnum="11" colname="col11" align="center"/>
     <oasis:colspec colnum="12" colname="col12" align="center"/>
     <oasis:colspec colnum="13" colname="col13" align="left"/>
     <oasis:colspec colnum="14" colname="col14" align="center"/>
     <oasis:colspec colnum="15" colname="col15" align="center"/>
     <oasis:colspec colnum="16" colname="col16" align="center"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1">Data selection</oasis:entry>
         <oasis:entry rowsep="1" namest="col2" nameend="col4">NO<inline-formula><mml:math id="M103" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col5"/>
         <oasis:entry rowsep="1" namest="col6" nameend="col8">NO<inline-formula><mml:math id="M104" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col9"/>
         <oasis:entry rowsep="1" namest="col10" nameend="col12">DIP </oasis:entry>
         <oasis:entry colname="col13"/>
         <oasis:entry rowsep="1" namest="col14" nameend="col16">Si(OH)<sub>4</sub></oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">strategy</oasis:entry>
         <oasis:entry colname="col2">RF</oasis:entry>
         <oasis:entry colname="col3">LightGBM</oasis:entry>
         <oasis:entry colname="col4">GPR</oasis:entry>
         <oasis:entry colname="col5"/>
         <oasis:entry colname="col6">RF</oasis:entry>
         <oasis:entry colname="col7">LightGBM</oasis:entry>
         <oasis:entry colname="col8">GPR</oasis:entry>
         <oasis:entry colname="col9"/>
         <oasis:entry colname="col10">RF</oasis:entry>
         <oasis:entry colname="col11">LightGBM</oasis:entry>
         <oasis:entry colname="col12">GPR</oasis:entry>
         <oasis:entry colname="col13"/>
         <oasis:entry colname="col14">RF</oasis:entry>
         <oasis:entry colname="col15">LightGBM</oasis:entry>
         <oasis:entry colname="col16">GPR</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Sample random</oasis:entry>
         <oasis:entry colname="col2">0.724</oasis:entry>
         <oasis:entry colname="col3">0.924</oasis:entry>
         <oasis:entry colname="col4">0.760</oasis:entry>
         <oasis:entry colname="col5"/>
         <oasis:entry colname="col6">0.049</oasis:entry>
         <oasis:entry colname="col7">0.054</oasis:entry>
         <oasis:entry colname="col8">0.079</oasis:entry>
         <oasis:entry colname="col9"/>
         <oasis:entry colname="col10">0.056</oasis:entry>
         <oasis:entry colname="col11">0.070</oasis:entry>
         <oasis:entry colname="col12">0.055</oasis:entry>
         <oasis:entry colname="col13"/>
         <oasis:entry colname="col14">1.90</oasis:entry>
         <oasis:entry colname="col15">2.30</oasis:entry>
         <oasis:entry colname="col16">1.53</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Station random</oasis:entry>
         <oasis:entry colname="col2">0.780</oasis:entry>
         <oasis:entry colname="col3">0.983</oasis:entry>
         <oasis:entry colname="col4">0.908</oasis:entry>
         <oasis:entry colname="col5"/>
         <oasis:entry colname="col6">0.065</oasis:entry>
         <oasis:entry colname="col7">0.068</oasis:entry>
         <oasis:entry colname="col8">0.072</oasis:entry>
         <oasis:entry colname="col9"/>
         <oasis:entry colname="col10">0.058</oasis:entry>
         <oasis:entry colname="col11">0.071</oasis:entry>
         <oasis:entry colname="col12">0.065</oasis:entry>
         <oasis:entry colname="col13"/>
         <oasis:entry colname="col14">2.07</oasis:entry>
         <oasis:entry colname="col15">2.45</oasis:entry>
         <oasis:entry colname="col16">2.20</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Cruise random</oasis:entry>
         <oasis:entry colname="col2">1.313</oasis:entry>
         <oasis:entry colname="col3">1.409</oasis:entry>
         <oasis:entry colname="col4">1.243</oasis:entry>
         <oasis:entry colname="col5"/>
         <oasis:entry colname="col6">0.054</oasis:entry>
         <oasis:entry colname="col7">0.057</oasis:entry>
         <oasis:entry colname="col8">0.071</oasis:entry>
         <oasis:entry colname="col9"/>
         <oasis:entry colname="col10">0.080</oasis:entry>
         <oasis:entry colname="col11">0.089</oasis:entry>
         <oasis:entry colname="col12">0.084</oasis:entry>
         <oasis:entry colname="col13"/>
         <oasis:entry colname="col14">2.79</oasis:entry>
         <oasis:entry colname="col15">3.07</oasis:entry>
         <oasis:entry colname="col16">2.94</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">ALOHA validation</oasis:entry>
         <oasis:entry colname="col2">0.701</oasis:entry>
         <oasis:entry colname="col3">0.842</oasis:entry>
         <oasis:entry colname="col4">0.674</oasis:entry>
         <oasis:entry colname="col5"/>
         <oasis:entry colname="col6">–</oasis:entry>
         <oasis:entry colname="col7">–</oasis:entry>
         <oasis:entry colname="col8">–</oasis:entry>
         <oasis:entry colname="col9"/>
         <oasis:entry colname="col10">0.066</oasis:entry>
         <oasis:entry colname="col11">0.079</oasis:entry>
         <oasis:entry colname="col12">0.064</oasis:entry>
         <oasis:entry colname="col13"/>
         <oasis:entry colname="col14">2.13</oasis:entry>
         <oasis:entry colname="col15">2.48</oasis:entry>
         <oasis:entry colname="col16">2.32</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d2e2411">Reconstruction errors for NO<inline-formula><mml:math id="M106" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, DIP, and Si(OH)<sub>4</sub> are summarized in Figs. S1–S3 in the Supplement and Table 3. Across methods, the RMSEs were below 0.079 <inline-formula><mml:math id="M108" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup> for NO<inline-formula><mml:math id="M110" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, 0.089 <inline-formula><mml:math id="M111" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup> for DIP, and 3.07 <inline-formula><mml:math id="M113" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup> for Si(OH)<sub>4</sub>. DIP and Si(OH)<sub>4</sub> exhibited similar error trends: RMSEs increased from sample-random to station-random to cruise-random selection. In contrast, NO<inline-formula><mml:math id="M117" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> reconstruction exhibited lower accuracy than NO<inline-formula><mml:math id="M118" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, DIP, and Si(OH)<sub>4</sub>, with regression slopes of 0.48–0.68 and <inline-formula><mml:math id="M120" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> values of 0.32–0.72. RF and LightGBM outperformed GPR for NO<inline-formula><mml:math id="M121" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>. The poorer NO<inline-formula><mml:math id="M122" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> performance likely reflects its generally low concentrations (mostly <inline-formula><mml:math id="M123" display="inline"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">0.5</mml:mn></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M124" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup>) and high biological variability. Thus, we highlight NO<inline-formula><mml:math id="M126" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> as a high-uncertainty reconstruction.</p>
      <p id="d2e2638">Understanding the spatiotemporal structure of reconstruction errors is also important for assessing the models' reconstruction applicability. As shown in Figs. S4–S7, the reconstruction errors of NO<inline-formula><mml:math id="M127" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, DIP, and Si(OH)<sub>4</sub> are generally small in the surface layer, increase with depth to maxima at the nutricline, and then decrease to low values in deep layers. However, the random errors associated with individual cruise observations for Si(OH)<sub>4</sub> display no evident vertical pattern. Horizontally, we paid particular attention to surface waters due to their greatest concentration gradients. The horizontal distribution shows that the errors are small in the western NPSG (a nutrient-depleted region) but are large in the subarctic gyre and close to the equatorial regions (nutrient-replete regions; Figs. S8–S11). Here, we particularly examined the nutrient reconstruction errors in the oligotrophic NPSG. The oligotrophic regimes are defined as regions where NO<inline-formula><mml:math id="M130" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, NO<inline-formula><mml:math id="M131" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, DIP, and Si(OH)<sub>4</sub> concentrations are <inline-formula><mml:math id="M133" display="inline"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">0.2</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M134" display="inline"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">0.2</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M135" display="inline"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">0.2</mml:mn></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M136" display="inline"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">5.0</mml:mn></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M137" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup>, respectively. As shown in Table 4, the reconstruction errors in these regimes are <inline-formula><mml:math id="M139" display="inline"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">0.574</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M140" display="inline"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">0.056</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M141" display="inline"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">0.084</mml:mn></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M142" display="inline"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">1.88</mml:mn></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M143" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup> for NO<inline-formula><mml:math id="M145" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, NO<inline-formula><mml:math id="M146" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, DIP, and Si(OH)<sub>4</sub>, respectively, which are evidently lower than the overall RMSEs for the entire North Pacific (Table 3). Among these models, the RF generally performs the best compared to the others. This confirms that absolute errors decrease in oligotrophic regimes. Since the number of summer observations is up to three times greater than that in winter and spring, we further examined the seasonal variation of errors. Overall, no evident seasonal variations are displayed. Only in the case of random cruise selection was the NO<inline-formula><mml:math id="M148" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> error shown to be greater in spring (March to May) than in other seasons (Fig. S12). For other cases and nutrients, seasonal variation in error was not evident. On a decadal timescale, the reconstruction errors display a slight decreasing trend, particularly for DIP, from 1973 to 2020 (Fig. S13), implying that the errors might be smaller in recent decades than in previous ones.</p>

<table-wrap id="T4" specific-use="star"><label>Table 4</label><caption><p id="d2e2875">The Root Mean Squared Errors of nutrient reconstruction from different error evaluation strategies in surface oligotrophic regimes (unit: <inline-formula><mml:math id="M149" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup>).</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="16">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="center"/>
     <oasis:colspec colnum="3" colname="col3" align="center"/>
     <oasis:colspec colnum="4" colname="col4" align="center"/>
     <oasis:colspec colnum="5" colname="col5" align="left"/>
     <oasis:colspec colnum="6" colname="col6" align="center"/>
     <oasis:colspec colnum="7" colname="col7" align="center"/>
     <oasis:colspec colnum="8" colname="col8" align="center"/>
     <oasis:colspec colnum="9" colname="col9" align="left"/>
     <oasis:colspec colnum="10" colname="col10" align="center"/>
     <oasis:colspec colnum="11" colname="col11" align="center"/>
     <oasis:colspec colnum="12" colname="col12" align="center"/>
     <oasis:colspec colnum="13" colname="col13" align="left"/>
     <oasis:colspec colnum="14" colname="col14" align="center"/>
     <oasis:colspec colnum="15" colname="col15" align="center"/>
     <oasis:colspec colnum="16" colname="col16" align="center"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1">Data selection</oasis:entry>
         <oasis:entry rowsep="1" namest="col2" nameend="col4">NO<inline-formula><mml:math id="M151" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col5"/>
         <oasis:entry rowsep="1" namest="col6" nameend="col8">NO<inline-formula><mml:math id="M152" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col9"/>
         <oasis:entry rowsep="1" namest="col10" nameend="col12">DIP </oasis:entry>
         <oasis:entry colname="col13"/>
         <oasis:entry rowsep="1" namest="col14" nameend="col16">Si(OH)<sub>4</sub></oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">strategy</oasis:entry>
         <oasis:entry colname="col2">RF</oasis:entry>
         <oasis:entry colname="col3">LightGBM</oasis:entry>
         <oasis:entry colname="col4">GPR</oasis:entry>
         <oasis:entry colname="col5"/>
         <oasis:entry colname="col6">RF</oasis:entry>
         <oasis:entry colname="col7">LightGBM</oasis:entry>
         <oasis:entry colname="col8">GPR</oasis:entry>
         <oasis:entry colname="col9"/>
         <oasis:entry colname="col10">RF</oasis:entry>
         <oasis:entry colname="col11">LightGBM</oasis:entry>
         <oasis:entry colname="col12">GPR</oasis:entry>
         <oasis:entry colname="col13"/>
         <oasis:entry colname="col14">RF</oasis:entry>
         <oasis:entry colname="col15">LightGBM</oasis:entry>
         <oasis:entry colname="col16">GPR</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Sample random</oasis:entry>
         <oasis:entry colname="col2">0.290</oasis:entry>
         <oasis:entry colname="col3">0.567</oasis:entry>
         <oasis:entry colname="col4">0.444</oasis:entry>
         <oasis:entry colname="col5"/>
         <oasis:entry colname="col6">0.018</oasis:entry>
         <oasis:entry colname="col7">0.035</oasis:entry>
         <oasis:entry colname="col8">0.048</oasis:entry>
         <oasis:entry colname="col9"/>
         <oasis:entry colname="col10">0.028</oasis:entry>
         <oasis:entry colname="col11">0.042</oasis:entry>
         <oasis:entry colname="col12">0.039</oasis:entry>
         <oasis:entry colname="col13"/>
         <oasis:entry colname="col14">1.19</oasis:entry>
         <oasis:entry colname="col15">0.90</oasis:entry>
         <oasis:entry colname="col16">1.30</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Station random</oasis:entry>
         <oasis:entry colname="col2">0.303</oasis:entry>
         <oasis:entry colname="col3">0.457</oasis:entry>
         <oasis:entry colname="col4">0.474</oasis:entry>
         <oasis:entry colname="col5"/>
         <oasis:entry colname="col6">0.030</oasis:entry>
         <oasis:entry colname="col7">0.030</oasis:entry>
         <oasis:entry colname="col8">0.043</oasis:entry>
         <oasis:entry colname="col9"/>
         <oasis:entry colname="col10">0.036</oasis:entry>
         <oasis:entry colname="col11">0.045</oasis:entry>
         <oasis:entry colname="col12">0.043</oasis:entry>
         <oasis:entry colname="col13"/>
         <oasis:entry colname="col14">1.24</oasis:entry>
         <oasis:entry colname="col15">1.51</oasis:entry>
         <oasis:entry colname="col16">1.51</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Cruise random</oasis:entry>
         <oasis:entry colname="col2">0.378</oasis:entry>
         <oasis:entry colname="col3">0.457</oasis:entry>
         <oasis:entry colname="col4">0.574</oasis:entry>
         <oasis:entry colname="col5"/>
         <oasis:entry colname="col6">0.030</oasis:entry>
         <oasis:entry colname="col7">0.029</oasis:entry>
         <oasis:entry colname="col8">0.056</oasis:entry>
         <oasis:entry colname="col9"/>
         <oasis:entry colname="col10">0.075</oasis:entry>
         <oasis:entry colname="col11">0.077</oasis:entry>
         <oasis:entry colname="col12">0.084</oasis:entry>
         <oasis:entry colname="col13"/>
         <oasis:entry colname="col14">1.85</oasis:entry>
         <oasis:entry colname="col15">1.88</oasis:entry>
         <oasis:entry colname="col16">1.75</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d2e3200">A fourth validation step assessed the model's temporal performance at Station ALOHA (Error 4; Fig. 6b). To test this, we withheld all observations from ALOHA (which, since 1988, represent 8.52 %, 8.45 %, and 8.11 % of the total Si(OH)<sub>4</sub>, NO<inline-formula><mml:math id="M155" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, and DIP records, respectively) from model training. We then reconstructed nutrient concentrations using space, time, and water-type predictors at Station ALOHA. NO<inline-formula><mml:math id="M156" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> was excluded due to insufficient observations. For NO<inline-formula><mml:math id="M157" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, the regression slopes between reconstruction and observations were 0.99, 0.98, and 0.99, with RMSEs of 0.701, 0.842, and 0.674 <inline-formula><mml:math id="M158" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup> for RF, LightGBM, and GPR, respectively; <inline-formula><mml:math id="M160" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> values exceeded 0.997 for all models (Fig. 8a). RF and GPR slightly outperformed LightGBM. All models accurately reproduced the NO<inline-formula><mml:math id="M161" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> profiles (Fig. 8b). The reconstruction errors for DIP were 0.066, 0.079, and 0.064 <inline-formula><mml:math id="M162" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup> for RF, LightGBM, and GPR, respectively. The corresponding errors for Si(OH)<sub>4</sub> were 2.13, 2.48, and 2.32 <inline-formula><mml:math id="M165" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup> (Table 3, Figs. S14 and S15).</p>

      <fig id="F8" specific-use="star"><label>Figure 8</label><caption><p id="d2e3344">Validating the reconstructed nutrient concentrations at Station ALOHA. <bold>(a)</bold> Reconstructed NO<inline-formula><mml:math id="M167" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M168" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula> NO<inline-formula><mml:math id="M169" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> (NO<inline-formula><mml:math id="M170" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>) vs. observations: Random Forest (RF; red dots), LightGBM (blue dots), and Gaussian Process Regression (GPR; green dots). <bold>(b)</bold> Profiles of observed (black dots) and reconstructed NO<inline-formula><mml:math id="M171" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> from RF (red dots), LightGBM (blue dots), and GPR (green dots).</p></caption>
        <graphic xlink:href="https://essd.copernicus.org/articles/18/2951/2026/essd-18-2951-2026-f08.png"/>

      </fig>

      <p id="d2e3415">Since the variations of nutrients primarily occur in the upper water column, we focused on the nutrient reconstruction in the upper 300 m at Station ALOHA. Overall, the models reproduced the profiles of NO<inline-formula><mml:math id="M172" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> from 1988 to 2021 well (Fig. 9a–d). The reconstruction errors were low at the surface and increased with depth, with most of the values <inline-formula><mml:math id="M173" display="inline"><mml:mo>&lt;</mml:mo></mml:math></inline-formula> 3.0 <inline-formula><mml:math id="M174" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup> (Fig. S16a–d). To evaluate models' ability to reconstruct nutrient variations in time, the nutrient concentrations were averaged monthly over the upper 300 m. As compared to observations, RF, LightGBM, and GPR all well reconstructed the interannual variations of NO<inline-formula><mml:math id="M176" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> with most of the absolute errors <inline-formula><mml:math id="M177" display="inline"><mml:mo>&lt;</mml:mo></mml:math></inline-formula> 0.5 <inline-formula><mml:math id="M178" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup> (Figs. 9e and S16e) at Station ALOHA. Similarly, the validation of DIP and Si(OH)<sub>4</sub> are shown in Figs. S17–S20.</p>

      <fig id="F9" specific-use="star"><label>Figure 9</label><caption><p id="d2e3508">Temporal variations of NO<inline-formula><mml:math id="M181" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> concentrations in the upper 300 m at Station ALOHA from 1988 to 2021 for observed <bold>(a)</bold> and reconstructed NO<inline-formula><mml:math id="M182" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> by Random Forest (RF; <bold>b</bold>), LightGBM <bold>(c)</bold>, and Gaussian Process Regression (GPR; <bold>d</bold>). <bold>(e)</bold> Time series of monthly averaged NO<inline-formula><mml:math id="M183" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> concentrations in the upper 300 m from observations, and reconstructions by RF, LightGBM, and GPR.</p></caption>
        <graphic xlink:href="https://essd.copernicus.org/articles/18/2951/2026/essd-18-2951-2026-f09.png"/>

      </fig>

      <p id="d2e3569">A fifth validation step evaluates the models' reconstruction for the period before 1970 (Error 5; Fig. 6c). This is necessary because the training data (CCHDO) spans 1973–2022, while the reconstructions are extrapolated back to 1895. We argue that this extrapolation should be reasonable because the variations of temperature–salinity–nutrient relationships in the ocean's interior might be small over the past century, providing a basis for temporal extrapolation. First, the residence time of nitrogen in deep and intermediate waters can be up to 2000 years in the North Pacific. Consequently, the imprint of centennial-scale change on nutrient inventories is attenuated. Second, the long-term variations of nutrient concentrations are not evident within our core training period (1973–2022; Figs. 9e and 17). Finally, the mean nutrient profiles derived from the 1920–1970 and 1973–2022 periods are not evidently different in the central North Pacific (Fig. S21). Therefore, while the North Pacific may experience long-term variability, it might be masked by the reconstruction error, and the use of hydrographic properties as predictors for nutrients is justified for historical reconstructions.</p>
      <p id="d2e3573">However, when assessing the reconstruction errors before 1970, we first consider data quality issues. Prior to the standardization of modern oceanographic methods, nutrient measurements – particularly from earlier decades – were subject to greater analytical errors, inconsistent sampling protocols, and varied determination techniques. The data quality concern is evident in the sporadic and sometimes physically implausible deep nutrient profiles found in WOD for that era (Fig. S22). This is also the primary reason that nutrient data pre-1973 collected from sources like the OSD from WOD were not incorporated into model training. To evaluate data quality in earlier decades, we selected five specific years with more abundant observations: 1929, 1947, 1953, 1958, and 1966 (Fig. S23). After applying the same quality-control criteria outlined in Sect. 3.1, we used the historical hydrographic data (temperature and salinity) from those years to predict nutrient concentrations. A total of 52 277, 119 137, 284 472, and 193 339 data points were collected for NO<inline-formula><mml:math id="M184" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, NO<inline-formula><mml:math id="M185" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, DIP, and Si(OH)<sub>4</sub>, respectively, after QC. The comparison between these predictions and the quality-controlled observations yields the prediction errors for the pre-1970 period (Fig. 6c). The RMSEs from different models suggested values <inline-formula><mml:math id="M187" display="inline"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">5.7</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M188" display="inline"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">0.40</mml:mn></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M189" display="inline"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">22.9</mml:mn></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M190" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup> for NO<inline-formula><mml:math id="M192" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, DIP, and Si(OH)<sub>4</sub>, respectively (Figs. S24–S26), which are much larger than the corresponding errors for the period after 1970. We recommend that these values be considered a conservative estimate of the upper error bound, as they incorporate both nutrient observations and prediction errors. In addition, the hydrographic data are also less reliable in the earlier period. Thus, we acknowledge that reconstruction errors are likely higher for the pre-1973 period, and the error estimated here should be considered a “best estimate” with quantified uncertainties, and encourage users to consider these error bounds when applying the dataset to early 20th century conditions.</p>
</sec>
<sec id="Ch1.S3.SS2">
  <label>3.2</label><title>Reconstructed nutrients</title>
      <p id="d2e3689">The final reconstructed nutrient dataset aligns with the spatiotemporal coverage of the quality-controlled WOD hydrographic dataset, comprising 472 652 680 data points for each nutrient (NO<inline-formula><mml:math id="M194" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, NO<inline-formula><mml:math id="M195" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, DIP, and Si(OH)<sub>4</sub>) from 1 920 634 stations across 35 744 cruises, spanning from 1895 to 2024 (Table 2). Most data points are located above 2000 m, with fewer observations at greater depths due to hydrographic platform limitations.</p>
      <p id="d2e3725">It is important to clarify the nature of the reconstructed dataset, which is fundamentally different from gridded products. This product provides nutrient concentrations linked to each hydrographic observations: nutrient values are reconstructed precisely at the locations, depths, and times of original hydrographic observations (sourced from WOD) where direct nutrient measurements might be unavailable or of poor quality. This approach yields a point-wise dataset that aligns with the original hydrographic observations, rather than a spatially or temporally interpolated field – an important distinction for users interpreting and applying the data.</p>
</sec>
<sec id="Ch1.S3.SS3">
  <label>3.3</label><title>Climatology of nutrient distributions</title>
      <p id="d2e3736">To evaluate the reliability of our product, we binned and averaged the predicted nutrients within <inline-formula><mml:math id="M197" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mi mathvariant="italic">°</mml:mi><mml:mo>×</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mi mathvariant="italic">°</mml:mi></mml:mrow></mml:math></inline-formula> grid cells for each month to produce a monthly climatology. This climatology represents a mean field that depends heavily on the spatiotemporal distribution of the underlying data and may be influenced by uneven data sampling. This reconstructed climatology was compared with the World Ocean Atlas 2023 (WOA23), which is derived from quality-controlled and objectively analyzed observational data. Since the large-scale patterns of NO<inline-formula><mml:math id="M198" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, DIP, and Si(OH)<sub>4</sub> are similar among different models (Figs. 10–13 and S27–S36), we focus on NO<inline-formula><mml:math id="M200" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> reconstructed by the RF model in this section unless stated otherwise.</p>
      <p id="d2e3788">Figures 10–13 present the monthly climatology of NO<inline-formula><mml:math id="M201" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> at 5, 100, 500, and 1000 m in the North Pacific. At 5 m, the reconstructed NO<inline-formula><mml:math id="M202" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> accurately captures the established spatial patterns, with elevated concentrations in the subpolar gyre, Bering Sea, and equatorial regions, and depleted concentrations in the NPSG (Fig. 10). Seasonally, the basin-averaged surface NO<inline-formula><mml:math id="M203" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> concentrations display the highest value of 3.50 <inline-formula><mml:math id="M204" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup> in March, in contrast to the lowest value of 1.82 <inline-formula><mml:math id="M206" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup> in September. These results agree with Yasunaka et al. (2014, 2021), who, using extensive surface nutrient observations (up to 14 000 for nitrate) in the North Pacific, reported similar spatial and seasonal patterns.</p>

      <fig id="F10" specific-use="star"><label>Figure 10</label><caption><p id="d2e3870">The monthly climatology of NO<inline-formula><mml:math id="M208" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> at 5 m in the North Pacific. Data are binned and averaged within <inline-formula><mml:math id="M209" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mi mathvariant="italic">°</mml:mi><mml:mo>×</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mi mathvariant="italic">°</mml:mi></mml:mrow></mml:math></inline-formula> grid cells. The values in the title represent the spatial mean values.</p></caption>
        <graphic xlink:href="https://essd.copernicus.org/articles/18/2951/2026/essd-18-2951-2026-f10.png"/>

      </fig>

      <fig id="F11" specific-use="star"><label>Figure 11</label><caption><p id="d2e3910">The monthly climatology of NO<inline-formula><mml:math id="M210" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> at 100 m in the North Pacific. Data are binned and averaged within <inline-formula><mml:math id="M211" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mi mathvariant="italic">°</mml:mi><mml:mo>×</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mi mathvariant="italic">°</mml:mi></mml:mrow></mml:math></inline-formula> grid cells. The values in the title represent the spatial mean values.</p></caption>
        <graphic xlink:href="https://essd.copernicus.org/articles/18/2951/2026/essd-18-2951-2026-f11.png"/>

      </fig>

      <fig id="F12" specific-use="star"><label>Figure 12</label><caption><p id="d2e3949">The monthly climatology of NO<inline-formula><mml:math id="M212" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> at 500 m in the North Pacific. Data are binned and averaged within <inline-formula><mml:math id="M213" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mi mathvariant="italic">°</mml:mi><mml:mo>×</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mi mathvariant="italic">°</mml:mi></mml:mrow></mml:math></inline-formula> grid cells. The values in the title represent the spatial mean values.</p></caption>
        <graphic xlink:href="https://essd.copernicus.org/articles/18/2951/2026/essd-18-2951-2026-f12.png"/>

      </fig>

      <fig id="F13" specific-use="star"><label>Figure 13</label><caption><p id="d2e3988">The monthly climatology of NO<inline-formula><mml:math id="M214" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> at 1000 m in the North Pacific. Data are binned and averaged within <inline-formula><mml:math id="M215" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mi mathvariant="italic">°</mml:mi><mml:mo>×</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mi mathvariant="italic">°</mml:mi></mml:mrow></mml:math></inline-formula> grid cells. The values in the title represent the spatial mean values.</p></caption>
        <graphic xlink:href="https://essd.copernicus.org/articles/18/2951/2026/essd-18-2951-2026-f13.png"/>

      </fig>

      <fig id="F14" specific-use="star"><label>Figure 14</label><caption><p id="d2e4027">Zonal and monthly climatology of NO<inline-formula><mml:math id="M216" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> in the upper 2000 m at 10° N in the North Pacific. Data were binned and averaged within <inline-formula><mml:math id="M217" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mi mathvariant="italic">°</mml:mi><mml:mo>×</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mi mathvariant="italic">°</mml:mi></mml:mrow></mml:math></inline-formula> grid cells.</p></caption>
        <graphic xlink:href="https://essd.copernicus.org/articles/18/2951/2026/essd-18-2951-2026-f14.png"/>

      </fig>

      <p id="d2e4065">At 100 m, NO<inline-formula><mml:math id="M218" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> concentrations are elevated particularly in the subarctic gyre, north of the Equator, and the eastern North Pacific, while the central regions, particularly the NPSG, exhibit lower values. At 500 m, NO<inline-formula><mml:math id="M219" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> concentrations display patterns similar to those at 100 m, except that the NO<inline-formula><mml:math id="M220" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> concentrations in the western NPSG are evidently lower than those in other regions (Fig. 12). At 1000 m, concentrations in the southwestern North Pacific Ocean are markedly lower than those in other regions (Fig. 13). Below 100 m depth, seasonal variability in NO<inline-formula><mml:math id="M221" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> is minimal (Figs. 11–13). These results display patterns similar to WOA23 (Figs. S36–S44). The differences between the averaged values of these two climatologies are generally <inline-formula><mml:math id="M222" display="inline"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">0.7</mml:mn></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M223" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup> at the surface and <inline-formula><mml:math id="M225" display="inline"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">1.5</mml:mn></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M226" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup> at 100 and 500 m. The maximum differences are found in July at a depth of 500 m (Figs. 13g and S38g). In that month and layer, WOA23 shows a notably low mean NO<inline-formula><mml:math id="M228" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> value (31.94 <inline-formula><mml:math id="M229" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup>) compared to its values in other months (33.15 to 34.64 <inline-formula><mml:math id="M231" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup>; Fig. S38) and compared to our climatology (33.34 to 33.56 <inline-formula><mml:math id="M233" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup>; Fig. 13). This discrepancy arises because the WOA23 climatology for July features a pronounced low-NO<inline-formula><mml:math id="M235" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> patch (down to 20 <inline-formula><mml:math id="M236" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup>) within the eastern subarctic gyre, surrounded by waters with concentrations of <inline-formula><mml:math id="M238" display="inline"><mml:mrow><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">35</mml:mn></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M239" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup> (Fig. S38g). These regional differences are clearly visible in the difference maps between the two products (Figs. S45–S47). Generally, our reconstructions capture finer spatial detail, exhibit less oversmoothing, and avoid artificial “bull's-eye” patterns.</p>
      <p id="d2e4313">Sectional distributions of NO<inline-formula><mml:math id="M241" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> in the upper 2000 m along 10° N and 180° E were used as examples to illustrate the vertical profile distributions of nutrients within the North Pacific. At 10° N, NO<inline-formula><mml:math id="M242" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> concentrations increase from <inline-formula><mml:math id="M243" display="inline"><mml:mrow><mml:mo>∼</mml:mo><mml:mn mathvariant="normal">0.0</mml:mn></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M244" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup> at the surface to <inline-formula><mml:math id="M246" display="inline"><mml:mrow><mml:mo>∼</mml:mo><mml:mn mathvariant="normal">45.0</mml:mn></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M247" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup> at <inline-formula><mml:math id="M249" display="inline"><mml:mrow><mml:mo>∼</mml:mo><mml:mn mathvariant="normal">1000</mml:mn></mml:mrow></mml:math></inline-formula> m, followed by a decrease to <inline-formula><mml:math id="M250" display="inline"><mml:mrow><mml:mo>∼</mml:mo><mml:mn mathvariant="normal">38.0</mml:mn></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M251" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>mol kg<sup>−1</sup> at 2000 m. NO<inline-formula><mml:math id="M253" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> concentrations increase from west to the east in the North Pacific in the upper 300 m (Fig. 14). At 180° E, in the upper 500 m, meridional NO<inline-formula><mml:math id="M254" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> concentrations increase from the equator to the North Equatorial Current (<inline-formula><mml:math id="M255" display="inline"><mml:mrow><mml:mo>∼</mml:mo><mml:mn mathvariant="normal">10</mml:mn></mml:mrow></mml:math></inline-formula>° N), decline within the subtropical gyre, and then increase toward the subarctic region (Fig. 15). Generally, seasonal differences of NO<inline-formula><mml:math id="M256" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> concentrations along both sections are not evident.</p>

      <fig id="F15" specific-use="star"><label>Figure 15</label><caption><p id="d2e4490">The monthly climatology of NO<inline-formula><mml:math id="M257" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mi>x</mml:mi><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> in the upper 2000 m at 170° E section in the North Pacific. Data were binned and averaged within <inline-formula><mml:math id="M258" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mi mathvariant="italic">°</mml:mi><mml:mo>×</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mi mathvariant="italic">°</mml:mi></mml:mrow></mml:math></inline-formula> grid cells.</p></caption>
        <graphic xlink:href="https://essd.copernicus.org/articles/18/2951/2026/essd-18-2951-2026-f15.png"/>

      </fig>

</sec>
<sec id="Ch1.S3.SS4">
  <label>3.4</label><title>Long-term variations of nutrients</title>
      <p id="d2e4535">We present an initial analysis of long-term nutrient changes by examining five representative regions in the North Pacific, covering the subarctic gyre, the subtropical gyre, and equatorial areas (Fig. 16). The data are binned by region, month, and depth (10, 100, 200, 300, 500, and 1000 m) for regions 1–5. As shown in Fig. 17, these time series reveal notable interannual fluctuations of NO<inline-formula><mml:math id="M259" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> (with 2–5-year oscillations), providing a first-order view of low-frequency variability captured by the reconstruction. However, no evident long-term trend is found for nutrients. DIP and Si(OH)<sub>4</sub> display patterns similar to NO<inline-formula><mml:math id="M261" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> (Figs. S48 and S49). In contrast, at depths of 200 and 300 m, NO<inline-formula><mml:math id="M262" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> displays an increasing trend in the central NPSG and a decreasing trend in the eastern NPSG during the 1970–2005 period (Fig. S50). More sophisticated trend analyses and basin-scale integrations are promising avenues for future work based on this newly reconstructed dataset.</p>

      <fig id="F16" specific-use="star"><label>Figure 16</label><caption><p id="d2e4585">Locations of five representative regions for analyzing long-term nutrient variations.</p></caption>
        <graphic xlink:href="https://essd.copernicus.org/articles/18/2951/2026/essd-18-2951-2026-f16.png"/>

      </fig>

      <fig id="F17" specific-use="star"><label>Figure 17</label><caption><p id="d2e4596">Time series of reconstructed NO<inline-formula><mml:math id="M263" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">3</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> concentrations at 10 m <bold>(a)</bold>, 100 m <bold>(b)</bold>, 200 m <bold>(c)</bold>, 300 m <bold>(d)</bold>, 500 m <bold>(e)</bold>, and 1000 m <bold>(f)</bold> for regions 1–5 (see Fig. 16). Data were binned by depth and region and then averaged by month.</p></caption>
        <graphic xlink:href="https://essd.copernicus.org/articles/18/2951/2026/essd-18-2951-2026-f17.png"/>

      </fig>

</sec>
</sec>
<sec id="Ch1.S4">
  <label>4</label><title>Data availability</title>
      <p id="d2e4646">The database is freely available at <ext-link xlink:href="https://doi.org/10.5281/zenodo.17451417" ext-link-type="DOI">10.5281/zenodo.17451417</ext-link> (Du et al., 2025). The files, containing the RF-, LightGBM-, and GPR-reconstructed data, are stored as text (.txt) files within a zip archive.</p>
</sec>
<sec id="Ch1.S5" sec-type="conclusions">
  <label>5</label><title>Conclusions</title>
      <p id="d2e4660">In this study, we applied rigorous quality control procedures to clean hydrographic and nutrient observations from CCHDO and WOD datasets. The cleaned CCHDO data were then used to train three machine-learning models to relate nutrient concentrations to spatial, temporal, and water-mass predictors. The models were applied to reconstruct nutrient concentrations from hydrographic observations collected from WOD, most of which lack direct nutrient measurements. We assessed the model performance using four data-partition strategies, and found that all models reproduced held-out data with low RMSEs. RF and GPR slightly outperformed LightGBM. The application of these models to WOD hydrography yielded 472 652 680 reconstructed nutrient concentrations across 1 920 634 stations and 35 744 cruises, spanning from 1895 to 2024. This represents a 2127- to 2393-fold increase compared to the original volume of CCHDO nutrient data. The reconstruction captured the spatial, seasonal, and interannual variations of water column nutrients in the North Pacific Ocean well. Compared to the WOA23 climatology, the reconstruction-based nutrient climatology exhibited more realistic spatial structures than WOA23. This high-quality and high-resolution nutrient dataset adds historical nutrient estimation for locations and times with solely hydrographic measurements. Additional potential applications of this dataset include: (1) investigating nutrient transport and budget in the North Pacific; (2) spinning up and validating ocean biogeochemical models; (3) assessing long-term nutrient trends driven by anthropogenic forcing and climate change; (4) investigating nutrient stoichiometric changes and their ecological impacts under climate variability. Collectively, this resource facilitates advanced studies on marine biogeochemical cycles, ecosystem dynamics, and climate–nutrient interactions.</p><supplementary-material position="anchor"><p id="d2e4662">The supplement related to this article is available online at <inline-supplementary-material xlink:href="https://doi.org/10.5194/essd-18-2951-2026-supplement" xlink:title="pdf">https://doi.org/10.5194/essd-18-2951-2026-supplement</inline-supplementary-material>.</p></supplementary-material>
</sec><notes notes-type="authorcontribution"><title>Author contributions</title>

      <p id="d2e4672">CD and XL designed the study and dataset. CD, SK, MD, ZC, DS, and XL conceived the project and secured the funding. CD, NZ, QL, HW and XL collected and processed the data, developed the code, and performed the analysis. SK, MD, ZC, and DS provided methodological guidance and advice. CD and NZ wrote the original draft. All authors reviewed and edited the manuscript.</p>
  </notes><notes notes-type="competinginterests"><title>Competing interests</title>

      <p id="d2e4678">The contact author has declared that none of the authors has any competing interests.</p>
  </notes><notes notes-type="disclaimer"><title>Disclaimer</title>

      <p id="d2e4684">Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. The authors bear the ultimate responsibility for providing appropriate place names. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.</p>
  </notes><ack><title>Acknowledgements</title><p id="d2e4690">We thank the CCHDO (<uri>https://cchdo.ucsd.edu/</uri>, last access: 1 October 2024) and the WOD (<uri>https://www.ncei.noaa.gov/products/world-ocean-database</uri>, last access: 18 December 2024) for providing the data used in this study. Special thanks are owed to all scientists involved in data collection, analysis, and management for these programs.</p><p id="d2e4698">Declaration of generative AI and AI-assisted technologies in the writing process: During the preparation of this work the authors used deepseek to check the spelling and grammar. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication.</p></ack><notes notes-type="financialsupport"><title>Financial support</title>

      <p id="d2e4703">This research has been supported by the National Key R&amp;D Program of China (grant no. 2023YFF0805001), National Natural Science Foundation of China (grants nos. 42494885, 42576215, 42494881, 42276034), Fundamental and Interdisciplinary Disciplines Breakthrough Plan of the Ministry of Education of China (grant no. JYB2025XDXM801), Innovational Fund for Scientific and Technological Personnel of Hainan Province (grant no. KJRC2023B04), Natural Science Foundation of Hainan Province (grant no. 624MS037), and First-class Discipline Breakthrough Initiative of Hainan University (grant no. XKTP2025A05).</p>
  </notes><notes notes-type="reviewstatement"><title>Review statement</title>

      <p id="d2e4709">This paper was edited by Xingchen (Tony) Wang and reviewed by Hengdi Liang and one anonymous referee.</p>
  </notes><ref-list>
    <title>References</title>

      <ref id="bib1.bib1"><label>1</label><mixed-citation>Arteaga, L., Pahlow, M., and Oschlies, A.: Global monthly sea surface nitrate fields estimated from remotely sensed sea surface temperature, chlorophyll, and modeled mixed layer depth, Geophys. Res. Lett., 42, 1130–1138, <ext-link xlink:href="https://doi.org/10.1002/2014GL062937" ext-link-type="DOI">10.1002/2014GL062937</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bib2"><label>2</label><mixed-citation>Ascani, F., Richards, K. J., Firing, E., Grant, S., Johnson, K. S., Jia, Y., Lukas, R., and Karl, D. M.: Physical and biological controls of nitrate concentrations in the upper subtropical North Pacific Ocean, Deep-Sea Res. Pt. II, 93, 119–134, <ext-link xlink:href="https://doi.org/10.1016/j.dsr2.2013.01.034" ext-link-type="DOI">10.1016/j.dsr2.2013.01.034</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bib3"><label>3</label><mixed-citation>Barone, B., Church, M. J., Dugenne, M., Hawco, N. J., Jahn, O., White, A. E., John, S. G., Follows, M. J., DeLong, E. F., and Karl, D. M.: Biogeochemical dynamics in adjacent mesoscale eddies of opposite polarity, Global Biogeochem. Cy., 36, e2021GB007115, <ext-link xlink:href="https://doi.org/10.1029/2021GB007115" ext-link-type="DOI">10.1029/2021GB007115</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bib4"><label>4</label><mixed-citation>Benitez-Nelson, C. R., Bidigare, R. R., Dickey, T. D., Landry, M. R., Leonard, C. L., Brown, S. L., Nencioli, F., Rii, Y. M., Maiti, K., Becker, J. W., Bibby, T. S., Black, W., Cai, W. J., Carlson, C. A., Chen, F., Kuwahara, V. S., Mahaffey, C., McAndrew, P. M., Quay, P. D., Rappé, M. S., Selph, K. E., Simmons, M. P., and Yang, E. J.: Mesoscale Eddies Drive Increased Silica Export in the Subtropical Pacific Ocean, Science, 316, 1017–1021, <ext-link xlink:href="https://doi.org/10.1126/science.1136221" ext-link-type="DOI">10.1126/science.1136221</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bib5"><label>5</label><mixed-citation>Bidigare, R. R., Chai, F., Landry, M. R., Lukas, R., Hannides, C. C. S., Christensen, S. J., Karl, D. M., Shi, L., and Chao, Y.: Subtropical ocean ecosystem structure changes forced by North Pacific climate variations, J. Plankton Res., 31, 1131–1139, <ext-link xlink:href="https://doi.org/10.1093/plankt/fbp064" ext-link-type="DOI">10.1093/plankt/fbp064</ext-link>, 2009.</mixed-citation></ref>
      <ref id="bib1.bib6"><label>6</label><mixed-citation>Bonnet, S., Caffin, M., Berthelot, H., and Moutin, T.: Hot spot of N<sub>2</sub> fixation in the western tropical South Pacific pleads for a spatial decoupling between N<sub>2</sub> fixation and denitrification, P. Natl. Acad. Sci. USA, 114, E2800–E2801, <ext-link xlink:href="https://doi.org/10.1073/pnas.1619514114" ext-link-type="DOI">10.1073/pnas.1619514114</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bib7"><label>7</label><mixed-citation>Browning, T. J. and Moore, C. M.: Global analysis of ocean phytoplankton nutrient limitation reveals high prevalence of co-limitation, Nat. Commun., 14, 5014, <ext-link xlink:href="https://doi.org/10.1038/s41467-023-40774-0" ext-link-type="DOI">10.1038/s41467-023-40774-0</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib8"><label>8</label><mixed-citation>Browning, T. J., Liu, X., Zhang, R., Wen, Z., Liu, J., Zhou, Y., Xu, F., Cai, Y., Zhou, K., Cao, Z., Zhu, Y., Shi, D., Achterberg, E. P., and Dai, M.: Nutrient co-limitation in the subtropical Northwest Pacific, Limnol. Oceanogr. Lett., 7, 52–61, <ext-link xlink:href="https://doi.org/10.1002/lol2.10205" ext-link-type="DOI">10.1002/lol2.10205</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bib9"><label>9</label><mixed-citation>Chelton, D. B., Schlax, M. G., Samelson, R. M., and de Szoeke, R. A.: Global observations of large oceanic eddies, Geophys. Res. Lett., 34, L15606, <ext-link xlink:href="https://doi.org/10.1029/2007GL030812" ext-link-type="DOI">10.1029/2007GL030812</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bib10"><label>10</label><mixed-citation>Chen, S., Hu, C., Barnes, B. B., Wanninkhof, R., Cai, W., Barbero, L., and Pierrot, D.: A machine learning approach to estimate surface ocean <inline-formula><mml:math id="M266" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula>CO<sub>2</sub> from satellite measurements, Remote Sens. Environ., 228, 203–226, <ext-link xlink:href="https://doi.org/10.1016/j.rse.2019.04.019" ext-link-type="DOI">10.1016/j.rse.2019.04.019</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bib11"><label>11</label><mixed-citation>Chen, S., Meng, Y., Lin, S., Yu, Y., and Xi, J.: Estimation of sea surface nitrate from space: Current status and future potential, Sci. Total Environ., 899, 165690, <ext-link xlink:href="https://doi.org/10.1016/j.scitotenv.2023.165690" ext-link-type="DOI">10.1016/j.scitotenv.2023.165690</ext-link>, 2023. </mixed-citation></ref>
      <ref id="bib1.bib12"><label>12</label><mixed-citation>Chen, S., Meng, Y., Shang, S., Zheng, M., Wang, Y., and Chai, F.: Remote estimates of sea surface nitrate and its trends from ocean color in the northwest Pacific, J. Geophys. Res., 129, e2023JC019846, <ext-link xlink:href="https://doi.org/10.1029/2023JC019846" ext-link-type="DOI">10.1029/2023JC019846</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib13"><label>13</label><mixed-citation>Dai, M., Luo, Y., Achterberg, E. P., Browning, T. J., Cai, Y., Cao, Z., Chai, F., Chen, B., Church, M. J., Ci, D., Du, C., Gao, K., Guo, X., Hu, Z., Kao, S., Laws, E. A., Lee, Z., Lin, H., Liu, Q., Liu, X., Luo, W., Meng, F., Shang, S., Shi, D., Saito, H., Song, L., Wan, X. S., Wang, Y., Wang, W.-L., Wen, Z., Xiu, P., Zhang, J., Zhang, R., and Zhou, K.: Upper Ocean biogeochemistry of the oligotrophic North Pacific subtropical gyre: From nutrient sources to carbon export, Rev. Geophys., 61, e2022RG000800, <ext-link xlink:href="https://doi.org/10.1029/2022RG000800" ext-link-type="DOI">10.1029/2022RG000800</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib14"><label>14</label><mixed-citation>Dave, A. C. and Lozier, M. S.: Local stratification control of marine productivity in the subtropical North Pacific, J. Geophys. Res., 115, C12032, <ext-link xlink:href="https://doi.org/10.1029/2010JC006519" ext-link-type="DOI">10.1029/2010JC006519</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bib15"><label>15</label><mixed-citation>Deutsch, C. and Weber, T.: Nutrient Ratios as a Tracer and Driver of Ocean Biogeochemistry, Annu. Rev. Mar. Sci., 4, 113–138, <ext-link xlink:href="https://doi.org/10.1146/annurev-marine-120710-100912" ext-link-type="DOI">10.1146/annurev-marine-120710-100912</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bib16"><label>16</label><mixed-citation>Dong, L., Qi, J., Yin, B., Zhi, H., Li, D., Yang, S., Wang, W., Cai, H., and Xie, B.: Reconstruction of subsurface salinity structure in the South China Sea using satellite observations: a LightGBM-Based Deep forest method, Remote Sens., 14, 3494, <ext-link xlink:href="https://doi.org/10.3390/rs14143494" ext-link-type="DOI">10.3390/rs14143494</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bib17"><label>17</label><mixed-citation>Du, C., He, R., Liu, Z., Huang, T., Wang, L., Yuan, Z., Xu, Y., Wang, Z., and Dai, M.: Climatology of nutrient distributions in the South China Sea based on a large data set derived from a new algorithm, Prog. Oceanogr., 195, 102586, <ext-link xlink:href="https://doi.org/10.1016/j.pocean.2021.102586" ext-link-type="DOI">10.1016/j.pocean.2021.102586</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib18"><label>18</label><mixed-citation>Du, C., Zheng, N., Kao, S.-J., Dai, M., Cao, Z., Shi, D., Li, Q., Wang, H., and Li, X.: Validated temperature and salinity data, and reconstructed nutrient concentrations in the North Pacific (1895–2024) (Version 2), Zenodo [data set], <ext-link xlink:href="https://doi.org/10.5281/zenodo.17451417" ext-link-type="DOI">10.5281/zenodo.17451417</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bib19"><label>19</label><mixed-citation>Dugdale, R. C., Morel, A., Bricaud, A., and Wilkerson, F. P.: Modeling new production in upwelling centers: A case study of modeling new production from remotely-sensed temperature and color, J. Geophys. Res., 94, 18119–18132, <ext-link xlink:href="https://doi.org/10.1029/JC094iC12p18119" ext-link-type="DOI">10.1029/JC094iC12p18119</ext-link>, 1989.</mixed-citation></ref>
      <ref id="bib1.bib20"><label>20</label><mixed-citation>Eugster, O. and Gruber, N.: A probabilistic estimate of global marine N-fixation and denitrification, Global Biogeochem. Cy., 26, GB4013, <ext-link xlink:href="https://doi.org/10.1029/2012GB004300" ext-link-type="DOI">10.1029/2012GB004300</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bib21"><label>21</label><mixed-citation>Fuhr, M., Laukert, G., Yu, Y., Nürnberg, D., and Frank, M.: Tracing water mass mixing from the Equatorial to the North Pacific Ocean with dissolved neodymium isotopes and concentrations, Front. Mar. Sci., 7, 603761, <ext-link xlink:href="https://doi.org/10.3389/fmars.2020.603761" ext-link-type="DOI">10.3389/fmars.2020.603761</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib22"><label>22</label><mixed-citation>Gan, M., Pan, S., Chen, Y., Cheng, C., Pan, H., and Zhu, X.: Application of the Machine Learning LightGBM model to the prediction of the water levels of the Lower Columbia River, J. Mar. Sci. Eng., 9, 496, <ext-link xlink:href="https://doi.org/10.3390/jmse9050496" ext-link-type="DOI">10.3390/jmse9050496</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib23"><label>23</label><mixed-citation>Garcia, H. E., Boyer, T. P., Locarnini, R. A., Reagan, J. R., Mishonov, A. V., Baranova, O. K., Paver, C. R., Wang, Z., Bouchard, C. N., Cross, S. L., Seidov, D., and Dukhovskoy, D.: World Ocean Database 2023: User's Manual, edited by: Mishonov, A. V., NOAA Atlas NESDIS 98, NOAA, 129 pp., <ext-link xlink:href="https://doi.org/10.25923/j8gq-ee82" ext-link-type="DOI">10.25923/j8gq-ee82</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib24"><label>24</label><mixed-citation>Goes, J. I., Saino, T., Oaku, H., and Jiang, D. L.: A Method for Estimating Sea Surface Nitrate Concentrations from Remotely Sensed SST and Chlorophyll – A Case Study for the North Pacific Ocean Using OCTS/ADEOS Data, IEEE T. Geosci. Remote, 37, 1633–1644, <ext-link xlink:href="https://doi.org/10.1109/36.774702" ext-link-type="DOI">10.1109/36.774702</ext-link>, 1999.</mixed-citation></ref>
      <ref id="bib1.bib25"><label>25</label><mixed-citation>Hu, C., Feng, L., and Guan, Q.: A machine learning approach to estimate surface chlorophyll <inline-formula><mml:math id="M268" display="inline"><mml:mi>a</mml:mi></mml:math></inline-formula> concentrations in global oceans from satellite measurements, IEEE T. Geosci. Remote, 59, 4590–4607, <ext-link xlink:href="https://doi.org/10.1109/TGRS.2020.3016473" ext-link-type="DOI">10.1109/TGRS.2020.3016473</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib26"><label>26</label><mixed-citation>Huang, Y., Nicholson, D., Huang, B., and Cassar, N.: Global estimates of marine gross primary production based on machine learning upscaling of field observations, Global Biogeochem. Cy., 35, e2020GB006718, <ext-link xlink:href="https://doi.org/10.3389/fmars.2022.837183" ext-link-type="DOI">10.3389/fmars.2022.837183</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib27"><label>27</label><mixed-citation>Huang, Y., Tagliabue, A., and Cassar, N.: Data-Driven Modeling of Dissolved Iron in the Global Ocean, Front. Mar. Sci., 9, 837183, <ext-link xlink:href="https://doi.org/10.3389/fmars.2022.837183" ext-link-type="DOI">10.3389/fmars.2022.837183</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bib28"><label>28</label><mixed-citation>Kamykowski, D.: A preliminary model of the relationship between temperature and plant nutrients in the upper ocean, Deep-Sea Res., 34, 1067–1079, <ext-link xlink:href="https://doi.org/10.1016/0198-0149(87)90064-1" ext-link-type="DOI">10.1016/0198-0149(87)90064-1</ext-link>, 1987.</mixed-citation></ref>
      <ref id="bib1.bib29"><label>29</label><mixed-citation>Kamykowski, D.: Estimating upper ocean phosphate concentrations using ARGO float temperature profiles, Deep-Sea Res. Pt. I, 55, 1580–1589, <ext-link xlink:href="https://doi.org/10.1016/j.dsr.2008.05.005" ext-link-type="DOI">10.1016/j.dsr.2008.05.005</ext-link>, 2008.</mixed-citation></ref>
      <ref id="bib1.bib30"><label>30</label><mixed-citation>Kamykowski, D., Zentara, S.-J., Morrison, J. M., and Switzer, A. C.: Dynamic global patterns of nitrate, phosphate, silicate, and iron availability and phytoplankton community composition from remote sensing data, Global Biogeochem. Cy., 16, 1077, <ext-link xlink:href="https://doi.org/10.1029/2001GB001640" ext-link-type="DOI">10.1029/2001GB001640</ext-link>, 2002.</mixed-citation></ref>
      <ref id="bib1.bib31"><label>31</label><mixed-citation>Karl, D. M. and Church, M. J.: Ecosystem structure and dynamics in the North Pacific Subtropical Gyre: new views of an old ocean, Ecosystems, 20, 433–457, <ext-link xlink:href="https://doi.org/10.1007/s10021-017-0117-0" ext-link-type="DOI">10.1007/s10021-017-0117-0</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bib32"><label>32</label><mixed-citation>Karl, D. M., Letelier, R. M., Bidigare, R. R., Björkman, K. M., Church, M. J., Dore, J. E., and White, A. E.: Seasonal-to-decadal scale variability in primary production and particulate matter export at Station ALOHA, Prog. Oceanogr., 195, 102563, <ext-link xlink:href="https://doi.org/10.1016/j.pocean.2021.102563" ext-link-type="DOI">10.1016/j.pocean.2021.102563</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib33"><label>33</label><mixed-citation>Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.-Y.: Lightgbm: A highly efficient gradient boosting decision tree, Adv. Neural Inf. Process. Syst., 30, 3147–3155, <ext-link xlink:href="https://doi.org/10.48550/arXiv.1706.08357" ext-link-type="DOI">10.48550/arXiv.1706.08357</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bib34"><label>34</label><mixed-citation>Lee, G. S., Lee, J. H., and Cho, H. Y.: Spatiotemporal estimation of nutrient data from the northwest pacific and east Asian seas, Sci. Data, 10, 354, <ext-link xlink:href="https://doi.org/10.1038/s41597-023-02602-4" ext-link-type="DOI">10.1038/s41597-023-02602-4</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib35"><label>35</label><mixed-citation> Liaw, A. and Wiener, M.: Classification and regression by randomForest, R News, 2, 18–22, 2002.</mixed-citation></ref>
      <ref id="bib1.bib36"><label>36</label><mixed-citation>Lipschultz, F., Bates, N. R., Carlson, C. A., and Hansell, D. A.: New production in the Sargasso Sea: History and current status, Global Biogeochem. Cy., 16, 1001, <ext-link xlink:href="https://doi.org/10.1029/2000GB001320" ext-link-type="DOI">10.1029/2000GB001320</ext-link>, 2002.</mixed-citation></ref>
      <ref id="bib1.bib37"><label>37</label><mixed-citation>Liu, H., Lin, L., Wang, Y., Du, L., Wang, S., Zhou, P., Yu, Y., Gong, X., and Lu, X.: Reconstruction of Monthly Surface Nutrient Concentrations in the Yellow and Bohai Seas from 2003–2019 Using Machine Learning, Remote Sens., 14, 5021, <ext-link xlink:href="https://doi.org/10.3390/rs14195021" ext-link-type="DOI">10.3390/rs14195021</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bib38"><label>38</label><mixed-citation>Madani, N., Parazoo, N. C., Manizza, M., Chatterjee, A., Carroll, D., Menemenlis, D., Fouest, V. L., Matsuoka, A., Luis, K. M., Serra-Pompei, C., and Miller, C. E.: A machine learning approach to produce a continuous Solar-Induced chlorophyll fluorescence over the Arctic Ocean, J. Geophys. Res.-Mach. Learn. Comput., 1, e2024JH000310, <ext-link xlink:href="https://doi.org/10.1029/2024JH000215" ext-link-type="DOI">10.1029/2024JH000215</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib39"><label>39</label><mixed-citation>Martino, M., Hamilton, D. S., Baker, A. R., Jickells, T., Bromley, T., Nojiri, Y., Quack, B., and Boyd, P. W.: Western Pacific atmospheric nutrient deposition fluxes, their impact on surface ocean productivity, Global Biogeochem. Cy., 28, 712–728, <ext-link xlink:href="https://doi.org/10.1002/2013GB004794" ext-link-type="DOI">10.1002/2013GB004794</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bib40"><label>40</label><mixed-citation>Mishonov, A. V., Boyer, T. P., Baranova, O. K., Bouchard, C. N., Cross, S. L., Garcia, H. E., Locarnini, R. A., Paver, C. R., Reagan, J. R., Wang, Z., Seidov, D., Grodsky, A. I., and Beauchamp, J. G.: World Ocean Database 2023, edited by: Bouchard, C., NOAA Atlas NESDIS 97, NOAA, <ext-link xlink:href="https://doi.org/10.25923/z885-h264" ext-link-type="DOI">10.25923/z885-h264</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib41"><label>41</label><mixed-citation>Moore, C. M., Mills, M. M., Arrigo, K. R., Berman-Frank, I., Bopp, L., Boyd, P. W., Galbraith, E. D., Geider, R. J., Guieu, C., Jaccard, S. L., Jickells, T. D., Lenton, T. M., Mahowald, N. M., Marañón, E., Marinov, I., Moore, J. K., Nakatsuka, T., Oschlies, A., Saito, M. A., Thingstad, T., Tsuda, A., and Ulloa, O.: Processes and patterns of oceanic nutrient limitation, Nat. Geosci., 6, 701–710, <ext-link xlink:href="https://doi.org/10.1038/ngeo1765" ext-link-type="DOI">10.1038/ngeo1765</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bib42"><label>42</label><mixed-citation> Możejko, J. and Gniot, R.: Application of Neural Networks for the Prediction of Total Phosphorus Concentrations in Surface Waters, Pol. J. Environ. Stud., 17, 363–368, 2008.</mixed-citation></ref>
      <ref id="bib1.bib43"><label>43</label><mixed-citation>Palacios, D. M., Hazen, E. L., Schroeder, I. D., and Bograd, S. J.: Modeling the temperature-nitrate relationship in the coastal upwelling domain of the California Current, J. Geophys. Res.-Oceans, 118, 1–17, <ext-link xlink:href="https://doi.org/10.1002/jgrc.20216" ext-link-type="DOI">10.1002/jgrc.20216</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bib44"><label>44</label><mixed-citation>Qi, J., Yu, Y., Yao, X., Yuan, G., and Gao, H.: Dry deposition fluxes of inorganic nitrogen and phosphorus in atmospheric aerosols over the Marginal Seas and Northwest Pacific, Atmos. Res., 245, 105076, <ext-link xlink:href="https://doi.org/10.1016/j.atmosres.2020.105076" ext-link-type="DOI">10.1016/j.atmosres.2020.105076</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bib45"><label>45</label><mixed-citation>Reagan, J. R., Boyer, T. P., García, H. E., Locarnini, R. A., Baranova, O. K., Bouchard, C., Cross, S. L., Mishonov, A. V., Paver, C. R., Seidov, D., Wang, Z., and Dukhovskoy, D.: World Ocean Atlas 2023, NOAA National Centers for Environmental Information, Dataset, NCEI Accession 0270533, NCEI, <ext-link xlink:href="https://doi.org/10.25921/va26-hv25" ext-link-type="DOI">10.25921/va26-hv25</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib46"><label>46</label><mixed-citation>Sarangi, P. K., Thangaradjou, T., Saravanakumar, A., and Balasubramanian, T.: Development of nitrate algorithm for the southwest bay of bengal water and its implication using remote sensing satellite datasets, IEEE J. Select. Top. Appl. Earth Obs. Remote Sens., 4, 983–991, <ext-link xlink:href="https://doi.org/10.1109/JSTARS.2011.2165204" ext-link-type="DOI">10.1109/JSTARS.2011.2165204</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bib47"><label>47</label><mixed-citation> Sigman, D. M. and Hain, M. P.: The Biological Productivity of the Ocean, Nat. Educ. Knowl., 3, 1–16, 2012.</mixed-citation></ref>
      <ref id="bib1.bib48"><label>48</label><mixed-citation>Steinhoff, T., Friedrich, T., Hartman, S. E., Oschlies, A., Wallace, D. W. R., and Körtzinger, A.: Estimating mixed layer nitrate in the North Atlantic Ocean, Biogeosciences, 7, 795–807, <ext-link xlink:href="https://doi.org/10.5194/bg-7-795-2010" ext-link-type="DOI">10.5194/bg-7-795-2010</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bib49"><label>49</label><mixed-citation>Su, H., Lu, X., Chen, Z., Zhang, H., Lu, W., and Wu, W.: Estimating Coastal Chlorophyll-A Concentration from Time-Series OLCI Data Based on Machine Learning, Remote Sens., 13, 576, <ext-link xlink:href="https://doi.org/10.3390/rs13040576" ext-link-type="DOI">10.3390/rs13040576</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib50"><label>50</label><mixed-citation>Sundararaman, H. K. K. and Shanmugam, P.: Estimates of the global ocean surface dissolved oxygen and macronutrients from satellite data, Remote Sens. Environ., 311, 114243, <ext-link xlink:href="https://doi.org/10.1016/j.rse.2024.114243" ext-link-type="DOI">10.1016/j.rse.2024.114243</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib51"><label>51</label><mixed-citation>Switzer, A. C., Kamykowski, D., and Zentara, S.-J.: Mapping nitrate in the global ocean using remotely sensed sea surface temperature, J. Geophys. Res., 108, 345–359, <ext-link xlink:href="https://doi.org/10.1029/2001JC000833" ext-link-type="DOI">10.1029/2001JC000833</ext-link>, 2003.</mixed-citation></ref>
      <ref id="bib1.bib52"><label>52</label><mixed-citation>Talley, L. D., Pickard, G. L., Emery, W. J., and Swift, J. H.: Descriptive Physical Oceanography, An Introduction, in: 6th Edn., Academic Press, 350–362, <ext-link xlink:href="https://doi.org/10.1016/B978-0-7506-4552-2.10010-1" ext-link-type="DOI">10.1016/B978-0-7506-4552-2.10010-1</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bib53"><label>53</label><mixed-citation>Wang, C., Su, B., Sun, J., Hu, X., and Liu, J.: A regional ocean database for the Coastal China Sea, Sci. Data, 12, 1550, <ext-link xlink:href="https://doi.org/10.1038/s41597-025-05840-w" ext-link-type="DOI">10.1038/s41597-025-05840-w</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bib54"><label>54</label><mixed-citation>Wang, L., Xu, Z., Gong, X., Zhang, P., Hao, Z., You, J., Zhao, X., and Guo, X.: Estimation of nitrate concentration and its distribution in the northwestern Pacific Ocean by a deep neural network model, Deep-Sea Res. Pt. I, 195, 104005, <ext-link xlink:href="https://doi.org/10.1016/j.dsr.2023.104005" ext-link-type="DOI">10.1016/j.dsr.2023.104005</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib55"><label>55</label><mixed-citation>Wang, W.-L., Moore, J. K., Martiny, A. C., and Primeau, F. W.: Convergent estimates of marine nitrogen fixation, Nature, 566, 205–211, <ext-link xlink:href="https://doi.org/10.1038/s41586-019-0911-2" ext-link-type="DOI">10.1038/s41586-019-0911-2</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bib56"><label>56</label><mixed-citation>Wang, Z., Wang, G., Guo, X., Hu, J., and Dai, M.: Reconstruction of High-Resolution Sea Surface Salinity over 2003–2020 in the South China Sea Using the Machine Learning Algorithm LightGBM Model, Remote Sens., 14, 6147, <ext-link xlink:href="https://doi.org/10.3390/rs14236147" ext-link-type="DOI">10.3390/rs14236147</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bib57"><label>57</label><mixed-citation>Yang, G. G., Wang, Q., Feng, J., He, L., Li, R., Lu, W., Liao, E., and Lai, Z.: Can three-dimensional nitrate structure be reconstructed from surface information with artificial intelligence? – A proof-of-concept study, Sci. Total Environ., 924, 171365, <ext-link xlink:href="https://doi.org/10.1016/j.scitotenv.2024.171365" ext-link-type="DOI">10.1016/j.scitotenv.2024.171365</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib58"><label>58</label><mixed-citation>Yasunaka, S., Nojiri, Y., Nakaoka, S., Ono, T., Whitney, F. A., and Telszewski, M.: Mapping of sea surface nutrients in the North Pacific: Basin-wide distribution and seasonal to interannual variability, J. Geophys. Res.-Oceans, 119, 7756–7771, <ext-link xlink:href="https://doi.org/10.1002/2014JC010318" ext-link-type="DOI">10.1002/2014JC010318</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bib59"><label>59</label><mixed-citation>Yasunaka, S., Ono, T., Nojiri, Y., Whitney, F. A., Wada, C., Murata, A., Nakaoka, S., and Hosoda, S.: Long-term variability of surface nutrient concentrations in the North Pacific, Geophys. Res. Lett., 43, 3389–3397, <ext-link xlink:href="https://doi.org/10.1002/2016GL068097" ext-link-type="DOI">10.1002/2016GL068097</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bib60"><label>60</label><mixed-citation>Yasunaka, S., Mitsudera, H., Whitney, F., and Nakaoka, S.: Nutrient and dissolved inorganic carbon variability in the North Pacific, J. Oceanogr., 77, 3–16, <ext-link xlink:href="https://doi.org/10.1007/s10872-020-00561-7" ext-link-type="DOI">10.1007/s10872-020-00561-7</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib61"><label>61</label><mixed-citation>Yu, X. R., Wen, Z., Jiang, R., Yang, J.-Y. T., Cao, Z., Hong, H., Zhou, Y., and Shi, D.: Assessing N2 fixation flux and its controlling factors in the (sub)tropical western North Pacific through high-resolution observations, Limnol. Oceanogr. Lett., 9, 716–724, <ext-link xlink:href="https://doi.org/10.1002/lol2.10390" ext-link-type="DOI">10.1002/lol2.10390</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib62"><label>62</label><mixed-citation>Zhong, A., Wang, D., Gong, F., Zhu, W., Fu, D., Zheng, Z., Huang, J., He, X., and Bai, Y.: Remote sensing estimates of global sea surface nitrate: Methodology and validation, Sci. Total Environ., 950, 175362, <ext-link xlink:href="https://doi.org/10.1016/j.scitotenv.2024.175362" ext-link-type="DOI">10.1016/j.scitotenv.2024.175362</ext-link>, 2024.</mixed-citation></ref>

  </ref-list></back>
    <!--<article-title-html>A historical nutrient dataset (1895–2024) for the  North Pacific: reconstructed from machine  learning and hydrographic observations</article-title-html>
<abstract-html/>
<ref-html id="bib1.bib1"><label>1</label><mixed-citation>
      
Arteaga, L., Pahlow, M., and Oschlies, A.: Global monthly sea surface nitrate fields estimated from remotely sensed sea surface temperature, chlorophyll, and modeled mixed layer depth, Geophys. Res. Lett., 42, 1130–1138, <a href="https://doi.org/10.1002/2014GL062937" target="_blank">https://doi.org/10.1002/2014GL062937</a>, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>2</label><mixed-citation>
      
Ascani, F., Richards, K. J., Firing, E., Grant, S., Johnson, K. S., Jia, Y.,
Lukas, R., and Karl, D. M.: Physical and biological controls of nitrate concentrations in the upper subtropical North Pacific Ocean, Deep-Sea Res.
Pt. II, 93, 119–134, <a href="https://doi.org/10.1016/j.dsr2.2013.01.034" target="_blank">https://doi.org/10.1016/j.dsr2.2013.01.034</a>, 2013.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>3</label><mixed-citation>
      
Barone, B., Church, M. J., Dugenne, M., Hawco, N. J., Jahn, O., White, A.
E., John, S. G., Follows, M. J., DeLong, E. F., and Karl, D. M.: Biogeochemical dynamics in adjacent mesoscale eddies of opposite polarity,
Global Biogeochem. Cy., 36, e2021GB007115, <a href="https://doi.org/10.1029/2021GB007115" target="_blank">https://doi.org/10.1029/2021GB007115</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>4</label><mixed-citation>
      
Benitez-Nelson, C. R., Bidigare, R. R., Dickey, T. D., Landry, M. R., Leonard, C. L., Brown, S. L., Nencioli, F., Rii, Y. M., Maiti, K., Becker,
J. W., Bibby, T. S., Black, W., Cai, W. J., Carlson, C. A., Chen, F., Kuwahara, V. S., Mahaffey, C., McAndrew, P. M., Quay, P. D., Rappé, M.
S., Selph, K. E., Simmons, M. P., and Yang, E. J.: Mesoscale Eddies Drive
Increased Silica Export in the Subtropical Pacific Ocean, Science, 316,
1017–1021, <a href="https://doi.org/10.1126/science.1136221" target="_blank">https://doi.org/10.1126/science.1136221</a>, 2007.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>5</label><mixed-citation>
      
Bidigare, R. R., Chai, F., Landry, M. R., Lukas, R., Hannides, C. C. S.,
Christensen, S. J., Karl, D. M., Shi, L., and Chao, Y.: Subtropical ocean
ecosystem structure changes forced by North Pacific climate variations, J.
Plankton Res., 31, 1131–1139, <a href="https://doi.org/10.1093/plankt/fbp064" target="_blank">https://doi.org/10.1093/plankt/fbp064</a>, 2009.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>6</label><mixed-citation>
      
Bonnet, S., Caffin, M., Berthelot, H., and Moutin, T.: Hot spot of N<sub>2</sub> fixation in the western tropical South Pacific pleads for a spatial
decoupling between N<sub>2</sub> fixation and denitrification, P. Natl. Acad. Sci. USA, 114, E2800–E2801, <a href="https://doi.org/10.1073/pnas.1619514114" target="_blank">https://doi.org/10.1073/pnas.1619514114</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>7</label><mixed-citation>
      
Browning, T. J. and Moore, C. M.: Global analysis of ocean phytoplankton
nutrient limitation reveals high prevalence of co-limitation, Nat. Commun.,
14, 5014, <a href="https://doi.org/10.1038/s41467-023-40774-0" target="_blank">https://doi.org/10.1038/s41467-023-40774-0</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>8</label><mixed-citation>
      
Browning, T. J., Liu, X., Zhang, R., Wen, Z., Liu, J., Zhou, Y., Xu, F., Cai, Y., Zhou, K., Cao, Z., Zhu, Y., Shi, D., Achterberg, E. P., and Dai, M.: Nutrient co-limitation in the subtropical Northwest Pacific, Limnol. Oceanogr. Lett., 7, 52–61, <a href="https://doi.org/10.1002/lol2.10205" target="_blank">https://doi.org/10.1002/lol2.10205</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>9</label><mixed-citation>
      
Chelton, D. B., Schlax, M. G., Samelson, R. M., and de Szoeke, R. A.: Global
observations of large oceanic eddies, Geophys. Res. Lett., 34, L15606,
<a href="https://doi.org/10.1029/2007GL030812" target="_blank">https://doi.org/10.1029/2007GL030812</a>, 2007.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>10</label><mixed-citation>
      
Chen, S., Hu, C., Barnes, B. B., Wanninkhof, R., Cai, W., Barbero, L., and
Pierrot, D.: A machine learning approach to estimate surface ocean <i>p</i>CO<sub>2</sub> from satellite measurements, Remote Sens. Environ., 228, 203–226, <a href="https://doi.org/10.1016/j.rse.2019.04.019" target="_blank">https://doi.org/10.1016/j.rse.2019.04.019</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>11</label><mixed-citation>
      
Chen, S., Meng, Y., Lin, S., Yu, Y., and Xi, J.: Estimation of sea surface
nitrate from space: Current status and future potential, Sci. Total Environ., 899, 165690, <a href="https://doi.org/10.1016/j.scitotenv.2023.165690" target="_blank">https://doi.org/10.1016/j.scitotenv.2023.165690</a>, 2023.


    </mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>12</label><mixed-citation>
      
Chen, S., Meng, Y., Shang, S., Zheng, M., Wang, Y., and Chai, F.: Remote
estimates of sea surface nitrate and its trends from ocean color in the northwest Pacific, J. Geophys. Res., 129, e2023JC019846,
<a href="https://doi.org/10.1029/2023JC019846" target="_blank">https://doi.org/10.1029/2023JC019846</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>13</label><mixed-citation>
      
Dai, M., Luo, Y., Achterberg, E. P., Browning, T. J., Cai, Y., Cao, Z., Chai, F., Chen, B., Church, M. J., Ci, D., Du, C., Gao, K., Guo, X., Hu, Z., Kao, S., Laws, E. A., Lee, Z., Lin, H., Liu, Q., Liu, X., Luo, W., Meng, F., Shang, S., Shi, D., Saito, H., Song, L., Wan, X. S., Wang, Y., Wang, W.-L., Wen, Z., Xiu, P., Zhang, J., Zhang, R., and Zhou, K.: Upper Ocean
biogeochemistry of the oligotrophic North Pacific subtropical gyre: From
nutrient sources to carbon export, Rev. Geophys., 61, e2022RG000800,
<a href="https://doi.org/10.1029/2022RG000800" target="_blank">https://doi.org/10.1029/2022RG000800</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>14</label><mixed-citation>
      
Dave, A. C. and Lozier, M. S.: Local stratification control of marine
productivity in the subtropical North Pacific, J. Geophys. Res., 115, C12032,
<a href="https://doi.org/10.1029/2010JC006519" target="_blank">https://doi.org/10.1029/2010JC006519</a>, 2010.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>15</label><mixed-citation>
      
Deutsch, C. and Weber, T.: Nutrient Ratios as a Tracer and Driver of Ocean
Biogeochemistry, Annu. Rev. Mar. Sci., 4, 113–138,
<a href="https://doi.org/10.1146/annurev-marine-120710-100912" target="_blank">https://doi.org/10.1146/annurev-marine-120710-100912</a>, 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>16</label><mixed-citation>
      
Dong, L., Qi, J., Yin, B., Zhi, H., Li, D., Yang, S., Wang, W., Cai, H., and
Xie, B.: Reconstruction of subsurface salinity structure in the South China
Sea using satellite observations: a LightGBM-Based Deep forest method, Remote Sens., 14, 3494, <a href="https://doi.org/10.3390/rs14143494" target="_blank">https://doi.org/10.3390/rs14143494</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>17</label><mixed-citation>
      
Du, C., He, R., Liu, Z., Huang, T., Wang, L., Yuan, Z., Xu, Y., Wang, Z., and Dai, M.: Climatology of nutrient distributions in the South China Sea based on a large data set derived from a new algorithm, Prog. Oceanogr., 195, 102586, <a href="https://doi.org/10.1016/j.pocean.2021.102586" target="_blank">https://doi.org/10.1016/j.pocean.2021.102586</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>18</label><mixed-citation>
      
Du, C., Zheng, N., Kao, S.-J., Dai, M., Cao, Z., Shi, D., Li, Q., Wang, H.,
and Li, X.: Validated temperature and salinity data, and reconstructed
nutrient concentrations in the North Pacific (1895–2024) (Version 2), Zenodo [data set], <a href="https://doi.org/10.5281/zenodo.17451417" target="_blank">https://doi.org/10.5281/zenodo.17451417</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>19</label><mixed-citation>
      
Dugdale, R. C., Morel, A., Bricaud, A., and Wilkerson, F. P.: Modeling new
production in upwelling centers: A case study of modeling new production
from remotely-sensed temperature and color, J. Geophys. Res., 94, 18119–18132, <a href="https://doi.org/10.1029/JC094iC12p18119" target="_blank">https://doi.org/10.1029/JC094iC12p18119</a>, 1989.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>20</label><mixed-citation>
      
Eugster, O. and Gruber, N.: A probabilistic estimate of global marine N-fixation and denitrification, Global Biogeochem. Cy., 26, GB4013,
<a href="https://doi.org/10.1029/2012GB004300" target="_blank">https://doi.org/10.1029/2012GB004300</a>, 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>21</label><mixed-citation>
      
Fuhr, M., Laukert, G., Yu, Y., Nürnberg, D., and Frank, M.: Tracing water mass mixing from the Equatorial to the North Pacific Ocean with dissolved neodymium isotopes and concentrations, Front. Mar. Sci., 7, 603761, <a href="https://doi.org/10.3389/fmars.2020.603761" target="_blank">https://doi.org/10.3389/fmars.2020.603761</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>22</label><mixed-citation>
      
Gan, M., Pan, S., Chen, Y., Cheng, C., Pan, H., and Zhu, X.: Application of
the Machine Learning LightGBM model to the prediction of the water levels of
the Lower Columbia River, J. Mar. Sci. Eng., 9, 496, <a href="https://doi.org/10.3390/jmse9050496" target="_blank">https://doi.org/10.3390/jmse9050496</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>23</label><mixed-citation>
      
Garcia, H. E., Boyer, T. P., Locarnini, R. A., Reagan, J. R., Mishonov, A.
V., Baranova, O. K., Paver, C. R., Wang, Z., Bouchard, C. N., Cross, S. L.,
Seidov, D., and Dukhovskoy, D.: World Ocean Database 2023: User's Manual, edited by: Mishonov, A. V., NOAA Atlas NESDIS 98, NOAA, 129&thinsp;pp.,
<a href="https://doi.org/10.25923/j8gq-ee82" target="_blank">https://doi.org/10.25923/j8gq-ee82</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>24</label><mixed-citation>
      
Goes, J. I., Saino, T., Oaku, H., and Jiang, D. L.: A Method for Estimating
Sea Surface Nitrate Concentrations from Remotely Sensed SST and Chlorophyll
– A Case Study for the North Pacific Ocean Using OCTS/ADEOS Data, IEEE
T. Geosci. Remote, 37, 1633–1644, <a href="https://doi.org/10.1109/36.774702" target="_blank">https://doi.org/10.1109/36.774702</a>, 1999.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>25</label><mixed-citation>
      
Hu, C., Feng, L., and Guan, Q.: A machine learning approach to estimate surface chlorophyll <i>a</i> concentrations in global oceans from satellite
measurements, IEEE T. Geosci. Remote, 59, 4590–4607, <a href="https://doi.org/10.1109/TGRS.2020.3016473" target="_blank">https://doi.org/10.1109/TGRS.2020.3016473</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>26</label><mixed-citation>
      
Huang, Y., Nicholson, D., Huang, B., and Cassar, N.: Global estimates of
marine gross primary production based on machine learning upscaling of field
observations, Global Biogeochem. Cy., 35, e2020GB006718,
<a href="https://doi.org/10.3389/fmars.2022.837183" target="_blank">https://doi.org/10.3389/fmars.2022.837183</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>27</label><mixed-citation>
      
Huang, Y., Tagliabue, A., and Cassar, N.: Data-Driven Modeling of Dissolved
Iron in the Global Ocean, Front. Mar. Sci., 9, 837183, <a href="https://doi.org/10.3389/fmars.2022.837183" target="_blank">https://doi.org/10.3389/fmars.2022.837183</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>28</label><mixed-citation>
      
Kamykowski, D.: A preliminary model of the relationship between temperature
and plant nutrients in the upper ocean, Deep-Sea Res., 34, 1067–1079,
<a href="https://doi.org/10.1016/0198-0149(87)90064-1" target="_blank">https://doi.org/10.1016/0198-0149(87)90064-1</a>, 1987.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>29</label><mixed-citation>
      
Kamykowski, D.: Estimating upper ocean phosphate concentrations using ARGO
float temperature profiles, Deep-Sea Res. Pt. I, 55, 1580–1589,
<a href="https://doi.org/10.1016/j.dsr.2008.05.005" target="_blank">https://doi.org/10.1016/j.dsr.2008.05.005</a>, 2008.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>30</label><mixed-citation>
      
Kamykowski, D., Zentara, S.-J., Morrison, J. M., and Switzer, A. C.: Dynamic
global patterns of nitrate, phosphate, silicate, and iron availability and
phytoplankton community composition from remote sensing data, Global Biogeochem. Cy., 16, 1077, <a href="https://doi.org/10.1029/2001GB001640" target="_blank">https://doi.org/10.1029/2001GB001640</a>, 2002.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>31</label><mixed-citation>
      
Karl, D. M. and Church, M. J.: Ecosystem structure and dynamics in the North
Pacific Subtropical Gyre: new views of an old ocean, Ecosystems, 20, 433–457, <a href="https://doi.org/10.1007/s10021-017-0117-0" target="_blank">https://doi.org/10.1007/s10021-017-0117-0</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>32</label><mixed-citation>
      
Karl, D. M., Letelier, R. M., Bidigare, R. R., Björkman, K. M., Church, M. J., Dore, J. E., and White, A. E.: Seasonal-to-decadal scale variability
in primary production and particulate matter export at Station ALOHA, Prog.
Oceanogr., 195, 102563, <a href="https://doi.org/10.1016/j.pocean.2021.102563" target="_blank">https://doi.org/10.1016/j.pocean.2021.102563</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>33</label><mixed-citation>
      
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.-Y.: Lightgbm: A highly efficient gradient boosting decision tree, Adv.
Neural Inf. Process. Syst., 30, 3147–3155, <a href="https://doi.org/10.48550/arXiv.1706.08357" target="_blank">https://doi.org/10.48550/arXiv.1706.08357</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib34"><label>34</label><mixed-citation>
      
Lee, G. S., Lee, J. H., and Cho, H. Y.: Spatiotemporal estimation of nutrient data from the northwest pacific and east Asian seas, Sci. Data, 10, 354, <a href="https://doi.org/10.1038/s41597-023-02602-4" target="_blank">https://doi.org/10.1038/s41597-023-02602-4</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib35"><label>35</label><mixed-citation>
      
Liaw, A. and Wiener, M.: Classification and regression by randomForest, R News, 2, 18–22, 2002.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib36"><label>36</label><mixed-citation>
      
Lipschultz, F., Bates, N. R., Carlson, C. A., and Hansell, D. A.: New
production in the Sargasso Sea: History and current status, Global Biogeochem. Cy., 16, 1001, <a href="https://doi.org/10.1029/2000GB001320" target="_blank">https://doi.org/10.1029/2000GB001320</a>, 2002.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib37"><label>37</label><mixed-citation>
      
Liu, H., Lin, L., Wang, Y., Du, L., Wang, S., Zhou, P., Yu, Y., Gong, X., and Lu, X.: Reconstruction of Monthly Surface Nutrient Concentrations in the Yellow and Bohai Seas from 2003–2019 Using Machine Learning, Remote Sens.,
14, 5021, <a href="https://doi.org/10.3390/rs14195021" target="_blank">https://doi.org/10.3390/rs14195021</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib38"><label>38</label><mixed-citation>
      
Madani, N., Parazoo, N. C., Manizza, M., Chatterjee, A., Carroll, D.,
Menemenlis, D., Fouest, V. L., Matsuoka, A., Luis, K. M., Serra-Pompei, C., and Miller, C. E.: A machine learning approach to produce a continuous
Solar-Induced chlorophyll fluorescence over the Arctic Ocean, J. Geophys.
Res.-Mach. Learn. Comput., 1, e2024JH000310, <a href="https://doi.org/10.1029/2024JH000215" target="_blank">https://doi.org/10.1029/2024JH000215</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib39"><label>39</label><mixed-citation>
      
Martino, M., Hamilton, D. S., Baker, A. R., Jickells, T., Bromley, T., Nojiri, Y., Quack, B., and Boyd, P. W.: Western Pacific atmospheric nutrient
deposition fluxes, their impact on surface ocean productivity, Global Biogeochem. Cy., 28, 712–728, <a href="https://doi.org/10.1002/2013GB004794" target="_blank">https://doi.org/10.1002/2013GB004794</a>, 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib40"><label>40</label><mixed-citation>
      
Mishonov, A. V., Boyer, T. P., Baranova, O. K., Bouchard, C. N., Cross, S.
L., Garcia, H. E., Locarnini, R. A., Paver, C. R., Reagan, J. R., Wang, Z.,
Seidov, D., Grodsky, A. I., and Beauchamp, J. G.: World Ocean Database 2023,
edited by: Bouchard, C., NOAA Atlas NESDIS 97, NOAA, <a href="https://doi.org/10.25923/z885-h264" target="_blank">https://doi.org/10.25923/z885-h264</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib41"><label>41</label><mixed-citation>
      
Moore, C. M., Mills, M. M., Arrigo, K. R., Berman-Frank, I., Bopp, L., Boyd, P. W., Galbraith, E. D., Geider, R. J., Guieu, C., Jaccard, S. L., Jickells, T. D., Lenton, T. M., Mahowald, N. M., Marañón, E., Marinov, I., Moore, J. K., Nakatsuka, T., Oschlies, A., Saito, M. A., Thingstad, T., Tsuda, A., and Ulloa, O.: Processes and patterns of oceanic nutrient limitation, Nat. Geosci., 6, 701–710, <a href="https://doi.org/10.1038/ngeo1765" target="_blank">https://doi.org/10.1038/ngeo1765</a>, 2013.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib42"><label>42</label><mixed-citation>
      
Możejko, J. and Gniot, R.: Application of Neural Networks for the Prediction of Total Phosphorus Concentrations in Surface Waters, Pol. J.
Environ. Stud., 17, 363–368, 2008.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib43"><label>43</label><mixed-citation>
      
Palacios, D. M., Hazen, E. L., Schroeder, I. D., and Bograd, S. J.: Modeling
the temperature-nitrate relationship in the coastal upwelling domain of the
California Current, J. Geophys. Res.-Oceans, 118, 1–17,
<a href="https://doi.org/10.1002/jgrc.20216" target="_blank">https://doi.org/10.1002/jgrc.20216</a>, 2013.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib44"><label>44</label><mixed-citation>
      
Qi, J., Yu, Y., Yao, X., Yuan, G., and Gao, H.: Dry deposition fluxes of
inorganic nitrogen and phosphorus in atmospheric aerosols over the Marginal
Seas and Northwest Pacific, Atmos. Res., 245, 105076,
<a href="https://doi.org/10.1016/j.atmosres.2020.105076" target="_blank">https://doi.org/10.1016/j.atmosres.2020.105076</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib45"><label>45</label><mixed-citation>
      
Reagan, J. R., Boyer, T. P., García, H. E., Locarnini, R. A., Baranova,
O. K., Bouchard, C., Cross, S. L., Mishonov, A. V., Paver, C. R., Seidov, D., Wang, Z., and Dukhovskoy, D.: World Ocean Atlas 2023, NOAA National Centers for Environmental Information, Dataset, NCEI Accession 0270533, NCEI,
<a href="https://doi.org/10.25921/va26-hv25" target="_blank">https://doi.org/10.25921/va26-hv25</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib46"><label>46</label><mixed-citation>
      
Sarangi, P. K., Thangaradjou, T., Saravanakumar, A., and Balasubramanian, T.:
Development of nitrate algorithm for the southwest bay of bengal water and
its implication using remote sensing satellite datasets, IEEE J. Select. Top.
Appl. Earth Obs. Remote Sens., 4, 983–991, <a href="https://doi.org/10.1109/JSTARS.2011.2165204" target="_blank">https://doi.org/10.1109/JSTARS.2011.2165204</a>, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib47"><label>47</label><mixed-citation>
      
Sigman, D. M. and Hain, M. P.: The Biological Productivity of the Ocean,
Nat. Educ. Knowl., 3, 1–16, 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib48"><label>48</label><mixed-citation>
      
Steinhoff, T., Friedrich, T., Hartman, S. E., Oschlies, A., Wallace, D. W. R., and Körtzinger, A.: Estimating mixed layer nitrate in the North Atlantic Ocean, Biogeosciences, 7, 795–807, <a href="https://doi.org/10.5194/bg-7-795-2010" target="_blank">https://doi.org/10.5194/bg-7-795-2010</a>, 2010.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib49"><label>49</label><mixed-citation>
      
Su, H., Lu, X., Chen, Z., Zhang, H., Lu, W., and Wu, W.: Estimating Coastal
Chlorophyll-A Concentration from Time-Series OLCI Data Based on Machine
Learning, Remote Sens., 13, 576, <a href="https://doi.org/10.3390/rs13040576" target="_blank">https://doi.org/10.3390/rs13040576</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib50"><label>50</label><mixed-citation>
      
Sundararaman, H. K. K. and Shanmugam, P.: Estimates of the global ocean surface dissolved oxygen and macronutrients from satellite data, Remote Sens. Environ., 311, 114243, <a href="https://doi.org/10.1016/j.rse.2024.114243" target="_blank">https://doi.org/10.1016/j.rse.2024.114243</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib51"><label>51</label><mixed-citation>
      
Switzer, A. C., Kamykowski, D., and Zentara, S.-J.: Mapping nitrate in the
global ocean using remotely sensed sea surface temperature, J. Geophys. Res., 108, 345–359, <a href="https://doi.org/10.1029/2001JC000833" target="_blank">https://doi.org/10.1029/2001JC000833</a>, 2003.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib52"><label>52</label><mixed-citation>
      
Talley, L. D., Pickard, G. L., Emery, W. J., and Swift, J. H.: Descriptive
Physical Oceanography, An Introduction, in: 6th Edn., Academic Press,
350–362, <a href="https://doi.org/10.1016/B978-0-7506-4552-2.10010-1" target="_blank">https://doi.org/10.1016/B978-0-7506-4552-2.10010-1</a>, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib53"><label>53</label><mixed-citation>
      
Wang, C., Su, B., Sun, J., Hu, X., and Liu, J.: A regional ocean database for the Coastal China Sea, Sci. Data, 12, 1550, <a href="https://doi.org/10.1038/s41597-025-05840-w" target="_blank">https://doi.org/10.1038/s41597-025-05840-w</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib54"><label>54</label><mixed-citation>
      
Wang, L., Xu, Z., Gong, X., Zhang, P., Hao, Z., You, J., Zhao, X., and Guo,
X.: Estimation of nitrate concentration and its distribution in the northwestern Pacific Ocean by a deep neural network model, Deep-Sea Res. Pt. I, 195, 104005, <a href="https://doi.org/10.1016/j.dsr.2023.104005" target="_blank">https://doi.org/10.1016/j.dsr.2023.104005</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib55"><label>55</label><mixed-citation>
      
Wang, W.-L., Moore, J. K., Martiny, A. C., and Primeau, F. W.: Convergent
estimates of marine nitrogen fixation, Nature, 566, 205–211,
<a href="https://doi.org/10.1038/s41586-019-0911-2" target="_blank">https://doi.org/10.1038/s41586-019-0911-2</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib56"><label>56</label><mixed-citation>
      
Wang, Z., Wang, G., Guo, X., Hu, J., and Dai, M.: Reconstruction of High-Resolution Sea Surface Salinity over 2003–2020 in the South China Sea
Using the Machine Learning Algorithm LightGBM Model, Remote Sens., 14, 6147,
<a href="https://doi.org/10.3390/rs14236147" target="_blank">https://doi.org/10.3390/rs14236147</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib57"><label>57</label><mixed-citation>
      
Yang, G. G., Wang, Q., Feng, J., He, L., Li, R., Lu, W., Liao, E., and Lai,
Z.: Can three-dimensional nitrate structure be reconstructed from surface
information with artificial intelligence? – A proof-of-concept study, Sci.
Total Environ., 924, 171365, <a href="https://doi.org/10.1016/j.scitotenv.2024.171365" target="_blank">https://doi.org/10.1016/j.scitotenv.2024.171365</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib58"><label>58</label><mixed-citation>
      
Yasunaka, S., Nojiri, Y., Nakaoka, S., Ono, T., Whitney, F. A., and Telszewski, M.: Mapping of sea surface nutrients in the North Pacific: Basin-wide distribution and seasonal to interannual variability, J. Geophys.
Res.-Oceans, 119, 7756–7771, <a href="https://doi.org/10.1002/2014JC010318" target="_blank">https://doi.org/10.1002/2014JC010318</a>, 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib59"><label>59</label><mixed-citation>
      
Yasunaka, S., Ono, T., Nojiri, Y., Whitney, F. A., Wada, C., Murata, A.,
Nakaoka, S., and Hosoda, S.: Long-term variability of surface nutrient concentrations in the North Pacific, Geophys. Res. Lett., 43, 3389–3397,
<a href="https://doi.org/10.1002/2016GL068097" target="_blank">https://doi.org/10.1002/2016GL068097</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib60"><label>60</label><mixed-citation>
      
Yasunaka, S., Mitsudera, H., Whitney, F., and Nakaoka, S.: Nutrient and
dissolved inorganic carbon variability in the North Pacific, J. Oceanogr.,
77, 3–16, <a href="https://doi.org/10.1007/s10872-020-00561-7" target="_blank">https://doi.org/10.1007/s10872-020-00561-7</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib61"><label>61</label><mixed-citation>
      
Yu, X. R., Wen, Z., Jiang, R., Yang, J.-Y. T., Cao, Z., Hong, H., Zhou, Y.,
and Shi, D.: Assessing N2 fixation flux and its controlling factors in the
(sub)tropical western North Pacific through high-resolution observations,
Limnol. Oceanogr. Lett., 9, 716–724, <a href="https://doi.org/10.1002/lol2.10390" target="_blank">https://doi.org/10.1002/lol2.10390</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib62"><label>62</label><mixed-citation>
      
Zhong, A., Wang, D., Gong, F., Zhu, W., Fu, D., Zheng, Z., Huang, J., He, X., and Bai, Y.: Remote sensing estimates of global sea surface nitrate: Methodology and validation, Sci. Total Environ., 950, 175362,
<a href="https://doi.org/10.1016/j.scitotenv.2024.175362" target="_blank">https://doi.org/10.1016/j.scitotenv.2024.175362</a>, 2024.

    </mixed-citation></ref-html>--></article>
