A historical nutrient dataset (1895–2024) for the North Pacific: reconstructed from machine learning and hydrographic observations
Abstract. Nutrients play a critical role in oceanic primary productivity and the biological pump. However, compared to hydrographic parameters such as temperature and salinity, nutrient observations are limited due to their labor-intensive and costly measurements. Thus, nutrient observations are several orders of magnitude sparser than hydrographic observations. In this study, we first established a rigorous data quality control procedure to clean the hydrographic and nutrient (including NO₃⁻, NO₂⁻, DIP, and Si(OH)₄) observations collected from World Ocean Database (WOD) and CLIVAR and Carbon Hydrographic Data Office (CCHDO) in the North Pacific. Subsequently, the cleaned and high-quality CCHDO dataset was used to train three machine learning models – Random Forest, Light Gradient Boosting Machine (LightGBM), and Gaussian Process Regression – to establish relationships between nutrient concentrations and key variables, including space coordinates (longitude, latitude, and depth), time variables (year and month), and water mass properties (indexed by potential temperature and salinity). Validation shows that the reconstruction closely matches the observations, with RMSEs of <1.41, <0.071, <0.089 and <3.07 mmol kg-1 for NO₃⁻, NO₂⁻, DIP, and Si(OH)₄, respectively. The validated models were then applied to reconstruct nutrient concentrations from the hydrographic observations in WOD, most of which lacked direct nutrient measurements. This resulted in ~473 million reconstructed nutrient data points across 1.92 million stations for each nutrient, spanning from 1895 to 2024, representing a 2,127 to 2,393-fold increase compared to the original nutrient observations in the North Pacific (197,539 to 222,234). This new dataset will be valuable for studying nutrient variability under climate change and anthropogenic influences, and for providing transient boundary conditions in ocean biogeochemical models. The dataset generated in this study is openly available on Zenodo at https://zenodo.org/records/17451417.
Overall comment:
This manuscript presents a valuable contribution to the field of chemical oceanography. The authors have reconstructed a massive database of historical nutrient data points for the North Pacific, greatly expanding original observations. The rigorous four-level quality control and the use of multiple machine learning (ML) architectures make this a strong candidate for the journal Earth System Science Data.
Overall, this is a good paper. However, I have some concerns regarding the temporal extrapolation, which must be addressed to ensure the dataset's reliability for historical hindcasts. Besides, the paper would benefit from a more in-depth discussion on the long-term trend of nutrients, which can help strengthen the utility of such a historical dataset.
I recommend minor revisions to strengthen the methodology and discussion before publication further.
Major comments:
1. Temporal extrapolation robustness: The use of three validation strategies (sample-random, station-random, and cruise-random) provides a transparent view of error, and the cruise-random approach should be most convincing in validating spatial extrapolation. However, in terms of temporal extrapolation, the training dataset (CCHDO) only spans from 1973 to 2022, then the model is applied to reconstruct nutrients going back to 1895. The inclusion of year as a predictor might be biased if any trend learned from 1973–2022 does not map onto the 1895–1972 era. The authors need to justify that their approach can be extrapolated not only spatially but also temporally, maybe through some discussion about whether the water mass-nutrient relationships remained relatively stationary over the last century (but is this really true given the acceleration of anthropogenic forcing?), or validate with some "time-slices" to prove the temporal predictor is robust.
2. Missing long-term analyses: A major selling point of this paper is the temporal extent of the reconstruction (1895–2024). However, the Results section is dominated by climatological maps (Figs. 10–13), which effectively collapse the temporal dimension that the authors worked so hard to reconstruct. Providing 130 years of data without showing a single long-term trend analysis (e.g., decadal shifts in the nutricline depth, or basin-scale nutrient inventory changes) undermines the claim that this dataset is ‘historical.’ I suggest the authors add a section analyzing a long-term trend or any regime shift using their reconstructed nutrient data. This would serve as a proof of concept that the reconstruction captures low-frequency climate variability and is not just a high-resolution climatology.
3. Elaboration on potential future applications: I think the reconstructed datasets would be impactful and have broad utility, but their applications are written in a generic way. To increase the impact of this paper, I recommend expanding the discussion to explicitly list potential future applications of this dataset. Specific examples could be to use this 4D dataset to spin up ocean biogeochemical models, or investigate nutrient stoichiometric changes, etc.
Minor comments:
- Section 2.1: Oxygen is a fundamental tracer for remineralization and is physically coupled with nutrients via the Redfield ratio and AOU, but is not included in the predictors. Can the authors explain why it is not included? Is it because many datasets lack this property?
- Table 1: The salinity data count increases after quality control. Typo?
- Figure 3a: Hard to visualize the low station counts in the open ocean. Consider plotting the colorbar in log scale?
- Line 373: The model performance for NO2 is notably lower (R^2 = 0.32–0.72) compared to other nutrients. Given that NO2 is biologically dynamic, the utility of a T/S-based reconstruction is questionable. Consider removing NO2 from the primary dataset or flagging it with a high-uncertainty warning?
- Line 417: The manuscript notes that "most data points are located above 2,000 m." How should we interpret the deep data then? Do they have larger RMSE? If so, to what extent can the reconstructed deep nutrient fields be considered reliable for full-depth modeling applications?