A global 3D chlorophyll-a dataset derived from multimodal deep learning reconstruction
Abstract. Chlorophyll-a (Chl-a) is a key variable for characterizing marine phytoplankton biomass and upper-ocean biogeochemical variability. Existing global products are generally limited to the ocean surface and therefore cannot adequately resolve subsurface vertical structure. In this study, a multimodal Profile-Surface Transformer (PST) framework was developed by integrating Biogeochemical-Argo (BGC-Argo) Chl-a profiles, Core-Argo temperature-salinity profiles, and satellite-derived surface Chl-a to reconstruct the three-dimensional (3D) vertical structure of Chl-a. A global monthly mean 3D Chl-a dataset at 1° spatial resolution for 2005–2025 was subsequently constructed. Evaluation of the dataset demonstrates that PST exhibits robust stability and effective profile reconstruction capability across various generalization scenarios. The coefficients of determination under random split, year-based split, and spatial cross-validation are 0.8923, 0.8739, and 0.8588 ± 0.0105, respectively. Furthermore, independent external validation using ship-based observations confirms that the model maintains strong generalization ability and high accuracy across different observing systems. The constructed dataset successfully reproduces known global patterns of Chl-a vertical structure and its subsurface chlorophyll maximum (SCM), as well as seasonal variability and spatial characteristics in the upper ocean. The long-term evolution of global SCM depth during 2005–2025 is characterized primarily by regionally heterogeneous redistribution superimposed on a stable zonal background, rather than by a pronounced monotonic change in the global mean depth. Overall, the constructed long-term 3D Chl-a dataset integrates surface observations and discrete biogeochemical profiles, delivering a new observationally constrained global product. This dataset is well-suited for studies on the vertical ecological structure of phytoplankton, long-term ocean biogeochemical variability, and the responses of upper-ocean ecosystems to climate change. The dataset is publicly available from Zenodo (Meng et al., 2026a) at: https://doi.org/10.5281/zenodo.19494734.
General Comments:
This study presents a global three-dimensional, over 20-year monthly dataset of chlorophyll-a (Chl-a) constructed using a multimodal Profile-Surface Transformer (PST) framework. This framework integrates BGC-Argo Chl-a profiles, Core-Argo temperature-salinity profiles, and MODIS satellite-derived surface Chl-a data. The authors examine the seasonal variability and long-term vertical structure of Chl-a, including the depth of the subsurface chlorophyll maximum (SCM). Chl-a is a critical variable for characterizing marine phytoplankton biomass and understanding the variability of upper-ocean biogeochemical cycles. Although machine learning/deep learning methods have been widely employed to reconstruct surface Chl-a on a global scale, vertical reconstruction remains underdeveloped. The study is novel and significant for understanding the response of the global marine ecosystem and the carbon/nitrogen cycle to the climate change. Overall, the paper is clearly written and easy to understand. However, I have some concerns about the organization, description and analysis of the reconstructed data, which further refinement is needed to meet the publish standard of ESSD. Therefore, I recommended a major revision.
Main concerns:
1)The current introduction and discussion should be better streamlined
The current introduction is lengthy and contains redundant elements, which could be streamlined to improve readability. I believe that certain sections should be relocated to the discussion. Both the introduction and discussion would benefit from a more thorough literature review. The purpose of the literature review in the introduction is to inform readers about what has been investigated and what remains unexplored, thereby establishing new questions worth addressing. In contrast, the literature review in the discussion serves to compare the current study with existing research and to highlight the novelty of the work. The primary novelty of this research lies in its global reconstruction of three-dimensional chlorophyll-a (Chl-a). I recommend incorporating a detailed literature review of surface Chl-a reconstruction and Global Ocean Biogeochemical Models (GOBMs) in the introduction. Conversely, the literature review pertaining to regional three-dimensional reconstruction should be expanded in the discussion section (lines 126-146). Additionally, the methodology section (3.1 Problem Definition) should be integrated into the introduction and should not be restated. I had more detailed suggestions on how to reorganize in the specific comments.
2)About the application of the dataset
The authors indicate in several sections that they utilize the data to analyze long-term chlorophyll-a (Chl-a) trends and the depth of the subsurface chlorophyll maximum (SCM) over the studied period. However, the statistical results show that the long-term trend is insignificant on a global scale. Therefore, the emphasis on long-term analysis should be moderated. The interannual variability presented in Figure 6b appears significant and warrants more explicit discussion. I recommend that the author team highlight the dataset's applicability for multi-scale analyses, including seasonal, interannual, and long-term trends, rather than focusing excessively on statistically insignificant results.
3)For the reconstructed data validation
I recommend that the authors discuss the reconstructed dataset in comparison with CMIP6/7 Global Ocean Biogeochemical Model (GOBM) products. The comparison does not necessarily need to be a full quantitative validation against multiple model outputs, but it would be useful to discuss broad spatial patterns, similarities, differences, and limitations relative to existing model-based products. The current validation framework already includes multiple validation strategies and independent ship-based observations, but such a comparison discussion would help readers better understand the advantage and disadvantages of this dataset.
4)A few other main concerns
4.1)Figures display concerns: Figure 5 and 6 are too small, their colorscales are unclear; Therefore, these figures should be enlarged and the colorbars should be changed. Furthermore, a global horizontal distribution figure and representative reconstructed time series (e.g., depth of the subsurface SCM, maximum Chl-a concentrations) could be presented.
4.2)Products resolution concerns: The authors describe resampling the MODIS surface Chl-a data to a 0.25° × 0.25° grid during preprocessing (Section 2.3) to balance large-scale patterns and regional details. However, the final reconstructed three-dimensional product is provided at a 1° × 1° resolution. It is recommended that the authors briefly state why the vertical reconstruction can not reach a higher resolution.
4.3)The analysis should be more detailed at the ocean biomes, superbiomes (Fay and McKinley, 2014), or individual ocean basins. Nevertheless, the current material is sufficiently detailed for the PST framework as presented. I recommended to have those discussions in future work.
Reference: Fay, A. R., McKinley, G. A. (2014). Global Ocean Biomes: Mean and time-varying maps (NetCDF 7.8 MB) [Dataset]. PANGAEA, https://doi.org/10.1594/PANGAEA.828650
Specific Comments:
1. Introduction
The introduction would benefit from a clearer logical structure. The early paragraphs could better connect the importance of Chl-a, the limitation of surface observations, and the need for global 3D reconstruction. The review of surface Chl-a reconstruction methods and GOBMs should be organized more clearly, with emphasis on why these approaches cannot fully resolve vertical Chl-a structure. Some statements about BGC-Argo/Core-Argo and global-scale extension appear repetitive and could be merged and condensed. The discussion of recent regional 3D reconstruction studies could be shortened in the introduction and revisited in the discussion to better highlight the novelty of the present global dataset.
2 Data sources
2.1 In Section 2.1, a data distribution figure for Bio-Argo and Core Argo should be placed in the Supporting Information and cited in both Sections 2.1 and 2.2. This would help readers understand the “observationally sparse regions” mentioned later in line 735.
3 Methodology
3.1 The problem has already been defined in the introduction. Therefore, there is no need to restate it here. Please remove “Problem definition” from the title and L199-L200.
3.2 L338: Is it reasonable to set the maximum search radius to 700 km? 700 km is an exceptionally large distance that can span multiple dynamical regimes. Applying such a wide radius, especially as a fallback in data-sparse regions (e.g., Southern Ocean), will likely introduce spurious signals and over-smooth the vertical structure of Chl-a. The authors should justify why 700 km is optimal compared to smaller alternatives.
4. Validation and uncertainty assessment
4.1 L497-L506: should be moved to the Methods section.
5. Examples of spatiotemporal variation characters based on the reconstructed results
5.1 I suggest provide a global horizontal distribution figure first, this can be the SCM depth/maximum chlorophyll concentrations. I also suggest providing a representative monthly timeseries, if appropriate, and discussing the reconstructed results in comparison with CMIP6/7 GOBM products.
5.2 Figure 5: The panels are too small to be seen clearly. Please reorganize them as upper, middle, and lower panels, and use a more distinct colormap (e.g., jet). Regarding Figure 5c, the Northern and Southern Hemispheres appear asymmetric. Please add a description of this asymmetry and provide a brief explanation.
5.3 Section 5.4: Given that the long-term trend is statistically insignificant, this section should not discuss it extensively. Both interannual variability and the long-term trend should be addressed, as Figure 6 includes both: panels (b) and (c) show interannual variability, while panel (a) shows the regional long-term trend. These should be separated into two distinct figures. Climate-mode analyses, such as during typical El Niño/La Niña years, could be mentioned as potential future applications rather than being required in the current manuscript.
Please note that lines 671–672 conclude that the main physical controls on SCM depth—particularly light availability, stratification intensity, and nutrient supply—remained broadly stable over 2005–2025. This conclusion may need more cautious wording in relation to the study by Cheng et al. (2025, NREE), which found that the ocean has become more stratified in a warming climate. It may also need to be better reconciled with analyses from data-assimilated global ocean circulation models. The authors should clarify that the absence of a statistically significant global mean SCM-depth trend does not necessarily confirm that the underlying physical controls remained stable over the same period.
Reference: Cheng, L., Li, G., Long, S.-M., Li, Y., von Schuckmann, K., Trenberth, K. E., Mann, M. E., Abraham, J., Du, Y., Cheng, X., Liu, H., Xu, Z., Liu, M., Peng, Q., Gong, X., Ma, Z., & Yuan, H. (2025). Ocean stratification in a warming climate. Nature Reviews Earth & Environment, https://doi.org/10.1038/s43017-025-00715-5
6. Discussion
6.1 L702: Compared with previous studies, the literature review currently presented in the introduction (L126-L141) should be moved here. Please also see my comments 1.6 & 1.7.
6.2 L709: This statement is not placed appropriately, making it reads jump.
6.3 L720: These features are consistent with previous BGC-Argo based analysis. If feasible, I suggest discussing their similarities and differences with existing GOBM or data-assimilative products.
6.4 L723: long-term -> multi-time scale? Multiple statements should be rewritten after revising the applications of the data, e.g. L765-L769. Please also see my main concerns.
6.5 L735: The phrase "observationally sparse regions" is vague—where exactly are these regions? A data distribution figure should be provided, and the specific locations should be more clearly identified here.
Technical comments: