the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
High-spatiotemporal reconstruction of biogeochemical dynamics in Australia integrating satellites products and in-situ observations (2000–2022)
Abstract. The marine biogeochemical time-series products, which include total alkalinity, inorganic carbon, nitrate, phosphate, silicate, and pH, constitute a foundational support mechanism for the ongoing surveillance of oceanic biogeochemical changes. These products play a critical role in facilitating research focused on dynamic monitoring of marine ecosystems and fostering sustainable oceanic development. However, existing monitoring methodologies are hampered by inherent limitations, notably the paucity of observational products that simultaneously offer high spatial and temporal resolutions. Furthermore, the interpolation methods typically employed in these contexts frequently prove low-effective on a large scale, resulting in data with extensive temporal and spatial expanses that are difficulty for applications aimed at monitoring large-scale ocean dynamics. A novel integration of the CANYON-B and Random Forest regression methods was explored to address these challenges in reconstructing key marine biogeochemical parameters. This work reconstructs the concentrations of these marine biogeochemicals at the sea surface within Australia's Exclusive Economic Zone over the period from 2000 to 2022 on a 1-kilometre scale. The approach involves the amalgamation of multi-source in-situ ocean chemistry time-series observations with MODIS Terra ocean reflectance imagery and ocean water colour product distributions. This research highlights the substantial capabilities of machine learning for the large-scale reconstruction of ocean chemistry data, introducing a new, viable method for utilising in-situ measurements and optical imagery in reconstructing marine biogeochemical elements, thereby significantly enhancing our ability to monitor large-scale ocean dynamics. The datasets generated and analysed in this study are available on Science Data Bank (https://doi.org/10.57760/sciencedb.09331) (Zhang et al., 2024)
- Preprint
(26660 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 11 Dec 2024)
-
RC1: 'Comment on essd-2024-219', Henry Bittig, 20 Nov 2024
reply
In this work, the authors present a methodology to densify sparse observations of ocean biogeochemistry into a high resolution dataset with the help of machine learning methods. Their starting point is the CANYON-B neural networks, based on GLODAPv2 ocean chemistry sparse sampling, as well as high resolution satellite remote sensing, which are combined by machine learning methods (random forest) to build a high resolution data cube of ocean chemistry in coastal waters around Australia to help ecosystem monitoring and management.
The manuscript is clearly structured, easy to read and well-organized.The main criticism I have is that the authors want to use open-ocean based parameterizations (CANYON-B) and apply them to highly dynamic coastal systems (Australian EEZ waters), where processes and dynamics do not match together.
Taking the author’s references as indication, the authors have a strong background in remote sensing, less so in sea going biogeochemistry and sample analysis:
The GLODAPv2 data set is a collection of past and recent basin scale, coast-to-coast repeat hydrography cruises, done through different programs such as WOCE or more recently GO-SHIP. The focus is on the deep open ocean. The GO-SHIP target is to cover the world’s ocean with a decadal repeat frequency, i.e., every repeat hydrography section should be covered at least once per 10 years. There are a handful of sections with “higher frequency” (meaning every 1 or 2 years), too, but they are mostly located in the Atlantic basin. In addition, research cruises have a tendency to occur in favourable weather conditions, and in fact there is a seasonal bias of the GLODAPv2 cruises towards summer rather than winter timing. To summarize, the GLODAPv2 dataset is based on multi-decadal observations, with multi-year time resolution between repeats at most. There is no unbiased seasonal sampling yet alone a seasonal resolution. In consequence, with CANYON-B being based entirely on the GLODAPv2 data (and the information captured within these data), CANYON-B’s relations are based on processes acting in the open ocean on multi-annual time scales but neither on seasonal or sub-seasonal processes nor on coastal gradients or complex dynamics.Like the authors state in lines 40-47:
"Most large-scale ocean chemistry datasets are derived from infrequent ship-based surveys or fixed-point observatories, which are then interpolated to create continuous spatial fields. This interpolation, while necessary, introduces substantial uncertainties, particularly in dynamic regions where biogeochemical properties can vary significantly over short distances and time periods. Traditional interpolation methods [...] may not adequately capture complex gradients or the temporal variability of ocean processes [...]. Such shortcomings can lead to misleading representations of marine biogeochemical environments, potentially skewing our understanding of oceanic processes and their responses to environmental changes."
the method used must be fit for the application.Here, CANYON-B is not fit for seasonal or sub-seasonal application, nor is it fit for coastal waters, i.e., one of the assumptions of this work is invalid, unfortunately. Even if CANYON-B is a machine learning-based method and not a “traditional interpolation method” like the ones listed by the authors, it is a data-driven method that inherits the limits of its parent training dataset. If some information on different orders of magnitude (notably: seasonal dynamics; near-shore coastal processes) is not captured at all by the training data, even a fancy machine learning approach cannot infer that information. In addition, CANYON-B’s predictive skill is high on interior ocean biogeochemistry. It degrades towards the surface and for surface applications, where there is much stronger variability and where the tight biogeochemical coupling between oxygen cycling and CO2 cycling breaks down in waters in contact with the atmosphere (due to different time scales of air sea gas exchange). (I.e., oxygen becomes a less adequate predictor of the chemical species of interest in surface waters than in the ocean interior.)
I have therefore great concerns towards the validity of the presented data set. It may show some interesting dynamics, as those data interpolation methods can always provide you with an output number, but I wouldn’t assume them at all to be trustworthy or reliable.
The remainder of their approach, to use in-situ Argo or glider data (temperature, salinity, oxygen, chla, turbidity/POC) and to combine them with ocean colour remote sensing by random forest regression to inform about the biogeochemical conditions (of temperature, salinity, oxygen, chla, turbidity/POC) in coastal waters seems valid. Here, the scale of observations (weekly to monthly timescale, some km resolution) matches the scope of the data product aimed at. But as outlined above, the transfer of Argo/glider data to CANYON-B outputs of nutrients or carbon system parameters is an invalid step in the manuscript's setting in the Australian EEZ waters. The monthly trends of chemical variables cannot be considered as robust or reliable.
Only the validation against independent observations (section 4.2.2) would be somewhat suited to hint towards how strongly the CANYON-B derived quantities match with field observations. I would have expected a style like Figure 10 for the Figure 12 comparison, in which is it hard to make out details. In addition, a presentation of data grouped by month would have been helpful to assess the invalidity of the monthly time scale target (which seems to be discernible from Figure 13, where there are repeated (annual?) oscillations in the percentage error). What I can make out from Figure 12/13 and the provided statistics is that the CANYON-B-based products are able to hit the ballpark value for the actually observed carbon parameters and nutrients (as – at some point – the EEZ waters connect to the open ocean), but that the products fail to get the variability and dynamics.
A way to make their approach work would be to use the independent observations of the IMOS National Reference Stations and establish a regionalized/local “CANYON-B”-like parameterization to connect temperature/salinity/oxygen/... with the target parameters of interest (carbon parameters and nutrients) that is applicable to the system and timescales the authors work in.
Further comment: It remains unclear whether this is a surface data product or one that extends into the water column. I believe it captures the surface only, but this should be stated somewhere (e.g., title, abstract).
Citation: https://doi.org/10.5194/essd-2024-219-RC1
Data sets
Monthly Product of Marine Chemical Data in Australian Waters from 2000 to 2022 Xiaohan Zhang, Lizhe Wang, Jining Yan, and Sheng Wang https://doi.org/10.57760/sciencedb.09331
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
305 | 50 | 20 | 375 | 16 | 18 |
- HTML: 305
- PDF: 50
- XML: 20
- Total: 375
- BibTeX: 16
- EndNote: 18
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1