A 20-year (1998–2017) global sea surface dimethyl sulfide gridded dataset with daily resolution

Zhou, Shengqian; Chen, Ying; Huang, Shan; Gong, Xianda; Yang, Guipeng; Zhang, Honghai; Herrmann, Hartmut; Wiedensohler, Alfred; Poulain, Laurent; Zhang, Yan; Wang, Fanghui; Xu, Zongjun; Yan, Ke

doi:https://doi.org/10.5194/essd-16-4267-2024

Articles | Volume 16, issue 9

https://doi.org/10.5194/essd-16-4267-2024

© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/essd-16-4267-2024

© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 16, issue 9

Data description paper

|

19 Sep 2024

Data description paper |

| 19 Sep 2024

A 20-year (1998–2017) global sea surface dimethyl sulfide gridded dataset with daily resolution

Shengqian Zhou, Ying Chen, Shan Huang, Xianda Gong, Guipeng Yang, Honghai Zhang, Hartmut Herrmann, Alfred Wiedensohler, Laurent Poulain, Yan Zhang, Fanghui Wang, Zongjun Xu, and Ke Yan

Download

Final revised paper (published on 19 Sep 2024)
Supplement to the final revised paper
Preprint (discussion started on 12 Dec 2023)
Supplement to the preprint

Interactive discussion

Status: closed

RC1:
'Comment on essd-2023-249', Murat Aydin, 27 Jan 2024

The manuscript by Zhou et al. offers a 20-year (1998-2017) global sea surface dimethyl sulfide (DMS) dataset with daily resolution. The new dataset is developed with an artificial neural network (ANN) ensemble model based on 9 environmental parameters. DMS is produced biogenically in the ocean and its emissions contribute to aerosol radiative forcing in the troposphere. There are a few other global ocean DMS emissions datasets, including one based on an artificial neural network. What makes this dataset unique is that it offers a high time resolution data product covering a 20 year period. The authors claim it is an improved emission inventory of oceanic DMS which can facilitate improved simulations of aerosols derived from DMS. This is a useful dataset with unique features that suits the Earth System Science Data Journal goals of publishing articles on original datasets. However, I would like to see the authors address my comments and questions below before the publication of the manuscript.
My primarily concerns are centered around the comparisons between the ANN product and actual data displayed in Fig. 3. The statistical metrics chosen for arguing good agreement between the ANN product and the observations are R2 and root mean square error (RMSE). These metrics are appropriate for testing the predictive capability of linear regressions, in other words the accuracy of a linear model, but they do not necessarily address the fidelity of the model data to reality. If we are to prefer the ANN data product over the observations to estimate DMS flux, the manuscript needs to present convincing evidence that the model vs. observations relationship is not only linear but also has a slope that centers around 1:1. In this context, it is not enough to show that there is a strong linear relationship between the ANN product and the observations, rather the slope of the linear relationship should be quantified and ideally shown to be statistically indistinguishable from 1.
Fig. 3 offers only qualitative information about the value of the slopes. I found the data density color scale helpful in trying to estimate what the slope of the best fits to these scatter plots might be, but the slopes should really be quantified in the manuscript. The manuscript includes a passing reference to a potential bias issue with regards to the coastal region. I agree that, if the bias is limited to high concentrations in that region alone, this would not be a big deal. However, looking at Fig. 3, I suspect that the slopes might be different than unity for multiple regions, although I cannot be not certain without seeing a proper analysis. The fact that the entire analysis is in log-log space makes me more worried because small looking deviations in a log-log linear relationship can result in significant biases in actual concentration and flux calculations.
One can think of many different ways to conduct this type of analyses, but I would advise investigating the residuals of the scatter plots in Fig. 3 from the 1:1 slope versus the DMS_obs. Fitting linear regression lines to these residuals-plots would be a good way to test for biases; ideally these slopes should equal zero within uncertainties, meaning the residuals do not have a positive or negative relationship with DMS_obs. For example, Figs. 4b,c display linear fits to the data and these slopes are different from unity. This is also noticeable in Fig. 4a as most of the higher concentrations during July-Aug are underestimated by the model and conversely the lower concentrations in the winter tend to be overestimated, leading to a damped seasonal cycle. I grant that the differences look small in Fig. 4a, but given that this figure too is on a logarithmic scale, it would be good to see a formal quantification of fluxes generated with observed data versus the simulated ones. Do under and over estimations at either end cancel each other out or does one win out over the other, leading to biases in the annual fluxes? I have a cautionary note when conducting linear fits. For your raw data, I’m guessing the errors for individual DMS_obs will be very small compared with the dynamic range of the dataset, therefore the x errors can be ignored. Likewise, it is probably reasonable to assume y errors are uniform, meaning standard (least-squares in y direction) linear regression analyses could be safely implemented. For the regionally-averaged data show in Fig. 4c, the x errors look very large and both x and y errors look nonuniform, meaning a standard linear regression approach will yield inaccurate estimates of the slope and its confidence band.
Some other shorter general comments and questions:
I’m confused about how the data from different time periods are treated during the training and validation steps of ANN model development. As far as I can tell, you use all data from all periods in training and validation. Once you have the ANN model, you input time variable parameters to estimate temporal changes in concentrations and fluxes, is this correct? Your criticism of previous work for using data from different time periods to estimate a global average flux does not seem justified because you seem to develop your model in the same fashion, or am I missing something?
It would be good see how much data each region contributes to the full dataset. The coastal region appears to contribute the most even though the emissions from the coastal regions constitute only 3% of the global DMS flux, and conversely the trades regions have little data even though the integrated fluxes in these regions are high. Did you try training the model without the coastal data to see if the model results change?
What are the contributions of the 9 different model parameters to the final outcome? Which parameters carry more important information according to your ANN model?
Was the ANN allowed to freely chose model equations, did you impose any restrictions or try other models?
Minor comments/corrections as they appear in the manuscript:
Line 81-82: This sentence here gives the impression that you are not using all data from all years with equal weight.
Line 113: Are you using exactly the same data that went into Hulswar et al (2022)?
Lines 128-131: What happens in SI covered areas? What level of SI cover lead to zero emissions?
Line 143: Are the SeaWiFS and Aqua-MODIS data in reasonable agreement?
Fig. 4a: The markers look quite faint on my screen. I suggest sharper colors.
Lines 339-344: Refer to Fig. 9 somewhere.
Lines 388-390: What drives the trends in Kt?

Citation: https://doi.org/10.5194/essd-2023-249-RC1
- AC2: 'Reply on RC1', Shengqian Zhou, 07 Apr 2024
  
  Dear Dr. Murat Aydin,
  Thank you for your interest in our work and taking the time to review our submission. Your constructive comments and suggestions are of great help to improving our dataset and manuscript. Please find our replies to your comments in the attached pdf file.
  
  Citation: https://doi.org/10.5194/essd-2023-249-AC2
RC2:
'Comment on essd-2023-249', Anonymous Referee #2, 29 Jan 2024

The article of Zhou and colleagues presents a novel global gridded dataset of sea-surface DMS concentration and emission based on the ANN technique. Given that DMS is the main biogenic source of atmospheric sulfur globally, the development of approaches that enable the production of detailed DMS emission maps is crucial for atmospheric chemistry and climate studies. The advantage of the new dataset over previous ones is to be found in its daily temporal resolution and multiyear coverage, which (unfortunately) is not matched by increased spatial resolution.
The article is generally well written and gives compelling arguments (e.g. in a strong Introduction) for the wide use of this novel dataset. Beyond the time-resolved fields, other welcome innovations with respect to previous machine learning approaches are the exclusion of time and coordinates as predictor variables (which should enhance model generality and decrease the risk of overfitting) and the validation against fully independent datasets. Below I make some suggestions. I also propose some non-exhaustive corrections to the writing, and encourage the authors to undertake a general check of English grammar.

Specific comments:
L65: please consider citing:
Galí, M., & Simó, R. (2015). A meta‐analysis of oceanic DMS and DMSP cycling processes: Disentangling the summer paradox. Global Biogeochemical Cycles, 29(4), 496-515.
Hopkins, F. E., Archer, S. D., Bell, T. G., Suntharalingam, P., & Todd, J. D. (2023). The biogeochemistry of marine dimethylsulfidNature Reviews Earth & Environment, 4(6), 361-376.
L89: the reference to Galí 2021 is incorrect (no machine learning used in that study). The following references to machine learning studies should be included:
McNabb, B. J., & Tortell, P. D. (2022). Improved prediction of dimethyl sulfide (DMS) distributions in the northeast subarctic Pacific using machine-learning algorithms. Biogeosciences, 19(6), 1705-1721.
McNabb, B. J., & Tortell, P. D. (2023). Oceanographic controls on Southern Ocean dimethyl sulfide distributions revealed by machine learning algorithms. Limnology and Oceanography, 68(3), 616-630.
L91, entire paragraph: note that higher temporal resolution would be even more valuable if accompanied by higher spatial resolution. Daily resolution (e.g. satellite data) typically shows (sub)mesoscale patterns that are blurred at 1 degree or after monthly averaging.
L124: Is the information on Lat-Lon-Cap 90 really needed here?
L126: using climatological data (nutrients, O2) to produce daily multiyear datasets is a bit paradoxical (as discussed later)
Figure 4: Line P stills shows large interannual variability in late summer (Aug) that is not captured by the ANN. This would be clearer if a linear (not log) y-scale was used, and may deserve some discussion.
L284: this is also consistent with the phytoplankton spring-summer bloom patterns…
Fig. 6 caption: “for each grid point”
Fig. 8: inclusion of Kt is welcome
L354: are these mean concentrations weighted by pixel (grid cell) area?
L362: this feature of G18 may be due to overestimation of Chl by satellites in coastal regions because of the interference of CDOM and non-algal detrital particles.
Fig. 8, 10, 11: I recommend reporting Kt in m d-1 rather than m s-1
L455: but note that G18 and W20 can be used to produce daily multiyear DMS fields as Z23. This is not possible for interpolated climatologies L11 and H22.
Fig. 12a: please use the same colours as in Fig. 9e to distinguish the different algorithms

Suggested rewording:
L103: “demonstrated” >> “shown, depicted”
L171 “Root(ed) mean square error”
L176 “is larger” >> “exceeds”
L259: “off-line” >> “discrete sampling (Niskin bottle)”
L505: revise grammar

Citation: https://doi.org/10.5194/essd-2023-249-RC2
- AC1:
  'Reply on RC2', Shengqian Zhou, 07 Apr 2024
  Thank you for your interest in our work and taking the time to review our submission. Your constructive suggestions and feedback have improved our study.
  To fully address reviewers’ concerns and further improve the model and subsequent DMS data product, we have reconstructed a new model. This involved incorporating additional training data, changing the data sources of input features, and implementing more reasonable data processing strategies. A new global daily multiyear DMS gridded dataset was obtained, and all figures in the manuscript have been updated accordingly. Here we provide an overview of the major modifications we have made in the model development and evaluation.
  We included more DMS observation data for training. These data originate from eight campaigns that have not been incorporated into the GSSD database but included in Hulswar et al. (2022). The number of those new samples is 6711.
  
  We changed the data sources of Chl a, nitrate, phosphate, silicate, and DO. Currently, the time resolutions of all input features are one day.
  
  We adjusted the fraction of coastal samples in training, validation, and testing sets to mitigate the overrepresentation of coastal regions. We also applied a weighted resampling strategy in data split process to mitigate the data imbalance between the extreme and moderate DMS values. This treatment will mildly decrease the overall performance (a slight increase in overall RMSE), but significantly reduce the prediction biases for extremely high and extremely low DMS concentrations.
  
  Other minor adjustments to make the model development and evaluation procedures more reasonable:
  We adjusted the model structure and applied L2 regularization to prevent overfitting.
  
  The fraction of the testing set was elevated, and the figures (e.g., Fig. 4c in the revised manuscript) to demonstrate model performance are based on the testing set, not based on training and validation sets as before.
  
  The discussion of fitting residual and bias has been added.
  
  To further increase the data volume for training, the data of the first two NAAMES campaigns were moved to the training set. We keep the third NAAMES campaign for independent testing.
  
  When comparing the predictions and observations for TRANSBROM SONNE and NAAMES campaigns, the data were not binned into 1°×1° first. Instead, they were binned into 0.05°×0.05°, following the treatment for the training set.
  
  The point-to-point replies to your comments are presented in the attached pdf file. We have also checked the grammar thoroughly and improved the language.
  
  Citation: https://doi.org/10.5194/essd-2023-249-AC1

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

AR by Shengqian Zhou on behalf of the Authors (07 Apr 2024) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (09 Apr 2024) by François G. Schmitt

RR by Murat Aydin (02 May 2024)

Suggestions for revision or reasons for rejection

I acknowledge the extensive nature of revisions to the paper. While many of the revisions resulted in improvements, some feel like a step backward. Given that this the second go around, I will not dwell on minor issues. I do recognize the value of the ANN data set in terms of high resolution in both temporal and spatial scales. My main concerns are related to the monthly and annual fluxes, specifically the fidelity of these estimates to reality as defined by the available observations.

There is only one way to test the accuracy of the ANN data product: it has to be compared with the DMS obs that underlie the training of the machine learning process. In my first review, I suspected that the linear regressions between the DMS obs and the ANN estimates yielded slopes significantly different than 1 and suggested the residuals might be correlated with the observations. I further added that the statistical metrics they relied on were insufficient to adequately evaluate the accuracy of the data product. The additional analyses the authors conducted based on the review confirm my suspicions were correct. While I appreciate the effort that went into the revisions, I do have misgivings about a major aspect revision they implemented and suggest further revisions.

The weighing scheme implemented to increase the influence of low and high concentrations on the results is a data analysis gimmick aimed at improving the linear regressions with respect to the deficiencies I outlined in the first review. I do not believe it is appropriate to manipulate the distribution of the training data in this manner unless they are real life reasons (related to the real world ocean and how it has been sampled) why lower and higher DMS concentrations are underrepresented in the observational data sets. The manuscript offers no such justification. As such, they would be better of presenting the original ANN results as the main data product and offer the weighing-based results as supplementary analysis. When referring to this supplementary analysis, you should discuss in the main body of the manuscript why it was conducted. In my view, the implemented weighing scheme does not make enough of a difference in the end and I remain unconvinced that the problematic aspects of the linear regressions are caused by extreme concentrations that constitute a small fraction of the data set. There appears to be a systematic issue for reasons that remain unclear to this reviewer.

Further, I do not like the fact that the comparison of the training data versus the observations are not shown in the main manuscript anymore. If the number of figures in the manuscript is a problem, I suggest moving the residual figures to the supplement and showing the main comparison figures with respect to both the training and test data in the main body. The slope values should be displayed in all sets of figures. Most readers may not readily infer the implications of trending residuals and the manuscript does not offer a detailed enough discussion.

A welcome revision to the manuscript is the inclusion of regional mean and normalized mean bias estimates presented in Table 2. However, this is the bare minimum necessary since the positive and negative biases that occur at high and low ends of the concentrations tend to cancel out during the averaging, therefore hindering insight into the biases at grid scale let alone how these biases impact the regional and global fluxes. I’m willing to accept these outstanding issues as subjects of future work as long as they are pointed out in the paper.

Hide

ED: Reconsider after major revisions (04 May 2024) by François G. Schmitt

AR by Shengqian Zhou on behalf of the Authors (26 Jun 2024) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (07 Jul 2024) by François G. Schmitt

RR by Murat Aydin (22 Jul 2024)

ED: Publish as is (22 Jul 2024) by François G. Schmitt

AR by Shengqian Zhou on behalf of the Authors (25 Jul 2024) Manuscript

Post-review adjustments

AA: Author's adjustment | EA: Editor approval

AA by Shengqian Zhou on behalf of the Authors (13 Sep 2024) Author's adjustment Manuscript

EA: Adjustments approved (18 Sep 2024) by François G. Schmitt

Download

The requested paper has a corresponding corrigendum published. Please read the corrigendum first before downloading the article.

Article (15883 KB)
Full-text XML

Short summary

Dimethyl sulfide (DMS) is a crucial natural reactive gas in the global climate system due to its great contribution to aerosols and subsequent impact on clouds over remote oceans. Leveraging machine learning techniques, we constructed a long-term global sea surface DMS gridded dataset with daily resolution. Compared to previous datasets, our new dataset holds promise for improving atmospheric chemistry modeling and advancing our comprehension of the climate effects associated with oceanic DMS.