the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
IPB-MSA&SO4: a daily 0.25° resolution dataset of In-situ Produced Biogenic Methanesulfonic Acid and Sulfate over the North Atlantic during 1998–2022 based on machine learning
Abstract. Accurate long-term marine-derived biogenic sulfur aerosol concentrations at high spatial and temporal resolutions are critical for a wide range of studies including climatology, trend analysis, model evaluation, accurate investigation of their contribution to aerosol burden, or to elucidate their radiative impacts and to provide boundary conditions for regional models. By applying machine learning algorithms, we constructed the first, publicly available, daily gridded dataset of in-situ produced biogenic methanesulfonic acid (MSA) and sulfate (SO4) concentrations covering the North Atlantic Ocean. The dataset is of high spatial resolution of 0.25° × 0.25°, spanning 25 years (1998–2022), far exceeding what observations alone could achieve both space- and time-wise. The machine learning models were generated by combining in-situ observations of sulfur aerosol data at Mace Head research station, west coast of Ireland, and from NAAMES cruises in the NW Atlantic, combined with the constructed sea-to-air dimethylsulfide flux (FDMS) and ECMWF-ERA5 reanalysis datasets. To determine the optimal method for regression, we employed four machine learning model types: support vector machines, ensemble, Gaussian process, and artificial neural networks. A comparison of the mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (R2) revealed that the Gaussian process regression (GPR) was the most effective algorithm, outperforming the other models in simulating the biogenic MSA and SO4 concentrations. For predicting daily MSA (SO4), GPR displayed the highest R2 value of 0.86 (0.72) and the lowest MAE of 0.014 (0.10) µg m–3. The GPR partial dependence analysis suggests that the relationships between predictors and MSA and SO4 concentrations are complex rather than linear. Using the GPR algorithm, we produced a high-resolution daily dataset of In-situ Produced Biogenic MSA and SO4 sea-level concentrations over the North Atlantic, which we named IPB-MSA&SO4. The obtained IPB-MSA&SO4 data allowed us to analyze the spatiotemporal patterns of MSA, SO4, and the ratio between them (MSA:SO4). A comparison with the existing CAMS-EAC4 reanalysis suggests that our high-resolution dataset reproduces with high accuracy the spatial and temporal patterns of the biogenic sulfur aerosol concentration and has high consistency with independent measurements in the Atlantic Ocean. The IPB-MSA&SO4 is publicly available at https://doi.org/10.17632/j8bzd5dvpx.1 (Mansour et al., 2023b).
- Preprint
(2953 KB) - Metadata XML
-
Supplement
(1406 KB) - BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on essd-2023-352', Anonymous Referee #1, 07 Jan 2024
Mansour et al. used the machine-learning model to predict the biogenic methanesulfonic acid (MSA) and sulfate (SO4) concentrations covering the North Atlantic Ocean. Overall, the study is very interesting and falls into the scope of ESSD. However, the manuscript still suffers from some major weaknesses. I recommend the manuscript for publication on ESSD after the following comments have been well addressed.
- The novelty of this dataset in this study should be well clarified in the introduction.
- Why do not you use the physical model (e.g., MITgcm, ROMS) output as the input for the machine-learning model? As shown in figure 1, the middle region of Atlantic lacks of measurement, and this region might show large uncertainties based on machine-learning models alone.
- You used many variables to train the machine-learning model. However, I felt these predictors were not strong proxy for MSA and sulfate. Why do not use SO2 satellite product for sulfate estimation?
- Why do you only use the four machine-learning models to predict MSA and sulfate? Please explain the reason. To the best of my knowledge, decision tree model and deep learning might show the better performance compared with ANN and SVM.
- Section 4.6, I think the discussion about the spatiotemporal variations of MSA and sulfate seems to be very superficial and I suggest the authors should add more in-depth analysis.
Citation: https://doi.org/10.5194/essd-2023-352-RC1 -
AC1: 'Reply on RC1', Karam Mansour, 21 Feb 2024
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2023-352/essd-2023-352-AC1-supplement.pdf
-
RC2: 'Comment on essd-2023-352', Anonymous Referee #2, 01 Feb 2024
The study of Mansour and colleagues represents a step forward towards the prediction of biogenic sulfur in aerosols, which has climatic and geochemical importance. The authors used several machine learning approaches, each with alternative configurations, to estimate the concentration of the two major atmospheric oxidation products of plankton-made dimethyl sulfide: non-sea-salt sulfate and methanesulfonate. Finally, the best performing model was used to produce daily gridded datasets for these compounds over the North Atlantic Ocean. I found the study methodologically robust and well written, but some issues should be addressed before publication.
General comments
I suggest using nss-SO4, not just SO4, throughout. Abbreviating nss-SO4 may confuse readers because, unlike MSA, SO4 has large anthropogenic and volcanic sources. The same applies to MSA:nss-SO4 ratios.
L141: Please provide a quantitative comparison between nss-SO4 and the non-refractory SO4 pool measured with the HR-ToF-AMS, e.g. an indication of the mean absolute and/or relative difference between the two estimates. Just stating they are “approximately equivalent” is not very reassuring. Can the authors exclude the possibility that, in some instances, significant proportions of nss-SO4 are in aerosol fractions not captured by the HR-ToF-AMS?
The authors use HYSPLIT driven by the Global Data Assimilation System (GDAS1) (1° × 1°) of the National Centers for Environmental Prediction (NCEP) to calculate back-trajectories (section 2.3). A different reanalysis, ERA5, is used to obtain meteorological predictor variables for machine learning methods (section 2.5), as well as the BLH used to analyze HYSPLIT-derived back trajectories (section 3.1.1). Can the use of different reanalyses in different parts of the study introduce inconsistencies?
Section 3.3: please consider reporting other metrics, like the Prediction-Observation linear slope (which would be 1 for perfect model predictions) -- OK, this is shown in Fig. 6 and 7. Just consider introducing this metric in section 3.3.
L272: How can this procedure prove causal relationships?
Specific
L25, L85…: “constructed” >> “reconstructed”
L27: what is the “ensemble” ML method? OK, later defined as "regression ensemble"
L42: marine phytoplankton >> marine microbes (phytoplankton are not the only DMS producers)
L49: elevated temperature and solar radiation >> elevated temperature OR solar radiation
L101 and paragraph: please revise whether the AMOC is the phenomenon you actually want to highlight here. Perhaps a mention to the Gulf Stream and the North Atlantic Current is enough (which indeed are components of the much wider phenomenon termed AMOC).
L238: were predictors averaged with or without the weighting factor e^(-t/72) used to compute R_0 and R_B? it would make sense to apply this weighting when using the meteorology along the BTs as predictor.
L272: was MLR applied to untransformed or log-transformed data (as done for the correlation analysis)?
Typos
L232: “NAAMEAS” cruises
L477: “Quantitively”
L529: southern >> southward
Citation: https://doi.org/10.5194/essd-2023-352-RC2 -
AC2: 'Reply on RC2', Karam Mansour, 21 Feb 2024
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2023-352/essd-2023-352-AC2-supplement.pdf
-
AC2: 'Reply on RC2', Karam Mansour, 21 Feb 2024
Status: closed
-
RC1: 'Comment on essd-2023-352', Anonymous Referee #1, 07 Jan 2024
Mansour et al. used the machine-learning model to predict the biogenic methanesulfonic acid (MSA) and sulfate (SO4) concentrations covering the North Atlantic Ocean. Overall, the study is very interesting and falls into the scope of ESSD. However, the manuscript still suffers from some major weaknesses. I recommend the manuscript for publication on ESSD after the following comments have been well addressed.
- The novelty of this dataset in this study should be well clarified in the introduction.
- Why do not you use the physical model (e.g., MITgcm, ROMS) output as the input for the machine-learning model? As shown in figure 1, the middle region of Atlantic lacks of measurement, and this region might show large uncertainties based on machine-learning models alone.
- You used many variables to train the machine-learning model. However, I felt these predictors were not strong proxy for MSA and sulfate. Why do not use SO2 satellite product for sulfate estimation?
- Why do you only use the four machine-learning models to predict MSA and sulfate? Please explain the reason. To the best of my knowledge, decision tree model and deep learning might show the better performance compared with ANN and SVM.
- Section 4.6, I think the discussion about the spatiotemporal variations of MSA and sulfate seems to be very superficial and I suggest the authors should add more in-depth analysis.
Citation: https://doi.org/10.5194/essd-2023-352-RC1 -
AC1: 'Reply on RC1', Karam Mansour, 21 Feb 2024
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2023-352/essd-2023-352-AC1-supplement.pdf
-
RC2: 'Comment on essd-2023-352', Anonymous Referee #2, 01 Feb 2024
The study of Mansour and colleagues represents a step forward towards the prediction of biogenic sulfur in aerosols, which has climatic and geochemical importance. The authors used several machine learning approaches, each with alternative configurations, to estimate the concentration of the two major atmospheric oxidation products of plankton-made dimethyl sulfide: non-sea-salt sulfate and methanesulfonate. Finally, the best performing model was used to produce daily gridded datasets for these compounds over the North Atlantic Ocean. I found the study methodologically robust and well written, but some issues should be addressed before publication.
General comments
I suggest using nss-SO4, not just SO4, throughout. Abbreviating nss-SO4 may confuse readers because, unlike MSA, SO4 has large anthropogenic and volcanic sources. The same applies to MSA:nss-SO4 ratios.
L141: Please provide a quantitative comparison between nss-SO4 and the non-refractory SO4 pool measured with the HR-ToF-AMS, e.g. an indication of the mean absolute and/or relative difference between the two estimates. Just stating they are “approximately equivalent” is not very reassuring. Can the authors exclude the possibility that, in some instances, significant proportions of nss-SO4 are in aerosol fractions not captured by the HR-ToF-AMS?
The authors use HYSPLIT driven by the Global Data Assimilation System (GDAS1) (1° × 1°) of the National Centers for Environmental Prediction (NCEP) to calculate back-trajectories (section 2.3). A different reanalysis, ERA5, is used to obtain meteorological predictor variables for machine learning methods (section 2.5), as well as the BLH used to analyze HYSPLIT-derived back trajectories (section 3.1.1). Can the use of different reanalyses in different parts of the study introduce inconsistencies?
Section 3.3: please consider reporting other metrics, like the Prediction-Observation linear slope (which would be 1 for perfect model predictions) -- OK, this is shown in Fig. 6 and 7. Just consider introducing this metric in section 3.3.
L272: How can this procedure prove causal relationships?
Specific
L25, L85…: “constructed” >> “reconstructed”
L27: what is the “ensemble” ML method? OK, later defined as "regression ensemble"
L42: marine phytoplankton >> marine microbes (phytoplankton are not the only DMS producers)
L49: elevated temperature and solar radiation >> elevated temperature OR solar radiation
L101 and paragraph: please revise whether the AMOC is the phenomenon you actually want to highlight here. Perhaps a mention to the Gulf Stream and the North Atlantic Current is enough (which indeed are components of the much wider phenomenon termed AMOC).
L238: were predictors averaged with or without the weighting factor e^(-t/72) used to compute R_0 and R_B? it would make sense to apply this weighting when using the meteorology along the BTs as predictor.
L272: was MLR applied to untransformed or log-transformed data (as done for the correlation analysis)?
Typos
L232: “NAAMEAS” cruises
L477: “Quantitively”
L529: southern >> southward
Citation: https://doi.org/10.5194/essd-2023-352-RC2 -
AC2: 'Reply on RC2', Karam Mansour, 21 Feb 2024
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2023-352/essd-2023-352-AC2-supplement.pdf
-
AC2: 'Reply on RC2', Karam Mansour, 21 Feb 2024
Data sets
IPB-MSA&SO4: In-situ Produced Biogenic Methanesulfonic Acid and Sulfate over the North Atlantic Karam Mansour, Stefano Decesari, Darius Ceburnis, Jurgita Ovadnevaite, Lynn Russell, Marco Paglione, Colin O'Dowd, and Matteo Rinaldi https://doi.org/10.17632/j8bzd5dvpx.1
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
470 | 113 | 30 | 613 | 39 | 24 | 25 |
- HTML: 470
- PDF: 113
- XML: 30
- Total: 613
- Supplement: 39
- BibTeX: 24
- EndNote: 25
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1