the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Global biogeography of N2-fixing microbes: nifH amplicon database and analytics workflow
Abstract. Marine nitrogen (N) fixation is a globally significant biogeochemical process carried out by a specialized group of prokaryotes (diazotrophs), yet our understanding of their ecology is constantly evolving. Although marine dinitrogen (N2)-fixation is often ascribed to cyanobacterial diazotrophs, indirect evidence suggests that non-cyanobacterial diazotrophs (NCDs) might also be important. One widely used approach for understanding diazotroph diversity and biogeography is polymerase chain reaction (PCR)-amplification of a portion of the nifH gene, which encodes a structural component of the N2-fixing enzyme complex, nitrogenase. An array of bioinformatic tools exists to process nifH amplicon data, however, the lack of standardized practices has hindered cross-study comparisons. This has led to a missed opportunity to more thoroughly assess diazotroph biogeography, diversity, and their potential contributions to the marine N cycle. To address these knowledge gaps a bioinformatic workflow was designed that standardizes the processing of nifH amplicon datasets originating from high-throughput sequencing (HTS). Multiple datasets are efficiently and consistently processed with a specialized DADA2 pipeline to identify amplicon sequence variants (ASVs). A series of customizable post-pipeline stages then detect and discard spurious nifH sequences and annotate the subsequent quality-filtered nifH ASVs using multiple reference databases and classification approaches. This newly developed workflow was used to reprocess nearly all publicly available nifH amplicon HTS datasets from marine studies, and to generate a comprehensive nifH ASV database containing 7909 ASVs aggregated from 21 studies that represent the diazotrophic populations in the global ocean. For each sample, the database includes physical and chemical metadata obtained from the Simons Collaborative Marine Atlas Project (CMAP). Here we demonstrate the utility of this database for revealing global biogeographical patterns of prominent diazotroph groups and highlight the influence of sea surface temperature. The workflow and nifH ASV database provide a robust framework for studying marine N2 fixation and diazotrophic diversity captured by nifH amplicon HTS. Future datasets that target understudied ocean regions can be added easily, and users can tune parameters and studies included for their specific focus. The workflow and database are available, respectively, in GitHub (https://github.com/jdmagasin/nifH-ASV-workflow; Morando et al., 2024) and Figshare (https://doi.org/10.6084/m9.figshare.23795943.v1; Morando et al., 2024).
- Preprint
(5689 KB) - Metadata XML
-
Supplement
(2 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on essd-2024-163', Anonymous Referee #1, 21 Jul 2024
The document by Morendo et al. reported a standardized pipeline to process microbial molecular data for cross-study comparison and reveling the distribution pattern of marine nitrogen-fixing organisms. Overall, the value of this study should be recognized due to nitrogen-fixers’ ecological role in the ocean, and their sensitivity to future climate change (i.e. temperature), and the study is well suited for prompt publication on ESSD. At present I suggest the authors consider a few issues to improve the clarity of the manuscript.
(1) The purpose of establishing such a novel processing approach was to allow cross-study comparisons. It would be helpful to present some improved aspects of this new pipeline compared to previous ones; especially for general marine biologists, some of the following questions can be better addressed: What would be changed if higher or lower retention of reads occurred (Fig. 3)? Could interpretation of marine diazotrophic distribution be somewhat different if higher or lower % retention of reads at each stage of the post-pipeline workflow (Fig. 4)? The previously reported pipelines may be tuned to suit some specific goals, could the new pipeline be used to suit those goals?
(2) Similar to (1), I think it may be useful to compare the relative abundance of each nifH cluster in previous studies using customized pipelines, with the present study (shown in Fig. 7b). And the authors could explore the advantage/disadvantage of the new pipeline, say more or less sensitive to particular nifH clusters.
(3) Dose the temperature-dependent distribution of nifH clusters (Fig. 8b) include only surface samples? Were deep samples, e.g. below 75 m (shown in Fig. 6b) be excluded? I think it more reasonable to plot SST vs. surface samples, say top 10 m. If all samples from surface to deep were included, can the authors explain the reasons?
(4) Please provide details of acquiring DNA and cDNA datasets (in Fig. 6) in the methods section preferably, and discuss the value of these datasets, their relative advantages and/or disadvantages, etc.
(5) In figure 7, why is it necessary to show % total reads (in panel a) and % relative abundance (in panel b) together? Most of the studies have similar abundance vs. reads of respective clusters, but the “Shiozaki_2020” shows obvious difference, e.g. higher % reads of “cluster 3” (light purple) while higher abundance of “others” (deep purple). Can the authors provide more explanations?
Citation: https://doi.org/10.5194/essd-2024-163-RC1 -
RC2: 'Comment on essd-2024-163', Anonymous Referee #2, 25 Jul 2024
Morando and coauthors developed a bioinformatic workflow to analyze nifH gene amplicon sequences. This workflow was applied to analyze nifH amplicon datasets compiled from 21 studies and to build a nifH ASV database along with physical and chemical parameters extracted from CMAP. The workflow and nifH ASV database can facilitate comparison of marine diazotrophs diversity and biogeography across studies. The manuscript is well-written. My major comments are 1) In addition to the variations in software pipelines and parameters used to analyze nifH sequences by different studies, sample collection, DNA/RNA extraction and PCR conditions vary across studies, which makes cross-study comparisons challenging. Could you also provide some guidelines or best practice for these procedures? 2) how much difference it makes in the resulting diazotroph diversity comparing the new workflow to the sequence analysis procedures used in previous studies? It may be good to show an example in terms of the retained reads, identified ASVs, and relative abundance of different diazotrophs comparing this new workflow and the previous studies.
Below are some minor comments.
Line 10: Marine nitrogen (N2) fixation…
Line 17: diazotroph diversity, biogeography, and …
Line 47: Benavides et al. reference year is missing.
Line 337: reference database (DB)
Line 381: only removing 94 samples out of total xx samples
Figure 3. Could you show the proportion reads retained as percentage as what you did in Table 4 and Figure 4?
Figure 4 caption: Study-specific loss of reads? Why not showing the subplots in alphabetical order to be consistent with other figures and tables across the study?
Figure 5. are these stacked bars? Or Northern and Southern Hemisphere bars overlapping?
Figure 6. Please double-check the sampling locations. Atlantic is missing.
Figure 7. Could you clarify the difference between % total reads and % relative abundance?
Line 597: nifH cluster 1E is not shown in the figure 7 legend.
Citation: https://doi.org/10.5194/essd-2024-163-RC2 - AC1: 'Comment on essd-2024-163', Jonathan Magasin, 22 Oct 2024
Data sets
nifH ASV database [Global biogeography of N2-fixing microbes: nifH amplicon database and analytics workflow] Michael Morando, Jonathan Magasin, Shunyan Cheung, Matthew M. Mills, Jonathan P. Zehr, and Kendra A. Turk-Kubo https://doi.org/10.6084/m9.figshare.23795943.v1
Interactive computing environment
DADA2 nifH pipeline Michael Morando, Jonathan Magasin, Shunyan Cheung, Matthew M. Mills, Jonathan P. Zehr, and Kendra A. Turk-Kubo https://github.com/jdmagasin/nifH_amplicons_DADA2
nifH ASV workflow (post-pipeline stages) Michael Morando, Jonathan Magasin, Shunyan Cheung, Matthew M. Mills, Jonathan P. Zehr, and Kendra A. Turk-Kubo https://github.com/jdmagasin/nifH-ASV-workflow
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
369 | 129 | 25 | 523 | 21 | 17 | 15 |
- HTML: 369
- PDF: 129
- XML: 25
- Total: 523
- Supplement: 21
- BibTeX: 17
- EndNote: 15
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1