Global oceanic diazotroph database version 2 and elevated estimate of global oceanic N2 fixation

.


Introduction
Dinitrogen (N 2 ) fixation is a process carried out by select prokaryotes (diazotrophs) capable of converting N 2 gas, which is not usable by most organisms, into bioavailable nitrogen (N). In the sunlit surface ocean, where dissolved inorganic forms of N such as nitrate (NO − 3 ) and ammonium (NH + 4 ) are scarce, N 2 fixation plays an important role in providing N that can contribute to primary production, particularly in oligotrophic regions Gruber, 2008). Globally, N 2 fixation serves to compensate, at least partially, for fixed N removed via denitrification and anammox (Deutsch et al., 2007;Gruber, 2019).
Diazotroph abundance has been estimated from nifH gene copies using qPCR assays (Church et al., 2005b) or droplet digital PCR (ddPCR; Gradoville et al., 2017). The abundance of some cyanobacterial diazotrophs can also be obtained by counting them directly using microscopy-based techniques and in some cases flow cytometry. A recent work combined an image recognition pipeline with molecular mapping of the nifH gene to quantify diazotrophs in the Tara Oceans dataset 3676 Z. Shao et al.: Global oceanic diazotroph database and elevated estimate of global oceanic N 2 fixation . Gene copies of nifH have been more frequently measured than microscopy-based cell counts and can be more useful when evaluating the abundance of different diazotrophic groups. Caution must be taken because there can be discrepancies between cell-count-based and nifHbased diazotrophic abundances (Luo et al., 2012), a finding largely attributed to large variations in the number of nifH copies per diazotroph cell, thus far observed particularly in Trichodesmium and heterocystous cyanobacteria (Sargent et al., 2016;White et al., 2018;Karlusich et al., 2021). However, a recent regional study spanning over 200 km of the North Pacific Subtropical Gyre has found a statistically significant linear correlation between the abundances of the nifH gene and cell counts in UCYN-B (i.e., Crocosphaera; linear slope = 1.82) and heterocystous cyanobacteria (Richelia and Calothrix; linear slope from 1.51-2.58) but not in Trichodesmium . A recent discussion highlighted the influence of the uncertainty in gene copy conversion to biomass and the need for further investigation of how to best take advantage of gene copy data for global diazotroph biogeography modeling purposes (Meiler et al., 2022;Zehr and Riemann, 2023); however, there is agreement that quantifying gene counts is a powerful tool for studying marine diazotroph distributions (Meiler et al., 2023;Zehr and Riemann, 2023). Meiler et al. (2023) proposed a number of topics of study for this field moving forward;  concluded that "we hope that future studies report nifH : cell and explore the mechanisms controlling this ratio." Both gene-based and microscopy cell counts have innate biases, which should be elucidated in future studies.
Given the importance of N 2 fixation to ocean ecology and biogeochemistry, it is imperative that a database of up-todate N 2 fixation and diazotrophic abundance measurements be maintained. Currently, global estimates of marine fixed N inputs calculated via the N 2 fixation rate mostly range from 100 to 170 Tg N yr −1 (see summary in Zhang et al., 2020). This value, together with other bioavailable N sources to the ocean including riverine input and atmospheric deposition, is considerably lower than estimates of N losses from the ocean such as denitrification, anammox, and sediment burial (Zhang et al., 2020;Gruber, 2008;Zehr and Capone, 2021). While the overestimation of the N losses cannot be ruled out, one of possible reasons for this imbalance is the inaccurate estimation of global marine N 2 fixation due to limited spatiotemporal coverage of rate measurements and the different methods employed in N 2 fixation assays . Another possible reason is the limited knowledge of ecological niches of N 2 -fixing organisms. Over the last decade, the realm of marine N 2 fixation has been expanded to include numerous non-paradigmatic habitats. Coastal Bentzon-Tilia et al., 2015b;Tang et al., 2020;Turk-Kubo et al., 2021), subpolar (Sato et al., 2021;Shiozaki et al., 2018a), and even polar ocean regions (Blais et al., 2012;Sipler et al., 2017;Harding et al., 2018;Shiozaki et al., 2020) have demonstrated N 2 fixation. Notably, N 2 fixation in aphotic waters remains debated (Bonnet et al., 2013;Farnelid et al., 2013;Selden et al., 2021b;Rahav et al., 2013a;Hamersley et al., 2011;Benavides et al., 2018a;Moisander et al., 2017). Other studies have also suggested that NCDs may be significant contributors to marine N 2 fixation (Shiozaki et al., 2014b;Geisler et al., 2020;Delmont et al., 2021;Karlusich et al., 2021;Bombar et al., 2016;Moisander et al., 2017) and may occupy different niches than cyanobacterial diazotrophs . Luo et al. (2012) compiled the first global oceanic diazotrophic database including in situ measurements of N 2 fixation rates and cell-count-based and nifH-based diazotrophic abundance. Several years later, two studies supplemented the database with a collection of some newly reported diazotrophic data , although a substantial amount of additional data remained to be included. Here, we present an updated version of the global oceanic diazotrophic database with data not yet compiled. We describe the database information, a summary of the data updates, measurement methods, and data distribution. Furthermore, we conduct a first-order estimation of the global oceanic N 2 fixation rate using the updated version of the database. In light of the aforementioned concerns of nifH : cell and various N 2 fixation methods (see Sect. 2.3), we also discuss the significance of employing different methodological approaches to estimate N 2 fixation rates and abundance metrics. We use the data available in the database to analyze the discrepancies between N 2 fixation rates using 15 N 2 bubble and dissolution methods, and compare the observed ranges of nifH gene copies and diazotrophic cell abundance.

Database summary
This study updated the original global oceanic diazotrophic database of Luo et al. (2012;version 1 hereafter) with new in situ measurements of N 2 fixation rates and abundances of diazotrophic cells and nifH gene copies. Together there were 55 286 diazotrophic data points in the updated database (version 2 hereafter; Tables 1-3), including 13 565 data points from version 1 (Luo et al., 2012), 6736 measured in 2012-2018 and compiled by two previous studies , 26 597 data points measured in 1979-2023 and compiled by this study, and 8388 NCD data mostly from see below). In version 2, some errors in the datasets of , mostly caused by unit conversions, were also corrected.
Version 2 was composed of six main sub-databases: (1) 9231 volumetric N 2 fixation rates (5853 new data points; Tables 1 and 4); (2) 2590 depth-integrated N 2 fixation rates (1805 new data points; Tables 1 and 4); (3) 9040 volumetric cell abundances (4154 new data points; Tables 2 and 5); (4) 1784 depth-integrated cell abundances (859 new data  points; Tables 2 and 5); (5) 29 655 volumetric nifH gene  copy abundances (26 506 new data points; Tables 3 and 6); and (6) 2986 depth-integrated nifH gene copy abundances (2544 new data points; Tables 3 and 6). Please be aware that 2416 N 2 fixation rates were measured with incubation periods less than 24 h; they were listed in separate spreadsheets in the database for reasons discussed in Sect. 2.3. Additionally, we included a compiled NCD dataset  in the database, which contained 7919 nifH gene copy abundances of primarily the most studied phylotype NCD Gamma A Langlois et al., 2015), also referred to as 24774A11  and UMB (Bird et al., 2005), as well as other phylotypes, and updated the compilation with 469 additional nifH gene copy abundances of NCDs published more recently Sato et al., 2022;Moore et al., 2018;Reeder et al., 2022;Wen et al., 2022;Bonnet et al., 2023). We also collected 468 cell-specific in situ N 2 fixation rates and added them to version 2 (Table 7).
Depth-integrated data were either provided directly in published papers or calculated as part of this study for those vertical profiles with at least three volumetric data points in each profile. The measurements within a profile were first interpolated linearly with depth, with the shallowest datum representing the level between the sea surface and the depth of that datum. The profile was then integrated from the sea surface to the deepest recorded measurement. Most vertical profiles of N 2 fixation rates were measured within the euphotic zone, with a few studies extending measurements to several hundred meters or deeper. In these cases, we only integrated to the deepest data point above 200 m, taking into account the scarcity of aphotic N 2 fixation measurements in the global ocean and their controversial contribution to the global budget (Benavides et al., 2018a). As a result, it was possible that certain measurements below the euphotic zone but above 200 m were included in the integration. However, these measurements would typically have minimal impact on the depth-integrated N 2 fixation rates due to their low rates and limited vertical extent in this range. N 2 fixation rates were measured for whole seawater samples, for different size fractions (> 10 µm and < 10 µm), or specifically for Trichodesmium and heterocystous cyanobacteria. When whole-water N 2 fixation rates were not reported, total N 2 fixation rates were calculated as the sum of the N 2 fixation rates of available groups.
Sampling information (latitude, longitude, depth, and time) was provided for each data point. Physical, chemical, and biological parameters, including temperature, salinity, and concentrations of nitrate, phosphate, iron, and chlorophyll a, were also included when available.

Quality control
The data of N 2 fixation rates and diazotrophic abundance in the database spanned several orders of magnitude. Extremely high rates and abundance values of both usually occurred during algal blooms, and zero values indicated that diazotrophic activity was below detection or truly absent at the sampling time and stations. The positive-value data were first logarithmically transformed and then analyzed for outliers, considering that they were approximately log-normally distributed (Figs. S1-S5). For each parameter, we used Chauvenet's criterion to identify suspicious outliers whose probability of deviation from the means is lower than 1/2n, where n is the number of data points (Glover et al., 2011). Because N 2 fixation rates and diazotroph abundances in the ocean can be extremely low, this filtering only applied to data on the high side. Although these outliers (labeled in the database) could be true values, we flagged them to caution users.

Nitrogen fixation rate data
The commonly used methods for marine N 2 fixation rates include 15 N 2 tracer methods and the acetylene reduction assay (Mohr et al., 2010;Montoya et al., 1996;Capone, 1993). However, in the last decade, the community has turned largely to the use of 15 N 2 tracer methods. The acetylene reduction assay estimates gross N 2 fixation rates indirectly from the reduction of acetylene to ethylene. Theoretical conversion factors of 3 : 1 and 4 : 1 have been used to convert acetylene reduction rates to N 2 fixation rates (Postgate, 1998;Capone, 1993;Wilson et al., 2012), although a wide range of conversion factors from 0.93 to 56 have been reported (e.g., Mague et al., 1974;Graham et al., 1980;Montoya et al., 1996;Capone et al., 2005;Mulholland et al., 2006;Wilson et al., 2012). When using the 15 N 2 tracer method, samples are incubated in seawater with 15 N 2 gas; the 15 N / 14 N ratio of particulate nitrogen is measured at the beginning and the end of the incubation to calculate the N 2 fixation rate (Capone and Montoya, 2001). Most measurements using the 15 N 2 tracer method only counted the fixed N in particulate forms and ignored the N that was fixed but then excreted by diazotrophs in the form of dissolved organic N (DON) during incubation, which could theoretically be counted by the acetylene reduction assays (Mulholland, 2007). In some studies using the 15 N 2 tracer method, this missing N was counted by also measuring the 15 N enrichment in DON Benavides et al., 2013a;Berthelot et al., 2015;Benavides et al., 2013b).
Compared to the 15 N 2 tracer method, the acetylene reduction assay requires less incubation time. However, in addition to the uncertainty in converting ethylene production to N 2 fixation, the purity of acetylene gas, trace ethylene contamination, and the Bunsen gas solubility coefficient of produced ethylene can also affect the accuracy of estimated N 2 fixation rates (Hyman and Arp, 1987;Breitbarth et al., 2004;Kitajima et al., 2009). Acetylene used in the assay can even impact the metabolic activities of dia- Table 3. Summary of the number of data points for nifH gene copy abundances. UCYNs include UCYN-A1, UCYN-A2, UCYN-B, and UCYN-C. Heterocystous cyanobacteria include Het-1, Het-2, and Het-3.

Original database
New data added to version 2 Sum    (Giller, 1987;Hardy et al., 1973;Flett et al., 1976;Staal et al., 2001). Moreover, the acetylene reduction assay needs to preconcentrate cells for signal detection when diazotrophic biomass is low, which may lead to underestimated N 2 fixation rates by perturbing cells during concentration and filtration (e.g., Capone et al., 2005;Barthel et al., 1989;Staal et al., 2007). In recent years, the acetylene reduction assay has undergone significant advancement. The sensitivity of ethylene detection has been improved by utilizing a reduced gas analyzer (Wilson et al., 2012) and by using highly purified acetylene gas to minimize the ethylene background (Kitajima et al., 2009). However, preparing high-purity acetylene with a low level of ethylene contamination remains a challenge. More recently, a new method named Flow-through incubation Acetylene Reduction Assays by Cavity ring-down laser Absorption Spectroscopy (FARACAS) has been introduced for high-frequency measurements of aquatic N 2 fixation (Cassar et al., 2018). This method involves continuous flow-through incubations and spectral monitoring of acetylene reduction to ethylene. By employing short-duration flow-through incubations without cell preconcentration, potential artifacts are minimized. This approach also allows for near-real-time estimates, enabling adaptive sampling strategies. The original 15 N 2 tracer method involves the addition of a known volume of 15 N 2 -labeled bubbles to the incubation bottle (named original 15 N 2 bubble method hereafter). However, this method was found to underestimate rates because N 2 gas solubility is low and tracer additions take a long time to equilibrate (Mohr et al., 2010;Großkopf et al., 2012;Jayakumar et al., 2017). To address this issue, the 15 N 2 dissolution method has been employed, which involves pre-preparing 15 N 2 -enriched seawater to maintain constant 15 N 2 atom % enrichment throughout the incubation (Mohr et al., 2010), similar to the method described in Glibert and Bronk (1994). However, the 15 N 2 dissolution method does not always yield higher N 2 fixation rates than the original 15 N 2 bubble method ( Table S4 in Großkopf et al., 2012;Saulia et al., 2020); it is still not conclusive what controls the magnitude of the underestimation (if it exists) in the original 15 N 2 bubble method. Compared to the original 15 N 2 bubble method, the 15 N 2 dissolution method is more susceptible to the introduction of contaminants (e.g., metals) during the preparation of the 15 N 2 inoculum due to its more complex process, which can alter the diazotrophic activities and abundance, thereby impacting the accuracy of N 2 fixation measurements (Dabundo et al., 2014;Klawonn et al., 2015). For example, Needoba et al. (2007) reported that a low but detectable amount of Fe 3+ contamination can be measured when protecting the needle of the gas-tight syringe with commercially available tubing. Additionally, pH and other chemical properties of the inoculum may be altered during its preparation, further affecting the measurements of N 2 fixation. Despite these limitations, the 15 N 2 dissolution method remains the predominant assay for measuring N 2 fixation rate due to its ability to satisfy the fundamental assumption of constant 15 N 2 atom % enrichment over the incubation period.
More recently, a modified 15 N 2 bubble method, known as the 15 N 2 bubble release method, has been proposed as an alternative to the 15 N 2 dissolution method (Klawonn et al., 2015;Chang et al., 2019;Selden et al., 2019). This method involves adding 15 N 2 gas to the incubation bottles and mixing for a brief period (∼ 15 min) to facilitate 15 N 2 equilibration and then removing the gas bubble. Compared to the original 15 N 2 bubble method, the 15 N 2 bubble release method ensures uniform 15 N 2 atom % enrichment throughout the in-   cubation. Moreover, it causes less interference with the incubation matrix than the 15 N 2 dissolution method. However, the mixing of incubation bottles required to stimulate gas dissolution has been suggested to negatively affect diazotrophs, although no robust studies have yet been performed to assess this critique (Wannicke et al., 2018;White et al., 2020). Moreover, the 15 N 2 bubble release method requires a handling step, and additional costs for preparing tracers may be another challenge for researchers . Ultimately, White et al. (2020) "advise employing either the dissolution or bubble release method, whichever is best suited to the specific research objectives and logistical constraints" with additional recommendations on the need for determination of detection limits for all rate measurements. We compared volumetric N 2 fixation rates in the upper 50 m and depth-integrated N 2 fixation rates in the database measured using acetylene reduction assays, the original 15 N 2 bubble method, and the 15 N 2 dissolution method and found that they span a similar range (Fig. 1). Meanwhile, in the analysis for volumetric N 2 fixation rates in the upper 50 m, the peak of the log-normal distributions of the measurements using the 15 N 2 dissolution method was approximately double that of the original 15 N 2 bubble method (Fig. 1a). The measurements using the 15 N 2 bubble release method were limited to several study sites and their distribution was thus not presented in this study. A further analysis comparing the original 15 N 2 bubble method and the 15 N 2 dissolution method will be presented later (see Sect. 4.1).
The majority of N 2 fixation rates (9405) were measured with incubation periods of 24 h and were reported as daily rates. In contrast, 2416 samples were incubated for less than 24 h and hourly N 2 fixation rates were reported. Diel cycles of N 2 fixation vary among samples and/or diazotrophic groups, and substantial errors may be introduced when ex- a Data are reported by data providers as depth-integrated nifH gene copy abundances (unlabeled depth-integrated abundances computed from volumetric data). b rnpB gene copies were determined. Table 7. Summary of data points of cell-specific N 2 fixation rates added to version 2 of the database. The rates were measured either by using the combination of CARD-FISH and nanoscale secondary ion mass spectrometry (nanoSIMS; method A) or via the measurements of bulk N 2 fixation rates incubated with a known number of diazotrophic cells (method B; see Sect. 2.3). Note that all the data were reported as N 2 fixation rates per cell, except for Filella et al. (2022) in which biomass-normalized rates in units of d −1 were reported. trapolating N 2 fixation rates incubated for less than 24 h to daily rates . Therefore, the N 2 fixation rates measured with incubation periods of less than 24 h were collected into separated data sheets in our database and were not used in further analysis within this study. Please note that the incubation periods of whole diurnal cycles (e.g., 24, 48, or 72 h) were used in Konno et al. (2010). The incubation of samples in Yogev et al. (2011) lasted from 24 to 30 h. The reported daily N 2 fixation rates by these two studies were also included in the 24 h data sheets and were used in our estimation of the global marine N 2 fixation rate (see below).
Cell-specific N 2 fixation rates of diazotrophs (or symbioses) were mostly measured using catalyzed reporter deposition fluorescence in situ hybridization (CARD-FISH) and nanoscale secondary ion mass spectrometry (nanoSIMS), in combination with 15 N 2 addition experiments (Mills et al., 2020;Berthelot et al., 2019). Using specific oligonucleotide probes, CARD-FISH enables the visualization and location of the regions of interest in diazotrophs at a single-cell level using a epifluorescence microscope. This is subsequently prepared for the secondary electron image in nanoSIMS analysis. Importantly, the handling, fixation, and processing of the samples with CARD-FISH has been demonstrated to significantly impact the enrichment measured by nanoSIMS (see Musat et al., 2014;Woebken et al., 2015;Meyer et al., 2021). The nanoSIMS technique detects the enrichment of 15 N atoms in the targeted regions, allowing for the calculation of the cell-specific rate. Additionally, in one study, handpicked Trichodesmium colonies or trichomes were incubated and the measured total N 2 fixation rates were normalized to number of cells (McCarthy and Carpenter, 1979).

Estimation of the global marine N 2 fixation rate
Using these data, we performed a first-order estimation of the global marine N 2 fixation rate. In a previous study (Luo et al., 2012), version 1 was utilized to estimate the global marine N 2 fixation rate, which included all the depth-integrated N 2 fixation rates. However, in this study, we employed more rigorous criteria to estimate the global rate using both version 1 and version 2, taking into account the reliability of different N 2 fixation rate data discussed in the preceding section. Specifically, we exclusively used depth-integrated N 2 fixation rates that met the following criteria: (1) measurements were taken from whole seawater samples, (2) incubation periods of 24 h were used, and (3) the three 15 N 2 -based methods were employed, although we acknowledged that the rates obtained using the original 15 N 2 bubble method might be underestimated. N 2 fixation rates obtained through the acetylene reduction method were excluded from this estimate due to the significant uncertainties described above. Applying these criteria, we selected 309 and 1642 depthintegrated N 2 fixation rates from version 1 and version 2, respectively. The greater number of data in version 2 potentially provided more constraints on estimating global marine N 2 fixation. We applied Chauvenet's criterion to identify outliers, using the log-transformed values of the selected data (see Sect. 2.2). As a result, two high-value outliers were removed in version 1 (one in the North Pacific and one in the South Pacific) while no outliers were detected in version 2. This difference can be attributed to the larger number of data samples in version 2, which allowed for a more relaxed threshold in identifying outliers.
The estimation of the global marine N 2 fixation rate involved four steps. First, we calculated the arithmetic or geometric means of the depth-integrated N 2 fixation rates within each 3 • latitude × 3 • longitude bin. Second, these mean values were further averaged using either arithmetic or geometric methods to determine the mean N 2 fixation rates for different ocean basins, which included the North Atlantic, South Atlantic, North Pacific, South Pacific, Indian, Arctic, and Southern oceans, as well as the Mediterranean Sea. Third, we multiplied the arithmetic or geometric mean of each basin by its respective area to estimate the total N 2 fixation rate for that specific basin, except when there was insufficient spatial coverage available. Finally, we obtained the global marine N 2 fixation rate by summing up the individual rates calculated for each basin, with the errors associated with the basin rates propagated properly (Glover et al., 2011).
In the first two steps, the geometric means were derived from positive N 2 fixation rates (NF + ): if µ and SE represented the mean and standard error of ln (NF + ), respectively, the geometric mean was e µ . The confidence interval for the geometric mean, based on the standard error, ranged between e µ /e SE and e µ · e SE (Thomas, 1979). To address the issue of not including zero-value N 2 fixation rates, we adjusted the geometric means by multiplying them with the percentage of zero-value data within each 3 • latitude × 3 • longitude bin (in the first step) or within each basin (in the second step).

Diazotrophic abundance data
Diazotroph cell abundances were determined by using standard light microscopy, and in some cases by using epifluorescence microscopy. A recent study used machine learning techniques to detect and enumerate diazotrophs in a large dataset of microscopic images . In the original database, only the cell abundances of Trichodesmium and heterocystous cyanobacteria were recorded.
Cell abundance of Trichodesmium was recorded as the number of trichomes per volume of water in our database, although it was also reported in some studies as the number of cells or colonies per volume of water. In the latter cases, the data were converted to trichomes per volume of water by using a commonly used factor of 200 (132-241) trichomes colony −1 (Letelier and Karl, 1996), similar to the conversion used in the original database (Luo et al., 2012).
All the uncertainties reported in this paper reflect one standard error of the means unless specified.

Data distribution
Version 2 of the database significantly expanded N 2 fixation rate measurements, filling spatial gaps, particularly in the In-dian Ocean and the Southern Hemisphere ( Overall, there remained more limited data on N 2 fixation and diazotrophic abundance in the Arctic and Southern oceans, with a number of rate measurements reporting values below detection limits. Version 2 added data at all latitudinal ranges (Fig. 4). In particular, version 2 extended the range of data from tropical and subtropical areas to include polar regions in the Arctic Ocean (Harding et al., 2018) and Antarctic coast (Shiozaki et al., 2020).
The data in version 2 reduce the difference in the number of data points across months, especially for nifH gene copies, in which substantially more samples were collected in January and February (Fig. 5). When considering seasons in both the Northern Hemisphere and the South Atlantic and Pacific, the data were distributed more evenly (Fig. 6). Although most of the new data were measured in nearsurface waters, numerous nifH gene copy abundance data were also sampled in deeper layers in the euphotic zone (Fig. 7). Additionally, active N 2 fixation and the existence of diazotrophs were found below the euphotic zone (e.g., depth > 200 m; Benavides et al., 2016aBenavides et al., , 2018bSelden et al., 2019;Hamersley et al., 2011;Loescher et al., 2014;Benavides et al., 2015; Fig. 7).

N 2 fixation rates
The volumetric N 2 fixation rates in five vertical layers and the depth-integrated N 2 fixation rates were binned in 3 • latitude × 3 • longitude bins, and the arithmetic means in each bin are displayed (Fig. 8). The depth-integrated N 2 fixation rates ranged over orders of magnitude, from 10 −4 -10 3 µmol N m −2 d −1 (mostly from 1 to 10 2 µmol N m −2 d −1 ;  Fig. 8a). Some high rates (i.e., 10 2 -10 3 µmol N m −2 d −1 ) were found in the western Pacific Ocean, the regions near the Hawaiian Islands, and the western tropical Atlantic Ocean. Approximately 10 % of the depth-integrated N 2 fixation rates were < 1 µmol N m −2 d −1 and were mainly from the North Atlantic and Indian oceans. Within the water column, the N 2 fixation rates were highest in the upper 25 m (Fig. 8b  and c), below which the rates rapidly decreased with depth ( Fig. 8d-f). In the upper 25 m, volumetric N 2 fixation rates in the southwestern Pacific were higher than those in other areas, mostly ranging from 1 to 100 µmol N m −3 d −1 . Undetectable N 2 fixation rates were reported mostly in subpolar regions, as well as in certain tropical and subtropical regions (Fig. 8). Cell-specific N 2 fixation rates span a range from 10 −4 to 10 3 fmol N cell −1 d −1 , although mostly on the order of 10 −2 to 10 2 fmol N cell −1 d −1 (Fig. 9). The mean cell-specific N 2 fixation rates of Trichodesmium, UCYN-A2, and heterocystous cyanobacteria were 1 to 2 orders of magnitude higher than those of other diazotrophic groups ( Fig. 9 and Table S1).

Diazotrophic abundance
The depth-integrated cell abundances and volumetric cell abundances in the upper 25 m are also shown as the arithmetic means in 3 • latitude × 3 • longitude bins (Fig. 10). Trichodesmium abundance generally decreased from the west to the east in the Atlantic Ocean ( Fig. 10a and b). In the Pacific Ocean, Trichodesmium appeared more abundant in the west. The abundance data of heterocystous diazotrophs were still scarce (Fig. 10c and e). The volumetric cell-count-based abundance data are also displayed in three additional depth intervals (Fig. S6).
Gene copies of nifH had better spatial coverage than the cell-count data (Fig. 11). Depth-integrated Trichodesmium nifH copies were also more abundant in the western Pacific and western Atlantic oceans (Fig. 11a). Some high depthintegrated nifH abundance of UCYN-A and UCYN-B were also reported in the northwestern and southwestern Pacific Ocean ( Fig. 11c and e). High nifH abundances of Richelia were found in the southwestern Pacific Ocean and western Atlantic oceans (Fig. 11i). The nifH abundance data for UCYN-C and het-3 were sparse. The volumetric nifH abundance data are displayed in three depth intervals (Figs. 11 and S7). Almost all diazotrophs were more abundant in the upper 25 m than in deeper water.

First-order estimate of global oceanic N 2 fixation rate
Compared to version 1, the spatial coverage of data in version 2, in terms of the fraction of 3 • latitude × 3 • longitude bins, was greatly increased in all ocean basins (Table 8). The spatial data coverage was very low in the Southern and Arctic oceans (1 % and 2 % of total bins, respectively; Table 8), and we therefore did not estimate total N 2 fixation rates for these two basins. Please note that the inaccurate areas of the North and South Pacific oceans used in estimating the global oceanic N 2 fixation rate by Luo et al. (2012) was corrected in this study (Table 8).
We first compared the N 2 fixation rates estimated based on arithmetic means of version 1 and version 2 (Table 8). Using available data in version 2, the global N 2 fixation rate was determined to be 223 ± 30 Tg N yr −1 , which was 3 times that obtained from version 1 ( Table 8). The substantial increase was mostly driven by notable changes in the South Pacific, North Atlantic, and Indian oceans. In the South Pacific Ocean, numerous high N 2 fixation rates were observed in the western subtropical region over the past decade (Fig. 12), resulting in a substantial increase of 68 ± 23 Tg N yr −1 in the estimated N 2 fixation rate for this basin (Table 8). It is worth noting that these newly recorded measurements in the western subtropics of the South Pacific Ocean might even be underestimated since most of them were obtained using the original 15 N 2 bubble method. In the North Atlantic Ocean, the estimated N 2 fixation rate also experienced an increase  Table 8. First-order estimates of N 2 fixation rates based on their arithmetic means in different ocean basins. Data are first binned to 3 • latitude × 3 • longitude grids before being used to calculate arithmetic means in each basin. The arithmetic means are multiplied by the basin areas to calculate the N 2 fixation rates of each basin. NQ: not quantified due to limited data points. ND: no data. The values in the parentheses are the percentages of 3 • × 3 • bins in each basin that have measurements. The reported uncertainties are one standard error of the mean.  of 30 ± 9 Tg N yr −1 for (Table 8), without any discernible pattern regarding the locations of the new high N 2 fixation measurements (Fig. 13). Furthermore, in the Indian Ocean, the improved data coverage in version 2 (Fig. 8a) supported the estimation of an N 2 fixation rate of 35 ± 14 Tg N yr −1 for this basin (Table 8), which was not possible to calculate using version 1 due to insufficient data availability. However, when estimating the global marine N 2 fixation rate using geometric means, both version 1 and version 2 yielded similar rates of approximately 50 Tg N yr −1 ( Table 9). The N 2 fixation rates in each basin tended to follow a log-normal distribution (Fig. 14), with the geometric mean aligning near the peak of the distribution. In the South Pacific Ocean, as discussed earlier, version 2 included a substantial number of newly observed high N 2 fixation rates, but it also incorporated a significant number of rates that were much lower than those in version 1 (Fig. 14c). This could be partially attributed to enhanced detection limits in measurements. Consequently, while version 2 yielded a much higher arithmetic mean N 2 fixation rate compared to version 1 for the South Pacific Ocean (Table 8), their geometric means remained quite similar (Table 9). In the North Pacific Ocean, for the same reasons, the arithmetic mean N 2 fixation rates obtained from both versions were very close, while the geometric mean of version 1 was even higher than that of version 2 (Tables 8 and 9; Fig. 14a). These analyses reveal that, despite the similarity in geometric means of N 2 fixation rates obtained from both versions of the database, the higher arith-metic means in version 2 were not coincidental. Instead, they were the direct outcome of the improved measurement methods and the expanded spatial and temporal coverage of marine N 2 fixation over the past decade. Consequently, previous assessments of the global marine N 2 fixation rate were likely underestimated due to the absence of these new measurements.
We must emphasize that this calculation simply used the average N 2 fixation rates in different ocean basins; therefore, our calculation can only be considered a first-order estimate. Furthermore, limited measurements have shown a large range of N 2 fixation rates in the Southern Ocean (Fig. 8). Considering its vast area, future measurements expanding coverage of N 2 fixation rates in the Southern Ocean (see White et al., 2022) may help to better constrain the contribution of N 2 fixation to the N budget of the global ocean. The new database presented here also expands opportunities for improved statistical estimates of N 2 fixation patterns and global rates based on the modeling of environmental controls (Luo et al., 2014).

Discussion
4.1 Comparison of N 2 fixation measured using 15 N 2 bubble and dissolution methods To date, the origin of the discrepancy in the N 2 fixation rates estimated using different 15 N 2 tracer methods remains unclear. As shown above, the volumetric N 2 fixation rates obtained by the original 15 N 2 bubble method and the 15 N 2 dissolution method spanned a similar range (Fig. 1), while the average rates using the former method were significantly lower than that measured using the latter method (one-tailed Wilcoxon test, p < 0.001, n = 2460 and 1128). With substantial data accumulated over the past decade, we further compared N 2 fixation rates measured using the two methods at close locations and sampling time, although the samples were not identical. We first binned data collected from the same months, horizontal locations (3 • latitude × 3 • longitude), and depth intervals (0-5, 5-25, 25-100, and 100-200 m) and calculated the average rates for each method in each bin. The results showed that the original 15 N 2 bubble method produced lower rates than the 15 N 2 dissolution method in 69 % of the cases (Fig. 13). Furthermore, our analysis employing the generalized additive model (GAM) revealed that the relationship between the rates measured using the original 15 N 2 bubble method and those obtained through the 15 N 2 dissolution method closely adhered to the 1 : 1 line, albeit with slightly lower values in the former (Fig. 15). Please note that these slightly lower values can still result in significant underestimation in measured N 2 fixation rates because the GAM model was applied in a logarithmic space. It is crucial to reiterate that the rates being compared were derived from different samples, emphasizing the necessity for future investigations that directly compare the two methods     using the same samples with controlled parameters such as temperature, volume of injected 15 N 2 , and incubation volume. Despite this limitation, our analysis suggests that the extensive body of historical marine N 2 fixation rate data obtained through the original 15 N 2 bubble method is still valuable, particularly in the examination of spatial and temporal variations in N 2 fixation. We also used the same procedure to compare the N 2 fixation rates measured using acetylene reduction assays and the 15 N 2 tracer methods. However, there were insufficient pairs of data available for reliable comparisons (n = 16 for acetylene reduction versus the 15 N 2 dissolution method; n = 6 for acetylene reduction versus original 15 N 2 bubble method).

Comparison between diazotrophic cell counts and nifH copies
Whether or not nifH copies can be used to infer diazotrophic abundance and to study diazotrophic biogeography, some still challenges remain in the conversion of gene counts to biomass, as a large range in the number of nifH copies per diazotrophic cell has been re-ported (Table S2). In version 2, we first converted Trichodesmium trichome abundance to cell abundance using the same conversion factor of 100 cells trichome −1 as that used in Luo et al. (2012). This conversion resulted in the mean and variance of log 10 -transformed Trichodesmium cell abundance (10 6.5±1.3 cells L −1 ) very similar to that those Trichodesmium nifH gene copies (10 6.6±1.5 copies L −1 ; Fig. 16a). More recently, however, a much lower conversion factor of 13.2 ± 2.3 cells trichome −1 was suggested for Trichodesmium based on larger sample sizes, although a very large range of 1.2-685 cells trichome −1 was reported (White et al., 2018). Hence, when a conversion factor of 10 cells trichome −1 was applied, the Trichodesmium nifH gene copy abundance was 1 order of magnitude higher than its cell abundance (Fig. 16a). This result was within the reported mean nifH : cell ratios for Trichodesmium, albeit based on sparse samples, on the order of 10-100 (Table S2). It is worth noting that there have been suggestions that the observed nifH : cell ratio for Trichodesmium may be overestimated due to methodological limitations . Our analyses underscore the importance of enumerating Trichodesmium cells, rather than solely focusing on trichomes, in correctly evaluating Trichodesmium abundance, which has been suggested for future studies by White et al. (2018). While counting all Trichodesmium cells may be impractical, it would be valuable to report the number of cells in random samples of Trichodesmium trichomes. The same analyses for heterocystous cyanobacteria showed that the nifH gene copy abundances were approximately 2 orders of magnitude greater than the cell abundances in terms of both mean and distribution ( Fig. 16b and  c). It must be noted that this simple analysis used all the data in our database. The limited in situ measurements for identical samples resulted in a mean nifH : cell ratio of 76 for heterocystous cyanobacteria (Table S2), consistent with our simple analysis.
In contrast, much lower nifH : cell ratios (1.51-2.58) were derived from regression analysis for heterocystous cyanobacteria and UCYN-B collected in the subtropical North Pacific . Considering these overall scarce measurements and the outcomes of our analysis, it is plausible that there is substantial variability in nifH : cell ratios. We expect that future studies, focusing on constraining these ratios and identifying mechanisms underlying variability in these ratios, will contribute to a more comprehensive understanding of the connection between nifH gene counts and diazotrophic cell abundance.
The application of qPCR assays for nifH-based abundance (DNA) and expression (RNA) has emerged as a critical step forward in our understanding of the distribution, abundance, and physiology (e.g., expression of nifH) of diazotrophs Zehr and Riemann, 2023). Previously, estimating the abundances of diazotrophs was limited to those that could be identified by microscopy, e.g., Trichodesmium, heterocystous cyanobacteria (e.g., Riche- lia, Calothrix, Anabaena, Nodularia, Aphanizomenon), and some unicellulars (e.g., Cyanothece, later Crocosphaera). Thus, qPCR enabled the study of diazotrophic targets (and their activity) without the need for microscopy to identify them, which came later as some diazotrophs did (and still do) require application of FISH techniques for identification (Biegala and Raimbault, 2008). Additionally, qPCR allowed the study of in situ activity (gene expression) by diazotrophs without the need for cultivation. Although beyond the scope of the work presented here, important considerations should be taken into account when applying microscopy and qPCR datasets (Table S3), for example, to biogeochemical models (Meiler et al., 2023).

Biomass conversion factor
For possible further usage of cell-counted abundance data, here we suggest carbon biomass conversion factors for different diazotrophic groups (Tables 10 and S4). Most biomass conversion factors suggested here are the same as those used in Luo et al. (2012), excluding UCYN-A and heterocystous cyanobacteria, where new information has become available or additional consideration is necessary. A recent study has discovered a new symbiosis association between the unicellular diazotroph (UCYN-C) and diatom Epithemia strains (Schvarcz et al., 2022). However, the conversion factor of UCYN-C could not be updated in this study due to insufficient information on the biovolumes of host cell.
The conversion factor for UCYN-A was updated because it has been found to live symbiotically with haptophyte Braarudosphaera bigelowii and relatives (Thompson et al., 2012;Hagino et al., 2013). Because the host and UCYN-A should function together, the host biomass is allocated to UCYN-A. It has been reported that each haptophyte cell hosts one UCYN-A1 cell (Cornejo-Castillo et al., 2019) or one UCYN-A2 cell (Suzuki et al., 2021). We used the empirically derived equation (Verity et al., 1992)  Because heterocystous cyanobacteria and their host diatoms form DDAs, similar to UCYN-A, we also suggest allocating the biomass of host diatoms to each associated diazotrophic cell (Table S4). The biomasses of heterocystous cells and vegetative cells in Richelia filaments were updated according to the cell dimension data reported in Caputo et al. (2019) using the same empirical equation above. The carbon biomass of host diatom cells was calculated using an empirical equation (Menden-Deuer and Lessard, 2000): where C is the diatom cell carbon biomass (pg C cell −1 ) and V is the average cell biovolume (µm 3 ) of each diatom genus, for which values from a database (Harrison et al., 2015) were used in this study (Table S4). Each host diatom associates with multiple heterocysts. The numbers of Richelia heterocysts associated with Hemiaulus, Rhizosolenia, and Chaetoceros were observed to be within the range of 1-2, 1-5, and 3-10 respectively Yeung et al., 2012;Caputo et al., 2019); we selected both the maximum and min-imum to do the estimation. The number of vegetative cells in each heterocyst was also updated according to Caputo et al. (2019). Conversion factors for DDAs were estimated by dividing the total biomass of each DDA by the number of associated heterocysts. Changes in the number of Richelia in Rhizosolenia (1 or 5) would make a large variation in its conversion factor, possibly due to the large host biomass; therefore, we keep them both to let users take caution when using this conversion factor. The resulting biomass conversion factors of Richelia-Hemiaulus and Richelia-Chaetoceros associations were estimated to be 280 pg C heterocyst −1 (range: 150-1250) and 430 pg C heterocyst −1 (range: 10-1900), respectively (Table S4), as the number of filaments did not have a large impact on the conversion factors.
It is important to reiterate that these biomass conversion factors are only applicable to cell-count data. Attempting to convert nifH gene copies to biomass is not recommended due to significant uncertainties associated with nifH : cell ratios, as previously discussed.

Conclusions
In this study, we updated the global oceanic diazotrophic database by Luo et al. (2012) by adding new measurements reported in the past decade. Although the spatial coverage of the data was greatly expanded by this effort, the data distribution is still uneven, with most measurements reported from the Pacific and Atlantic oceans. Using the updated database, the estimation of global oceanic N 2 fixation based on arithmetic rates in ocean basins was increased from 74 ± 7 to 223 ± 30 Tg N yr −1 . This change is largely attributable to a new estimate for the Indian Ocean and a much elevated estimate for the South Pacific Ocean, the latter of which would account for ∼ 40 % of global N 2 fixation. This high estimation for the South Pacific Ocean is in line with its qualification as a hotspot for diazotrophy (Messer et al., 2016;, partly due to iron fertilization processes in this region (Bonnet et al., 2023). Due to data sparsity, our updated estimation did not include N 2 fixation in the Southern and Arctic oceans. Furthermore, data were more concentrated in surface seawater, and a significant amount of data were measured with incubation periods shorter than a daily cycle (24 h), limiting reliable evaluations of depth-integrated N 2 fixation rates. Although this result suggests more balanced N inputs and losses in the global ocean than the previous estimate suggested, large uncertainties still exist. We also compared the N 2 fixation rates measured using the addition of a bubble of labeled gas or the addition of dissolving Figure 16. Comparison of all cell-count and nifH gene copy abundance data in the database. The box plots show the median (central line), 25th and 75th percentiles (upper and lower edges of the boxes), 5th and 95th percentiles (error lines), and outliers (red crosses) of the log 10transformed data. The comparisons are conducted for (a) Trichodesmium, (b) het-1/2, and (c) het-3. Note that the two conversion factors of 10 and 100 cells trichome −1 are used for Trichodesmium. Trichodesmium UCYN-A1 UCYN-A2 UCYN-B UCYN-C Het-1 Het-2 Het-3 (pg C cell −1 ) (pg C cell −1 ) (pg C cell −1 ) (pg C cell −1 ) (pg C cell −1 ) Richelia-Richelia-Richelia-Hemiaulus Rhizosolenia Chaetoceros (pg C heterocyst −1 ) (pg C heterocyst −1 ) (pg C heterocyst −1 ) Recommended 300 2 30 20 10 350 450 50 (5 heterocyst DDA −1 ) or 1900 (1 heterocyst DDA −1 ) Likely range 100-500 1-3 10-50 4-50 5-24 150-1030 19-5700 9-300 15 N 2 gases reported at the same location and month (not necessarily in identical samples). The results indicated that the original 15 N 2 bubble method produces lower rates than the 15 N 2 dissolution method in 69 % of the cases. These results reveal that, despite decades of effort, the ocean is still undersampled in terms of the distribution of diazotrophs and N 2 fixation rate measurements. Our analyses suggest that prioritizing N 2 fixation measurements in the South Pacific Ocean, Indian Ocean, and high northern latitudes can significantly reduce the current uncertainty of N 2 fixation rates in the global ocean. Nevertheless, we believe that this updated diazotrophic database, supplemented with enhanced data from the past decade, is timely and can be helpful to scientists studying the marine biogeochemical cycle of N.
Author contributions. YWL conceived and designed the structure of the database. ZS, YX, HW, WL, LW, YH, and YWL collected the data and updated the database. ZS, YX, HW, SCD, and YWL analyzed the data. The other authors contributed to the data. ZS, YX, and YWL wrote the first draft of the manuscript, and all authors revised the manuscript.
Competing interests. The contact author has declared that none of the authors has any competing interests.
Disclaimer. Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Financial support. This research has been supported by the National Natural Science Foundation of China (grant nos. 41890802 and 42076153). Individual authors were also supported by other awards.
Review statement. This paper was edited by Xingchen Wang and reviewed by Christopher Somes and one anonymous referee.