GRiMeDB: the Global River Methane Database of concentrations and ﬂuxes

. Despite their small spatial extent, ﬂuvial ecosystems play a signiﬁcant role in processing and trans-porting carbon in aquatic networks, which results in substantial emission of methane (CH 4 ) into the atmosphere. For this reason, considerable effort has been put into identifying patterns and drivers of CH 4 concentrations in streams and rivers and estimating ﬂuxes to the atmosphere across broad spatial scales. However, progress toward these ends has been slow because of pronounced spatial and temporal variability of lotic CH 4 concentrations and ﬂuxes and by


Introduction
Despite their small areal extent, running-water (fluvial) ecosystems play a significant role in processing and transporting carbon (C) in and through aquatic networks, including the production, consumption, transport, and evasion of carbon dioxide (CO 2 ) and methane (CH 4 ). The profound planetary warming effects of CH 4 in the atmosphere, its erratic but accelerating rate of increase over recent years (NOAA, 2022), the significant contributions of natural sources to the growing atmospheric pool (Turner et al., 2019), and improvements in gas measurement technologies have all contributed to a rapid increase in studies of CH 4 dynamics in aquatic environments in general and fluvial ecosystems in particular. These studies reveal widespread supersaturation of CH 4 in running waters that underlies their larger-than-expected contribution to the atmospheric pool .
Efforts to quantify fluvial CH 4 dynamics at regional, continental, and global scales have been fraught with uncertainty, reflecting the inherent variability of this gas in surface waters combined with a notable limitation in data availability. Sources and sinks of CH 4 are often unevenly distributed over space and time within drainage systems and, as a result, concentrations can vary over 1-3 orders of magnitude over short time periods (subdaily to subweekly; e.g., Natchimuthu et al., 2017;Smith and Böhlke, 2019) or relatively small spatial extents (< 10 to < 100 m for small streams and large rivers; e.g., Anthony et al., 2012;Crawford et al., 2017;Bretz et al., 2021;Robison et al., 2021). Similarly, several drivers or predictors of CH 4 have been identified in the literature, and these properties also have variable spatial and temporal distributions. Thus, efforts to estimate the total emissions from world rivers have relied on relatively small data sets composed of site-specific values that have been averaged over time and have then used upscaling strategies based on Monte Carlo techniques or extrapolations using predictor variables that have little or no significant statistical relationships with large-scale patterns of gas concentrations or fluxes (Hutchins et al., 2020). Consequently, current global-scale estimates of riverine emissions are poorly constrained and highly uncertain (Saunois et al., 2020;Rosentreter et al., 2021).
The combination of rapidly increasing atmospheric concentrations of CH 4 , the significant role of fluvial systems in emitting this gas, and, critically, current difficulties in explaining or predicting concentrations and fluxes with reasonable certainty inspired the central goal of this paper: to assemble a comprehensive database of CH 4 concentrations and fluxes for fluvial ecosystems that includes broadly relevant concurrent physical and chemical data. This effort expands upon a prior compilation of CH 4 and CO 2 data (named MethDB; Stanley et al., 2015) that was constructed to emphasize among-site differences and included 1496 concentration records and 532 flux records from 1080 sites. In this more comprehensive Global River Methane Database (GRiMeDB), most data are date-specific (i.e., not averaged over time), the breadth of site types is expanded to include marginal fluvial habitats as well as disturbed and artificial waterways, and CH 4 data are supported by a broad suite of site-specific physical and chemical attributes along with concurrent measurements of CO 2 and N 2 O where available. Given the more finely resolved scale of the data and the growth of the field in the past decade, GRiMeDB represents a significant expansion beyond MethDB. Building GRiMeDB with greater detail and breadth of data was done with the intent of increasing opportunities to identify and predict spatial and temporal variation in CH 4 , to test hypotheses related to greenhouse gas dynamics, and to reduce uncertainty in future upscaled estimates of gas emissions. In this paper, we (1) provide a detailed description of the components of the database and its construction, (2) summarize some basic patterns of gas concentrations and fluxes from GRiMeDB, and (3) highlight critical data gaps and possible future research opportunities for improving current understanding of CH 4 dynamics in streams and rivers.
ber 2021 for completeness. We also used informal "word-ofmouth" approaches to discover additional, often unpublished data sets.
All potential data sources were first screened to determine their appropriateness for inclusion in GRiMeDB. Several criteria were established a priori to ensure the usability of the data and that they were derived from inland running water systems. Coastal sites with > 1 ppt salinity were considered estuarine and thus were excluded. Similarly, sites that were situated in reservoirs or immediately upstream of small dams, dam spillways, beaver ponds, or lake outlets or that were subject to experimental manipulation were omitted. We did not enter fluxes derived from chambers attached to collars or inserted into sediments because we could not be certain that such measurements were capturing air-water fluxes. Sources that reported minimum and maximum gas concentrations or fluxes only were not included. Finally, rates expressed on an annual basis were also excluded to avoid introducing uncertainty associated with different upscaling assumptions and methods.

Source table
The source table contains the list of all sources used to build GRiMeDB, a unique identification number (Source_ID) for each CH 4 data source, and basic bibliographic information for the data source (Title, Author, Source, publication year Pub_year, and digital object identifiers Paper_DOI or Data_DOI_primary or another persistent identifier; all column titles for this table are defined in Table A1). In several cases, data sources were supplemented with additional supporting information (e.g., associated physicochemical data) from separate sources (described further in Sect. 2.3) or additional or corrected information from authors (Fig. 2). In the latter case, we contacted authors when questions arose regarding their data (e.g., clarification regarding units) and/or to request supporting information or site-or date-specific concentrations or fluxes if published values were aggregated. Inclusion of additional unpublished data from authors is noted in the source table along with a description of the addition or correction. If supporting data from separate published sources were used, the DOI or another persistent identifier for the secondary source was listed in a separate column (Data_DOI_supporting).

Site table
The site table reports basic information on attributes for all sites where CH 4 was sampled. Each site has a unique identification code (Site_ID) and name (usually taken directly from the data source) and is linked to the source table via the Source_ID (see Table A2 for detailed descriptions of all columns in the site table). What comprises a "site" (i.e., the spatial extent of data collection) varied among data sources and includes (1) discrete sampling points, (2) geomorphi-cally distinct study reaches, and/or (3) larger channel sections, drainage networks, or other geographic units. The second case typically corresponded to reaches such as riffles or pools in small streams. In the third case, multiple points were often sampled within the "site" and data were then presented as averages. The distance between sampling points that had been averaged varied widely but were typically > 1 km and in some cases exceeded 100 km. Because land use, channel order, and slope can vary substantially across such distances, we included fields to indicate whether a site was an aggregation of widespread points ("aggregated") and, if so, the number of locations in the aggregation (if available). We also limited the resolution of latitude and longitude for these sites to less than three decimal places. At the opposite extreme, gas sampling at points very close to one another (a "highdensity site" sensu Fig. 3) has the potential to create ambiguities for site delineation and data analysis. To avoid these pitfalls, we combined points with slightly different latitudelongitude values to represent a single site for three specific cases. First, multiple samples collected at different points and/or depths within a channel cross section were averaged to form a single site. Second, some drainages or regions were surveyed repeatedly (particularly the Congo River basin and streams in Pennsylvania, USA), and it was not always clear whether closely situated (ca. 10-50 m) points from different surveys were intended to be a repeated sampling of the same location or sampling of discrete sites. Some judgment was involved in choosing between these two possibilities, and in a subset of cases, points in close proximity to one another that were sampled on separate dates were treated as a single site. What constituted "close proximity" varied between small streams and large rivers but was always < 100 m and typically < 50 m. Finally, three data sources had extremely high sampling densities within discrete reaches (50 to > 20 000 samples per reach; Crawford et al., 2016;Call et al., 2018;Loken et al., 2018). Because closely adjacent gas samples can be spatially autocorrelated  and including all individual values from these studies would have resulted in their overrepresentation in the database, individual point measurements were treated as within-reach replicates.
For a site used in multiple studies, the Site_ID was assigned to the earliest paper and a comment was added to the site entry noting its use in other data sources (Fig. 3). Latitude and longitude coordinates were available for most sites; however, in several cases, location information was acquired from authors or estimated from study site figures using Google Earth (© Google Earth, 2020, https://earth.google. com/, last access: 10 April 2023). All sites were plotted on Google Earth and inspected (Fig. 3) to identify and correct data errors. If a site's coordinates were immediately adjacent to but not on a channel, the coordinates were adjusted to fall on the channel, and this modification was noted in the Comments field. If available, additional variables drawn from the data sources were entered to characterize the site, including Figure 1. General structure of GRiMeDB and connections between its four tables. Information flow began with entering information about each data source into the source table and assigning a unique Source_ID. Site information for each site within a data source was then entered into the site table. The site was given a unique Site_ID and linked to its data source by the Source_ID. Source_IDs and Site_IDs were carried over to all concentration and flux observations in their respective tables. Methane (CH 4 ) observations include site-date combinations with only concentration data (orange), only flux data (green), or both concentration and flux data (brown). Concentrations and available supporting data (described in Sect. 2.3) were entered into the concentration table, and each observation was given a unique observation (obs) name. For site-date combinations that had both concentration and flux observations, the Source_ID, Site_ID, observation name, and date information were copied to the flux table for data entry. Site-date combinations with flux data only were entered into the flux table and given a unique observation name. If a flux observation had associated supporting data, the Source_ID, Site_ID, observation name, and date information were copied to the concentration table for supporting data entry. However, if there were no supporting data, matching rows were not added to the concentration table.  source table of  GRiMeDB. stream name, basin or region name, elevation, channel slope, Strahler order, basin area, and codes denoting distinct channel or site types (described below). To supplement the available elevation data, we also estimated elevation for all sites except aggregated sites or sites with poorly resolved coordinates (fewer than three decimal places for both latitude and longitude) after snapping coordinates to the nearest stream.
To determine the adjusted within-channel coordinates, we first downloaded a digital elevation model (DEM) for each site using the function get_elev_raster() from the package "elevatr" (version 0.4.2;Hollister et al., 2021) for R statistical software (version 4.2, R Core Team 2021) at a resolution of 6-9 m depending on the location on the globe. Second, the DEM was processed for hydrological correctness using the package "whitebox" (version 1.2.0, Wu, 2020) by filling single-cell pits (fill_single_cell_pits() function) and breaching depressions (breach_depressions() function) to obtain a flow-direction model (d8_pointer() function). Finally, we calculated a flow-accumulation model (d8_flow accumulation() function). If the coordinates reported in the data source had a flow accumulation of < 10 cells (indicating that they were not located in a preferential flow path), new coordinates were assigned to the cell with the highest flow accumulation within a 50 m radius. If the initial site had a high flow-accumulation value (> 10 cells), we assumed that the site was in a stream channel. Typically, the snapping procedure resulted in very minor changes to a site's location (median < 3 m).
Many studies of CH 4 dynamics have been undertaken to determine whether and how specific phenomena such as the presence of upstream reservoirs, point source discharges, thermokarst features, or oil and gas extraction potentially affect fluvial CH 4 (and other constituents), usually with an expectation of a net enhancement of concentrations and fluxes. Similarly, other studies have examined sites that may be expected to be enriched in CH 4 but whose fluvial identity might be considered marginal or ambiguous (e.g., springs, floodplain backwaters, ditches, canals). Inclusion of such "methane hunting" studies has the potential to bias the data set toward higher values . Nonetheless, we included these studies in GRiMeDB because they provide an opportunity to investigate the consequences of human activity and gain a more comprehensive understanding of fluvial CH 4 dynamics. However, to accommodate future analyses in which use of such data might be unsuitable, or alternatively, when these sites might be the sole focus of a study, we generated a set of channel codes to identify targeted site types (Table 1). Information about four of the codes was not consistently available among data sources, and thus their assignment often involved judgment calls. The first case involved determining whether the presence of an upstream dam (code DD) was relevant for sites of varying distances downstream. We used a distance of 7 km as a cutoff for this category, although the zone of influence may be far shorter or extend far beyond this distance depending on dam size and operation , respectively. To provide some context for this code, a site's distance from a dam was acquired from the data source or estimated in Google Earth using the Path tool and reported in the Comments field whenever possible. The second case involved straight, symmetrical channels that are common in many agricultural and urban areas. Frequently, it was not known whether this unnatural geometry was due to channelization (straightening) of a stream (code CH) or creation of a new channel (ditches and canals; codes DIT and CAN). In the absence of specific information, straight channels were classified as CH. Third, channels draining or passing through wetlands (WS) were often difficult to identify, particularly given seasonal variation in wetland appearance. Finally, floodplain channels presented a distinct challenge because of the complex nature of these environments and their potential to be classified as either riverine or wetland systems. We used the floodplain (FP) code to indicate habitats that were described as or appeared to be lentic (i.e., backwaters or connected floodplain lakes) but that were persistently connected to the main river channel and thus were part of the fluvial system. Given these ambiguities, we recommend that these four codes be viewed and used with care.

Concentration table and flux table
The concentration table and the flux table contain the primary gas data central to GRiMeDB, and the concentra- tion table also hosts physical and chemical variables associated with concentration and/or flux observations (see Tables A3 and A4 for the full list of columns and their descriptions). The vast majority of concentration and flux data were extracted from tables within data sources or data repositories or were provided by authors. However, in some cases, values were acquired from figures using graphical digitizing software (WebPlotDigitizer, https://automeris.io/ WebPlotDigitizer/, last access: 10 April 2023, GetData, http: //getdata-graph-digitizer.com/, last access: 10 April 2023, or DigitizeIt, https://www.digitizeit.xyz/, last access: 10 April 2023). Plots with log scales or that were difficult to interpret were not digitized. The accuracy and consistency of this method were evaluated by comparing data generated by different individuals digitizing a set of common figures and by comparing digitized results to known results. Agreement between both comparisons was strong (average slope = 0.994, average R 2 = 0.9996 for five comparisons between individuals digitizing the same data set and average slope = 0.998, average R 2 = 0.997 for digitized versus actual data for seven data sets; see Table S1 in the Supplement for further details), demonstrating the reliability of this method of data gathering.
Whenever possible, concentrations and fluxes were entered as values for individual sites on individual days (i.e., not averaged across sites or days) (Fig. 4). Because 1 d represented the lowest level of temporal resolution in GRiMeDB, repeated measurements made on a subdaily scale were averaged and expressed as a daily value and were not considered to be aggregated over time. If multiple replicates were collected at different times on the same day (e.g., a study of diurnal gas dynamics), this was noted in the Comments field, and measurements prior to and after 12:00 local time (LT) were entered as separate, consecutive days. Observations resolved to the daily scale can be identified using either a "No" in the Aggregated_Time field or by having the same reported starting (Date_start) and ending (Date_end) dates. If the specific start and end dates were not specified in the data source, we entered the day as the 15th of the month and noted this approximation in the Comments field. If available, we also reported minimum and maximum values and standard deviations (SDs) for entries that were aggregated over space and/or time. SDs but not minima and maxima were reported for replicates from non-aggregated sampling when available, except for reach-averaged entries with multiple within-reach measurements and diel studies with multiple within-day values. In these cases, minima and maxima were also included.
Dealing with concentration data reported as a negative value, zero, or below a detection limit (BDL) is problematic because of inconsistencies in detection limits and reporting practices, and any decision about handling these records introduces some bias (Stow et al., 2018). For example, using a non-numerical format such as BDL or < 0.01 is likely to lead to the elimination of these entries during data analysis and thus would introduce a bias against low-value observations. Alternatively, converting any such value to zero would introduce a bias in the opposite direction. As a compromise solution, concentrations recorded as zero in the original data source were entered as zero in GRiMeDB, and other belowdetection values were entered as −999 999. In this latter case, the original data entry format was noted in the Comments column. For fluxes, negative and zero values were entered without modification or comment.
The flux table reports diffusive, ebullitive, and total CH 4 fluxes along with CO 2 and N 2 O diffusive fluxes. Given the diverse strategies for measuring each of the three CH 4 flux pathways and potential biases associated with different approaches Chen et al., 2021), values are accompanied by brief categorical descriptions of methods used for each flux type as well as for CO 2 fluxes and the gas exchange coefficient k. For a small number of entries, CH 4 fluxes were not directly reported in the data source, but information was available (dissolved gas concentration, temperature, and a corresponding k value) that allowed us to calculate these fluxes. We also entered BDL values for flux for one data source in which fluxes had been calculated from  GRiMeDB concentration table and flux table. concentration, but fluxes associated with BDL concentrations had been omitted from the results. Finally, a small number of observations listed diffusive and ebullitive but not total fluxes, so diffusion and ebullition were summed and entered as total flux. In all cases, the added calculations are noted in the Comments field.
The GRiMeDB concentration table includes physicochemical measurements in support of concentration and flux observations (Figs. 1 and 4, Table A3). Availability of this supplemental information varied widely among data sources and was limited to data collected concurrently with gas samples. For data sources with gas fluxes and physicochemical data but not gas concentrations, we created rows in the concentration table to capture the supporting data. These records are identified by a "Yes" in the FluxYesNo column, Sample-Count = 0, and NA in the CH4mean column. Finally, water temperature was estimated for entries if it was needed to convert gas units and entered in the WaterTemp_degC_estimated column. Estimates were typically based on values from adjacent sites or the same site at a similar time (e.g., averages of temperature from the prior and subsequent dates or from the same month in an adjacent year). Error introduced from these estimates should be small, e.g., ca. < 10 % of the actual value if the estimated temperature is off by 3 • C.
Following completion of all data entry, gas and physicochemical variables were converted to "new" standard units (Tables A3 and A4). The identities of the new and original units are included in both the concentration table and flux table for clarity. Elevation was used to estimate atmospheric pressure if needed for unit conversions. We used Henry's law, water temperature, and atmospheric pressure to convert reported dissolved gas values (ppm, ppb, µatm, and percentage saturation; ∼ 13 % of observations). For observations that reported gas values as percentage saturation (< 1 % of all observations), we also used the global average CH 4 , CO 2 , and the N 2 O atmospheric concentrations from the NOAA Global Monitoring Laboratory (https://gml.noaa.gov/ccgg/, last access: 12 June 2023) for the year 2013, which corresponds to the median observation year in the database.

Assessment of representativeness
We assessed the representativeness of sites in GRiMeDB relative to the global distribution of biological, physical, and climatic properties following van den Hoogen et al. (2021). Briefly, we first assigned each site to a corresponding river reach in HydroSHEDS (Linke et al., 2019), which is a global hydrological network database that contains spatial data for a wide array of hydrological, physiographical, climatic, land cover, geological, edaphic, and anthropogenic variables for each river reach. HydroSHEDS thus provides a multidimensional characterization of global rivers that is well suited for assessing how representative GRiMeDB sites are in terms of key biophysical and anthropogenic features. After excluding non-numerical variables (e.g., biome) and variables with monthly values (e.g., monthly precipitation), we performed a principal component analysis (PCA) on all HydroSHEDS subcatchments using all possible combinations of the 54 remaining HydroSHEDS variables. From this, we selected all principal components (PCs) needed to explain 90 % of the variance in the PCA, which corresponded to 28 PCs and 378 possible bivariate combinations of these PCs. For each unique PC pair, we computed the convex hull of all sampled sites to determine the distribution of these sites relative to all global river subcatchments for the specified PCs (Fig. 5). Each HydroSHEDS subcatchment was then assigned a value of 1 or 0 if it fell within or outside the convex hull, respectively. This process was repeated for each of the 378 possible PC combinations. To collapse this information, we calculated the fraction of cases where a given subcatchment fell within the convex hull for all PC combinations to obtain a dimensionless summary value ranging from 0 to 1. A subcatchment with a value of 1 for this index of "representativeness" means that it fell within the convex hull for 100 % of the PC combinations, indicating that its overall characteristics are well captured in the database. It is important to note that this analysis only captures average catchment properties of relatively large river reaches (average subcatchment area: 130 km 2 ). Given the strong local controls on CH 4 concentrations and fluxes, interpretations from this analysis should be made with some caution.

Data checking and data analysis
Several approaches were taken to check the accuracy of data in GRiMeDB. This included evaluation of the reliability of digitized data (Sect. 2.3) along with several additional inspection steps. Entries were error checked by a coauthor other than the individual who entered the data, including confirmation of site location information, validation of units for all variables, and spot or complete checking of entered gas data (independent units and data check in Fig. 4), depending on dataset length and whether data were manually entered or imported directly from a file. Once values had been converted to standard units, all variables were plotted to identify outliers (outlier check; Fig. 4), and extreme values were checked against the original data source. In cases in which errors were present in the original data, if possible, authors were contacted for clarification. In the few rare cases in which issues could not be resolved, the data were excluded. These and all other calculations and analyses were performed in R (version 4.2, R Core Team 2021) using the "dplyr" (version 1.0.7, Wickham et al., 2021) and "data.table" (Dowle and Srinivasan, 2021) packages for data analysis, the "sf" package (version, 1.0, Pebesma, 2018) for spatial data pro-cessing, and the "ggplot2" (version 3.3.5, Wickham, 2016) and "patchwork" (Pedersen, 2022) packages for visualization.

Overview of GRiMeDB data
GRiMeDB includes 24 024 records of CH 4 concentration and 8205 CH 4 flux values from 5037 unique sites along with 17 655 and 8409 concurrent measurements of concentration and 4444 and 1521 of flux for CO 2 and N 2 O, respectively (Table S2). Although the first concentration and flux values in GRiMeDB were published in 1973  and 1987 (de Angelis and Lilley, 1987), respectively, over 70 % of all CH 4 concentrations and 80 % of flux observations became available after 2015 (the year of publication of MethDB; Fig. 6, Fig. S1 in the Supplement). This growth in data availability has occurred predominantly along the spatial axis, as almost two-thirds of all the sites were added in or after 2015 and over half of all the sites in the database have a single concentration and/or flux observation. Conversely, long time series are rare, with only 8 % of the 5037 sites having > 10 concentration observations and 4 % having > 10 diffusive flux records (Figs. 6 and S1). The longest concentration record includes 590 observations distributed over 28 years (Toolik Inlet, Site_ID 9025; Kling, 2019a;Kling, 2022), while the longest flux record has 82 observations of diffusive flux over 4 years (Site_ID 3644;. Further, among the 15 sites with time series > 5 years, 12 are situated in either the Toolik Lake region of Alaska, USA (Kling, 2019a(Kling, , b, 2022, or within the Krycklan watershed in Sweden .

Spatial and temporal distribution of data
Spatially, 40 % of all sites and 52 % of all CH 4 concentration observations are in North America, followed by Europe (25 % of all sites and 26 % of all CH 4 concentration values; Table S2). Conversely, there are vast geographic areas with moderate to high channel densities with few or no observations, such as central Canada, central America, South America beyond the Amazon mainstem area, most of Russia, central and western Asia, New Zealand, and the Malay Archipelago (Fig. 7a). Geographic limitations in availability of flux data, particularly of ebullition, are pronounced given a smaller number of observations and domination of diffusion measurements. Observations of ebullition are absent or limited to one to two studies for Africa, Oceania, central America, South America, and Russia (Fig. S2). Despite these gaps, there is surprisingly good representation in terms of the range of hydrological, physiographical, climatic, land cover, geological, edaphic, and anthropogenic conditions that exist globally (Fig. 7b). Areas that are poorly represented are characterized by very low channel density associated with arid or  (2015) indicates the year of MethDB  publication. See Fig. S1 for data accumulation and length resolved by CH 4 flux type. polar climates as well as high-altitude regions (Greenland, northern Canada, northern Africa, central Australia, Middle Eastern nations, western China, Mongolia, Chile, southern Argentina). Evaluating the distribution or representativeness of sites in terms of system size is difficult given the limited availability of relevant information such as Strahler stream order or basin area, which were reported for only 26 % and 28 %, respectively, of all sites (Table S2). For sites with these data, counts of observations decline with increasing stream order (Fig. 8) in a log-linear fashion (R 2 = 0.92 for concentration and 0.90 for flux; P < 0.0005 for both regressions after excluding zero-order counts), consistent with Horton's law of stream numbers (Horton, 1945). Thus, other than the extreme underrepresentation of zero-order channels, this predictable decline suggests reasonable representation by order. Nonetheless, this result should be interpreted with caution given the scarcity of relevant data. The distribution of counts by basin size follows a similar pattern of underrepresentation of sites draining very small basins and also indicates a potential overrepresentation of some large basin sizes (basins of ca. 10 000 km 2 ; Fig. 8).
The distribution of observations among months illustrates seasonal sampling regimes dominated by summer sampling at northern (> 40 • ) and southern (< −20 • ) latitudes contrasted by even or erratic sampling at mid latitudes (Fig. 9).
Consistent with the lower representation of Southern Hemisphere rivers and streams, several months lack concentration and/or flux measurements south of −10 • latitude, particularly during winter months. Beyond these gaps, the only months missing data in the Northern Hemisphere are fluxes in January and February at sites north of 60 • latitude and several missing months north of 70 • , presumably due to pervasive ice and snow cover.

CH 4 flux methodology
Records of CH 4 flux are dominated by diffusive flux measurements, which represent 85 % of all flux values in the database, with ebullition (8 %) and total flux (7 %) accounting for the remaining entries (Fig. 10). Not surprisingly, a variety of methods have been used to quantify each flux type, although diffusive flux methods are dominated by calculations based on dissolved gas concentration and a gas exchange coefficient (k) (74 % of all observations), while chamber-based methods are most common for quantifying total flux (93 % of all observations). Similarly, k is most commonly estimated via physical models (n = 3188). Several models have been employed for this calculation, as indicated by > 25 different references for k model sources listed in GRiMeDB.

Overview of concentration and flux data
Concentrations and fluxes of all three gases are characterized by log-normal distributions that vary over several orders of magnitude (Fig. 11) and large coefficients of variation (CVs) for CH 4 and especially N 2 O ( Table 2). The vast majority (∼ 95 %) of CH 4 and CO 2 concentrations appear to be supersaturated, in contrast to N 2 O concentrations, with 67 % of observations above this threshold. Reports of concentrations below detection are scarce for all gases, including N 2 O ( Table 2).
There were no meaningful univariate relationships between variables that may be used for upscaling (latitude, basin area, and stream order) and mean site concentration or flux (Fig. 12, Table S3). Linear regressions indicated that latitude and flux accounted for a very small percent of the variation in both concentration (R 2 = 0.006 and 0.002, respectively) and flux (R 2 = 0.036 and 0.055) among sites. Similarly, concentration and flux among stream orders suggested possible differences for concentration (Kruskal-Wallis χ 2 = 47.165, df = 8, P < 0.001) and flux (χ 2 = 14.777, df = 8, P = 0.070). However, results of corrected pairwise comparisons (using the method of Benjamini and Hochberg, 1995) among orders were ambiguous, suggesting no differences among orders for flux. For concentration, these comparisons indicated possible differences in distributions only between seventh-order channels and all other orders and be-tween sixth-vs. first-order sites for concentration. Collectively, these results indicate a lack of a consistent change in CH 4 magnitude across channel orders for flux.
As with relationships between CH 4 and physical site attributes, relationships between CH 4 concentration or flux and water chemistry parameters are also characterized by substantial variability. Representative examples indicate increasing, decreasing, and ambiguous relationships between Table 2. Summary statistics for methane (CH 4 ), carbon dioxide (CO 2 ), and nitrous oxide (N 2 O) concentrations and fluxes. The %BDL (below detection level) column reports the percent of all observations that are below detection limits (including values reported as zero) for concentration. See Table S2 for counts and Table S3 for statistical summaries for all other variables. Standard deviation (SD) and coefficient of variation (CV).  CH 4 concentrations and fluxes and selected chemical constituents (Fig. 13). One source of the variation in the relationship shown in Fig. 13 may be attributed to differences among sites, as is illustrated for the case of CH 4 concentration versus discharge. The cluster of points in this plot (Fig. 14a) does not suggest an obvious linear relationship between concentration and discharge; however, resolving the data to the site level for sites with multiple observations reveals several significant trends (Fig. 14b). Among 57 sites with > 30 obser-vations, 42 had significant relationships (P < 0.05) between concentration and discharge, and 30 of these 42 trends were negative. Median site concentrations for most categories of targeted channels (Fig. 15) differed from "normal" (NORM) sites (Kruskal-Wallis test χ 2 = 460.1, df = 12, P < 0.0001 after dropping channel types with < 10 observations to improve test reliability). Pairwise Wilcoxon comparisons adjusted to account for multiple comparisons (Benjamini and Hochberg, 1995) indicated that springs (SP) and delta channels (DC) were similar to NORM sites (P > 0.4), and impoundment-influence (IMP) sites were marginally different (P = 0.053). Concentrations in channels at glacial termini (GT) and FP backwaters were lower (P < 0.0001), whereas all other site types had higher site-average CH 4 concentrations than NORM sites. Fluxes also varied among channel type (Kruskal-Wallis test χ 2 = 126.4, df = 8, P < 0.0001 after dropping channel types with < 10 observations), and similar to concentration, fluxes in DC and channelized sites (CH) were similar to NORM channels, while all other channel types considered had higher median fluxes.

Discussion
The rapid increase in availability of aquatic CH 4 (as well as CO 2 and N 2 O) data over the past 5-10 years has been remarkable and creates new opportunities for examining patterns and drivers of these gases in lotic ecosystems across broad spatial scales. Similarly, constructing GRiMeDB provided us with an unprecedented opportunity to identify tendencies in when, where, and how CH 4 has been sampled in streams and rivers. Examination of such data collection tendencies can reveal important biases and gaps within a field Gomez-Gener et al., 2021b) and thus points to future research needs and opportunities. Below, we  discuss the distribution of sampling efforts, methodological issues, and preliminary data analyses and consider questions that GRiMeDB can help to answer.

When and where: sampling effort considerations
The growth of greenhouse gas (GHG) studies in flowing water systems in the past decade includes a geographic expansion beyond the large body of historic and current work in temperate regions of North America and Europe. In particu-lar, recent research in Africa, Australia, and especially Southeast Asia has greatly improved the global coverage of available data. However, studies in arid drainages remain scarce -even beyond what would be expected given their small river surface area. A possible explanation for the limited study of CH 4 in these systems may be the pervasive focus on the contribution of streams and rivers to the global atmospheric CH 4 pool and the corresponding assumption that aridland systems play a minor role in this context. However, we suggest that limited study in arid and semi-arid drainages Figure 10. Counts of methane (CH 4 ) flux observations by type (left) by major methodological categories for each pathway (middle) and for the method type used to estimate the gas exchange coefficient k (right). For clarity, the chamber category includes all chamber types and patterns of gas increase in the chamber unless specified; more resolved methodological data are presented in the GRiMeDB flux table. See Table A4 for further details about category definitions.
represents a missed opportunity to understand metabolism and carbon cycling in a set of streams and rivers that drain nearly half of the global land surface, is increasingly stressed by growing human water demands (e.g., Sabo et al., 2010;Lian et al., 2021;Stringer et al., 2021), and supports ecosystem process rates that are amplified by warm temperatures and highly variable flow regimes (Fisher et al., 1982;Ran et al., 2021). Beyond arid and semi-arid basins, further research emphasis in tropical and high-latitude regions would also be beneficial even given recent improvements in data availability and geographic representation of both areas. Existing data for tropical forests and grasslands are dominated by studies of African rivers (especially the Congo drainage) and the Amazon River system. In fact, observations from tropical areas of the Indomalayan and northern Australasian region represent < 3 % of all sites, and central America is represented by a single study. Tropical drainages are frequently characterized by high CH 4 concentrations and fluxes along with rapid changes in land use and river regulation that are affecting C cycling and GHG dynamics Flecker et al., 2022). However, understanding or detecting the magnitude and consequences of these anthropogenic changes in fluvial CH 4 is constrained by these current sampling limitations. Finally, while high-latitude regions (north of the Arctic Circle) are well represented in GriMeDB with > 3600 concentration observations, more than 80 % of these values are derived from studies in the vicinity of the Toolik Field Station in Alaska, USA, and thus do not capture the full biophysical diversity of Arctic biomes (Metcalfe et al., 2018). Given that climate change at high latitudes is progressing faster than elsewhere on the planet (IPCC, 2021) and that the global north stores massive quantities of C in soils (Hugelius et al., 2014), more extensive coverage of CH 4 across Arctic drainage systems is warranted.
Although the spatial coverage of CH 4 data has improved markedly over the past decade, expansion across temporal dimensions has lagged. The predominant mode of sample collection has been and continues to be through surveys that yield one or a few observations from individual sites (e.g., Bouillon et al., 2012;Kuhn et al., 2017;Jin et al., 2018;Ho et al., 2022), and studies characterizing seasonal dynamics or responses to a site-specific environmental change are limited. Indeed, long-term (> 5-year) CH 4 data sets in general are extremely rare (Leng et al., 2021); no such data are currently available for fluxes, and most long-term concentration records are derived from just a few clustered locations. Determining the consequences of changes in land use or habitat attributes for fluvial CH 4 dynamics has instead relied on space-for-time substitutions (e.g., Smith et al., 2017;Woda et al., 2020) rather than on direct observations of change over time. Although this strategy has been successful in revealing variation in GHG dynamics among different site types, current knowledge about how gases vary over time and respond to perturbations is poorly developed because of these data limitations. This deficit may be particularly consequential in the case of climate change, as the broad scope of this phenomenon will inevitably limit the effectiveness of spatial sampling approaches.
The discussion above regarding the "when" and "where" of sampling emphasizes large spatial and relatively long temporal scales, consistent with the extent of GRiMeDB. However, another current deficit in our understanding relates to  Table S2. Dashed vertical lines in the concentration histograms indicate the 100 % saturation concentration based on the median estimated elevation (250 m) and water temperature (12.5) for all sites and atmospheric concentrations of 1.83, 400, and 0.325 ppm for methane (CH 4 ), carbon dioxide (CO 2 ), and nitrous oxide (N 2 O), respectively. the degree of heterogeneity of this gas at fine spatial and temporal scales and thus whether current sampling strategies are missing meaningful variation. Recent studies of CO 2 provide a cautionary tale in this context, as failure to account for diurnal variation in this gas results in a consistent underestimation of fluvial emissions that is quantifiable at regional (Attermeyer et al., 2021) and global (Gómez-Gener et al., 2021b) scales. Similar questions may arise for spatial variation: that is, what is the minimum grain size or appropriate spatial scale for sampling CH 4 in running waters Lupon et al., 2019)? The potential to examine very shortterm variation is not possible using GRiMeDB data because of our decision to average within-day measurements given the current small number (ca. 20) of these temporally detailed studies. Assessment of fine-scale spatial variation is also limited because of limited fine-scale sampling in general and by decisions made both by investigators and during database construction. For example, geomorphologically distinct units (e.g., an individual riffle or pool) are often used as a basic sampling unit, and results are presented as averages of replicates collected at different points within the study reach (e.g., Hlaváčová et al., 2006;Smith et al., 2017). In general, information about replication was frequently omitted, or if reported, information about variability among replicates was frequently absent. In addition to this limitation, our decision to combine replicates taken at different points in a channel     Table 1 but briefly are as follows: NORM -non-targeted sites; CAN -canals; CH -channelized streams; DC -river delta channels; DD -downstream of dams; DIT -ditches; FP -floodplain backwaters; GT -glacial outflows; IMP -impounded reaches; PIpermafrost (thermokarst) influenced; PS -point source influenced; SP -springs; TH -thermogenic CH 4 inputs; WS -wetland streams. The number of sites per channel type are listed on the right-hand side of each plot. The vertical black line denotes the median concentration and flux for NORM sites. Because a log scale is used in these plots, zeros and negative values were excluded. The actual median for non-targeted sites represented by the vertical line is therefore slightly different than the median displayed in the corresponding boxplot because of this exclusion. The upper and lower edges of each box are the 25th and 75th percentiles, whiskers are drawn up to 1.5 times the interquartile range, and points are plotted if beyond the whiskers. cross section and within individual channel units that had hundreds to thousands of data points to avoid ambiguities for site delineation and data analysis also constrains the opportunity to examine variation at fine spatial scales. However, we anticipate that this situation will change over the next few years as in situ sensors or other devices capable of collecting high-frequency/high-density gas measurements become more widely available.

How: methodological considerations
Measuring dissolved GHG concentrations or fluxes involves multiple steps and calculations. Field and laboratory protocols vary widely in the literature, and methodological variety is particularly conspicuous for flux determination. Ironically, even though many studies of lotic CH 4 dynamics are framed in terms of understanding the contribution of these ecosystems to the rapidly increasing atmospheric CH 4 pool, flux measurements lag far behind those of concentrations, and the vast majority (ca. 85 %) of observations are of the diffusive pathway alone. Further, the most common method for estimating this pathway involves combining dissolved CH 4 concentration with k, the gas exchange coefficient. Quantifying k is notoriously challenging (Hall and Ulseth, 2020), and the large number of approaches for calculating k used among data providers is concerning and undoubtedly introduces substantial uncertainty. A more in-depth consideration of the consequences of different models or strategies for arriving at a k value was beyond the scope of this paper, but inclusion of methodological information should be useful for such an analysis in the future.
Ebullition measurements are notably scarce despite the potential of this pathway to account for a large fraction of total emissions in some streams (e.g., from 30 % to 98 % of total CH 4 emissions, as shown in Baulch et al., 2011;Crawford et al., 2014;Chen et al., 2021). The conventional approach to quantifying ebullition involves a combination of capturing bubbles just below the water surface to determine the area and time-specific rate of bubble volume reaching the surface and measuring CH 4 content of recently erupted bubbles. The episodic nature and extreme spatial heterogeneity of ebullition Robison et al., 2021) require multiple bubble trap replicates that need to be deployed over several days to generate reliable measurements. Given the logistic challenges and labor-intensive work involved, indirect approaches are becoming more common. These approaches typically use the difference between a chamber-based measurement of flux, which is assumed to represent total flux (diffusion + ebullition) and diffusion calculated from dissolved CH 4 and k (i.e., the "chamber -[concentration + k]" method in Fig. 10) to estimate ebullition (e.g., Campeau et al., 2014;. We suggest that this approach should be used cautiously, however. This strategy is arguably inappropriate for situations in which the gas content within a chamber increases in a linear fashion during the measurement period, consistent with the occurrence of diffusive flux alone. Second, it is not clear whether it is reasonable to assume that chamber-based measurements capture both diffusion and ebullition, even if a chamber-based flux value is greater than that calculated from a dissolved CH 4 concentration. Relatively short chamber deployments are likely to miss or incompletely capture bubble releases, while longterm deployments are vulnerable to sampling artifacts associated with altered concentration gradients within and/or turbulence around the chamber Lorke et al., 2015). Given these challenges, it is not altogether surprising that comparisons between direct and indirect measurements of ebullition can yield substantially different results (e.g., Yang et al., 2012;. The final and most profound knowledge gap in the collection of flux data is the absence of measurements of plantmediated emissions. Plant-mediated fluxes can account for a substantial fraction of total emissions from wetlands and shallow lake habitats , but the contribution of this pathway is unknown in fluvial systems. Indeed, we did not include plant-mediated fluxes in GRiMeDB because we encountered only two papers that had explicitly quantified this pathway in streams . Although aquatic macrophytes are sparse or absent from many streams and rivers, they can be abundant in low-gradient, low-disturbance environments (Riis and Biggs, 2003;Gurnell et al., 2010), where diffusive fluxes would be constrained by low gas exchange rates. Sediment trapping and venting by macrophytes enhance both methanogenesis and methane emission in these systems , but the significance of such processes and the contribution of plant-mediated fluxes at larger spatial scales remain to be determined for fluvial systems .

Concentration and flux patterns
Not surprisingly, the massive increase in data availability has led to differences in averages and measures of variability for CH 4 concentrations and fluxes compared to our previous efforts. Median values for all three CH 4 flux pathways in GRiMeDB are 1.2-2.2 times lower than those reported by Stanley et al. (2016) and those from Rosentreter et al. (2021). Conversely, measures of variability (SD, CV) in GRiMeDB are almost 3-fold greater than previous estimates, undoubt-edly due to the far larger number of observations, the associated expansion of geographic scope and channel types, and the inclusion of temporally resolved data. It is not yet clear whether the sample sizes are sufficient to capture the true global-scale variability of fluvial concentrations and fluxes, and future database updates could be used to examine this possibility.
Despite the slight lowering of median values compared to previous estimates, supersaturated concentrations and positive fluxes are the norm for CH 4 as well as for CO 2 and N 2 O. However, it is likely that CH 4 concentrations and flux BDLs are underreported, as is common with environmental data in general (Stow et al., 2018), so these latest estimates may still be slight overestimations of true population medians. Even given the modest number of zero or undetectable CH 4 concentrations in GriMeDB (< 2.5 %), decisions about handling BDLs can have a small but detectable effect on the estimation of global averages. For example, if these observations are excluded, median CH 4 concentrations for all other observations increase from 1.49 to 1.51 µmol L −1 . If we keep all of these observations and assign them a value of zero (an unlikely scenario but used here to provide a lower limit for this example), then the overall median declines to 1.46 µmol L −1 . Although these differences are relatively small, it would likely be consequential for upscaling estimates. At a minimum, we urge GRiMeDB users to be aware of how these values are handled and encourage future researchers to determine and report detection limits and include samples that fall below these limits in their results.
A goal of assembling GRiMeDB was to centralize CH 4 data to foster future research efforts. To this end, we included information about habitat conditions that allows the exploration of relationships between CH 4 and potential explanatory variables and covariates. To demonstrate this opportunity, we provided a limited number of examples of CH 4 versus variables that have been identified as potential predictors or drivers of CH 4 production, concentration, or flux , and these plots suggest both the presence and absence of relationships. For example, increasing CH 4 concentrations have been associated with low or decreasing dissolved oxygen and/or increasing organic carbon (e.g., Borges et al., 2018;Jin et al., 2018;, and these relationships are recognizable for concentration but ambiguous for flux across the entirety of the GRiMeDB data set. Similarly, increased CH 4 production and emissions tend to be elevated in nutrient-rich (eutrophic) lakes (DelSontro et al., 2018) and polluted rivers Ho et al., 2022), consistent with positive relationships between CH 4 flux and total nitrogen (TN) and total phosphorus (TP). However, nutrient enrichment in rivers often occurs concurrently with fine sediment and organic matter input; thus, it remains to be determined whether positive relationships in Fig. 13g and h are correlative or reflect a causal mechanism. Finally, increases in discharge have been linked to declines in gas concentration, likely due to source limitation (i.e., dilu-tion) of terrestrial supply Gómez-Gener et al., 2021a) and/or greater water turbulence, which increases gas exchange and thus reduces supersaturated CH 4 stocks Kokic et al., 2018). This relationship is not obvious when all data were considered en masse but became more apparent when examining within-site dynamics. In contrast to these three confirmatory examples, although latitude and channel size have also been identified as determinants of CH 4 concentrations or used to extrapolate site-specific gas measurements to larger (even global) scales (e.g., Bastviken et al., 2011;Rosentreter et al., 2021), evidence for such relationships is absent from our analysis. Further, even for the former examples that indicated relationships between CH 4 concentration and dissolved oxygen (DO), dissolved organic carbon (DOC), or discharge, there is substantial variability present in these relationships, the strength of these predictors is likely to vary across scales, and they explain little of the variability for diffusive fluxes. In short, substantial opportunities exist to identify multivariate relationships between different predictors and CH 4 concentrations and fluxes across different scales, and pursuit of these opportunities should be improved by the substantial increase in data for both gases and potential predictor variables.
The disproportionate contribution of streams and rivers to atmospheric inputs and the utility of CH 4 as an indicator of anthropogenic influences on drainage systems have inspired several studies that focus on fluvial habitats that are expected to have high concentrations and fluxes. Many of these "methane-hunting" studies have demonstrated significant increases in CH 4 concentrations and/or fluxes associated with phenomena such as point source inputs (Alshboul et al., 2016), ditch and canal construction , oil and gas extraction , or passage through wetlands (Taillardat et al., 2022). Such signals persist at the global scale (Fig. 15), highlighting widespread human enhancement of CH 4 emissions from lotic ecosystems. Not all targeted sites are CH 4 -rich, however. Low concentrations in GT likely reflect the effects of cold temperatures and/or low organic carbon availability (Crawford et al., 2015;Burns et al., 2018), while low values at FP sites may be attributable to their more characteristically lentic conditions, which favor higher rates of CH 4 oxidation in the water column. Indeed, oxidation has been shown to represent a significant CH 4 sink in floodplain lakes associated with the Amazon River (Barbosa et al., 2018), and most of the FP sites in GRiMeDB are part of the Amazon system.
As noted in Sect. 3.4, the availability of supporting information is inconsistent, as, for example, only ∼ 25 % of data sources provided data on channel order or basin size. However, open-access regional and global geospatial data sets that provide information about site characteristics (e.g., Linke et al., 2019;Yang et al., 2020) have increased rapidly in the past decade, to the benefit of analyses seeking to link landscape attributes to CH 4 distribution among sites. Recent upscaling effort analyses (Rosentreter et al., 2021;Liu et al., 2022; have, for example, capitalized on improved estimates of the surface area of world streams and rivers (Allen and Pavelsky, 2018;Yang et al., 2020), while the diverse data sets in HydroSHEDS (Linke et al., 2019) allowed us to evaluate the global representativeness of GRiMeDB sites. As new global-scale data sets become available and become more spatially resolved, we anticipate that their pairing with GRiMeDB data will result in significant improvements in the strength and certainty of data-assimilation models, regional-to global-scale analyses of CH 4 distribution and drivers, and quantification of fluvial emissions into the atmosphere.

Code and data availability
GRiMeDB and its associated metadata are available from the Environmental Data Initiative: https://doi.org/10.6073/pasta/ f48cdb77282598052349e969920356ef .

Conclusion
The data gathered in GRiMeDB highlight many new opportunities, both through analysis of CH 4 and supporting data in the database and by revealing gaps that currently exist across fluvial CH 4 studies. The most conspicuous data limitations include deficits in measurements of non-diffusive flux pathways and underrepresentation of sites from arid, tropical, and arctic biomes. Challenges associated with quantifying ebullition discussed earlier also emphasize the need for more intercomparisons among flux methodologies. Regardless of pathway, flux is a difficult process to quantify and can be highly sensitive to methods or gas exchange model choices, yet there are few comparisons (such as Raymond et al., 2012;Lorke et al., 2015) available to inform these decisions. Finally, we highlight that the expansion of GHG data over the past decade has proceeded largely across spatial rather than temporal dimensions. While this expansion has vastly improved the geographic representativeness of the data, longterm data sets are rare despite their power for generating ecological understanding and informing policy/management in the face of environmental change . GHGs are rarely included as routine components of water quality monitoring programs. Thus, we emphasize the compelling need to establish such sampling efforts and perpetuate those few that do exist.
Given the rapid growth in both research interest and data in fluvial GHG dynamics, we imagine future updates and expansion of GRiMeDB, and we welcome data sets and associated research products (e.g., theses, journal publications, reports). To facilitate the data acquisition and updating process, a downloadable spreadsheet template and detailed information about its use and submission are available at https://stanley.limnology.wisc.edu/GRiMe (last access: 12 June 2023). Regardless of database updates, we rec-ommend that the minimum basic information to collect along with GHG data that would be most valuable for later analyses include well-resolved site location data (latitude and longitude), information about site size (Strahler order and/or basin size at the sampling site), disturbance or modification relevant to GHGs (e.g., categories listed in Table 1), specific sample dates and times, discharge, dissolved oxygen, and temperature at the time of sample collection, and clear information about units and method(s) used to measure gas flux. Finally, we strongly encourage data package (sensu Gries et al., 2022) publication in a trustworthy public data repository such as the Environmental Data Initiative that requires metadata to meet findability, accessibility, interoperability, and reusability (FAIR) data principles and increase data findability, accessibility, and reuse (Wilkinson et al., 2016).
Despite highlighting areas of data limitation, it is important to underscore the opportunities that the growth in GHG data availability -especially of CH 4 data -now provides. Assembly of GRiMeDB was motivated by the goal of having a centralized, standardized resource to facilitate further studies of CH 4 pattern and process in flowing water systems. Our strategy in developing this database was to maximize opportunities for identifying patterns and relationships involving this gas in future analyses. Past difficulties with such efforts may well be a product of the common practice of averaging values over time or among sites and/or of including non-fluvial sites in analyses. Thus, we carefully documented the data and resolved observations to individual sites and dates whenever possible to match the pronounced spatial and temporal variance of this gas. Similarly, while we included a range of habitat types in GRiMeDB, unconventional or targeted sites are easily identifiable. Further, we carefully examined sites to ensure that they were not situated within reservoirs/impoundments or estuaries where distinct processes such as methane oxidation, tidal cycles, or elevated sulfate reduction may obscure or overtake relationships present in inland-flowing water systems. Thus, we are optimistic that analysis of GriMeDB data by themselves or in concert with other complementary data sets will provide new and unprecedented opportunities to examine relationships between CH 4 and environmental drivers or correlates and provide broad contextual information for site-based studies of fluvial carbon and GHG dynamics.  Identity of the outlet for the data (e.g., journal, data repository, or agency that presented the data) For titles with published papers paired with published data sets, the journal is listed in this column.

Pub_year
Year of publication, data release, or acquisition of an unpublished data set Source_ID Unique data source identifier Additional_data "Yes" in this column indicates that additional data were acquired directly from the author for any field. Additions are described in the "Comments" field.

Comments
Additional information or clarification about the data source Paper_DOI DOI or hyperlink for journal article or other publication based on the CH 4 data Data_DOI_primary DOI or hyperlink for CH 4 data posted in a data repository Data_DOI_supporting DOI or hyperlink for separate data sets providing supporting data

Comments
Additional information or clarification about the site source WaterTemp_degC_estimated Estimated water temperature ( • C). This field was populated only for cases in which temperature was needed for gas unit conversion. Most estimates were based on temperatures from adjacent sites, averaging temperatures from prior and proceeding sample dates, or an adjacent day of the year but from another year.

Cond_uScm
Specific conductance (µS cm −1 )  Methodological category used to measure diffusive gas flux. Categories (with brief explanations in italics) are the following.
-chamber (unspecified) -unspecified response use of an unspecified type of chamber (suspended, tethered, or free-floating) and pattern of change gas concentration over time during flux measurements also not specified -chamber (unspecified) -linear response unspecified type of chamber with a linear increase in chamber gas concentration over time or use of a linear model to calculate flux -suspended/tethered chamber -unspecified response chamber restrained to maintain its position and not float downstream during flux measurement -suspended/tethered chamber -linear response -floating chamber -unspecified response chamber unrestrained and able to float downstream during flux measurement -floating chamber -linear response -conc+k diffusive flux calculated using the equation flux = k(C w − C eq ), where k = gas exchange coefficient, C w = CH 4 concentration measured in water, and C eq = CH 4 concentration in water in equilibrium with the atmosphere -other methods other than those described above  Methodological category used to measure total CH 4 flux. Categories (with brief explanations in italics) are the following.
-conc+k and ebullition total flux calculated as the sum of separate measurements of diffusion determined by the conc + k method plus ebullition determined from the bubble trap or echosounder approach combined with bubble CH 4 analysis -floating chamber free-floating chamber assumed to capture diffusive flux and ebullitive flux (if present) -suspended/tethered chamber suspended or tethered chamber assumed to capture diffusive flux and ebullitive flux (if present) -chamber and ebullition total flux calculated as the sum of separate measurements of diffusion determined using a floating or suspended/tethered chamber plus ebullition determined from the bubble trap or echosounder approach combined with bubble CH 4 analysis -mass balance total flux represents the difference between all measured inputs to a reach (e.g., dissolved CH 4 from upstream flow, groundwater discharge, and methanogenesis) minus all outputs other than efflux to the atmosphere (e.g., downstream export, methane oxidation) -other methods other than those described above k_Method Methodological category used for estimating the gas exchange coefficient, k; categories (with brief explanations in italics) are the following.
Author contributions. EHS conceived of the project idea and led data entry, manuscript preparation, and data curation. LCL developed the code used for unit conversions, was responsible for data conversion and QA/QC, and contributed to data visualization, data analysis, and code curation. GRR was responsible for spatial analyses and contributed to data visualization, code curation, and manuscript preparation. The structure and composition of the paper resulted from collaborative discussions among EHS, GRR, LCL, NJC, SKO, and RAS. All the authors contributed to data acquisition, data entry, data checking, and substantial manuscript revision and editing. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.