Decomposability of soil organic matter over time: the Soil Incubation Database (SIDb, version 1.0) and guidance for incubation procedures

The magnitude of carbon (C) loss to the atmosphere via microbial decomposition is a function of the amount of C stored in soils, the quality of the organic matter, and physical, chemical, and biological factors that comprise the environment for decomposition. The decomposability of C is commonly assessed by laboratory soil incubation studies that measure greenhouse gases mineralized from soils under controlled conditions. Here, we introduce the Soil Incubation Database (SIDb) version 1.0, a compilation of time series data from incubations, structured into a new, publicly available, open-access database of C flux (carbon dioxide, CO2, or methane, CH4). In addition, the SIDb project also provides a platform for the development of tools for reading and analysis of incubation data as well as documentation for future use and development. In addition to introducing SIDb, we provide reporting guidance for database entry and the required variables that incubation studies need at minimum to be included in SIDb. A key application of this synthesis effort is to better characterize soil C processes in Earth system models, which will in turn reduce our uncertainty in predicting the response of soil C decomposition to a changing climate. We demonstrate a framework to fit curves to a number of incubation studies from diverse ecosystems, depths, and organic matter content using a built-in model development module that integrates SIDb with the existing SoilR package to estimate soil C pools from time series data. The database will help bridge the gap between point location measurements, which are commonly used in incubation studies, and global remotesensed data or data products derived from models aimed at assessing global-scale rates of decomposition and C turnover. The SIDb version 1.0 is archived and publicly available at https://doi.org/10.5281/zenodo.3871263 (Sierra et al., 2020), and the database is managed under a version-controlled system and centrally stored in GitHub (https://github.com/SoilBGC-Datashare/sidb, last access: 26 June 2020). Published by Copernicus Publications. 1512 C. Schädel et al.: SIDb version 1.0


Introduction
Temperature, soil moisture, soil type, plant-microbe interactions, microbial community compositions, physical protection of organic matter (e.g., sorption on minerals and aggregation), and physical disconnection of microbes/enzymes and their substrates all control microbial decomposition processes and fluxes of greenhouse gases to the atmosphere (Conant et al., 2011;Schmidt et al., 2011). The relative importance of all these factors in controlling decomposition processes is poorly quantified but is important to understand as warming temperatures shift rates of microbial processes, potentially increasing releases of soil-stored carbon (C) to the atmosphere (Davidson and Janssens, 2006).
Research synthesis (e.g., meta-analysis) has become an increasingly important tool in science to overcome sitespecific results, identify universal patterns across ecosystems and at global scales, and assess what is known and what needs further research (Gurevitch et al., 2018;Gurevitch and Hedges, 1999;Hillebrand and Gurevitch, 2013;Osenberg et al., 1999). Numerous reviews, syntheses, and meta-analyses have been performed using laboratory incubation studies (e.g., Conant et al., 2011;Hamdi et al., 2013;Schädel et al., 2014Schädel et al., , 2016Treat et al., 2015) to answer questions about the relative decomposability or stability of soil organic matter, the temperature response of soil respiration, and the ratio of CO 2 : CH 4 production in anaerobic incubations. New experiments are continuously contributing to the growing body of soil incubation literature. While individual soil incubation studies are performed to answer specific research questions that may not require measuring a large variety of variables, the more details that are provided and the more comprehensive the metadata are, the greater the utility of an individual study beyond its original use (Hillebrand and Gurevitch, 2013). Metadata help to characterize these datasets, enable identification of data through relevant criteria, and provide the information needed for data archiving (Hillebrand and Gurevitch, 2013;Jiang et al., 2015), making individual incubation studies as useful as possible.
Here, we report on the development and compilation of a subset of available incubation data into a new, publicly available Soil Incubation Database (SIDb). In addition to introducing SIDb, we provide clear reporting guidance for database entry and the required variables that incubation studies need at minimum to be included in SIDb. Further, we provide guidance and associated recommendations to help inform best practices for conducting consistent, comparable soil incubation studies while retaining the adaptability required for individual research groups and projects.
A key application of this synthesis effort is to better characterize soil C processes in Earth system models, which will in turn reduce our uncertainty in predicting the response of soil C decomposition to a changing climate. Soil C decom-position is traditionally represented by a simple first-order decay function (Jenkinson, 1990) in C cycle models assuming one or more conceptual C pools (Davidson and Janssens, 2006;Parton et al., 1987;Trumbore, 1997) with fast and slower rates of C turnover. The models are described by several parameters such as the decay rate of each pool, as well as the transfer rates among pools. These parameters can be utilized to predict the evolution of CO 2 one would observe in an incubation over time. Incubation time series data could therefore be used to constrain the parameters of these models by solving the corresponding inverse problem.
We demonstrate a framework to fit such curves to a number of incubation studies from diverse ecosystems, depths, and organic matter content using a built-in model development module that integrates SIDb with the existing SoilR package (Sierra et al., 2012) to estimate soil C pools from time series data. This allows users to test different model structures against their data, representing a benefit of contributing data to SIDb. We hope the database will help bridge the gap between localized measurements, which are commonly used in incubation studies, and global remote-sensed data or data products derived from models aimed at assessing global-scale rates of decomposition and C turnover (Carvalhais et al., 2014;Koven et al., 2017). This work also complements other compilations of soil-C-related datasets such as the International Soil Carbon Network (https://iscn.fluxdata. org/, last access: 26 June 2020); the open-source Continuous Soil Respiration database, COSORE (https://github. com/bpbond/cosore, last access: 26 June 2020); the Global Database of Soil Respiration Data, Version 4.0 (Bond-Lamberty and Thomson, 2018); and the International Soil Radiocarbon Database (ISRaD, http://soilradiocarbon.org/, last access: 26 June 2020; Lawrence et al., 2020).

Laboratory incubations as a tool to assess soil C decomposability
Laboratory soil incubation studies are a commonly used method to estimate the decomposability of soil organic matter by measuring greenhouse gas release as C is mineralized from soils under controlled conditions. Results from incubation studies can inform global models about C pool sizes and rates of soil organic matter processing (mostly derived from long-term incubations) and sensitivities of process rates with respect to changes in abiotic factors such as soil temperature, moisture, pH, etc. Incubation durations may vary from less than 1 d to up to many years. Short-term incubations (a few days to a few months) provide information on how much C is readily decomposable and may be closer to the initial conditions experienced within the soil profile. Long-term incubations (months to years) may diverge from the conditions found within the profile but can give insights into the potential decomposability of slower cycling C (e.g., Schädel et Fast-cycling C dominates total CO 2 −C flux at the beginning of the incubation and is later replaced by slower-cycling C pools. al., 2014). At the beginning of laboratory incubations, respiration of fast-cycling C dominates total C respired, but it declines rapidly, whereas slow-cycling C accounts for most of the C being respired after the fast C pool is mostly depleted ( Fig. 1). In this respect, laboratory incubations serve as a method to biologically partition soil C into different kinetic pools using the microbes themselves as the main partitioning agent. The time series produced is often well approximated by a sum of exponential functions, which are the solution of systems of first-order linear differential equations with constant coefficients (Metzler and Sierra, 2018). Fitting data from incubations to these types of functions has been done for individual site-level studies (e.g., Schädel et al., 2013Schädel et al., , 2014Sierra et al., 2017). Like all methods, incubations have their advantages and disadvantages. Many laboratory methods exist for splitting soil C into pools of various purported stabilities (e.g., density fractionation, Sollins et al., 2006;sequential extraction, Heckman et al., 2018; and thermal analysis, Barré et al., 2016), but incubations are the only biological assay for testing soil C stability, an ultimately biological process. Carbon stability is a measure of how resistant and inaccessible organic molecules are to microbial decay.
Another distinct advantage of incubations is the high level of control they allow, compared to field methods. For example, incubations that test the temperature sensitivity of C flux (e.g., Conant et al., 2008) offer a greater level of control compared to field measurements in several ways. First, in situ soil respiration is a mixture of both heterotrophic microbial respiration and autotrophic root respiration; soil incubations isolate the heterotrophic flux. Second, in situ temperatures change daily and seasonally, thereby confounding any direct effects of temperature with the phenology of C inputs such as root exudates and litter fall. At many locations, such as those under Mediterranean climate regimes, temperature is highly correlated with soil moisture so that the effects of one are impossible to disen-tangle from the other (Sierra et al., 2015;Subke and Bahn, 2010). With incubations, temperature and moisture effects can be tested both in isolation and with interactions. Incubations are a tractable and accessible method that can be run with minimal equipment (scale, gas-tight jars that seal, and a CO 2 analyzer). Much of the utility of incubations lies in their simplicity. Lastly, as described above, the time series data collected by most incubations can be connected to soil C models (Sierra et al., 2012(Sierra et al., , 2014. The main shortcoming of incubations is their isolation from the soil ecosystem. Incubations lack new inputs, which could otherwise prime the decomposition of the existing soil C pool (Huo et al., 2017). However, the lack of inputs simplifies the system and allows a focus on decay processes. Substrates can be added to incubations to measure the decomposability of specific compounds or materials (particularly if they are isotopically labeled), or to measure the priming effect under experimentally controlled conditions, a common extension of incubation methods (e.g., Finley et al., 2018;Pegoraro et al., 2019). Additionally, the microbial community in incubations may not reflect in situ communities. For example, constant environmental conditions in incubations may reduce the available niches and potentially result in a decline of microbial diversity -an effect that has yet to be tested. The lack of inputs can also induce changes in the microbial community as more oligotrophic microbes are favored over time. Lastly, soils used in incubations are always disturbed to varying degrees during removal from the field and often further in the laboratory: during sieving or root-picking procedures, or through rewetting prior to the start of the incubation. For example, at the time of publication, half of the studies in our database reported sieving prior to incubation, while a third do not report pre-incubation procedures. This disturbance may increase the susceptibility of occluded soil C to decay via disruption of aggregates, potentially overestimating the amount of C released during incubations relative to field conditions (Salomé et al., 2010). In general, the experimental control of incubations allows for most of these criticisms to be explicitly tested and accounted for as needed, and overall, the advantages of incubations far outweigh their drawbacks when the goal is understanding C pool structure, C stability, and C sensitivity to drivers such as temperature and moisture.

The Soil Incubation Database (SIDb)
The Soil Incubation Database (SIDb) version 1.0 is an opensource software project that provides open access to data and is a platform for the development of tools for reading and analysis of data as well as documentation for future use and development. The data are freely available at https://doi.org/10.5281/zenodo.3871263 , and the database is managed under a version-controlled system and centrally stored in GitHub (https://github.com/ SoilBGC-Datashare/sidb, last access: 26 June 2020).

The repository
The structure of the SIDb project contains three main folders: data, docs, and Rpkg, which provide access to the database, the website (https://soilbgc-datashare.github.io/sidb/, last access: 26 June 2020), and the R package. The tree structure of the essential repository components is as follows

The database
The open-source approach to SIDb allows data access, manipulation, analysis, and contribution to be accomplished without proprietary software. The soil incubation data are stored in the data folder. Each entry in the database consists of a folder containing three files and has the name convention AuthornameYEAR (optionally with journal name abbreviation appended) and the suffix "a" or "b" if multiple entries for one author and year exist. (1) The metadata.yaml file contains the following required sections: citation and curator information, basic site information (siteInfo), experimental setup of incubation (incubationInfo), and the metadata for the variable in the time series data (variables). The structure of the metadata file allows for flexible inclusion of many types of experimental and incubation designs. (2) The init-Conditions.csv file includes site, treatment, and initial soil characteristics (C content, texture conditions, etc.; Table 1).
(3) The timeSeries.csv file contains measurements made over the course of the incubation. Column headers in the time-Series.csv file are required to match the values entered for variable names in the variables section of the metadata.yaml file (e.g., V1:name, V2:name). The Readme.md file in the data folder provides a detailed explanation of how to add entries to the data folder. Note that for entries to be ingested in SIDb they must pass certain QA-QC tests (described in detail in Sect. 3.2.4 in the R package).

The metadata file
The metadata file is a simple text file that includes all relevant information about the incubation study. The .yaml format is both human and machine readable. YAML (YAML Ain't Markup Language) files are text files that utilize indent hierarchy to store information in iterable and query-able format. Thus, data stored under main headings may contain subcategories and arrays of information. In an array, each line is started with a hyphen, followed by a space, then the data. A heading of any level must end with a colon, followed by a new line return. The metadata.yaml file contains four sections. The first section consists of bibliographical data about the database entry, including DOI and contact information (Fig. 2). The second section, siteInfo, includes geographic data, land cover, vegetation, and soil data (Fig. 3). The third section, incubationInfo, provides data on laboratory experimental setup and sample treatment (Fig. 4). The fourth section, variables, contains metadata for the individual columns of the timeseries.csv file (Fig. 5).
One advantage of the .yaml format is the ease with which specific types of data can be grouped in a hierarchical array. For example, in Fig. 3 site is a subfield of siteInfo, and latitude is a subfield of coordinates. More subfields can be added to the siteInfo subfield as necessary; however, adding a secondary subfield beneath existing subfields should be avoided in SIDb as consistent data structure is required for data ag-  gregation. For example, in the siteInfo section, the variables coordinates, country, MAT, MAP, landCover, vegNotes, and soilTaxonomy all need to be equal to the length of the site array Fig. 3.
In Fig. 4, the incubationInfo field has a subfield with a description on how the incubations were carried out. This is important information for documenting the experimental conditions under which the incubations were conducted. However, specific treatments and experimental conditions (temperature, moisture, etc.) should be explicitly entered under the appropriate corresponding subfields (Fig. 4).
The last fields that must be filled in are in the variables section (Fig. 5). This section consists of, in sequential order, subsections containing the metadata that correspond to the respiration time series observations (columns) of the timeSeries.csv file. The number of variables (V1-Vn) must therefore correspond to the number of columns in the time-Series.csv file. The first column in the timeSeries file must be a vector of time (in days or other consistent unit), and thus the first variable name (V1:name) in the variables section must also be "time". Experimental and incubation treatments listed in the incubationInfo section must be specified under each variable (V2, V3, etc.). Note that if a treatment has only one level it will be reported in the incubationInfo section and does not need to be repeated in the variables section. For example, if all incubations were conducted at the same temperature, the incubation temperature would be reported under the temperature subheading in the incubation-Info section, and the information will be automatically propagated to all of the variables (example of Crow2019a in the database). However, if a treatment has multiple levels, e.g., an incubation study utilizing three temperatures, the temperature subheading under incubationInfo would be left blank, and the temperature level would need be specified for each variable in the variables section in a subheading called "temperature" (example of Bracho2018SBB in the database).

Data entries
The timeSeries.csv file for each entry in the database contains the time series of incubation data in comma-separated format. The first column of the data file must contain the times at which gas measurements were taken. Subsequent columns must contain the respiration measurements. The format of the  data is irrelevant (e.g., units) as long as the relevant information to identify each respiration column is described in the variables field of the metadata file.

The website
Documentation of the project, which includes the database and the R package, is presented on the project's website (https://soilbgc-datashare.github.io/sidb/, last access: 26 June 2020). The website is publicly served by GitHub Pages. Every time new changes are pushed to the SIDb repository, the website is rebuilt and served automatically by GitHub.

The R package
Data in SIDb are stored in a format that can be read in any programming language. We provide an R package to allow users to compile or read the database into R and a platform to facilitate future analyses. To install the package, open R and run install.packages("devtools") devtools::install_github('SoilBGC-Datashare/sidb/Rpkg/', build_vignettes=TRUE).
Once the R package 'sidb' is installed and loaded, a browser-based html version of the available vignettes can be accessed using -browseVignettes('sidb').
There are currently two vignettes available: "sidb-QueryReportPlot" and "Fitting data to models". The first vignette describes a simple workflow for querying, generating reports, and plotting data with SIDb. The second vignette demonstrates the model fitting functions built into the R package "sidb".
In the sidb R package two main functions are provided: loadEntries.R and readEntry.R. As their names suggest, loadEntries.R collects all metadata and data from all entries and produces an "R list" with the entire database. The function readEntry.R reads individual entries from the database and also produces an "R list". The package also provides a function that "flattens" and coerces the database list object into a simpler data structure for easier querying (flatter-SIDb.R), as well as stand-alone functions to query the entire database in its native list format for specific variables. For instance, the function coordinates, R extracts all latitudes and longitudes for each entry in the database. Similarly, other functions are provided to extract C and nitrogen (N) content, or the incubation duration of each entry.
Quality control is provided for code testing and data validation. A brief overview is given here and more details can be found in the Readme.md file located in the directory "sidb/tests" within the SIDb GitHub repository. Code testing can be done both locally and remotely. For local testing we have written a shell script that runs an R CMD check on the package directory (github: sidb/tests/pkg_test.sh). For remote testing, we use Travis Continuous Integration to run an R CMD check on the Rpkg directory of the SIDb GitHub repository. This ensures that any modifications to the functions or other aspects of the sidb R package are tested every time a new commit is made in the repository and that we will be notified of any errors, warnings, or issues.
For data validation, raw SIDb data (entry files that live outside the R package in the "data" directory) can be tested for conformity to SIDb standards using the file "data_test.R" (github: sidb/tests/data_test.R). This R script runs all tests in the subdirectory "testthat". Tests can be run from the command line or directly inside R using the R package devtools. Contributors of new data or code must run these tests before contributing to SIDb and no pull requests will be accepted if any of the tests fail.

Summary statistics in SIDb version 1.0
The database is a work in progress: currently SIDb includes 31 studies with 684 time series, representing a total number of 42 545 data points (Fig. 6). Most entries contain multiple time series of CO 2 fluxes. Incubations reported in SIDb were performed under temperatures ranging from 0 to 40 • C with the majority of incubations under normal laboratory temperature (20-25 • C) (Fig. 6a). Soil temperature is the most frequently reported laboratory treatment, while soil moisture is less frequently reported despite the fact that it is also a key factor in incubation studies. The omission of soil moisture data may be related to inconsistencies in reporting conventions, a topic that is discussed further in Sect. 4.3. All soils listed in our database included surface soil samples; however some studies considered soil depth as a treatment and report incubation data from soil layers as deep as 1.2 m (Fig. 6a).
Important geographic and ecological gaps exist in SIDb version 1.0. Coverage is highest in temperate followed by arctic regions, with only a few studies in tropical areas while the continents of Africa and Australia are barely represented (Fig. 6b). Incubation data from the tropics are currently poorly represented in SIDb despite their vulnerability and the importance of tropical regions to global C cycling and therefore should be a priority for both future ingestion into SIDb and further study. For most ecosystems, there are still many incubation studies to be included into SIDb in the future. Additionally, recent work (Fontaine et al., 2007;Hicks Pries et al., 2018;Mathieu et al., 2015) has highlighted the importance of understanding deep soil processes and potential changes due to global warming. In fact, warming effects on respiration have been observed at depths as great as 1 m (Hicks Pries et al., 2017). Incubations of deep soils thus represent a major gap in SIDb, which is reflective of the lack of deep soil incubation studies more broadly, and present a large potential for future study. It was not our intention with SIDb to produce a comprehensive database. Instead, we want to introduce SIDb's structure, tools, and the current capacity of the database to the broader scientific community, with the potential to expand.

Required and suggested data reporting for inclusion into SIDb
While consistent methods across studies facilitate metaanalysis, incubation studies must remain adaptable to each research question, available resources, and soil properties. Nonetheless, in developing SIDb and the entry template, the most critical required components of incubations for making comparisons across studies emerged. On the basis of these observations, we have generated a list of variables, including information about the sites, soils, and setup of the incubation itself, that we require in order for a study to be ingested in SIDb (Table 1). Here, we discuss the issues associated with these critical variables and make suggestions for other useful variables to report that, while not required, will increase the interpretability of results and allow for broader inclusion into syntheses and meta-analyses (Table 1). In the Supplement, we also offer a limited discussion of methodologies and measurements such as incubation setup, sample preparation, additional variables to measure, and special considerations for radiocarbon incubations.

Site information
Site characteristics provide a context for the inherent conditions of the soils. General site characteristics, such as latitude and longitude, mean annual temperature, and mean annual precipitation are important in drawing out the similarities or differences between studies. Descriptions of the ecosystem and the aboveground vegetation give information on litter input and chemistry, which can be a direct link to organic matter quality. Additionally, providing information on the soil order and taxonomy helps to put findings into context with other studies (Schimel and Chadwick, 2013).

Soil characteristics
There are ultimately two essential soil variables that must be reported for incubation studies and a myriad of suggested variables that facilitate comparisons among and explorations of potential drivers. The first essential soil variable is depth, which is a major organizing factor of many soil characteristics. No matter whether an individual incubation study measured soil from a single depth increment or multiple depth increments, either the depth increment (top, bottom, and middle) or the horizon must be reported. Ideally, both depth and horizon should be reported as samples can be taken from a generic depth or from a mixture of horizons (when sampled to a certain depth). All subsequent soil characteristics should then be reported for each depth increment or horizon incubated and provided in the initConditions.csv file. When reporting the sampling depth, it is necessary to report whether depth is in relation to the soil surface, which can be defined as the top of the mineral soil or the top of the organic horizon depending on the system, or within a specific soil horizon. Additionally, specifics of the geography and topography of the sampling locations, such as permafrost zone, active layer thickness, or water table depth in permafrost and peatlands, are crucial to report.
The second required soil variable is either the initial C (reported in milligrams of carbon per gram dry weight or percent) or organic matter (which can be converted to C), which is essential for facilitating comparisons across studies and for normalizing rates of C losses during incubations. Other common and useful variables to measure are initial N (reported in milligrams of carbon or nitrogen per gram dry weight or percent), bulk density in grams per cubic centimeter, soil texture, and pH.
Most soil characteristics, as listed in Table 1, can be measured at the beginning of an incubation on a subsample of the soil being incubated, while others like pH, redox, or microbial biomass may be best measured multiple times during the course of an incubation (see Supplement for more details). For anaerobic incubations, we strongly recommend measuring redox potential because it may not be sufficient to assume that anoxic conditions (e.g., soils inundated with water and headspace filled with N 2 or He) will result in the production of CH 4 during the incubation as there can be a considerable lag period before CH 4 production occurs (Knoblauch et al., 2018;Treat et al., 2015).

Incubation information
Details of incubation studies should be reported as they enhance the value of a primary study, but also, critically, they determine whether or not they can be included in a synthesis or meta-analysis. Thus, most of the information about how an incubation and its treatments are carried out is required in SIDb. Incubation duration, temperature, and soil moisture are among the most important details to provide because they directly affect microbial activity and therefore C flux rates (Table 1). For temperature and soil moisture, it needs to be clarified whether temperature and moisture were controlled at a single value or whether there were multiple temperature or moisture treatment levels. For temperature, details on how incubation temperature was achieved should be provided (e.g., water bath, freezer, or controlled environment chamber). For moisture, it should be specified whether the soils were all brought to the same moisture content or left at field conditions. For below-freezing incubation temperatures, unfrozen soil water can also be quantified, if possible, as temperature responses of CO 2 production at subzero temperatures are influenced by water availability (Öquist et al., 2009). Moisture treatments range from fully aerobic (either drier than or at field capacity) to fully anoxic (headspace of jar flushed with N 2 or helium) to fluctuating moisture conditions. In aerobic incubations, soils are often freely drained and deionized water is added over the course of the incubation to maintain constant moisture content. However, caution should be paid in order to maintain constant moisture through the incubation and not allow soils to dry out as drying and rewetting of soils can affect C mineralization rates and microbial activity (Birch, 1958;Rey et al., 2005;Unger et al., 2010). In addition, adjustments to soil moisture are ideally made at least 24-48 h prior to making measurements to minimize confounding effects of water addition (Rey et al., 2005). For anaerobic incubations it may not be necessary to add water during the course of the incubation as incubation vessels typically remain closed, but caution should be taken if water is added as it often contains dissolved oxygen. Other critical parameters to report about the incubation from the synthesis perspective include whether replicates are field (i.e., spatially different soil cores) or analytical replicates, whether soil samples were homogenized (e.g., by soil sieving), or whether roots were removed prior to incubation (see Supplement for more information). Lastly, the duration of a pre-incubation should be reported if carried out.

Flux measurements
Incubation data are most commonly published as C flux rates or cumulative C release over time for the whole incubation period. SIDb is designed around incubation studies that report respiration rates and cumulative release over time (time-Series.csv), and time series data are required for inclusion in SIDb. Reporting only one average flux value, one maximum production value, or one single cumulative C release value for the whole incubation period may be useful for comparison of treatments within a study but omits key information about changes in C dynamics over time and precludes our ability to model dynamics of different C pools.
If changes in C dynamics over time are not of interest for a specific study, time series data should be provided in supplementary material or in a data repository such as SIDb.
Flux rates can be provided on a per gram of dry soil or per gram of soil C basis (mg CO 2 −C g dry weight −1 d −1 or mg CO 2 −C g −1 soil C d −1 ). These units can be easily converted to one another using the required initial C data (Table 1). Providing flux rates on a wet-weight soil basis or per volume of soil slurry is discouraged, as SIDb does not support this format and it precludes comparisons to other studies. If units of dry weight are not available, then soil moisture content and bulk density need to be reported so that data can be converted to standard units. Reporting C release on a per gram of C basis captures information about C decomposability and reveals information about the relative C release from a given soil that is independent of its C quantity; this is particularly useful for comparisons among soils, sites, and incubation studies (Schädel et al., 2014).

Case study: fitting time series data to pool models in SIDb version 1.0
Our incubation database can be easily integrated with other R packages for further analyses. For instance, it is possible to integrate soil C pool modeling from the SoilR package (Sierra et al., 2012) with parameter optimization from the FME package (Soetaert and Petzoldt, 2010). We illustrate this functionality with a simple example. The entry Crow2019a in the database contains a large number of longterm incubations (371 d). From those incubations, we selected data from a native forest in Hawaii and fitted a set of first-order models with two or three pools. Following the procedure described in Sierra et al. (2015), we optimized two-and three-pool models with parallel, series, and feedback connections among them (Fig. 7). Depending on the question asked, different criteria can be considered to select the best model (e.g., Akaike information criterion or number of parameters, Table 2), and it is beyond this paper to identify the best model; we simply show the basics of an example using SIDb.

SIDb connections to other databases
There are two approaches to database building, which can be characterized by tradeoffs between the scope and quantity of data, the ease of data analysis, and the simplicity of data entry. SIDb has a narrow scope (i.e., incubation time series), allowing for the flexibility to incorporate studies with different variable types and experimental designs, while the data themselves are highly structured in order to facilitate data analysis. Other soil databases, such as the International Radiocarbon Database (ISRaD, Lawrence et al., 2020) or the International Soil Carbon Network (ISCN, https://iscn.fluxdata.org/, last access: 26 June 2020), have the advantage of a much larger quantity of data and a much broader scope. However, Figure 7. Results from a parameter optimization procedure to soil incubation data from a native tropical forest of Hawaii. The parallel model structures do not consider transfers of C among pools, while the series model structures transfer C sequentially from fast-to slow-cycling pools. In all cases, the models fitted the data relatively well (Table 2) and identified the relative contribution of the different pools to the overall respiration flux. maintenance and data ingestion with these larger databases become much more challenging and require either (a) relaxing control of data structure, units of variables, and direct data oversight, such as the case with the International Soil Carbon Network, or (b) in the case of the International Radiocarbon Database, increasing the complexity of the data structure while enforcing strict variable control, e.g., allowable names, factor levels for categorical data, and numerical limits for quantitative data. Owing to the broader scope, maintaining these larger databases inevitably requires additional time and effort. However a database is structured, establishing a common set of required measurements, metadata, and site-level data provides transparency that helps both to identify and to re-duce systematic bias. The statistical power provided by the wealth of data points in a database such as SIDb is only useful as long as any potential systematic bias is identified. For example, all studies in SIDb report data at the variable level with respect to a time variable, as well as provide information about the experimental design, where the samples were collected from, who performed the study, and how to access the original data. Additionally, providing data such as geographic coordinates, land cover, mean annual temperature, mean annual precipitation, soil taxonomy, and soil C content enables leveraging of databases that may have a different scope but contain potentially useful supporting data. For example, respiration time series data from SIDb could be compared to 14 C content of bulk soil or respired 14 CO 2 from ISRaD (Lawrence et al., 2020) by stratifying both databases along common variables, or a query could be made using geographic coordinates, DOI, or other variables.
The database is open for reuse, and the usage license follows the MIT license (https://opensource.org/licenses/MIT, last access: 26 June 2020). When using the database or R package, users should cite this definition publication and consider citing individual studies (publication or dataset).

Conclusion
Currently, SIDb is a compilation of a wide range of incubation studies with built in capacities to summarize the database and conduct model comparisons for fitting curves to time series data. There is great potential benefit for the soil C community through identification and ingestion of new datasets into SIDb. Every incubation study is planned and performed to answer a specific question; however, when analyzed in aggregate, syntheses of incubation studies can help answer fundamental questions about soil C pools and their stability and vulnerability to global change. Furthermore, setting up incubation studies involves several decision points, such as whether to sieve or preincubate the soil, whose consequences have not yet been tested systematically, and which may be able to be tested using SIDb.
A comprehensive collection of existing laboratory incubation data will be invaluable for the synthesis of spatial, methodological, and functional trends, as well as for identifying key gaps in our current knowledge. Individual researchers are encouraged to add individual study results to the database, thereby helping fill gaps in our broader understanding of soil C cycling in the process. A key goal for the next stages of development in SIDb will focus on expanding the geographical and ecological coverage of the entries.
SIDb is specifically designed to host incubation data with time series of respiration rates to facilitate synthesis studies. We encourage researchers to archive their data in the format presented here, but we caution that this database is not a long-term archive. SIDb not only collects data in a structured format; it also provides tools for data analysis and reporting through an R package and a website. Soil incubations are a commonly used technique for answering many different kinds of research questions, and here we provide recommendations on best practices, as well as a common data infrastructure for reporting. We expect the size of this database to grow in the future as it can be used as a standard repository for time series soil incubation data following open-source standards.
Author contributions. CAS designed the database; CAS, CS, JBM, MAR, SEC, AP, CHP, SS, and AMH built and populated the database while JBM provided technical database support. CS, JE, and CT developed the first version of incubation recommendations and CS wrote up the initial draft of the SIDb manuscript. All authors contributed to the writing.