A restructured and updated global soil respiration database (SRDB-V5)

Field-measured soil respiration (RS, the soil-to-atmosphere CO2 flux) observations were compiled into a global soil respiration database (SRDB) a decade ago, a resource that has been widely used by the biogeochemistry community to advance our understanding of RS dynamics. Novel carbon cycle science questions require updated and augmented global information with better interoperability among datasets. Here, we restructured and updated the global RS database to version SRDB-V5. The updated version has all previous fields revised for consistency and simplicity, and it has several new fields to include ancillary information (e.g., RS measurement time, collar insertion depth, collar area). The new SRDB-V5 includes published papers through 2017 (800 independent studies), where total observations increased from 6633 in SRDB-V4 to 10 366 in SRDBV5. The SRDB-V5 features more RS data published in the Russian and Chinese scientific literature and has an improved global spatio-temporal coverage and improved global climate space representation. We also restructured the database so that it has stronger interoperability with other datasets related to carbon cycle science. For instance, linking SRDB-V5 with an hourly timescale global soil respiration database (HGRsD) and a community database for continuous soil respiration (COSORE) enables researchers to explore new questions. The updated SRDB-V5 aims to be a data framework for the scientific community to share seasonal to annual field RS measurements, and it provides opportunities for the biogeochemistry community to better understand the spatial and temporal variability in RS, its components, and the overall carbon cycle. The database can be downloaded at https://github.com/bpbond/srdb and will be made available in the Oak Ridge National Laboratory’s Distributed Active Archive Center (ORNL DAAC). All data and code to reproduce the results in this study can be found at https://doi.org/10.5281/zenodo.3876443 (Jian and Bond-Lamberty, 2020). Published by Copernicus Publications. 256 J. Jian et al.: A restructured and updated global soil respiration database (SRDB-V5)


Introduction
Soil respiration (R S ), the soil-surface-to-atmosphere CO 2 flux, is one of the largest carbon fluxes between the terrestrial land surface and atmosphere (Luo and Zhou, 2010). The majority of R S is released by soil microbial/fauna (heterotrophic respiration) and plant root respiration (autotrophic respiration). Soils hold a large amount (> 2000 Pg C to 1 m depth) of carbon, more than the total carbon stock in the atmosphere and aboveground plants (Batjes, 2016;Tarnocai et al., 2009). Thus, its C efflux to the atmosphere has major implications for our understanding of ecosystem-to global-scale biogeochemical cycling. For better monitoring of soil carbon dynamics as well as to investigate how soil carbon responds to global climate change, it is important to measure R S across different vegetation types and climate conditions.
Many field experiments have been conducted in recent decades to measure R S in different climate conditions and vegetation types (Bond-Lamberty and Thomson, 2010b; Davidson et al., 1998;Raich and Potter, 1995). However, the resulting estimates of seasonal to annual R S fluxes are scattered throughout the scientific literature in a variety of formats. Therefore, compiling past R S measurements together into a standardized data framework to support synthesis analysis is very important to advance carbon cycle science.
Published site-scale R S measurements across the globe have been compiled and standardized into global soil respiration databases to support synthesis studies, macro-to-globalscale R S estimates, and soil carbon response to climate change investigation (Bond-Lamberty and Thomson, 2010a; Raich and Schlesinger, 1992). Schlesinger (1977) compiled one of the earliest listings of R S estimates from diverse ecosystems. Raich and Schlesinger (1992) subsequently integrated R S from published papers which covered 13 ecosystems and developed a simple linear model between R S and climate factors (i.e., temperature and precipitation), estimating global R S to be 68 ± 4 Pg C yr −1 . Later, more R S measurements (especially measured using the infrared gas analyzer, IRGA) were added, and the global R S was updated to 76-81 Pg C yr −1 (Raich et al., 2002;Raich and Potter, 1995). In 2010, Bond-Lamberty and Thomson (2010a) compiled a comprehensive global soil respiration database (SRDB), and this database was released for public usage. The SRDB contains annual and seasonal R S measurements, ancillary carbon pools and fluxes (e.g., gross primary production, net primary production, ecosystem respiration), response of R S to temperature and moisture (i.e., model parameters to describe the relationship between R S and temperature and moisture), and sites' background information (e.g., latitude, longitude, elevation, mean annual temperature, mean annual precipitation) (Bond-Lamberty and Thomson, , 2010a. With more IRGA-based R S measurements added and alkalinebased measurements excluded, Bond-Lamberty and Thomson (2010b) estimated the global R S to be 98 ± 12 Pg C yr −1 and estimated that global R S was increasing at a rate of 0.1 Pg C yr −2 . The SRDB has been widely used in the past decade since the first version was published (Bond-Lamberty and Thomson, 2010a), and to date it has been cited 359 times (searched in Google Scholar on 20 May 2020), but its use continues to increase each year (Fig. 1).
The SRDB of Bond-Lamberty and Thomson (2010a) however only recorded seasonal to annual R S fluxes, hindering analyses at finer temporal resolutions. Based on the SRDB, Jian et al. (2018c) collected SRDB studies reporting diurnal R S and compiled these into a global hourly soil respiration database (HGRsD). Similarly, Jian et al. (2018a) further collected detailed monthly and daily timescale R S measurements into a global monthly and daily soil respiration database (MGRsD). More recently, Bond-Lamberty et al. (2020) have built a database (COSORE) of continuous (typically half-hourly or hourly) datasets from globally distributed sites. With these different-timescale databases, R S temporal variability and its time-related driving processes and uncertainties can be analyzed (Jian et al., 2018a, b, c). There is still a need to improve interoperability among R S databases to expand available information, improve database usage, and advance our understanding of R S dynamics across multiple spatial and temporal scales.
In approaching a decadal reworking of the SRDB, we envisioned that it required improvements to increase its usage across different disciplines. Some important information (e.g., collar area, collar insertion depth, R S measure time, soil temperature, soil moisture, soil temperature measure depth, and soil moisture measure depth) was not included in the older versions (hereafter named SRDB-V1 to SRDB-V4), and thus important questions such as whether R S survey time (Cueva et al., 2017), collar insertion depth (Heinemeyer et al., 2011), and/or how collar cover area affected R S measurement accuracy could not be addressed. In addition, SRDB-V4 included data mainly published in English (∼ 98 %), while data published in other languages (∼ 2 %) were rarely included (Epule, 2015). Some metadata such as manipulation and measurement method were not standardized and thus were difficult to use in subsequent metaanalyses. For instance, the attempt to link SRDB to the Forest Carbon Database (ForC) showed that the old SRDB structure required modification before it could be linked with ForC (Anderson-Teixeira et al., 2018). Finally, information about how heterotrophic (R H ) and autotrophic respiration (R A ) respond to environmental conditions (i.e., temperature and soil moisture) was not included.
The older SRDB followed certain data integration principles, including inclusion criteria, database structure design, and quality control (Bond-Lamberty and Thomson, 2010a), but improvements could be made. We have updated it to a new version (hereafter named SRDB-V5) following FAIR protocols (i.e., findable, accessible, interoperable, and reusable) (Wilkinson et al., 2016). This has been accomplished by (1) restructuring SRDB and improving its interoperability so that data from SRDB-V5 can more easily be linked to external datasets; (2) separating the R S , R H , and R A responses to temperature and soil moisture functions into a separate file to simplify the database and improve its reusability; (3) adding collar area, collar insertion depth, and R S measurement time information to SRDB-V5; (4) collecting more R S data published in the Russian and Chinese scientific literature; (5) updating R S records available throughout the world from recently published literature (until 2017); and (6) improving the metadata description. We hope that these efforts will significantly improve the future interoperability and reusability of SRDB-V5.

Soil respiration database restructuring
We restructured the SRDB for easier data collection and quality control. The previous global R S database versions (SRDB-V1 to SRDB-V4) mainly included two files: a "studies" file, which recorded the detailed metadata for all published papers examined by the SRDB, and a "data" file, which stores all the R S data; a variety of ancillary site, soil, and carbon cycle data (e.g., gross primary production, GPP; net primary production, NPP; ecosystem respiration); and related background information such as site location, ecosystem type, and management (Bond-Lamberty and Thomson, 2010a). In SRDB-V5 the "studies" file remains unchanged, but the "data" file is now separated into two files: "srdb-data" and "srdb-equations". This simplifies the structure of the former while moving all the "Response of R S to temperature and moisture" columns in the SRDB to the latter. Note that the SRDB-V5 file format remains the same as the older ver-sions as comma-separated value data are easy to work with and universally readable by software.

Metadata
We standardized the background information of SRDB-V5. Most of the metadata are described by Bond-Lamberty and Thomson (2010a), and here we only describe new added columns or metadata with updates (Tables 1 to 3). We added five columns (i.e., Site_ID, Collar_height, Collar_depth, Chamber_area, Time_of_day) in SRDB-V5. Four columns (Rs_max, Rs_maxday, Rs_min, Rs_minday) were deleted (Table 1) because they were rarely reported and had not been used by the community in the past 10 years (according to our literature search, Rs_max, Rs_maxday, Rs_min, and Rs_minday have never been used). In the Quality_flag column, we added two more flags related to R S temperature equations: Q15 means the equation was developed based on seasonal R S data rather than covering at least a whole year, and Q16 notes that there is a soil water content (SWC) component within the reported equation (Table 1).
For many analyses SRDB needs to be connected with other datasets, and a unique observation ID is essential for this process. In the SRDB-V5, we added a "Site_ID" column to guarantee a unique ID for each Rs_annual observation within a study, enabling users to easily link SRDB-V5 records with external data such as MGRsD and HGRsD. The Site_ID is in the form of "CC-RC-IC", where CC is the ISO Alpha-2 country code (https://www.nationsonline. org/oneworld/country_code_list.htm, last access: 31 January 2021), RC is region code (state/province), and IC is identity code. Country code and region code are always  Table 2 Standardized in SRDB-V5 Partition_method See Table 3 Standardized in SRDB-V5 Indirectly calculate soil respiration rate (e.g., through relationship between soil respiration and GPP) Isotope 3 Determine soil respiration rate using isotope (e.g., C 13 ) Unknown 27 None of above present, but some studies report only one annual R S value, and thus IC may or may not be present. We standardized the coding of experimental manipulation, collapsing the previous ad hoc categories into a smaller set of standardized terms. This decreased the number of unique Manipulation field values from 689 to 276. We used the following criteria to simplify the manipulation in SRDB-V5: (1) measurements from no treatment (i.e., control) were categorized as "None"; (2) manipulation names were standardized (e.g., "clipping", "clip", and "clipped" are now all standardized as "Clip"); (3) we used the manipulation level to further describe the difference within a specific manipulation (e.g., "Litter manipulation" could have "double litter", "50 % litter removal", "100 % litter removal"). With manipulation standardized, scientists can further analyze how manipulation affects R S . For instance, comparing R S measurements from the "CO 2 " group (i.e., elevated CO 2 concentration treatment) with "None" (i.e., control) enables researchers to analyze how R S responds to CO 2 concentration increase caused by CO 2 released from fossil fuel combustion. Similarly, data from the "Warm" and "Precipitation amount change" groupings will enable scientists to more easily explore how soil carbon responds to global climate change. Barba et al. (2018) suggested that bias could arise from measurements made in "hotspots" (i.e., areas with high values compared with the surrounding environment), and groupings such as "Ant mound" and "High N" facilitate data interpretation and analyses regarding "hotspots".
We also standardized the R S measurement method (the Meas_method) and R S partition method (Partition_method) fields. Measurement method was grouped into nine types (Table 2), and the partition method was grouped into eight types (Table 3). With these changes, scientists can more easily investigate whether different measure methods affect R S results as well as whether different partition methods affect R H and R A partitioning.
Latitude and longitude are key metadata as they can be used to link R S measurements to spatial data (e.g., precipi- 16 Determining heterotrophic and autotrophic respiration through total belowground carbon allocation calculation Other 122 None of above tation and air temperature). During the data collecting process, latitude and longitude values reported in the original paper were recorded in our database, generally to two significant digits. However, the precision of SRDB latitude and longitude can be affected by many factors: first, studies report latitude and longitude at different and sometimes uncertain levels of precision; second, studies use different methods for recording latitude and longitude; and finally, some studies have multiple nearby sites but report one general latitude and longitude for all those sites. However, it is unlikely that the error is very large, and in general we assume that linking R S measurements to relatively coarse spatial data (e.g., 0.1-0.5 • resolution) should be unproblematic. When linking to high-spatial-resolution data (such as 30 m resolution remotesensing images), users should be aware that the variable and uncertain SRDB latitude and longitude precision may cause data quality issues. That said, SRDB-V5 was revised to avoid unrealistic locations such as points in the ocean. Furthermore, the latitude and longitude fields should be within −90 to 90 • and −180 to 180 • , respectively; whenever they are out of these ranges, a warning is raised.

Soil respiration database update
We updated the SRDB-V5 so that it has temporal coverage to 2017 and made an effort to collect R S data published in the Russian and Chinese literature to be more inclusive and expand its spatial coverage. Papers published in English are the majority (∼ 98 %) of sources in SRDB, while papers published in other languages are rarely included (Bond-Lamberty and Thomson, , 2010a. This reflects the dominance of English as the language of international science, but there are some data available from the Russian-language literature, representing data from a large area (Russia represents ∼ 11 % of the terrestrial land surface) and a variety of climate types and vegetation types. In addition, in MGRsD and HGRsD, there were some Chinese-language papers or recently published papers (103 studies, ∼ 5 % of the total studies in SRDB-V5) which were not included by SRDB. Now we have compiled data from those papers into SRDB-V5.

Data quality control
We developed an R (R Core Team, 2019) script to perform data quality and consistency checks. For example, the latitude and longitude fields have to be in specific ranges, otherwise a warning is raised. For details about the data constraints used to check each column in SRDB-V5, please see the "srdb_check.R" script, which is available in the GitHub repository and as part of every release download (https://github.com/bpbond/srdb/releases, last access: 31 January 2021). This script is also run on all pull requests to the Github repository, which enables us to flag data quality problems before changes are made to the database.

Data coverage analysis
We compared mean annual temperature (MAT) and mean annual precipitation (MAP) of sites from SRDB with the global MAT and MAP to test the representation of the SRDB. We connected the sites from SRDB with external climate data (Willmott and Matsuura, 2001) through latitude and longitude and obtained MAT and MAP. Barren area was masked according to the MODIS land cover (Friedl et al., 2002). Climate region was retrieved from the climate Köppen classification (Peel et al., 2007). We also obtained International Geosphere-Biosphere Programme (IGBP) vegetation classification of the SRDB sites by connecting IGBP classification data (IGBP, 1990); vegetation was grouped into agriculture, arctic, desert, tropical forest (tropic FOR), temperate & boreal forest (T&B FOR), grassland, savanna, shrubland, urban, and wetland. If the MAT and MAP distribution of SRDB sites is similar to that of global MAT and MAP distribution, it should mean that the SRDB better represents the global flux R S distribution as well. We also assume that as data sam-ple size increases, the new database (e.g., SRDB-V5) should improve its representation compared with the older version (e.g., SRDB-V1). We tested the representation of sites in different vegetation types (IGBP, 1990).

Results
The number of records of SRDB-V5 is much larger compared with older versions. Collecting R S measurements from newly published literature (until 2017) greatly improves the total number of observations in the database (increased from 6633 to 10 366) in SRDB-V5 but only somewhat improved its spatial coverage (Fig. 2). The Northern Hemisphere midlatitude regions, where SRDB-V4 has the most R S sites, had the largest R S increase in SRDB-V5 as well (blue dots in Fig. 2). Adding literature in Chinese did not substantially improve the spatial coverage either, possibly because more and more R S measurements in China have been published in the English scientific literature. However, most sites in China are from the eastern part of the country, and measurements from western China, if available, will be important to include in future SRDB updates. We collected ∼ 50 papers published in Russian, but only 14 of them (∼ 0.7 % of total studies of all languages in SRDB-V5) met the criteria (see Bond-Lamberty and Thomson, 2010a, for details) and were included in the database. This small number of papers nonetheless substantially improved the database's spatial coverage of the Russian landmass (orange circles in Fig. 2). MAT and MAP distribution of SRDB sites are very similar to global distribution in agriculture, forest, and grassland regions, indicating good representativeness of SRDB sites in these three types of vegetation (Figs. 3 and 4). For shrublands, sites in the oldest versions of the database (e.g., SRDB-V4) did not represent the global distribution well, but this distribution was greatly improved as more R S measurements were included in SRDB-V5 (Fig. 3). Sites from other vegetation types, however, were less representative of the corresponding global climate space, with barren lands masked out (Fig. 3, right panel). More specifically, arctic sites in SRDB have relatively narrow MAT and MAP coverage compared with the global arctic MAT and MAP distribution, probably because many regions in the arctic are covered by snow all year round, and thus it is difficult to measure R S at those sites (Virkkala et al., 2019). Desert SRDB sites have lower MAT but higher MAP than the global distribution, probably because (1) the disproportionate number of samples in temperate regions (Fig. 2) means that most samples in deserts are likely from wetter deserts; (2) the Sahara has low MAP and high MAT and covers a large area of the world, but few studies were conducted there, so that area of the world may simply represent the bias; and (3) many "deserts" that have been studied are in relatively close proximity to urban developments (e.g., southwestern USA, southern Europe), and those deserts are neither as harsh nor extensive as the Sahara. Urban and savanna sites in SRDB had lower MAT compared to their global distribution, probably because many tropical cities and savannas in South America, Asia, and Africa were rarely measured Martin et al., 2012). We suggest that papers written in other languages, especially those in Portuguese, Spanish, and French, could potentially increase the R S measurements in South America and Africa.
Adding new measurements in SRDB-V5 has substantially increased total observations, and the spatial coverage of sites was improved compared with SRDB-V4 (Fig. 2). However, the distributions of annual R S and seasonal R S (growing, dry, wet, spring, summer, autumn, and winter season R S ) were similar in the SRDB-V5 compared to SRDB-V4 (Fig. 5). We suspect that new R S measurements are collected disproportionately from the same regions as previously sampled, and thus future studies should focus more on those regions with fewer data. For the future SRDB update, measurements from the Southern Hemisphere, desert, arctic, and tropical forests, if available, will be important to include.

Forecasting global R S , R H , and R A
The updated SRDB-V5 provides opportunities for constraining global R S estimates in the future. Currently, estimated global R S ranged from 68-101 Pg C yr −1 , with many uncertainties associated with measurements and propagation of errors evident when upscaling site-specific R S measurements to regional and global scales (Bond-Lamberty and Thomson, 2010b;Jian et al., 2018a, b;Raich et al., 2002;Raich and Potter, 1995;Raich and Schlesinger, 1992;Warner et al., 2019). For example, R S has been usually measured during daylight hours, implicitly assuming that measurements during this period represent the mean daily R S . In a water-limited ecosystem, however, Cueva et al. (2017) estimated a time-of-day bias ranging from −29 % to +40 %. On the global scale, based on the HGRsD, Jian et al. (2018c) found that not measuring R S 24 h continuously contributed less than 6 % of bias when estimating diurnal R S . Quantifying the amount of bias required detailed information about when R S was measured and how long the measurement lasted (Jian et al., 2018c). In the SRDB-V5, we revised all the studies and collected the "Time_of_day" information, which should enable future analyses of how R S measurement bias is related to when R S measurements were collected.
It is also widely accepted that chamber properties (e.g., volume, area) (Davidson et al., 2002) and collar insertion depth (Heinemeyer et al., 2011) affect the R S measurement accuracy, but on a global scale, this has not been quantitatively tested before to our knowledge. We added information in the SRDB-V5 to enable researchers to investigate whether chamber area (smaller chambers are more vulnerable to edge effects, while larger chambers may experience inadequate air The size of circles represents the sample size at each measurement site (i.e., bigger circles represent more data). mixing), collar height (which may affect air mixing in the chamber), and insertion depth (which may cut off roots) affect R S measurement accuracy and bias at seasonal to annual scales.
Comparing SRDB-V1 through SRDB-V5, we found that the uneven spatial distribution of R S sites has improved, but bias still remains, with measurements conducted unevenly around the world and in climate space (Figs. 2-4). The reason for the spatially uneven coverage of R S sites is a combination of economy, national policy, environmental conditions, spatial heterogeneity, and many other issues. Most obviously, the Northern Hemisphere has much more data than the Southern Hemisphere as the most economically developed and wealthiest countries tend to be in the middle latitude of the Northern Hemisphere, and thus more funds, infrastructure, and a broader and deeper pool of students and technical experts are all available to support on-site R S measurement in these regions.
Improving modeling frameworks may help mitigate the uneven spatial distribution of R S sites. For example, Jian et al. (2018b) found that how R S responds to temperature is significantly different among climate regions, and therefore climate-specific models may be more appropriate than a single global model to estimate global R S . Alternatively, machine-learning approaches that account for non-linearity and multiple potential combinations of environmental factors have been used to estimate global R S (Warner et al., 2019). SRDB-V5 also significantly increased the R S sample size, and analyses could be conducted to test whether the increasing sample size of R S helps reduce uncertainty when upscal-ing from site-to global-scale R S . We recognize that there are many other possible sources of bias, but it is nonetheless possible that the biogeochemistry community will be able to use SRDB-V5 to improve the confidence of global R S modeling and constrain global carbon cycle estimates.
Linking SRDB-V5, MGRsD, HGRsD, and COSORE provides an opportunity for global R H and R A estimates. Soil respiration mainly consists of two parts, R H and R A , but it is difficult to separate these two components, and much fewer R H and R A data are available in the SRDB (Bond-Lamberty and Thomson, 2010a). Due to a lack of data, far fewer studies have analyzed R H and R A and estimated global R H and R A in the past decades. According to our knowledge, there are only four global R H (or R A ) estimates based on the very limited extant data (n < 500) (Hashimoto et al., 2015;Konings et al., 2019;Tang et al., 2020;Warner et al., 2019). In the "srdb-equations" file, response of R H and R A to temperature and moisture information will be recorded, which will inspire the study of R H and R A and how they respond to temperature and soil moisture in the future. Further, we argue that a big advantage of global soil respiration databases with finer temporal resolution (i.e., MGRsD, HGRsD, and COSORE) is that the sample size of R H and R A could be greatly increased (e.g., sample size could be increased 10-fold if using a monthly timescale). In addition, the spatial coverage of R H and R A data could also be improved. Based on the monthly R H and R A data and how they relate to environmental conditions (such as temperature and precipitation), monthly global R H and R A products could be generated, which provide useful data products for the earth Figure 3. Comparison of mean annual temperature (MAT; • C) around the globe (in red) vs. MAT from the sites in the global soil respiration database (SRDB; in teal) by the vegetation types. SRDB-V4 represents the older SRDB released in 2018, and SRDB-V5 represents the newest SRDB published in 2020. Data from SRDB cover 10 vegetation types (agriculture, arctic, desert, tropical forest (tropic FOR), temperate and boreal forest (T&B FOR), grassland, savanna, shrubland, urban, and wetland). Comparing the fourth version (SRDB-V4) to the newest version (SRDB-V5), MAT values of agriculture, forest, and grassland sites generally well represent the global MAT; in contrast, MAT from shrubland sites in the database did not well represent global means in the older SRDB-V4, but their representation significantly improved in the newest SRDB-V5; for other vegetation types (arctic, desert, savanna, urban, and wetland (including peatland) in the right panel), the MAT of the database sites does not well represent the global MAT distribution. Note that the barren region was masked using MODIS land cover data. The number within each panel represents the number of records for each vegetation type. system models' (ESMs) benchmarking. The disadvantages of the smaller-timescale databases (MGRsD, HGRsD, and COSORE) is that those databases usually have much less spatial coverage, and much more data are available from the growing season than from the non-growing season. Therefore, spatial upscaling including time may result in additional bias and associated uncertainty that must be carefully investigated.

Perspective
The updated SRBD-V5 will further support the analysis of how different manipulations affect R S . In the past decades, many field experiments have been conducted to study different questions, for example, how soil carbon responds to global climatic warming and changes in precipitation patterns (Vicca et al., 2014) or how human activities (forest management, agriculture cultivation, and pollution) affect terrestrial carbon cycling and soil carbon stock (Carrillo et al., 2014;Jasek et al., 2014). However, inconsistent results from different experiments have generated debate regarding the effects of environmental factors and manipulations in R S . Now SRDB-V5 includes R S measurements from both control and different kinds of treatments, providing opportunities for synthesis analysis of how manipulation affects R S . However, these treatment data about R S measurements were rarely used in the past decade as the manipulation information in older versions of SRDB was not standardized and thus could not easily be used. The updated and standardized SRDB-V5 manipulation codes have the potential to enable manipulation-driven studies on the macro to global scale.

Future improvements
We made an effort to resolve some issues in the old versions of SRDB (V1-V4), but the database needs to be continuously improved in the future. There is much more potentially useful information that could be included in future SRDB up- dates, although it is important to remember that every additional piece of information comes with a never-ending cost (in terms of data entry time, quality assurance and quality control, etc).
1. Number_of_collar: the number of collars within a certain study area is important information to evaluate the representability of the R S measurements.
2. Soil organic carbon (SOC): SOC measured in situ or obtained from regional or global datasets should be compiled into the database (Guevara et al., 2020;Hengl et al., 2017).
3. Currently, Site_ID in SRDB-V5 is only comparable with Site_ID of MGRsD and HGRsD; further updates to Site_ID are necessary so it can connect with more external datasets (e.g., FLUXNET, COSORE, and Amer-iFlux and a global database of forest carbon stocks and fluxes (ForC); Anderson-Teixeira et al., 2018).
4. Annual_soil_moisture: including a mean value of soil moisture or intra-annual soil variability derived from remote sensing (Guevara and Vargas, 2019) when this variable was not measured at the site.
In addition, some meta information can be improved. For example, there are still 276 manipulation types in the SRDB-V5 and many manipulation types (n = 96 out of 276) with only one row of records. Efforts could be made in the database update to further simplify the manipulation of SRDB. We recognize that with thousands of publications included in the SRDB, it is known that some entries are incorrect, and some information may have been missed during literature collection. In the past years, users have pointed out many data input errors and missing data issues in the SRDB; we made a great effort to check, and many corrections have been made. However, it is inevitable that mistakes and missing information still exist; therefore, there is a pressing need to continue with the development of quality assurance and quality control for each update.

Reducing interoperability barriers
High interoperability is needed to maximize the benefits of SRDB-V5 to improve our understanding of the global car- Figure 5. Comparison of annual soil respiration (R S ) and seasonal R S (growing, dry, and wet seasons; spring, summer, autumn, and winter) observations from SRDB-V4 vs. those from SRDB-V5. In summary, adding new measurements does not change the distribution of annual R S or seasonal R S in the databases.
bon cycle. Interoperability has been defined as an organized collective effort with the ultimate goal to maximize sharing and using information to produce knowledge, and high interoperability is achieved by reducing conceptual, technological, organizational, and cultural barriers . The improved SRDB-V5 has reduced conceptual barriers as it provides a standardized and replicable framework to organize global R S information that has been used for over a decade (Bond-Lamberty and Thomson, 2010a). It has reduced technological barriers by improving standardization of data fields (see Tables 1-3) and data formats compatible with other databases as well as and providing flexible R scripts (for details please see Sect. 2.4) in a Github repository for end users and potential data contributors. We recognize that measuring R S has other technological barriers (e.g., standardization of instrumentation, electrical power supply) that limit the collection of new measurements in harsh environments or wide implementation in developing countries. Organizational barriers remain a challenge as this is a bottomup effort in need of long-term support to continue improving the quality and the development of the new versions of the SRDB. Finally, we believe that cultural barriers have been reduced as the global scientific community has improved in recognizing the importance of standardized databases and data sharing following FAIR principles.

Code availability
All data and code to reproduce the results in this study can be found at https://doi.org/10.5281/zenodo.3876443 .

Data availability
Findability and accessibility were well considered and described when SRDB-V1 was published (Bond-Lamberty and Thomson, 2010a). To summarize the updating progress, SRDB-V1 was the first full available dataset, released on 28 May 2010; SRDB-V2 was released on 13 March 2012, and R S data of publications from 2011 were integrated into the database; SRDB-V3 was released on 4 August 2014, and R S data of the literature from 2012 were collected and added; SRDB-V4 was released on 21 November 2018, and R S data of the literature through 2015 were collected and compiled into the database; SRDB-V5 was released on 24 April 2020, and R S data of the literature from 2017 were collected and added ). The version release information was recorded at the Oak Ridge National Laboratory's Distributed Active Archive Center (ORNL-DAAC). All data and code to reproduce the results in this study can be found at https://doi.org/10.5281/zenodo.3876443 .

Using and citing SRDB-V5
SRDB-V5 can be used for individual, academic, research, commercial, and other purposes and can be repackaged without written permission. Research and non-research products using SRDB-V5 should cite this publication.

Conclusions
A global soil respiration database (SRDB) was developed to integrate soil respiration measurements from the globe a decade ago. Since the first release in 2010 (SRDB-V1), it has been widely used to advance our understanding of carbondynamic-related questions. Here, we restructured SRDB to a new version (SRDB-V5) following FAIR principles. We show that the SRDB substantially improved its representativeness compared with the older versions (SRDB-V1 to SRDB-V4; Figs. S1 and S2 in the Supplement) and improved its spatial coverage. A primary goal of SRDB-V5 is to improve the interoperability and reusability and make it possible for scientists to contribute in the future with the ultimate goal to improve our understanding of the global carbon cycle. With those goals in mind, the revised SRDB-V5 is now more user-friendly for the ecology, biogeochemistry, and modeling communities.
Author contributions. BBL and JJ designed the new version of the global soil respiration database (SRDB-V5). BBL searched and downloaded the new papers until 2017 and compiled the metainformation. BBL, MH, RM, JM, DP, and JJ contributed to data collection; NK collected data in Russian; KAT and VH raised many useful suggestions while working to integrate with ForC; RV and ES provided feedback and insights in all phases. JJ wrote the paper in close collaboration with all authors.
Competing interests. The authors declare that they have no conflict of interest.