Spatial- and temporal-patterns of global soil heterotrophic respiration in terrestrial ecosystems

. Soil heterotrophic respiration (RH) is one of the largest and most uncertain components of the terrestrial carbon cycle, directly reflecting carbon loss from soils to the atmosphere. However, high variations and uncertainties of RH existing 20 in global carbon cycling models require RH estimates from different angles, e.g., a data-driven angle.an urgent development of data-derived RH dataset. To fill this knowledge gap, this study applied a Random Forest (RF) algorithm – a machine learning approach, to (1) develop a globally gridded RH dataset and (2) investigate its spatial- and temporal-patterns from 1980 to 2016 at the global scale by linking field observations from the Global Soil Respiration Database and global environmental drivers – temperature, precipitation, soil water content, etc. Finally, a globally gridded RH dataset was and was primarily distributed at high latitude areas. While the areas dominated by precipitation and soil water content were mainly semi-arid and tropical areas, accounting for 36% and 25% of the global land, respectively, suggesting variations in the dominance of environmental controls on the spatial patterns of RH. The developed globally gridded RH dataset will further aid in understanding of the mechanisms of global soil carbon dynamics, serving as a benchmark to constrain terrestrial biogeochemical global vegetation models. The dataset is publicly available at https://doi.org/10.6084/m9.figshare.8882567 (Tang et al., 2019a). gridded data-derived dataset

of data-derived RH dataset. To fill this knowledge gap, this study applied a Random Forest (RF) algorithma machine learning approach, to (1) develop a globally gridded RH dataset and (2) investigate its spatial-and temporal-patterns from 1980 to 2016 at the global scale by linking field observations from the Global Soil Respiration Database and global environmental driverstemperature, precipitation, soil water content, etc. Finally, a globally gridded RH dataset was developed covering from 1980 25 to 2016 with a spatial resolution of half degree and a temporal resolution of one year. Globally, the average annual RH was 57.2±0.6 Pg C a -1 from 1980 to 2016, with a significantly increasing trend of 0.036±0.007 Pg C a -2 . However, the temporal trend of the carbon loss from RH varied with climate zones that RH showed a significant and increasing trend in boreal and temperate areas, in contrast, such trend was absent in tropical regions. Temperature -driven RH dominated 39% of global land 2 and was primarily distributed at high latitude areas. While the areas dominated by precipitation and soil water content were 30 mainly semi-arid and tropical areas, accounting for 36% and 25% of the global land, respectively, suggesting variations in the dominance of environmental controls on the spatial patterns of RH. The developed globally gridded RH dataset will further aid in understanding of the mechanisms of global soil carbon dynamics, serving as a benchmark to constrain terrestrial biogeochemical global vegetation models. The dataset is publicly available at https://doi.org/10.6084/m9.figshare.8882567 (Tang et al., 2019a). 35

Introduction
Global soils and surface litter store up to 2-or 3-fold the amount of carbon present in the atmosphere (Trumbore, 2009) and therefore, a small change in soil carbon content could have profound effects on atmospheric CO 2 and climate change (Köchy et al., 2015). Although global carbon flux from soil-to-atmosphere is increasing , the degree to which future 40 climate change will stimulate soil carbon loss via heterotrophic respiration (RH) remains highly uncertain (Bond-Lamberty et al., 2018;Friedlingstein et al., 2014;Trumbore and Czimczik, 2008), particularly for areas with a high temperature sensitivity, or rapid changes in precipitation frequency and intensity.
Soil RH represents the carbon loss from the decomposition of litter detritus and soil organic matter by microorganisms (Subke et al., 2006), accounting for one of the largest components of the terrestrial carbon cycle (Bond-Lamberty et al., 2016). 45 However, RH's feedback to climate variability is poorly understood. RH could affect future climate change via the mineralization of long-stored soil carbon, offsetting net primary production (NPP) and even converting terrestrial ecosystems from a carbon sink to a carbon source (Tremblay et al., 2018). Conversion of the sink/source role depends on how strongly large-scale process affected by environmental drivers, e.g. temperature, precipitation and soil organic carbon content (Hursh et al., 2017;Sierra et al., 2015), or extreme conditions, such as fire, human disturbance and drought (Kurz et al., 2013;50 Metsaranta et al., 2011). Although it is widely recognized that warming enhances CO 2 release from soils, the magnitude of such release is uncertain due to variations in the temperature sensitivity of soil organic matter decomposition (Suseela et al., 2012). In addition, environmental drivers of RH, e.g. temperature and soil moisture, are still undergoing changes under climate warming and can affect RH individually or interactively. Therefore, reducing RH uncertainty and clarifying the response of RH to environmental factors are essential for future projections of the impact of climate change on the terrestrial carbon balance. 55 Due to the diurnal, seasonal and annual variability in RH, in addition to the difficultiesty of large-scale measurements, regional and global RH estimations mainly depend on modelling approaches using regional or global variables, such as temperature, precipitation and carbon supply (Bond-Lamberty and Thomson, 2010b;Hashimoto et al., 2015;Hursh et al., 2017). Besides temperature and precipitation, soil variables, such as water, carbon and nitrogen contents, are also important factors in the regulation of RH and should be considered for accurate RH estimations (Hursh et al., 2017;Zhao et al., 2017), 60 although these variables vary with biome and climate.
Observational studies have examined the responses of soil respiration to different climatic variables at different locations across the globe (Bond-Lamberty and Thomson, 2010a;Zhou et al., 2016). Hashimoto et al. (2015) and Bond-Lamberty and Thomson (2010b) predicted global soil respiration rates using climate-derived models driving by temperature and precipitation, however, these models commonly explain less than 50% variations of soil respiration, requiring new techniques, potential new 65 numerical/algorithmic methods to better quantifyication and understanding of the large-scale soil carbon fluxes (Bond-Lamberty, 2018). To improve the modelling accuracy, more recent studies have used linear regression or machine learning approaches including more abiotic or biotic variables, such as soil carbon supply, soil properties and NPP (Hursh et al., 2017;Zhao et al., 2017) and observations collected from newly published measurements (Jian et al., 2018;Zhao et al., 2017). On the other hand, including more variables in linear or non-linear regression models may cause overfitting and autocorrelation issues 70 (Long and Freese, 2006). To overcome overfitting and autocorrelation, machine learning approaches, such as the Random Forest (RF, Breiman, 2001), have been applied to explore the hierarchical importance of environmental factors, such as temperature, soil water content (SWC), NPP and soil pH (Hursh et al., 2017). Machine learning techniques are highly effective as because they are fully data adaptive, and do not require initial assumptions on functional relationships and can function with nonlinear dependencies (Bodesheim et al., 2018). Therefore, these approaches are beginning used using in earth science, 75 particularly in carbon and water flux modelling (Jung et al., 2010;Jung et al., 2017;Yao et al., 2018b), and may provide more reliable estimates of soil respiration (Bond-Lamberty, 2018;Zhao et al., 2017). However, no study to date has assessed the global variability of RH using empirically field observations to bridge the knowledge gap between local, regional and global scales.
The newly-emerged Dynamic Global Vegetation Models from the TRENDY model ensembles and Earth System Models 80 have been widely used to investigate major physiological and ecological processes and ecosystem structures, providing a novel database and approach to examine and estimate RH at the global scale , although RH improvements in Earth System Models are still required (Shao et al., 2013). TRENDY and Earth System Model simulations incorporating a RH component are commonly calibrated and validated by eddy covariance measurements, e.g. net ecosystem carbon exchange (Yang et al., 2013), however, modelled RHs from these models have not yet been calibrated and validated using field RH 85 observations. Therefore, these modelled RHs may be fundamentally different from observed values and no global observations exist to evaluate model effectiveness. Consequently, the a data-driven RH database dataset could improve our understanding of the underlying mechanisms of RH variability to future climate change at the global scale, and could serve as a benchmark to constrain terrestrial biogeochemicalglobal vegetation models.
Thus, we used a RF algorithm to estimate global RH based on updated RH observations from the Global Soil Respiration 90 Dataset (SRDB, Bond-Lamberty and Thomson, 2010a) with the objectives of: (1) developing a globally gridded RH product (named data-derived RH); (2) detecting the temporal and spatial-and temporal-patterns of RH; (3) identifying the dominant driving factors for spatial-and temporal-variabilities of RHs; and (4) comparing the data-derived RH dataset with data RH generated by Dynamic Global Vegetation Models from the TRENDY ensembles. 4 95 2 Materials and methods

Soil heterotrophic respiration database development
The basis of the database developed here included observed global RH values from SRDB (Bond-Lamberty and Thomson, 2010a), which was freely obtained at https://github.com/bpbond/srdb. The database was further updated using observations collected from Chinese peer-review publications at the China Knowledge Resource Integrated Database (www.cnki.net) until 100 March 2018. This study included the RH data for: (1) annual RH as directly reported in publications with at least one year continuous measurements; (2) the start-and end-year were extracted from SRDB, directly from publications or calculated by the "years of data" in the SRDB; (3) observations measured by alkali absorption or soda lime approaches were not included because of their potential underestimation of respiration flux with an increasing pressure in the measurement chamber (Pumpanen et al., 2004); (4) experiments with treatments, such as nitrogen manipulation, or fertilization, were excluded, and 105 only RH measurements from the control treatment were included (Jian et al., 2018); (5) SRDB observations labelled as "potential problem" ("Q10"), "suspected problem" ("Q11"), "known problem" ("Q12"), "duplicate" ("Q13") and "inconsistency" ("Q14") were not included (Bond-Lamberty and Thomson, 2010a). Finally, the newly updated databaseset included 504 RH observations in total. Although most of the observations were from China, America and Europe, this database covered all the major terrestrial biomes across the world (Fig. 1). 110

Climate and soil data
To investigate the global spatial-temporal RH patterns, global spatial-temporal grids of RH driving factors were required.
A total of 9 nine global variables were included (Supplementary Table S1): Mmonthly gridded data of temperature, precipitation, diurnal temperature range from Climatic Research Unit TS v.4.01 over 1901(https://crudata.uea.ac.uk, Harris et al., 2014; shortwave radiation (SWR, https://www.esrl.noaa.gov, Kalnay et al., 1996); gridded soil organic carbon 115 content (Hengl et al., 2017) and nitrogen content from https://webmap.ornl.gov/ogc/index.jsp (Global Soil Data, 2000); monthly gridded nitrogen deposition dataset from the global Earth System Models of GISS-E2-R, CCSM-CAM3.5 and GFDL-AM3 from 1850 to 2000s (https://www.isimip.org, Lamarque et al., 2013); monthly Palmer Drought Severity Index (PDSI, https://www.esrl.noaa.gov/psd/, Dai et al., 2004) and (SWC, https://www.esrl.noaa.gov, van den Dool et al., 2003). Before further data analysis, monthly data were aggregated to an annual scale. These variables could stand for different environmental 120 controls on RH. For example, temperature, precipitation and SWC are critical environmental controls on microbial activities for soil organic carbon matter decomposition (Jian et al., 2018;Suseela et al., 2012;Tremblay et al., 2018). Soil organic matter, soil carbon stock and soil nitrogen are important carbon and nitrogen substrates for microbes that are related to the decomposition of soil organic matter (Tremblay et al., 2018). The drought index (PDSI) and diurnal temperature range to represent water and temperature stress on RH (Berryman et al., 2015;Zhu and Cheng, 2011). Then tThe global environmental 125 drivers for each given site were extracted by site longitudes and latitudes corresponding to annual RH observations. If the environmental driver is not in a spatial resolution of 0.5 o , we first re-sampled this environmental driver to a 0.5 o resolution using a the bilinear interpolation.

RH from TRENDY models
In the last several decades, TRENDY models were developed to simulate key processes (e.g. photosynthesis, respiration, 130 evapotranspiration, phenology and carbon allocation) that drive the dynamics of global terrestrial ecosystems . TRENDY models follow a common protocol and use the same climate-forcing data from National Centres for Environmental Prediction at a spatial resolution of 0.5°. For modelled products with different spatial resolutions, new errors will be produced when re-sampling to 0.5°. Therefore, to compare the dynamics in the data-derived RH dataset and TRENDY RH dataset, we used model outputs from seven TRENDY models: (Community Land Model  Zeng et al., 2005); and Vegetation Integrative Simulator for Trace gases (VISIT, Kato et al., 2013). Additionally, the RH dataset generated empirically by Hashimoto et al. (2015) was compared (termed as Hashimoto RH), which was publicly available (http://cse.ffpri.affrc.go.jp/shojih/data/index.html) and estimated from a global relationship between RH and soil 140 respiration (Bond-Lamberty et al., 2004), and the total soil respiration was predicted from a climate-driven model using precipitation and temperature estimated using empirical total soil respiration relationships using a climate-driven model (termed as Hashimoto RH) based on the observations from SRDB. More details can be found in Hashimoto et al. (2015).

RF-based RH Modelling
RF is a machine learning approach that uses a large number of ensemble regression trees, but a random selection of predictive 145 variables (Breiman, 2001). Two free parameter settings are required, which are the number of trees and candidate variables for each split. However, the RF model is not usually sensitive to the number of trees or variables. A RF regression can deal with a large number of features, assisting a feature selection based on the importance value of each variable and the avoidance of overfitting (Bodesheim et al., 2018;Jian et al., 2018). In the present study, a RF model was trained using 9 variables (supplementary Table 1) in the "caret" package (version 6.0-80, accessed on May 27, 2018) in R (R Core Team, 2018), which 150 was then implemented to predict RH for each grid at a spatial resolution of 0.5°. To characterize the performance of RF, a 10fold cross-validation was applied, which meants that the dataset was stratified into 10 parts and each part contained roughly equal number of samples. The target values for each of these 10 parts were predicted based on the training models using the remaining nine parts. Two model evaluation statistics were used, including modelling efficiency (R 2 ) and root mean square error (RMSE, Tang et al., 2019b;Yao et al., 2018b). 155

Trend analysis
A Ttrend analysis of RH was estimated by the Theil-Sen linear regression and tested with the Mann-Kendall non-parametric test. The Theil-Sen estimator is a non-parametric slope estimator based on median values, and this approach was widely used 6 for time-series analysis, e.g. carbon fluxes (Dai et al., 2016), and vegetation greening and browning (Pan et al., 2018) . The Mann-Kendall non-parametric test was employed to investigate the significant changes in RH trend at a. The significance level 160 was atof 0.05.

Relationships between RH and temperature, precipitation and SWC
Although previous studies have used precipitation as a proxy for SWC (Bond-Lamberty and Thomson, 2010b;Chen et al., 2010), this may result in variability in soil respiration estimates (Jassal et al., 2007;Zhang et al., 2006), because the relationship between SWC and soil respiration is was much more complex than that between soil respiration and temperature or 165 precipitation (Jian et al., 2018). Therefore, the mean annual temperature (MAT), precipitation (MAP) and SWC were all considered as potentially important proxies driving RH (Bond-Lamberty et al., 2016;Reichstein and Beer, 2008). Annual mean RH was regressed against the three proxies. The relationships between the data-derived RH and temperatureMAT/precipitationMAP/SWC were assessed locally for each grid cell by calculating the correlations using partial correlation analysis. When analysing the partial correlations between RH and the proxy, the other two proxies were controlled 170 to remove their confounding effects on RH. The correlation strengths of temperatureMAT, precipitation MAP and SWC were used to derive RGB combinations and indicate the drivers of RH.

The comparison map profile method
To detect the spatial similarity and difference patterns between the data-derived RH and TRENDY and Hashimoto or reported RH values from 1981 to 2010, we utilized the comparison map profile (CMP) method (Gaucherel et al., 2008). This 175 method was based on the absolute distances (D) and cross-correlation coefficients (CC) across multiple scales, with D and CC reflecting the similarity and the spatial structures of two compared images with the same sizes, respectively (Gaucherel et al., 2008). The D value between moving windows (from 3 × 3 to 41 × 41 pixels in present study) of two compared images wasere calculated by Equation (1): (1) 180 Where, ̅ and ̅ represent mean values calculated over two moving windows. Finally, the mean D value was calculated as an average of different moving windows.
The CC was calculated according to Equation (2): Where, x ij and y ij are pixel values at i th row and j th column of the moving windows of two compared images, respectively; N represents the total number of pixels covered by each of moving windows; σ x and σ y stand for standard deviations of two 7 moving windows. Low D values reflect a goodness between the compared images, while low CC values suggest a low similarity. Finally, the mean D and CC were calculated as the average from different moving windows.
All data analysis mentioned above were conducted in R (version 3.5.0, access on April 2018). 190

Spatial patterns of RH
Based on the 10-fold cross-validation, model efficiency (R 2 ) and RMSE, were 50% and 143 g C m -2 a -1 (Fig. S1), respectively. This indicates that the RF algorithm effectively captured the spatial-and temporal-variability of RH, therefore enabling deriving of a global gridded RH dataset. 195 The Ddata-derived RH dataset showed a strong spatial pattern globally (Fig. 2a). The largest RH fluxes occurred in tropical areas (e.g. Amazon tropical forests) at > 700 g C m -2 a -1 , followed by the subtropics, such as South China and America, and humid temperate areas, e.g. North America, Western and Central Europe, with an annual RH of 400-600 g C m -2 a -1 . Relatively low annual RH less than 200 g C m -2 a -1 was generally observed in areas with cold and dry climates, such as boreal areas, characterized by low temperatures and short growing seasons, dry or semi-arid regions (e.g. Northwest China), where water 200 availability limits ecosystem development. However, the most variable changes in RH over the time from 1980 to 2016 -using standard deviation and coefficient of variation (CV, the ratio of the standard deviation and the mean) as proxies a proxy ( Fig.   2b an S3), were found in boreal regions with RH more higher than 70 g C m -2 a -1 or a CV > 0.7, . wWhile the majority areas of RH variabilitiesy exhibited smallerwere lower than 30 g C m -2 a -1 or a CV < 0.3. Similarly, TRENDY and previously reported RH values showed similar patterns with the highest RH in warm and humid areas and the lowest RH in cold and dry 205 regions (Fig. S2). However, differences existed in the absolute RH fluxes (Fig. S2). For example, CLM4 and VISIT models predicted RH to be more higher than 1400 g C m -2 a -1 within Amazon forest regions, while ISAM and LPJ-GUESS estimates were typically low at around 1000 g C m -2 a -1 . However, the data-derived RH dataset and Hashimoto RH showed the highest RH fluxes in tropical regions of about 800 g C m -2 a -1 .
To examine the similarity in the patterns of between the data-derived RH dataset and those established by TRENDY models 210 and /Hashimoto RH, the CMP method was employed (Fig. 3). Larger D and lower CC values indicate less consistent magnitudes and a local gradient distribution between the two compared images. The Ddata-derived RH dataset and Hashimoto RH differed greatly in East Canada and the Middle East with D values above higher 200 g C m -2 a -1 and CC values lower than -0.5. Interestingly, the most noticeable differences between the data-derived RH values and TRENDY model mean TRENDY RH values, occurred in East Asia and the Middle East, where D was higher than 500 g C m -2 a -1 , while CCs was were around 215 -0.1. When assessing each TRENDY model individually (Figs. S4 and S5), the differences between the data-derived RH dataset and TRENDY RH were even larger. The most remarkable differences was were found for CLM4 and VISIT models in regions where D was above 800 g C m -2 a -1 with CC values of about -0.3 (East Asia and America).
Across the latitudinal gradients, zonal mean RH values increased from cold or dry areas (e.g. tundra, and desert or semi-arid areas) to warm or humid areas (e.g. temperate and tropical areas, Fig. S6). The Ddata-derived RH dataset varied from 60±12 220 at about 75 o N to 640±71 g C m -2 a -1 at the equator, reflecting a higher resource limitation in high latitude areas and a less resource limitation in low latitude areasless stress from environmental limitations. In the dry tropical areas (10 o S -25 o S and 10 o N -25 o N) limited by water, the zonal mean RH decreased slightly. With the increase of water availability, RH showed a second peak in the Northern and Southern Hemispheres around 20 o N and 40 o S, respectively. NonethelessHowever, there was a high level of variability between the data-derived RH and those predicted by TRENDY/Hashimoto RH in the equatorial 225 regions (Fig. S6), with predictions generally overestimating RH at the equator. Peak RH values in the equatorial region ranged from 660±65g C m -2 a -1 for ORCHIDEE modelin previously published estimations, to above 1200±460 g C m -2 a -1 for CLM4 model, resulting in a considerably higher peak RH value for the model mean (950±300 g C m -2 a -1 ).

Total RH
Over the last 37 years, the global RH has increased from 55.8 Pg C a -1 (1 Pg = 1×10 15 g) in 1992 to 58.3 Pg C a -1 in 2010, 230 with an average of 57.2±0.6 Pg C a -1 and strong annual variabilitiesy (Fig. 4). Compared to the data-derived RH dataset, TRENDY/Hashimoto RH was underestimated global RH values (Fig. 56a), with the exception of the VISIT model. ISAM predicted the lowest global RH of 34.8±0.4 Pg C a -1 , while the VISIT model produced the highest RH of 59.9±0.6 Pg C a -1 (Fig. 5a). The model mean RH was 47.6±0.5 Pg C a -1 , underestimating RH by 9.6 Pg C a -1 (16%) in comparison to the dataderived RH dataset. Due to this large divergence, the strength of correlation between the data-derived RH and 235 TRENDY/Hashimoto RH varied greatly from 0.06 to 0.72 (Fig. 5b). Boreal, temperate and tropical regions were the three most important contributors for the global RH according to the Köppen -Geiger climate classification system (Peel et al., 2007), contributing 76% of the total global RH. The mean RH of boreal, temperate and tropical areas were 10.8±0.3, 12.9±0.1 and 19.5±0.2 Pg C a -1 , accounting for 19%, 22% and 35% of the total global RH, respectively (Fig. S8).

Trends in RH 240
Globally, although there was a great inter-annual variability in RH, the total RH has significantly increased at the a rate of 0.036±0.007 Pg C a -2 from 1980 to 2016 (p = 0.000, Fig. 4). Comparison of the data-derived RH dataset and that TRENDY RH generated by TRENDY models during the period of 1981 to 2010 was performed. The Ddata-derived RH increased at 0.041±0.01 Pg C a -2 (Fig. S7), which was lower than that of TRENDY RH (0.057±0.009 Pg C a -2 ) and Hashimoto RH (0.057±0.009 Pg C a -2 )exhibiting slower increasing trends of 0.05±0.007 and 0.057±0.009 Pg C a -2 of TRENDY/Hashimoto 245 RH, respectively. Additionally, temporal trends varied greatly among TRENDY models (Fig. S7), with the largest increasing trend of 0.123±0.013 Pg C a -2 for LPJ-GUESS and the largest decreasing trend of -0.018±0.007 Pg C a -2 for ISAM.
Temporal trends varied according toamong climate zones. RH in boreal and temperate areas increaseding by 0.020±0.004 and 0.007±0.002 Pg C a -2 from 1980 to 2016 (both ps = < 0.000001, Fig. S8), respectively, while RH in tropical areas did not show a significant temporal trend, although inter-annual variabilities were observed (p = 0.362, Fig. S8). TRENDY/Hashimoto 250 9 RHs showed significant increasing temporal trends in boreal, temperate and tropical regions, except those estimated by ISAM and ORCHIDEE models (Fig. S9-11). However, the increasing magnitude varied among different TRENDY models.
From 1980 to 2016, the global RH was expected to be driven by multiple environmental factors, such as temperature and precipitation. During this period, MAT and MAP levels increased significantly by 0.34±0.032 o C and 6.69±2.399 mm per decade, respectively (p < 0.01, Fig's. S12a and b). Therefore, the correlations between RH and temperatureMAT/precipitation 255 MAP were evaluated. Globally, RH was significantly correlated with temperature MAT (R 2 = 0.56, p = < 0.0010) and precipitation MAP (R 2 = 0.42, p = < 0.000001, Fig. 6) anomalies. On average, the global RH increased by 1.08±0.163 Pg C a -1 per 1 o C increase in MAT and 0.23±0.046 Pg C a -1 per 10 mm increment in MAP.

Spatial pattern of RH trends
Spatially, the data-derived RH trends presented heterogeneous geographical patterns (Fig. 2c). Positively increasing trends in 260 of RH were found for more than half of the global land areas (59%, calculated from cell areas; Fig. S13). Generally, the increasing rates of RH were lower than 3 g C m -2 a -2 , in contrast, the highest RH increase was above 6 g C m -2 a -2 in boreal regions, such as Russia, North Canada and the Tibetan Plateau. RH exhibited a decreasing trend in 41% of the global land area and most considerably in South Asia (Fig. 2c). Similar to the data-derived RH trends, RH trends estimated by the TRENDY/Hashimoto RH trend also showed heterogeneous geographical patterns (Fig. S14). However, large discrepancies 265 were found among TRENDY/Hashimoto RH (Fig. 2c and S14). Generally, the largest increase in RH trends occurred in boreal areas, except for outputs by LPJ-GUESS and LPJ models, which showed a decreasing trend for most boreal areas. There was a decreasing trend across most of tropics (e.g. Southeast Asia), with the exception of VEGAS model results (Fig. S14).

Dominant factors in RH annual variability
Annual mean temperature and precipitationMAT and MAP were the most important factors dominating RH in 39% and 36% 270 of global land areas, respectively (Fig. S15). While SWC dominated the remaining 25% of global land areas. Spatially, the dominant drivers controlling RH varied greatly across the globe (Fig. 2d), with the area dominated by temperature mainly distributed in boreal areas above 50 o N. This was also observed in the relatively high and positive partial correlation coefficient between temperature and RH (Fig. S15a). While precipitation dominated temperate areas between 25 o N and 50 o N (such as North China, the Middle East and America), where a wide distribution of desert or semi-arid regions occur, SWC dominated 275 in tropic areas, such the Amazon, India and Africa. Similarly, water availability (SWC and precipitation) were also main driving factors for RH in Australia.
Spatial patterns in environmental controls on TRENDY/Hashimoto RH varied greatly compared to the data-derived RH dataset or among TRENDY models (Figs. 2d and S15-17). Water availability (including precipitation and SWC) appeared to be more important than temperature. The percentage of the areas dominated by temperature (mainly distributed in boreal areas, 280 except for in ISAM model outputs), wasere less than the areas dominated by precipitation and SWC (globally distributed) (Fig.  S17). In terms of the modelled mean TRENDY RH, precipitation dominated most of global land areas (43%), followed by SWC (36%) and temperature (21%) (Fig. S15).

Comparison with Hashimoto RH
Despite the increasing efforts to quantify the global carbon cycle, large uncertainties still remain in the spatial-and temporal -patterns in RH. To the best of our knowledge, this is the first study to apply the RF approach to predict of the temporalspatialand spatial temporal-patterns of global RH using field observations. Globally, the mean RH amounted to 57.2±0.6 Pg C a -1 from 1980 to 2016, 13.6 Pg C a -1 higher than RH from a satellite-driven estimates (Konings et al., 2019) , and 6.4 Pg C a -1 290 higher than Hashimoto RH (Hashimoto et al., 2015). The differences between the data-driven RH in this study and Hashimoto may be due to several reasons, Firstly, the two RH products covered different land areas, with the data-derived RH dataset in the present study covering a higher land area. If the data-derived RH dataset was masked by Hashimoto RH over 1981-2010 the total RH was 51.8±0.6 Pg C a -1 , close to that of Hashimoto RH with 51.1±0.7 Pg C a -1 (Fig. S18), respectively. However, the temporal spatial-and spatial temporal-patterns varied greatly (Figs. 3 and 5). 295 Secondly, the two RH products used different variables and algorithms for RH predictions. RH was not only affected by temperature and precipitation, but also by carbon substrates, soil nutrient levels and other variables (Hursh et al., 2017). Besides temperature and precipitation, we also included SWC, soil nitrogen and carbon contents as indicators for environmental and nutrient constraints on RH. Conversely, Hashimoto RH was estimated from a climate-driven model including only temperature and precipitation as the driving variables (Hashimoto et al., 2015). This simple model can partly explain the reasons that 300 Hashimoto RH could not capture the significant decrease in RH in 1982 and 1991 due to El Chichón and Pinatubo eruptions, respectively (Zhu et al., 2016), while the data-derived RH dataset and TRENDY RH successfully captured such effects.
Thirdly, the linear model between total soil respiration and RH was developed based on forest ecosystems (Bond-Lamberty et al., 2004;Hashimoto et al., 2015), which could be another uncertainty when applying this linear model to other ecosystems, e.g. croplands and grasslands. 305

Comparison with TRENDY models
As data-derived RH dataset often serve as a benchmark for terrestrial biogeochemicalglobal vegetation models, the dataderived RH datasetempirically derived global patterns of annual RH were was compared with TRENDY models from 1980 to 2010 results. Although the data-derived RH dataset lied within the model range (34.8±0.4 Pg C a -1 for ISAM to 59.9±0.6 Pg C a -1 for VISIT, Fig. 5a), the model mean TRENDY RH was underestimated RH by 16% compared to the data-derived RH 310 dataset. Due to the different temporal trends among TRENDY models outputs and their low spatial correlations to the of dataderived RH dataset and TRENDY RH (correlation efficiencies ranging from 0.06 to 0.72, Fig. 6b), TRENDY RH clearly have different sensitivities to climate variations. Additionally, the difference in RH magnitude and spatial pattern varied considerably, as shown by analysis of absolute distances and cross-correlations. This effect was mostly notable in tropical areas in VISIT and CLM4 models outputs (Figs. S4 and S5). This phenomenon may be associated to several reasons. Firstly, 315 plant functional types differed among TRENDY models. For example, the VEGAS model included four plant functional types (Zeng et al., 2005), while the LPJ model defined ten plant functional types (Sitch et al., 2003).
Secondly, for each set of equations, constant vegetation parameters (e.g. photosynthetic capacity) were applied across time and space for most TRENDY models, which may induce an RH bias. Model parameters using short-term observations do not account for the inter-annual variability of climatic and soil conditions, generating a simplistic representation of RH due to the 320 inability to capture the response of RH to new environmental controls in short-term observations. Thirdly, models that do not consider nitrogen constraint could overestimate the increasing trend of RH, because nitrogen limitation was globally observed (LeBauer and Treseder, 2008). This could explain why the CLM4 model with a nitrogen control constraint produced a much smaller increasing trend compared to other TRENDY models, with the exception of ISAM (Fig. S7). Therefore, including soil nitrogen as a driving variable in modelling RH in this dataset had the advantage to detect 325 the nitrogen constrain on RH.
Fourthly, the lacking of the representation of human activities and agricultural management (e.g. fertilization and irrigation) may underestimate RH, because fertilization and irrigation were important practices to increase RH Zhou et al., 2016). This could explain why five of seven TRENDY models could not explain the significant increasing change of RH in middle China (Fig. S14), which experienced an intensive use of fertilization for food security in recent decades. 330 Finally, uncertainties and differences in model structures could also lead to inconclusive RH estimations. Although the same climatic data, e.g. temperature and precipitation, were used for TRENDY models to reduce the uncertainty causing by various meteorological forcings, systematic errors may be caused by applying a particular forcing and the errors might be propagated to model outputs (Anav et al., 2015). Therefore, TRENDY models should be improved by incorporating more processes such as nutrient constrains and an assessment of the model response to environmental variability (Keenan et al., 2012;Wang et al., 335 2014;Yao et al., 2018b).

Linkage to global carbon balance
If assuming the global ratio of RH/total soil respiration ranged from 0.56 ( (Hashimoto et al., 2015)) to 0.63 ((Bond-Lamberty et al., 2018)), annual soil respiration varied from 90.8 to 102.1 Pg C a -1 , within the reported values of 83 to 108 Pg C a -1 based on recent studies (Bond-Lamberty and Thomson, 2010b;Hursh et al., 2017). This indirectly highlights the reliability of the 340 use of RF for global RH prediction. Moreover, these findings also have important indications to carbon balance estimations.
According to a recent NPP estimate from observations and IPCC report data (IPCC, 2013;Li et al., 2017), the global NPP ranged from 61.5 to 60 Pg C a -1 , respectively. The residual between RH and NPP (net ecosystem production) was 2.8-4.3 Pg C a -1 , which is similar to the global estimates of net ecosystem production from the International Geosphere-Biosphere Programme, which ranged from 1.9 to 4.1 Pg C a -1 from 1959 to 2016 (Le Qué ré et al., 2013;Le Qué ré et al., 2016;Le Qué ré 345 et al., 2014). With a 1 o C increase in global MAT, RH will increase by 1.08 Pg C a -1 globally, and it such increase is 0.23 Pg C a -1 for 10 mm increment in global MAP. These findings indicate that carbon fluxes from the decomposition of soil organic matter and litter (RH) maybe positively feedback to future global climate change -typically characterized by the increasing temperature and alterations the changes in precipitation (IPCC, 2013).

Dominant factors in RH 350
Dominant factors driving RH varied spatially. As temperature and energy were the most limited climatic factors in high latitude areas, temperature was a dominant factor for RH in high latitudinal regions above 50 o N (Fig. 2d), with low temperatures leading to low RH (Fig. 2a). Similarly, due to the limited amount of precipitation, RH in semi-arid areas was mainly controlled by precipitation, which iswas consistent with reported both field observations (Bai et al., 2008) and modelling studies (Gerten et al., 2008). SWC control of RH in tropical areas could be explained by the mechanisms of RH. 355 Excessively high SWC can reduce the diffusion of oxygen, while excessively low SWCs could limit water and soluble substrate availabilityies, preventing microbial activitiesy (Luo and Zhou, 2006;Xu et al., 2004). Suseela et al. (2012) proposed that RH fluxes declined sharply when volumetric soil moisture reduced below ~15% or exceeded ~26%, which supportsed the findings of the present study. However, it should be noted that dominant environmental controls on spatial carbon flux gradients might vary among different years (Reichstein et al., 2007), such as with climatic extremes. 360

Temporal variability of tropical, temperate and boreal areas
Temporally, RH in tropical areas did not exhibit a significantly temporal pattern between 1981 and 2010 (Fig. S8, p = 0.362), indicating that in our model, climate change did not affect RH fluxes in these areas, negatively feeding back to future climate change. However, RH in boreal and temperature areas experienced significant increasing trends of 0.020±0.004 and 0.007±0.002 Pg C a -21 , respectively (Fig. S7), suggesting that a positive feedback may occur with future climate change. 365 Tremblay et al. (2018) proposed that the increased RH was mainly related to the increasing temperatures in boreal forest soils, which supportsed the findings of the present study. It should be noted that both the data-derived RH dataset and Hashimoto/TRENDY RH in boreal areas showed a temporally increasing trend from 1981 to 2010, although the magnitude of increase differed (Fig. S9). Furthermore, despite the ISAM model showing a decreasing trend for temperate and tropic regions, the ISAM model had an increasing trend in RH from 1981 to 2010 in boreal areas (Fig. S9). These results indicate that boreal 370 regions are becoming increasingly important in global carbon cycling and that the increasing trend may continue due to the a large amount of carbon stored in soils. Therefore, climate change may fundamentally alter carbon cycling in boreal areas through changes in the decomposition rate of soil organic matter (Crowther et al., 2016;Hashimoto et al., 2015;Schuur et al., 2015). Furthermore, the response of RH to climate variability varied with climate zone, indicating that different carbon loss rates from RH will occur in different regions to future climate change. 375

Advantages, limitations and uncertainties
Based on the updated SRDB databaseset, we used a RF algorithm to predict the temporal and spatial-and temporal-patterns of RH at the global scale and its response to environmental variables, and empirically derived global patterns of annual RHthe data-derived RH could serve as a benchmark for terrestrial biogeochemicalglobal vegetation models and reduce RH uncertainties. This developed data-derived RH dataset provideds several advantages to the estimation of global RH compared 380 to previous studies, e.g. Hashimoto et al. (2015) and Konings et al. (2019). Firstly, we compiled up-to-date field observations from SRDB and Chinese peer-review literatures up to March 2018, including 504 observations in total covering the majority global terrestrial ecosystems and climate zones (Fig. 1). Secondly, total RH and its inter-annual variability were assessed for boreal, temperate and tropical zonesthree-three main global climate zones. Analysis from the data-derived RH dataset further concludeds that RH in different climate zones responded differently to global climate change. Thirdly, we applied the 385 RF to predict and mapped RH at the global scale using climate and soil predictors. Compared to the linear regression analysis for predicting soil respiration (as no such global RH predictions were previously available for comparison), which has reported with model efficiencies of <50% (Bond-Lamberty and Thomson, 2010b;Hashimoto et al., 2015;Hursh et al., 2017), the RF algorithms achieved a higher model efficiency of 50%. In addition to, allowing a feature selection according the importance value of each variable and avoiding overfitting (Bodesheim et al., 2018;Jian et al., 2018), the RF algorithm improveding RH 390 modelling accuracies and reduceding uncertainties. AdditionallyMeanwhile, the data-derived RH dataset was cross-validated globally by a 10-fold cross-validation (see "materials and methods" section), which could improve its reliability and feasibility compared to TRENDY based RH that are were not validated and calibrated by field observations, bridging the knowledge gap between local, regional and global scales temporally and spatially with a large number of empirical field measurements.
However, although empirically derived global patterns of annualthe data-derived RH dataset could be used as a benchmark 395 for the verification of global carbon cycle modelling, bridging the knowledge-gaps between local, regional and global scales, few uncertainties and limitations still remained. Firstly, the RF algorithms constructsed a model based on the training dataset and is was typically data limited in terms of quantity, quality and representativeness. Uneven data distribution has been a known issue in many ecological studies across the world, e.g. Bond-Lamberty and Thomson (2010b), Jung et al. (2011), Xu andShang (2016) and Yao et al. (2018a). The RH observations were mainly from China, Europe and North America, while 400 there were a lack of RH observations in Russia, Africa, Australia and Southwestern Asia in our study. Thus Uthe uneven coverage of the observations iswas an important source of uncertainty to develop the data-derived RH dataset, which may cause a bias of the RF model towards the areas with more observations. However, our dataset covered a large climatic and edaphic gradient covering the major land covers and climate zones. Therefore, in future studies, increasing field observations in unsampled areas should greatly improve our ability to evaluate spatial-and temporal-patterns of RH at the global scale and 405 model global carbon cycle to future climate change.
Secondly, the misrepresentation of human activities, particularly regarding to land management and land use change, could result in uncertainties in RH (Bond-Lamberty et al., 2016;Tang et al., 2016). These human activities include both site-level in situ information and the corresponding global grids. Otherwise, such information must not be included as corresponding site 14 information or respective globally gridded datasets are missing or insufficient. Although soil organic carbon stock, soil nitrogen 410 content, SWC and shortwave radiation were selected as inputs for the development of the RF model, which could partly capture land use change, the impacts of land use change on the inter-annual variability of RH have not been fully qualified in the present study. Therefore, further efforts are required to characterize and quantify the effects of land use changes on the global RH.
Thirdly, the developed date-derived RH dataset was derived at an annual timescale, which may cause an additional 415 uncertainty regarding to the inter-annual variability of RH. Therefore, the need for a larger number of global observations and to develop finer-scale temporal dynamics need further exploration, in combination with remote-sensing measurements and field observations, which may provide new insights into terrestrial ecosystem carbon dynamics at the global scale. Besides, without consideration of the temporal changes of soil organic carbon content from 1980 to 2016 might be another uncertainty because the increase of productivity driven by CO 2 fertilization would increase litter input into soils. However, there is a lack 420 of soil organic carbon content that considering its temporal changes based on observations, which has constrained the further analysis of the effects of the temporal changes of soil organic carbon content on RH.
Finally, we developed a global RH at a half-degree spatial resolution, which included a scale mismatch between the observations and global gridded variables. This could be a great challenge for spatial modelling and using global gridded variables with a finer resolution is encouraged to overcome this limitation (Xu and Shang, 2016). On the other hand, the study 425 sites were globally distributed and there was a large climatic and edaphic gradient covering the major land covers and biomes, which should reflect a larger variability than the site-to-grid mismatch.

Data availability
The developed globally gridded RH database, the field RH observation dataset and R codes to produce the main results are publicly free for scientific purpose, which can be downloaded at https://doi.org/10.6084/m9.figshare.8882567 (Tang et al., 430 2019a).

Conclusions
Data-derived global RH dataset may be used as a benchmark for global vegetation terrestrial biogeochemical models, however, no such study has yet been conducted to assess the global variability in RH using a large dataset of empirical measurements to bridge the knowledge gap between local, regional and global scales. To fill this knowledge gap, we developed 435 a globally gridded RH dataset, which was 0.5 o × 0.5 o from 1980 to 2016 with an annually temporal resolution, using a RF algorithm by linking field observations and global variables. Robust conclusions include: (1) Annual mean RH was 57.2±0.6 Pg C a -1 between 1980 and 2016, with an increasing trend of 0.036±0.007 Pg C a -2 , indicating an increase in carbon loss from soils to the atmosphere with future climate change; (2) Significant temporal trends were observed in the RH in boreal and temperate areas, although none were found in tropical regions. This indicates that the temporal trend in RH varied with climate 440 zones, highlighting their different sensitivities to future climate change; (3) The magnitude and dominant factors in of the data-15 derived RH TRENDY RHresults generated by data-derived and TRENDY models varied greatly, indicating that future efforts should focus on improving the representation of RH in ecosystem models and the ecosystemand its response to environmental variability in terrestrial biogeochemical models; (4) More field observations are required in areas with limited observational datasets, with the integration of smaller-scale temporal dynamics (rather than annual timescales) potentially providing new 445 insights into terrestrial ecosystem carbon dynamics at the global scale; (5) The develop globally gridded data-derived RH dataset could serve as a benchmark to constrain the terrestrial biogeochemicalglobal vegetation models, further contributing to improve our understanding of the mechanisms of global soil carbon dynamics.
Author contributions. XT, SF and WY design the study; XT, WZ, SG, MD and ZY contributed to data analysis, including 450 improving R code; SL and GC provided constructive comments. All authors contributed to review the manuscript.