A global map of root biomass across the world’s forests

Root plays a key role in plant growth and functioning. Here we combine 10307 field measurements of forest root biomass worldwide with global observations of forest structure, climatic conditions, topography, land management and soil characteristics to derive a spatially-explicit global high-resolution (~ 1km) root biomass dataset, including fine and coarse roots. In total, 142 ± 32 Pg of live dry matter biomass is stored below-ground, that is a global average root:shoot biomass ratio of 0.25 ± 0.10. Our estimations of total root biomass in tropical, temperate and boreal forests are 44-226% smaller than earlier studies1–3. The smaller estimation is attributable to the updated forest area, spatially explicit above-ground biomass density used to predict the patterns of root biomass, new root measurements and upscaling methodology. We show specifically that the root shoot allometry is one underlying driver that leads to methodological overestimation of root biomass in previous estimations.

Abstract. As a key component of the Earth system, root plays the key role in linking Earth's lithosphere, hydrosphere, biosphere, and atmosphere. Here we combine 10307 field measurements of forest root biomass worldwide with global observations of forest structure, climatic conditions, topography, land management and soil characteristics to derive a spatially explicit global high-resolution (~ 1km) root biomass dataset, including fine and coarse roots. In total, 142 ± 25 (95% CI) Pg of live dry matter biomass is stored below-ground, representing a global average root:shoot biomass ratio of 35 0.25 ± 0.10. Our estimations of total root biomass in tropical, temperate and boreal forests are 44-226% smaller than earlier studies (Jackson et al., 1997;Robinson, 2007;Saugier et al., 2001). The smaller estimation is attributable to the updated forest area, spatially explicit above-ground biomass density used to predict the patterns of root biomass, new root measurements and upscaling methodology. We show specifically that the root shoot allometry is one underlying driver that leads to methodological overestimation of root biomass in previous estimations. Raw

Introduction
Roots act as a hub that connects complex feedbacks among biomes, soil, water, air, rocks and nutrients. Roots 45 mediate nutrient and water uptake by plants, below-ground organic carbon decomposition, the flow of carbohydrates to mycorrhizae, species competition, soil stabilization and plant resistance to windfall (Warren et al., 2015). The global distribution of root biomass is related to how much photosynthates plants must invest below-ground to obtain water, nitrogen and phosphorus for sustaining photosynthesis, leaf area and growth. Root biomass and activity also control the land surface energy budget through plant transpiration (Wang et al., 2016;Warren et al., 2015). While Earth Observation data combined 50 with field data enables the derivation of spatially explicit estimates of above-ground biomass with a spatial resolution of up to 30 meters over the whole globe (GlobalForestWatch, 2019;Santoro, 2018b), the global carbon stock and spatial details of the distribution of below-ground root biomass (fine + coarse) rely on sparse measurements and coarse extrapolation so far, therefore remaining highly uncertain.
More than twenty years ago, Jackson et al, 1996Jackson et al, , 1997 provided estimates of the average biomass density (weight 55 per unit area) and vertical distribution of roots for 10 terrestrial biomes. Multiplying their average root biomass density with the area of each biome results in a global root biomass pool of 292 Pg, with forests accounting for ~68%. Saugier, et al. (2001) estimated global root biomass to be 320 Pg by multiplying biome-average root to shoot ratios (R:S) by shoot biomass density and the land area of each biome. Mokany, et al. (2006) argued that the use of mean R:S values at the biome scale is a source of error because root biomass measurements are performed at small scales, but root distributions are highly spatially 60 heterogeneous and their size distribution spans several orders of magnitude, fine roots being particularly difficult to sample (Jackson et al., 1996;Taylor et al., 2013). With updated R:S and broader vegetation classes, Mokany, et al. (2006) Robinson (2007) further suggested that R:S was underestimated by 60%, which translated into an even higher global root biomass of 540-560 Pg. These studies provided a first order estimation of the root biomass for different biomes, but not of its spatial details. Further, it is worth noting that estimations of the global total root 65 biomass have increased with time.
An alternative approach to estimate root biomass is through allometric scaling, dating back to Enquist (1997, 1999) 6 7 and Enquist and Niklas (2002). The allometric scaling theory assumes that biological attributes scale with body mass, and in the case of roots, an allometric equation verified by data takes the form of ∝ where R is the root mass, S the shoot mass and a scaling exponent. In contrast to the studies listed above assuming the R:S ratio to be uniform, 70 this equation implies that the R:S ratio varies with shoot size when β is not equal to one (Cairns et al., 1997;Enquist and Niklas, 2002;Jiang and Wang, 2017;McCarthy and Enquist, 2007;Niklas, 2005;Robinson, 2004;Zens and Webb, 2002) .
Allometric equations also predict that smaller trees generally have a larger R:S with < 1 , which is well supported by measurement of trees of different sizes (Cairns et al., 1997;Jiang and Wang, 2017;Niklas, 2005;Robinson, 2004;Zens and Webb, 2002). The allometric equation approach was applied for various forest types, and the scaling exponent was 75 observed to differ across sites (Yunjian Luo et al., 2018), species (Cheng and Niklas, 2007), age (Cairns et al., 1997), leaf characteristics (Luo et al., 2012), elevation (Moser et al., 2011), management status (Ledo et al., 2018), climatic conditions, such as temperature (Reich et al., 2014), soil moisture and climatic water deficit (Ledo et al., 2018), as well as soil nutrient content and texture (Jiang and Wang, 2017). Despite successful application of allometric equations for site-and speciesspecific studies(Yunjian Luo et al., 2018), their use to predict global root biomass patterns appears to be limited and 80 challenging.

Overview
We use a new approach to upscale root biomass of trees at the global scale (Supplementary Figure 1) Figure 2). We compared the allometric upscaling and tested three machine learning techniques (the random forest, the artificial neural networks and multiple adaptive regression splines), searched through a pool of 47 predictor variables that include shoot biomass and other vegetation, edaphic, topographic, anthropogenic and climatic variables (Supplementary Table 1) and selected the random 90 forest model (RF) that performs best on cross validation samples (see section Building predictive models below). Using this RF model, we mapped the root biomass of an average tree over an area of ~1km x1 km across the globe using as predictors gridded maps of shoot biomass (weight per area) (Santoro, 2018a), tree height (Simard et al., 2011), soil nitrogen (Shangguan et al., 2014), pH (Shangguan et al., 2014), bulk density (Shangguan et al., 2014), clay content (Shangguan et al., 2014), sand content (Shangguan et al., 2014), base saturation (Shangguan et al., 2014), cation exchange capacity (Shangguan et al., 2014), 95 water vapor pressure (Fick and Hijmans, 2017), mean annual precipitation (Fick and Hijmans, 2017), mean annual temperature (Fick and Hijmans, 2017), aridity (Trabucco and Zomer, 2019) and water table depth (Fan et al., 2013) ( Supplementary Figures 10,11,12). Combining with the tree density (number of trees per area) (Crowther et al., 2015) at the global scale, we quantified the global forest root biomass.

Field measurements 100
Our dataset was compiled from literature and existing forest biomass structure or allometry databases (Falster et al., 2015) (Ledo et al., 2018;Schepaschenko et al., 2018;Schepaschenko et al., 2017). We included studies and databases that reported georeferenced location, root biomass and shoot biomass. For example, Ref (Poorter et al., 2015) is not included due to lack of georeferenced location and Ref (Iversen et al., 2017) in not used as we also need measurements of other plant compartments like shoot biomass. Repeated entries from existing databases were removed. One of the databases (Falster et 105 al., 2015) reported data on woody plants which also include shrub species. We kept the shrub data partly because the remote sensing products we used to generate our root map do not clearly separate trees from shrubs. Around 82% of the extracted entries also recorded plant height and management status. Height was identified as an important predictor in our model assessment, and entries were discarded when height was missing (18% of data). As woody plant age was reported in 19% of the entries only, the values of this variable was determined from another source of information, i.e. from a composite global 110 map introduced in the next section. Species names were systematically reported, but biotic, climatic, topographic and soil information were missing for a substantial proportion of entries and values of these variables were thus extracted from

Preparing predictor variables
We used 47 predictors that broadly cover 5 categories: vegetative, edaphic, climatic, topographic and anthropogenic (Supplementary Table 1). Vegetative variables include shoot biomass, height, age, maximum rooting depth, biome class and species. Edaphic predictors cover soil bulk density, organic carbon, pH, sand content, clay content, total nitrogen, total phosphorus, Bray phosphorus, total potassium, exchangeable aluminium, cation exchange capacity, base saturation (BS), soil 120 moisture and water table depth (WT). Climatic predictors are mean annual temperature (MAT), mean annual precipitation (MAP), the aridity index that represents the ratio between precipitation the reference evapotranspiration, solar radiation, potential evapotranspiration (PET), vapor pressure, cumulative water deficit (CWD=PET -MAP), wind speed, and mean diurnal range of temperature (BIO2 ), isothermality (BIO2/BIO7) (BIO3), temperature seasonality (BIO4), max temperature of warmest month (BIO5), min temperature of coldest month (BIO6), temperature annual range (BIO7), mean temperature 125 of wettest quarter (BIO8), mean temperature of driest quarter (BIO9), mean temperature of warmest quarter (BIO10), mean temperature of coldest quarter (BIO11), precipitation of wettest month (BIO13), precipitation of driest month (BIO14), precipitation seasonality (BIO15), precipitation of wettest quarter (BIO16), precipitation of driest quarter (BIO17), precipitation of warmest quarter (BIO18), precipitation of coldest quarter (BIO19). The topographic variable is elevation and we take the management status (managed or not) as the anthropogenic predictor. All references are given in Supplementary  130 https://doi.org/10.5194/essd-2021-25 As in-situ field measurements of above-ground biomass (AGB) do not offer a full global coverage, gridded shoot biomass data were derived from satellite AGB products to predict root biomass at the global scale with a 1km by 1 km spatial resolution. The gridded global shoot biomass dataset used in our study has been extensively calibrated with in-situ observations and is currently the most reliable source of information on shoot biomass offering a global coverage (Baccini et 135 al., 2017;Santoro, 2018a). To derive the shoot or AGB per tree (in unit of weight per tree) to generate spatially explicit global root biomass, we combined the GlobBiomass-AGB satellite data product (Santoro, 2018a) ( in unit of weight per unit area) with a tree density map (number of trees per unit area) (Crowther et al., 2015). The GlobBiomass dataset was based on multiple remote sensing products (radar, optical, LiDAR) and a large pool of in-situ observations of forest variables (Santoro et al., 2015;Santoro, 2018b). The original GlobBiomass-AGB map was generated at 100 m spatial resolution; for this study, 140 the map was averaged into a 1 km pixel by considering only those pixels that were labeled as forest (Santoro, 2018b). A pixel was labeled as forest when the canopy density was larger than 15% according to Hansen et al. (2013)'s dataset (Hansen2013) averaged at 100 m. The 1-km resolution global tree density map was constructed through upscaling 429,775 ground-based tree density measurements with a predictive regression model for forests in each biome (Crowther et al., 2015).
The forest canopy height map took advantage of the Geoscience Laser Altimeter System (GLAS) aboard ICESat (Ice, Cloud, 145 and land Elevation Satellite). Forest definitions are slightly different among these three maps. Forest area of the tree density map was based on a global consensus land cover dataset that merged four land cover products (Tuanmu and Jetz, 2014), and which gave a total tree count equal to the Hansen et al. (2013) land cover product (Crowther et al., 2015). The canopy height map used the Globcover land cover map (Hagolle et al., 2005) as reference to define forest land. We took Hansen2013 with a 15% canopy cover threshold as our base forest cover map. We approximated the missing values in tree density and height 150 (due to mismatches in forest cover) by the mean of a 5x5 window that is centered on the corresponding pixel. We quantified the potential impact of mismatches in forest definition by looking into two different thresholds: 0% and 30%.
We merged several regional age maps to generate a global forest age map. The base age map was derived from biomass through age-biomass curve similarly as conducted in tropical regions in ref. (Poulter, 2019) This age map does not cover the northern region beyond 35 N. We filled the missing northern region with a North American age map (Pan et al., 155 2011) and a second age map covering China (Zhang et al., 2017). Remaining missing pixels were further filled with the age map derived from MODIS disturbance observations. For the final step, we filled the remaining pixels with the GFAD V1.1 age map (Poulter, 2019). GFAD V1.1 has 15 age classes and 4 plant functional types (PFTs). We choose the middle value of each age class and estimated the age as the average among different PFTs.
Detailed information of all ancillary variables is listed in Supplementary Table 1. To stay coherent, we re-gridded 160 each map to a common 1 km x 1 km grid through the nearest neighbourhood method.

Building predictive models
We investigated the performance of the allometric scaling and three non-parametric models: RF, ANN and MARS.
Allometric upscaling relates root biomass to shoot biomass in the form of ∝ . RF is an ensemble machine learning method that builds a number of decision trees through training samples (Breiman, 2001). A decision tree is a flow-chart-like 165 structure, where each internal (non-leaf) node denotes a binary test on a predictor variable, each branch represents the outcome of a test, and each leaf (or terminal) node holds a predicted target variable. With a combination of learning trees (models), RF generally increases the overall prediction performance and reduces over-fitting. ANN computes through an interconnected group of nodes, inspired by a simplification of neurons in a brain. MARS is a non-parametric regression method that builds multiple linear regression models across a range of predictors. 170 Tree shoot biomass from the in-situ observation data spans a wider range than shoot biomass per plant derived from global maps (1x10 -7 to 8800 vs. 7.9x10 -5 to 933 kg/plant). To reduce potential mapping errors, we selected training samples with shoot biomass between 5x10 -5 and 1000 kg/plant. The medians and means of shoot biomass, root biomass and R:S from the selected training samples are similar to those from the entire database. Also, to reduce the potential impact of outliers, we analyzed samples with R:S falling between the 1 st and 99 th percentiles, which consists of 9589 samples with R:S ranging 175 from 0.05 to 2.47 and a mean of 0.47 and a median of 0.36. Sample filtering slightly deteriorated model performance and had minor impact on the final global root biomass prediction (145 from whole samples vs.142 Pg from filtered data). We chose root biomass as our target variable instead of R:S because big and small trees contribute equally to R:S while big trees are relatively more important in biomass quantification. In our observation database, we have more samples being small woody plants (Supplementary Figure 6). A model with an overall good performance will not guarantee a good prediction on 180 woody plants with higher biomass. We furthermore split the in-situ measured shoot biomass into three groups, namely measurements with shoot biomass smaller than 0.1, between 0.1 and 10, and larger than 10 kg/plant, and trained a specific model for each class. The rationale behind this splitting is: (1), to remove the bias of small plants from the distribution of insitu measured woody shoot biomass (Supplementary Figure 6); (2), to account for the shift of root shoot allometry with tree size (Poorter et al., 2015) (Ledo et al., 2018;Zens and Webb, 2002); (3), to improve the performance of independent 185 validation through numerous combinations of splitting trials; (4), and because tests through weighting samples or resampling samples (e.g., over-sampling using Synthetic Minority Over-sampling Technique) showed no better performance.
Model performances were assessed by 4-fold cross-validation using two criteria: the mean absolute error (MAE) and the R-squared value (R 2 ). MAE quantifies the overall error while R 2 estimates the proportion of variance in root biomass that is captured by the predictive model. We favoured the model with a smallest MAE, a highest R 2 and with minimum 190 number of predictors. For non-parametric models, starting from a model with all 47 predictors, we sequentially excluded predictors that did not improve model performance one after another. The order of removing predictors was random. After a combination of trials, the best model was from RF and the final set of predictors included shoot biomass, height, soil nitrogen, pH, bulk density, clay content, sand content, base saturation, cation exchange capacity, vapor pressure, mean annual precipitation, mean annual temperature, aridity and water table depth. 195

Generation of the global root biomass map
Over an area of 1km x 1km, we assumed a tree with an average shoot biomass follows the RF model trained above. within-vegetation competition), we implicitly accounted for some sub-pixel variability (e.g., resource competition and responses to environmental conditions) on root biomass. We combined the RF model with global maps of selected predictor 200 variables to produce the root biomass map which has a unit of weight per tree. This map was multiplied by tree density at 1km resolution to obtain the final root biomass map with a unit of weight per area (Supplementary Figure 1).

Uncertainty quantification
We estimated the overall uncertainty of the root biomass estimates through quantifying errors caused by predicting root biomass at the 1-km resolution (ƞpred) and converting root biomass per tree to root biomass per unit area ( ƞcon). We 205 quantified the prediction uncertainty through an ensemble of predictions. We collected 8 additional global predictor datasets (3 shoot biomass, 2 soil and 3 climate datasets) (Supplementary Table 2) and carried out 8x4 (4 folds) sets of additional predictions replacing the predictors by each of these additional data maps. We calculated the standard deviation among 36 predictions for each pixel (Supplementary Figure 4a). Converting root biomass from per tree to per area is through the tree density (Crowther et al., 2015). We assumed the coefficient of variation (CV, i.e., the ratio of the standard deviation to the 210 mean) in tree density mapping caused the same relative uncertainty in our per unit area root biomass. CV in tree density mapping at the biome scale was derived from Ref (Crowther et al., 2015) through dividing uncertainties in quantifying total tree numbers by the total tree numbers. ƞcon in terms of standard deviation is therefore equal to the product of CV and the mean root biomass at each pixel ( Supplementary Figure 4b). At last we propagated these two sources of uncertainty assuming these errors were random and independent. Note that we did not account for uncertainties in in-situ root biomass 215 measurements used in training the RF model. The overall uncertainty (standard deviation) at the pixel level was calculated through, ƞ = √ƞ 2 + ƞ 2 (1) At the biome and global scales, we obtained total root biomass for each of the 36 predictions and estimated the standard deviations of the total root biomass. ƞcon was estimated by multiplying CV by biome or global-level root biomass. We 220 propagated these two sources of uncertainty through Equation 1. Note that the semivariogram of the random forest prediction errors do not show a clear autocorrelation pattern (Supplementary Figure 10).

Relative importance of predictor variables
The impact of predictors on predicting R:S was estimated through the Spearman's rank-order correlation at both the 225 global and biome scales. We log-transformed the R:S and shoot biomass before standardizing these datasets. Partial dependence plots (Hastie et al., 2009) show the marginal effect that one predictor has on root biomass from a machine learning model, and serves as a supplement to the Spearman correlation.

Results
We estimated a global total root biomass of 142±25 (95% CI) Pg (see Methods for uncertainty estimation and 230 Supplementary Figures 3, 4) for forests when forest is defined as all areas with tree cover larger than 15% from the Hansen  Table 3). Given our use of a tree cover threshold of 15% at 30m resolution, our estimate ignores the roots of isolated woody plants present in arid or cold regions (Staver et al., 2011), as well as heterogeneous (e.g. urban or agriculture) landscapes and is possibly an under-estimate. Total root biomass decreases from 151 to 134 Pg when the canopy cover threshold used to define forest land is increased from 0% to 30%. The root biomass density per unit of forest area is highest in tropical moist forest, followed by temperate coniferous and Mediterranean forest (  Broadly speaking, locations with small trees, low precipitation, strong aridity, deep water table depth, high acidity, low bulk density, low base saturation and low cation exchange capacity are more likely to have higher fractional root biomass ( Figure   3). In line with the allometric theory, shoot biomass emerged as the most important predictor of R:S and root biomass, as 255 given by the Spearman correlation analysis shown in Figure 3, and partial importance plots (Supplementary Figures 11, 12 (Figure 3) (Ledo et al., 2018), with trees and woody plants in dry regions generally having higher R:S (Supplementary Tables 3, 4), and with stronger dependence on precipitation especially when precipitation is low and on water table depth when the water table is deep. Temperature is slightly negatively correlated with R:S at the 260 global scale, in line with Reich et al. (2014). However, the relationship between temperature and below-ground biomass is not consistent among biomes (Figure 3) and biomass size groups (Supplementary Figures 11, 12, 13). The relationship between total soil nitrogen and root biomass is negative when soil nitrogen content is below 0.1-0.2 % (Supplementary Figure 11, 12, 13). Root biomass and R:S generally increases with soil alkalinity (Figure 3, Supplementary Figures 11, 12,   13). Low pH is toxic to biological activities and roots, especially as fine roots are sensitive to soil acidification, as revealed 265 by a recent meta-analysis (Cheng Meng et al., 2019). Our results also indicate overall positive correlations between CEC, BS and R:S, but the processes that may account for these correlations are less clear from literature. Age has been shown to be important for R:S (Schepaschenko et al., 2018). How age regulates R:S remains elusive, with studies showing both positive (Waring and Powers, 2017) and slightly negative (Mokany et al., 2006) relationship between R:S and age. Including forest age (see Methods: Preparing predictor variables) as a predictor only marginally improved our model prediction (see 270 Supplementary Information for details). It is likely that shoot biomass partially accounts for age information and the quality of the global forest age data might also affect the power of this variable in improving root biomass predictions. S1, Tropical moist forest (Biome 1), tropical dry forest (Biome 6), tropical/subtropical coniferous forest (Biome 11) and 275 forest in tropical/subtropical grasslands/savannas and shrublands (Biome 3) are aggregated to represent tropical systems (Tr).
S3, Estimation based on allometric equations and the global above-ground biomass dataset from ref (Santoro, 2018b). See Supplementary Table 7 for details. * RDS1, the relative difference of Tr + Te + Bo between this study (S1) and previous quantifications. RDS1 = (previous studythis study)/this study x 100%. For example, in the column with the head Jackson, RDS1 = (200-139)/139*100% = 44%. 285 & RDS2, the same as RDS1, but with the S2 definition of tropical, temperate and boreal systems. root biomass from this study, of above-ground biomass used for the prediction, and of modelled R:S ratios at the global and biome scales. (e) is a heat plot of observed vs. predicted root biomass in kg of root per individual woody plant (see Supplementary Figures 7, 8, 9 for cross-validation at biome, tree size class and continental scales). (f) shows the mean (purple) and median (grey) R:S as a function of shoot biomass from observations. A shift of the shoot biomass towards a larger size ((a), (c)) results in a smaller predicted mean R:S at the global scale ((b),(d)) (see Supplementary Table 4

Discussion
Our lower estimation of root biomass compared to earlier studies is attributable to differences in forest area (Supplementary For example, the forest area in temperate zones used in Jackson et al. (1997) was about one third higher than in this study.
Using the root biomass density (Supplementary Table 5) and estimation method from Jackson et al. (1997), but using the updated forest area map from this study, we estimated total root biomass of tropical, temperate and boreal forests to be 147 Pg (or 184 Pg if sparse forests in tropical/subtropical/temperate grasslands/savannas and shrublands and tundra region are accounted, S1 biome definition in Table 1). This value is smaller than the 200 Pg from Jackson et al. (1997), but still larger 320 than the 121 Pg (or 139 Pg ) (Table 1) from our machine learning approach. Our lower values of root biomass compared to Saugier et al. (2001), Mokany et al. (2006) and Robinson (2007) are caused mainly by our lower above-ground biomass density and R:S (Supplementary Table 5). Shoot or above-ground biomass density (AGB) of tropical zones is 70% lower in our study than in Robinson (2007) who used sparse plot data collected more than a decade ago (Supplementary Table 5, case S2), and this lower AGB explains 27-46% of our lower root biomass (Supplementary Tables 5, 6). On the other hand, 325 lower biome average R:S explains 41-48% of our underestimation compared to Robinson (2007). To elucidate this difference, we calculated weighted biome average R:S ratios through dividing total biome-level shoot biomass by root biomass (i.e., weighted mean R:S). These weighted mean R:S, ranging between 0.19 and 0.31 across biomes (Supplementary  Table 5).
The common practice of estimating root biomass through an average R:S without considering the spatial variability of biomass and this ratio 4 is a source of systematic error, leading to overestimating the global root biomass for two reasons.
Firstly, upscaling ratios through arithmetic averages (possibly weighted by the number of trees or area, but not accounting 335 for the fine-grained distribution of biomass) systematically overestimates the true mean R:S because R:S is a convex negative function of S given by : ∝ −1 with taking typical values of about 0.9 (Mokany et al., 2006;West et al., 1997West et al., , 1999) (see also Supplementary Information: Arithmetic mean R:S section). This explains why high-resolution S data used to diagnose weighted mean R:S ratios in our approach give generally smaller values than using arithmetic means across grid cells at the biome level (Weighted R:S Ratio in Supplementary Table 3  Multiplying this biome-level arithmetic mean R:S by the average biome-level shoot biomass (Supplementary Table 3) yielded a global forest root biomass of 155 Pg, larger than 142 Pg. Secondly, available measurements tend to sample more small woody plants than big trees compared to real world distributions, because small plants are easier to excavate for measuring roots (see Figure 2a, 2c) but smaller plants tend to have larger R:S (Figure 2e, see also Refs Niklas, 2002) (Zens andWebb, 2002)). This sampling bias shifts the R:S towards larger values. If we use the biome-level mean R:S 345 from our in-situ database (Mean (Obs) in Supplementary Table 4), multiplying the shoot biomass (Supplementary Table 3) yielded a global value of 233 Pg, larger than using the mean R:S across grid cells through RF (155 Pg). Our RF approach uses in-situ data for training but in the upscaling, it accounts for realistic distributions of plant size (Supplementary Figure 5; Supplementary Table 4). We further verified that our upscaled R:S ratios are robust to sub-sampling the training data in observed distributions, so that the bias of training data towards small plants does not translate into a bias of upscaled results 350 (see Method, Supplementary Figure 8).
The upscaling approach using allometric equations should also tend to overestimate (see Supplementary Information: Allometric upscaling section) the global root biomass due to the curvature of these allometric functions (Enquist and Niklas, 2002;Zens and Webb, 2002). The global forest root biomass ranges between 154 -210 Pg when root biomass was upscaled through different allometric equations collected from literature and fitted to our database (Supplementary  355   Table 7), generally larger than from the RF mapping. The global root biomass is likely to be smaller than when applying the allometric equation to the spatial average of shoot biomass (Supplementary Figures 14,15,16,17). Thus, future in-situ characterization of the distribution of tree sizes across the world's forests (see Supplementary Information: Allometric upscaling section) would greatly improve root biomass quantification. Note that how well our global estimate reflects the real root biomass is conditioned upon the accuracy of in-situ root measurement database used to train our RF model. Under-360 sampling is a common issue in many root studies due the fractal distribution of root systems in soils and the difficulty of implementing an efficient sampling strategy (Taylor et al., 2013), especially for large trees. We did not quantify the uncertainty of our estimates associated with in-situ root measurements due to lack of reliable information.
An accurate spatially explicit global map of root biomass helps to improve our understanding of the Earth system dynamics by facilitating fundamental studies on resource allocation, carbon storage, plant water uptake, nutrient acquisition 365 and other aspects of biogeochemical cycles. For example, the close correlation (correlation coefficient: 0.8) between root biomass and rooting depth (Fan et al., 2017) at the global scale and the importance of roots for plant water uptake and transpiration reflect close interactions between vegetation and hydrological cycles. The quest for drivers that affect allocation and consumption of photosynthetic production is a major focus of comparative plant ecology and evolution, as well as the basis of plant life history, ecological dynamics and global changes (McCarthy and Enquist, 2007). Turnover time and 370 allocation are two key aspects that contribute to large uncertainties in current terrestrial biosphere model predictions (Bloom et al., 2016;Friend et al., 2014). Our root biomass map does not provide data on turnover or allocation, but an outcome on their aggregated effects. Future studies combining the root biomass map with upscaled root turnover data could shed light on the allocation puzzle. The growth of the fast turnover part of roots, mostly fine roots, and leaves are highly linked. If we https://doi.org/10.5194/essd-2021-25 assume an annual turnover of leaves and fine roots, a preliminary estimation of average forest fine root biomass (from leaf 375 biomass) reaches 6.7-7.7 Pg (see Supplementary Information: Preliminary estimation of fine root biomass). Despite being a small portion of total plant biomass and highly uncertain, fine roots are temporally variable and functionally critical in ecosystem dynamics. Future studies on global distribution and temporal dynamics of fine roots are valuable. Considering specific biomes, tropical savannas would benefit from better root biomass estimation due to its large land area, and in tropical dry forests, field measurements of root and shoot biomass are needed to refine root biomass quantifications.

Competing interests 395
The authors declare that they have no conflict of interest.