A biomass equation dataset for common shrub species in China

Shrub biomass equations provide an accurate, efficient and convenient method in estimating biomass of shrubland 10 ecosystems and biomass of the shrub layer in forest ecosystems at various spatial and temporal scales. In recent decades, many shrub biomass equations have been reported mainly in journals, books and postgraduate's dissertations. However, these biomass equations are applicable for limited shrub species with respect to a large number of shrub species widely distributed in China, which severely restricted the study of terrestrial ecosystem structure and function, such as biomass, production, and carbon budge. Therefore, we firstly carried out a critical review of published literature (from 1982 to 2019) on shrub biomass 15 equations in China, and then developed biomass equations for the dominant shrub species using a unified method based on field measurements of 738 sites in shrubland ecosystems across China. Finally, we constructed the first comprehensive biomass equation dataset for China’s common shrub species. This dataset consists of 822 biomass equations specific to 167 shrub species and has significant representativeness to the geographical, climatic and shrubland vegetation features across 20 China. The dataset is freely available at https://doi.org/10.11922/sciencedb.00641 (Wang et al., 2021) for non-commercial scientific applications, and this dataset fills a significant gap in woody biomass equations and provides key parameters for biomass estimation in studies on terrestrial ecosystem structure and function.

estimation is also partly attributed to insufficient shrub biomass equations in estimating the shrub layer biomass (Estornell et al., 2011;Lin et al., 2010).
Shrub biomass equations refer to quantitative relationships between biomass of the whole individual or different components (such as stems, branches, leaves and roots) and one or several dendrometric variables (such as shrub height, 35 basal diameter, crown projection area, etc.) (Chojnacky et al., 2013;Lambert et al., 2005;Whittaker and Woodwell, 1968).
In recent decades, biomass equations of certain shrub species have been developed primarily consulted the methods in tree biomass equation researches, by harvesting samples in the growing season, establishing the optimal biomass equation through variable selection, model selection and precision evaluation (Chave et al., 2014;Jenkins et al., 2003;Ludwig et al., 1975). Representative researches mainly focused on following shrubs species, such as Ostryopsis davidiana, Spiraea 40 pubescens, Rosa xanthina, Caragana korshinskii, Hedysarum scoparium and several shrub species of Artemisia (Chen et al., 2002;Li et al., 2014;Zhang et al., 1993;Zhang, 1989).
However, due to the diversity of shrub species and the wide geographic distribution, there are still severe problems in shrub biomass estimation: first, biomass equations are mostly limited to certain research areas and species (Hounzandji et al., 2015;Muukkonen, 2007;Zeng et al., 2015); second, equation forms are diverse and lack a relatively unified method 45 (Foroughbakhch et al., 2005;Picard et al., 2015); third, only a few studies verified the accuracy of the equations with independent field measured data (Dong et al., 2012;Kozak and Kozak, 2003). Therefore, it is necessary to develop a biomass equation dataset for common shrub species in China using a relatively unified method based on a large number of field measurements (Cifuentes Jara et al., 2015), after collecting and screening biomass equations from published studies. Therefore, in this research, we firstly scrutinized literature on shrub biomass equations published in recent decades and 50 collected shrub biomass equations of high quality. Afterwards, we developed biomass equations for the dominant species of shrublands using a unified method, based on a large number of field measured biomass data obtained from a national scale shrubland ecosystem investigation. Consequently, we constructed a comprehensive biomass equation dataset for common shrub species in China. The dataset covers broad geographical and climatic gradients, represents the common shrub communities across China, and facilitates terrestrial ecosystem woody biomass estimation on large scales. 55

Equation collection and screening
Using the following criteria, we critically scrutinized the collected literature to obtain reliable biomass equations. 65

Scope
Natural and planted shrublands and those formed due to human disturbance (deforestation, over cutting of arbor species and forest fire, etc.) were all investigated in this study. Biomass equations including both total biomass and biomass of different components were developed mainly for the dominant shrub species, but also include a few tree species with shrub-like architecture (dwarfed by long term disturbance) (Zhang et al., 2013). In addition, biomass equations for understory shrub 70 species in forest ecosystems were collected and compiled as well.

Measurement method
Procedures of field investigation and biomass measurement were in accordance with a robust and unified method proposed in a technical specification for field investigation and laboratory analysis of carbon sequestration in shrub ecosystem (Xie and Tang, 2015), including plot setting, sample shrubs selection, morphological form classification and biomass (the oven-75 dried mass) measurements. Generally, plot areas were not smaller than 25m 2 , and at least ten sample shrubs were harvested and weighed to determine the biomass of each component (leaf, stem, branch and root, etc.). The division of shrub components can be summarized as shown in Fig. 1. Aboveground biomass mainly including three components (leaf, stem and branch), but in some cases, such as during the florescence and fruit period, flower and fruit were also included.
Belowground biomass was determined by a full excavation of the entire root system to avoid significant underestimation, 80 although the loss of fine roots was always inevitable during excavation.

Equation building
Predictor variables were not limited, but equations should be developed with robust regressions, explicit equation forms (e.g., power, linear, and quadratic functions) and validation evaluations. If the differences (<0.05) in goodness-of-fit of biomass equations, such as coefficients of determination (R 2 ) were small (e.g., |R 2 |≤0.1) among all equation forms, the priority order 85 for selection was power, linear and quadratic equations. Higher-degree polynomial functions were excluded for the lack of biological significance (i.e. the representativeness of plant growth and development processes). Besides, equations developed based on larger sample size were preferred. In addition, equations developed with fewer and easy-to-measure predictor variables were selected with priority in this study.

Quality checking 90
There are great differences between shrub biomass equations due to the large time ranges, different investigation methods, and various methods used in equation creation. In some studies artificial mistakes were involved, such as printing errors or wrong records in figures and charts. Therefore, biomass equations being considered for inclusion were checked or corrected with the following steps. First, if original data were available, dendrometric variables (e.g. height, basal diameter and crown projection area) of sample shrubs were used to verify the biomass equations. Second, relative growth relationships between 95 different components and biomass allocations (the percentages of biomass allocated to leave, stem, branch and root, and the root-shoot ratio) were important references (if they were in reasonable ranges, considering the divergences among species and habitats) in equation collection.

Classification of shrub types 100
All shrubs were classified into three types according to the morphological characteristics (Xie and Tang, 2015). Shrubs with explicit and dispersed branch structure were defined as "Type A" shrubs (e.g. Cotinus coggygria). Shrubs with implicit and unkempt branch structure were defined as "Type B" shrubs (e.g. Potentilla fruticosa). Shrubs with implicit and clustered branch structure were defined as "Type C" shrubs (e.g. Sophora moorcroftiana).
In general, for both "Type A" and "Type C" shrubs, biomass could be estimated using biomass equations developed 105 based on the law of allometric growth. However, for "Type B" shrubs the dendrometric variables were hard to be accurately measured, thus biomass was measured by destructive harvesting and weighing. Therefore, in this study we focused on developing biomass equations for "Type A" and "Type C" shrubs.

Equation creation
For "Type A" shrubs, a compound variable D 2 H (D, basal diameter in cm; H, shrub height in m) was used as the predictor, 110 while for "Type C" shrubs, Ac (crown projection area, Ac = π(L1×L 2)/4, L1 and L2 are the longest axis of shrub crown and the shorter axis perpendicular to it respectively, both in m) or Vc (crown projection volume, Vc = Ac×H) was used.
Power equation was preferred for its interpretation of the natural law, i.e. the allometric growth relationships between related variables in plant growth and development process. The growth relationship can be expressed as Eq. (1):

=
(1) 115 Generally, its linear form was more commonly used through natural logarithmic transformation, Eq. (2) (Baskerville, 1972): Y is the dry weight of different shrub components to be estimated; X is the corresponding predictor; ln denotes natural logarithm (base e); a is the constant in regression equation; b is the scaling coefficient of relative growth relationship.
In cases the goodness-of-fit or prediction accuracy of power equation was less effective to meet the requirements, Y is the dry weight of different shrub components to be estimated; X is the corresponding predictor; a and b are the intercept and slope in regression equation respectively. 125 Biomass equations were fitted with linear regression analysis in R statistical software (R version 3.3.0), using the ordinary least squares (OLS) method. For power equations, a standard error correction factor (cf) was applied (Snowdon, 1991;Sprugel, 1983).

Equation evaluation
Equation evaluation includes both the analysis of goodness-of-fit and the accuracy in future prediction. The regression 130 equations and regression coefficients were tested for significance at first. Statistical parameters including adjusted-R 2 (R 2 ) or fitness index (FI) were used in the goodness-of-fit evaluation. A simple linear regression between predicted and field measured value was fitted without intercept, and the regression slope (b), R 2 and the relative error (RE) were used in evaluating the prediction accuracy (Table 1).
In this study, we split the data into two parts (Picard and Cook, 1984), 10% of the sample shrubs of each species were 135 randomly sampled and used as independent test dataset, and the remaining 90% samples were used for equation creation and evaluation. With the resampling methods (bootstrap and cross-validation) samples for equation creation and evaluation were randomly allocated into two groups, and the sampling test iterated 1000 times for each shrub species. 75% of the samples were used for fitting biomass equation and obtaining relevant parameters of goodness-of-fit, and the remaining 25% samples were used for analyzing the prediction accuracy. 140 Equation coefficients, parameters of the goodness-of-fit and prediction accuracy were the mean values of corresponding results in 1000 times random test. The optimal biomass equation was selected through a comprehensive analysis of the goodness-of-fit and prediction accuracy. Finally, the independent test dataset was used to test the accuracy of the speciesspecific biomass equations in future prediction. These compiled studies and equations varied greatly with shrubland types, stem forms and shrub species 155 (https://doi.org/10.11922/sciencedb.00641, Wang et al., 2021). The studied shrublands were categorized into five types: deciduous broadleaved shrubland, evergreen broadleaved shrubland, evergreen coniferous shrubland, open shrubland and the understory shrub layer in forest ecosystem (Fig. 3). It should be noted that, for easy retrieval and utilization of biomass equations in this dataset, the understory shrub layer was informally categorized into a group of shrublands. Among the five types, deciduous broadleaved shrubland had the most equations (71.5% of the total equations), followed by evergreen 160 broadleaved shrubland (14.4%), open shrubland (6.9%), understory shrub layer (6.3%) and evergreen coniferous shrubland (0.9%). The ten most commonly studied species contributed 18.2% of biomass equations and six of them were dominant species in xeromorphic shrublands.
In previous studies root biomass was not always measured, therefore, equations for root were relatively few compared with the aboveground sector. Equations for total biomass, aboveground biomass, stem biomass, current-year branch biomass, 165 leaf biomass and root biomass, accounted for 22.3%, 21.2%, 17.2%, 6.2%, 15.5%, and 17.0% of the total 822 equations respectively (Fig. 4). However, only 0.7% of the equations were for other shrub organs, such as stem bark and fruit.
A small proportion (4.4%) of the total 822 equations did not specify the sample size (i.e., the number of sample shrubs 175 used in developing biomass equations). The sample size varied from 5 and 312 shrubs, where the most common sample sizes were between 9 and 40 shrubs, accounting for 79.3% of the 786 equations with specified sample sizes.  (Wang et al., 2021) for non-commercial scientific applications, but the free availability of the dataset does not constitute 195 permission to reproduce or publish it.

Conclusion
In this study, we developed the first biomass equation dataset for common shrub species in China. This dataset contains comprehensive background information and covers broad geographical, climatic and shrub vegetation gradients, and moreover, represents a significant expansion and supplement to the woody biomass equation datasets such as the biomass 200 equation datasets for China's tree species (Luo et al., 2020), and thus fills an important gap in woody biomass estimation and terrestrial ecosystem carbon budget.