Soil represents the largest phosphorus (P) stock in terrestrial
ecosystems. Determining the amount of soil P is a critical first step in
identifying sites where ecosystem functioning is potentially limited by soil
P availability. However, global patterns and predictors of soil total P
concentration remain poorly understood. To address this knowledge gap, we
constructed a database of total P concentration of 5275 globally
distributed (semi-)natural soils from 761 published studies. We quantified
the relative importance of 13 soil-forming variables in predicting soil
total P concentration and then made further predictions at the global scale
using a random forest approach. Soil total P concentration varied
significantly among parent material types, soil orders, biomes, and
continents and ranged widely from 1.4 to 9630.0 (median 430.0 and mean
570.0) mg kg
In terrestrial ecosystems, to a depth of 1 m from the land surface, most of the P is found in the soil (Zhang et al., 2021). The amount and form of P determine the supply of soil P to plants, which further regulates the structure and function of global terrestrial ecosystems (Vitousek et al., 2010; Hou et al., 2020; Elser et al., 2007; Hou et al., 2021). Moreover, the amount or total concentration of P in soils determines P concentration in all major forms in soils (Hou et al., 2018a; Turner and Engelbrecht, 2011). Therefore, it is important to determine the total concentration of P in soils, which varies by up to 3 orders of magnitude across the globe (Yanai, 1998; Augusto et al., 2010; Zhang et al., 2021). Despite the large variation in soil total P concentration, its global patterns and drivers remain poorly resolved, and improving this knowledge gap is needed to better represent the P cycle in Earth system models (Fleischer et al., 2019; Goll et al., 2017; Reed et al., 2015; Wang et al., 2015; Wieder et al., 2015; Zhang et al., 2011; Achat et al., 2016a).
Soil total P concentration is the outcome of climatic, biotic, and landscape processes interacting over time on soil parent material (Dokuchaev, 1883; Jenny, 1941; Buendía et al., 2010). Each of these factors may be characterized by a few variables; for example, climate may be characterized by mean annual temperature (MAT) and precipitation (MAP). Relationships between soil total P concentration and variables such as parent material type and P concentration, MAT, MAP, site slope, and soil organic carbon (SOC) have been reported in previous studies but mostly at local to regional scales (Brédoire et al., 2016; Cheng et al., 2018; Li et al., 2019; Porder and Chadwick, 2009; Wang et al., 2009). Few studies have quantified the relative importance of these variables for predicting soil total P concentration at a global scale (Delgado-Baquerizo et al., 2020; Augusto et al., 2017; Yang et al., 2013). Such an understanding can guide the management of the soil P supply in agroecosystems of different regions (Ringeval et al., 2017) and is crucial for both mapping soil total P concentration in natural terrestrial ecosystems (Reed et al., 2015) and simulating ecosystem functioning (Achat et al., 2016a).
While each soil-forming factor can determine soil total P concentration, the roles of some factors (e.g., climate and vegetation) are less understood than other factors (e.g., parent material and soil age). Since P in soil is derived mainly from parent materials, the control of parent material on soil total P concentration has been well recognized (Augusto et al., 2017; Porder and Ramachandran, 2013). Soil chronosequences provide a unique opportunity to isolate the effect of soil age from other soil-forming factors on soil P dynamics and have shown that soil age negatively impacts soil total P concentration (Wardle et al., 2004; Delgado-Baquerizo et al., 2020; Vitousek et al., 2010; Walker and Syers, 1976). Due to climate change, there is an increasing interest in how climate impacts soil total P concentration (Augusto et al., 2017; Vitousek and Chadwick, 2013; Hou et al., 2018a). Yet the effects of climate, vegetation, and topography on soil total P concentration remain largely unknown. Recently, Delgado-Baquerizo et al. (2020) surveyed 32 ecosystem properties, including soil total P concentration, in 16 soil chronosequences globally. They found that climate, vegetation, topography, and soil age together explained only about 60 % of the variation in soil total P concentration, despite examining 30 predictors and considering all possible interactions among predictors. This finding reflects our incomplete understanding of the controls of soil total P concentration.
Several pressing global issues, such as mitigating climate change, increasing food security, and reducing nutrient runoff to bodies of water, rely on accurate soil P maps (Alewell et al., 2020; Ringeval et al., 2017; Beusen et al., 2015; Wang et al., 2010). While several maps of soil total P concentration have been produced (Viscarra Rossel and Bui, 2016; Ballabio et al., 2019; Hengl et al., 2017a; Delmas et al., 2015), to our knowledge, there are only two published maps of soil total P concentration in natural terrestrial ecosystems (Shangguan et al., 2014; Yang et al., 2013). These two maps have been used to explore global patterns of soil P supply (Yang et al., 2013), to estimate P limitation on future terrestrial C sequestration (Sun et al., 2017), and as baseline information to quantify P supply in agricultural ecosystems (Ringeval et al., 2017). They are also used frequently in land surface models to benchmark soil P modules (Yang et al., 2014; Goll et al., 2012). However, the two maps may suffer from large uncertainties due to the limited numbers of predictors used and/or low spatial coverage of global soils. First, for example, Yang et al. (2013) mapped soil total P concentration based only on parent material and soil chronosequence measurements. The map by Shangguang et al. (2014) was based on a database that had poor coverage of many parts of the world (e.g., high latitudes, Africa, South America). Second, both maps only focus on the surface layers of soils, though subsoils are known to contribute to the P nutrition of plants and P leaching to groundwater (Rodionov et al., 2020; Andersson et al., 2013).
To address these issues, we constructed a global database of total P concentration of 5275 (semi-)natural soils from 761 published studies. We defined (semi-)natural ecosystems as ecosystems without any documented significant anthropogenic activities such as tillage, fertilization, and heavy grazing. We then used random forest algorithms to quantify the relative importance of soil-forming variables for predicting soil total P concentration and further predicted it at the global scale. In our predicted map, we did not remove cropland or other heavily influenced areas (e.g., cities and roads), so the predicted map represents a potential natural background without direct anthropogenic influence. With our enlarged dataset and our map of global soil P distribution, we addressed the following research questions: (1) which factors are the most important for predicting the spatial variation in soil total P concentration in the top 1 m of soil? (2) How does soil total P concentration differ among regions and soil layers? (3) How large is the global total P stock in the top 1 m of soil?
Given massive measurements of soil total P concentration in the literature, it is practically infeasible to collect all the measurements in the literature. Therefore, we collected soil total P concentration measurements in (semi-)natural terrestrial ecosystems mainly from existing global or regional databases, as well as from literature with a focus on the underrepresented regions identified in global databases, to ensure a good coverage of global terrestrial ecosystems. We defined (semi-)natural ecosystems as ecosystems without any documented significant anthropogenic activities such as tillage, fertilization, and heavy grazing. Forests with a stand age greater than 10 years were considered (semi-)natural ecosystems. We carefully checked the description of soil sampling in every cited paper for any anthropogenic activities such as tillage, fertilization, and heavy grazing and excluded such samples. Despite our efforts to exclude soils affected by anthropogenic activities, some soils in our database might be influenced by undocumented anthropogenic activities (e.g., P fertilization in reforested lands), particularly in western Europe and the eastern USA (e.g., De Schrijver et al., 2012). We compiled the database in four steps, which are described as follows.
First, we searched existing global or regional databases that may include
soil total P concentration measurements in (semi-)natural ecosystems in the
Web of Science using keywords “global OR terrestrial OR meta-analysis”
AND “soil phosphorus” NOT “crop OR agriculture” in the topic search field. This search
returned 714 papers up to 15 September 2020. After excluding site-level
studies and studies with artificial treatments (e.g., treatment with
fertilizer, elevated temperature, or elevated CO
Second, we used “soil phosphorus” as keywords to search global or regional
databases stored in public data repositories on 10 October 2020,
including Figshare (
Third, we included 1693 measurements of soil total P concentration in a global database of soil extractable P concentration (Hou et al., unpublished) and 262 measurements of soil total P concentration in a global database of soil P fractions (He et al., unpublished). Original data sources of the two databases are given in Text S1 in the Supplement. After step 3, we combined measurements collected in steps 1–3 and deleted 22 duplicated ones (i.e., measurements with the same site coordinates and soil total P concentration), resulting in a total of 4734 site-level measurements of soil total P concentration from the 11 databases listed in Table S1.
Fourth, we searched additional soil total P concentration measurements from underrepresented regions identified in steps 1–3 from the Web of Science using keywords of “soil phosphorus” along with the keywords of the underrepresented regions (listed in detail in Table S2). According to the criteria above, we only collected soil total P concentration measurements in (semi-)natural terrestrial ecosystems. In this step, we collected 541 additional site-level measurements of soil total P concentrations from 60 additional papers (Table S2, Text S1).
Following these steps, our database included 5275 measurements of soil
total P concentration at 1894 sites from 761 studies (Text S1
and Fig. S1), with 4536 measurements in the top 30 cm of soil and 739 measurements in
deeper soil (depth
Summary of training data used to predict soil total P concentration. P10 and P90 indicate the percentile ranks of 10 % and 90 %.
MAT: mean annual temperature; MAP: mean annual precipitation; SOC: soil
organic carbon; NPP: net primary production.
Soil total P concentration is thought to be influenced by five soil-forming factors, which are parent material, climate, vegetation productivity, topography, and soil age (Delgado-Baquerizo et al., 2020; Jenny, 1941; Dokuchaev, 1883). Four of the five factors were directly considered here (Table 1): parent material, climate (i.e., mean annual temperature (MAT), mean annual precipitation (MAP), and biome), vegetation (i.e., net primary production (NPP)), and topography (e.g., slope and elevation). As soil age was rarely reported, we used USDA (United States Department of Agriculture) soil orders as a proxy for age with three classes: slightly, intermediately, and strongly weathered (Yang et al., 2013; Smeck, 1985). Among the 12 USDA soil orders, Entisols, Inceptisols, Histosols, Andisols, and Gelisols are classified as slightly weathered soils. Alfisols, Mollisols, Aridisols, and Vertisols are classified as intermediately weathered soils. Oxisols, Ultisols, and Spodosols are classified as strongly weathered soils (Yang et al., 2013; Smeck, 1985). Moreover, we have classified each soil in our database according to soil types of the World Reference Base for Soil Resources (WRB) (Table S3). We extracted the WRB soil type of each site from a global WRB soil type map (Hengl et al., 2017b) based on geographical coordinates.
In addition to predictors of soil total P concentration related to soil-forming factors, we collected information about the properties of the soils (e.g., soil organic carbon (SOC), soil pH, soil clay content (Clay) and soil sand content (Sand), and soil depth (Depth); Table 1). These soil properties were used as additional predictors. We extracted predictors from each original publication when available. In cases where information on predictors was not reported, we extracted the missing data from gridded datasets (Table S3) based on the geographical coordinates of the measurement sites.
In the random forest model, correlated predictors can be substituted for
each other so that the importance of correlated predictors will be shared,
making the estimated importance smaller than the true value (Strobl et
al., 2008). Thus, we did not include soil total nitrogen content as it is
correlated with SOC (
Among the 5275 soil total P measurements, there were 15 extremely high
values (
The distribution of our site-level training data. The database
contains 5275 observations
Soil total P concentration (mg kg
Soil total P concentration (mg kg
We compared a suite of algorithms against the aforementioned 13 predictors
which included three generalized linear models: a cubist model, a boosted tree
model, and a random forest model (Table S4). Model performance was compared in
terms of
Finally, we applied the above trained model to global databases of the 13
predictors to generate a global map of soil total P concentration. The
gridded driver variables used for the global prediction were all re-gridded
to a spatial resolution of 0.05
Soil depth was used as a covariate so that the models could predict soil
total P concentration for any given depth (Hengl et al., 2017b). The
partial dependence plot indicated that soil total P concentration
approximately linearly decreased with soil depth in the top 30 cm and there
was no apparent trend with depth in the subsoil (
The prediction uncertainty of each cell in the global gridded map was assessed
using bootstrap samples with the quantile regression forest technique
(Meinshausen, 2006). Standard deviation was calculated to represent the
uncertainty using the quantregForest function in the “quantregForest” R package (Meinshausen, 2017).
Individual predictions of each tree in the random forest model (
All statistical analyses and plotting were performed in the R environment (v. 4.0.2) (R Core Team, 2018).
Our soil total P concentration database included 5275 measurements from
1894 geographically distinct sites and covered 6 continents, all major
biomes, and all 12 USDA soil orders in terrestrial ecosystems (Fig. 1a–d
and Table S5). The database was highly right-skewed (Fig. 1b) and revealed
that the soil total P concentration in natural soils of terrestrial
ecosystems varied from 1.4 to 9636.0 mg kg
The database revealed that soil total P concentration varied within and
among biomes. The soil from tundra and boreal biomes had the highest soil
total P concentrations. Mediterranean and temperate soils had intermediate
soil total P concentrations. Soils in the desert and tropics had relatively
lower soil total P concentrations (Table 2 and Fig. 2b). Soil total P
concentration also varied with different soil orders (Table 3). Soil total P
concentration decreased from slightly weathered soil (mean value 719.4 mg kg
Soil total P concentration in relation to parent material, biome, and soil weathering extent. For visualization, we chose to limit the
The random forest regression model explained 65 % of soil total P
concentration variability across all sites, with an RMSE of 288.8 mg kg
Results of the random forest model predicting soil total P
concentration.
Partial dependence plots showing the dependence of soil total P
concentration on predictors. Soil total P concentration in relation to
In our predicted global map (Fig. 5), we did not remove cropland or other
heavily influenced areas (e.g., cities and roads), so the predicted map can
be used to represent a natural background without direct anthropogenic
influence. The predicted soil total P indicated that the total global P
stocks in the topsoil (0–30 cm) and subsoil (30–100 cm) were 26.8 (standard deviation 3.1) Pg and 62.2 (standard deviation 8.9) Pg, respectively (excluding Antarctica; Table 4). Estimated area-weighted average soil total P concentrations in the topsoil and subsoil were 529.0 and 502.3 mg kg
Global maps of total P concentration in the 0–30 and 30–100 cm layers of
soils. Panels
Analysis of the predicted global map of soil total P. The area-weighted average soil total P concentration was calculated based on our predicted map. Converting soil total P concentration to soil total P content and stock used the soil bulk density (Hengl et al., 2017b) and land area.
The estimated global map of soil total P concentration revealed latitudinal
patterns (Fig. 5), which were also found from analysis of the site-level
data (Fig. S4k). Soil total P concentration significantly increased from the
Equator to the poles in both hemispheres (
Highlands and mountains at low latitudes (e.g., the Tibetan Plateau, Andes, Africa highlands, west India) had high soil total P concentrations. Our map also indicated some regional difference in soil total P; for example, central Australia was low in soil total P compared with east and west Australia. On a larger scale, South America, Oceania, and Africa had the lowest soil total P concentration, while soil total P concentration was highest in Europe, North America, and Asia (Table 4). The estimated soil total P concentrations in the subsoil showed similar patterns to those found in the topsoil (Fig. 5a, c).
With our soil total P concentration dataset, we quantified soil total P concentration in natural ecosystems, identified its key drivers, and predicted it for terrestrial ecosystems globally. Our work goes beyond previous studies (Delmas et al., 2015; Hengl et al., 2017a; Shangguan et al., 2014; Viscarra Rossel and Bui, 2016; Yang et al., 2013; Cheng et al., 2016), which used limited data that did not represent the heterogeneous conditions found on Earth well and did not separate natural soils from human-managed soils and therefore may not be able to distinguish natural drivers from anthropogenic factors (e.g., land use type, mineral fertilizer). In addition, we mapped soil total P concentration by considering more predictors and multiple soil depths.
Given the larger number of measurements that we considered, the range of
total P concentration in our study (1.4–9630.0 mg kg
In agreement with previous studies, soil total P concentration was largely predicted by parent material type (Deiss et al., 2018; Augusto et al., 2017; Porder and Ramachandran, 2013). This result supports the use of parent material to map soil total P concentration at the global scale (Yang et al., 2013). Parent material can affect soil total P concentration both directly and indirectly. Some parent materials tend to have higher P concentrations, which then translate into higher total soil P (Mage and Porder, 2013; Dieter et al., 2010; Kitayama et al., 2000). Additionally, parent material also affects soil total P indirectly via the influence of soil physiochemical properties such as soil texture, pH, and Al and Fe oxides (Siqueira et al., 2021; Mehmood et al., 2018). For example, the retention of P in soil can be influenced by the soil content of clay, soluble calcium, and Fe oxyhydroxides (Delgado-Baquerizo et al., 2020; Mehmood et al., 2018; Achat et al., 2016b). As such, parent material type is a critical predictor of soil total P from local to global scales.
Interestingly, we found that SOC was one of the two most important predictors of soil total P concentration. The positive relationship between soil total P and SOC has two possible explanations. First, this relationship may reflect the coupling between P and C in soils (Hou et al., 2018a) given that soil organic matter is characterized by a rather narrow range of C : P ratio values (Cleveland and Liptzin, 2007; Spohn, 2020; Tipping et al., 2016). Second, P and organic C are stabilized and retained through similar processes in soil (Doetterl et al., 2015). For example, reactive minerals can simultaneously stabilize both P and organic C in soil (Helfenstein et al., 2018). As such, the strong relationship between SOC and total P at the global scale confirms that SOC is an integrated measure of biotic (e.g., soil microbial activity) and abiotic (e.g., cation exchangeable capacity) factors that regulate soil total P (Spohn, 2020; Wang et al., 2020).
Consistent with a recent global synthesis that focused on soil P fractions (Hou et al., 2018a), our result indicated that MAT was a more important predictor of soil total P concentration than MAP. The negative relationship could be because soils under low MAT are often found at high latitudes where soils were eroded during the last glaciation. These soils tend to be much younger compared to soils at low latitudes with high MAT and thus have experienced less loss of P (Vitousek et al., 2010). In addition, high MAT and MAP generally promote soil weathering as well as plant growth and P uptake, resulting in the depletion of soil P (Huston and Wolverton, 2009; Arenberg and Arai, 2019; Huston, 2012).
Further, we provide two explanations for the negative relationship between
soil total P concentration and sand content. First, soil sand content is a
surrogate for quartz content (Bui and Henderson, 2013), and the rock
content in quartz is usually negatively correlated with the total P content
of siliceous rocks (Hahm et al., 2014; Vitousek et al., 2010). Second, soil
sand is worse at retaining nutrients including P than other soil fractions
(Augusto et al., 2017). For example, loamy soils regularly lose 0.3–0.5 kg P ha
Based on our predicted global map, we estimated that the area-weighted
global average of soil total P concentration was 529.0 and 502.3 mg kg
Our predicted soil total P concentrations decreased significantly with decreasing latitude in both hemispheres. This result is consistent with our theoretical understanding of the evolution of soils in soil chronosequences (Walker and Syres, 1976) and the stark differences in soil age and weathering intensity between low- and high-latitude regions. And this result is in agreement with a recent meta-analysis that revealed P limitation to plant growth decreased significantly with latitude (Hou et al., 2021). Lowland tropical soils tend to be more weathered compared to soils at high latitudes due to warmer and more humid climate which promotes weathering (Hou et al., 2018a). Moreover, the last glaciation could have eroded soils at more northern higher latitudes and have caused relatively young and P-enriched soils (Vitousek et al., 2010; Reich and Oleksyn, 2004). Our result is consistent with Xu et al. (2013); by comparing soil total P concentration across the major biomes, the authors found the highest soil total P concentration in the tundra and the lowest in the tropical and subtropical forest. Previous global maps of soil total P concentration were not able to capture the latitudinal trend of soil total P concentrations (Yang et al., 2013; Shangguan et al., 2014), likely due to poorer spatial coverage of their measurements. For example, their measurements were mostly from the USA and China, with a very small proportion of measurements from high latitudes.
While we found a latitudinal gradient in soil total P concentration,
heterogeneity in soil total P concentration at the regional and local scales
was large. For example, consistent with Brédoire et al. (2016), we found
that the soil total P concentration was higher in Siberia than in northern
Europe, both of which have similar latitudes. First, this difference may be
due to the fact that glaciation was more regular and intense in Siberia than
in northern Europe (Wassen et al., 2021), leading to a more intensive
rejuvenation of soils. Second, the warmer and wetter climate in northern
Europe may promote weathering which releases P from parent material (Goll
et al., 2014) and makes it subject to loss (Fig. S8). Regional variation in
soil total P concentration may also be attributable to regional variation in
parent material. For example, higher soil total P concentration in eastern
Australia than in central Australia was probably due to P-enriched basaltic
lithologies in eastern Australia (6500–8700 mg kg
Despite our unprecedented effort to construct a database and perform global
predictions, our study has some limitations. First, some regions were still
underrepresented, e.g., northern Canada, Russia, middle Asia, and inner
Australia, which may result in a low accuracy of predicted values in these
regions (Ploton et al., 2020). Further, our assumption that soils which
are or have been in agricultural use can be characterized in their native
state by the same relationships as (semi-)natural soils might not hold true.
For example, fertile soils are preferred in agriculture and forestry.
Second, subsoils (
Raw datasets, R code, and global maps generated in this study are available at
By constructing a database of total P concentration globally, we quantified the relative importance of multiple soil-forming variables for predicting soil total P concentration and further estimated it at the global scale. Our results indicated that no single variable can be used to predict soil total P concentration. Instead, a combination of variables are needed to reliably predict soil total P concentration, among which SOC, parent material, MAT, and soil sand content are the most important predictors. Soil total P concentration was positively correlated with SOC and negatively correlated with both MAT and soil sand content. Our predicted map captures the latitudinal gradient in potential soil total P concentration expected from our theoretical understanding. We estimated that P stocks in the topsoil (0–30 cm) and subsoil (30–100 cm) of soil of natural ecosystems (excluding Antarctica) were 26.8 and 62.2 Pg, respectively. Our improved global map of soil total P will be an important resource for future work which aims to tackle issues related to P cycling, including predicting future land carbon sink potential and P losses to aquatic and marine ecosystems as well as modeling the P needs of crops to increase food security.
The supplement related to this article is available online at:
XH and EH designed this study. XH, EH, LA, ZW, and YY collected the data. XH, EH, LA, DSG, BR, YW, JH, YH, and KY discussed analyzing methods. XH conducted the analysis and drafted the manuscript. All authors discussed the results and contributed to the manuscript.
The contact author has declared that neither they nor their co-authors have any competing interests.
Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
We would like to thank Morgan Furze at Yale University for his assistance with English language and grammatical editing.
This research has been supported by the National Natural Science Foundation of China (grant nos. 32071652 and 31870464), the China Postdoctoral Science Foundation (grant no. 2020M673123), the Chongqing Technology Innovation and Application Demonstration Major Theme Special Project (grant no. cstc2018jszx-zdyfxmX0007), and the ANR CLAND Convergence Institute.
This paper was edited by Martin Schultz and reviewed by Christine Alewell and Tom Bruulsema.