the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A global database of soil microbial communities and associated climate, soil and vegetation factors
Abstract. Few scholars have compiled databases of soil microbial communities and associated climate, soil and vegetation factors at the global scale. However, many studies involving high-throughput sequencing of soil bacteria and fungi have been published in the past decade. In this study, we constructed a global database of the soil microbial communities and the associated climate, soil and vegetation factors, with sites on each of the seven continents and eleven ecosystem types. There were 8490 sets of soil bacterial and fungal community data for the different treatments and study sites in the database. Soil bacterial and fungal diversities were highly variable across various ecosystems. There was a highly significant (R2 = 0.4037, P < 0.001) linear regression relationship between the fungal and bacterial Shannon indices. Proteobacteria and Ascomycota were the most species-rich bacterial and fungal phyla, respectively, in most ecosystems. The median relative abundances of Proteobacteria and Ascomycota were 29.30 % and 57.49 %, respectively. The information (e.g., site names and ecosystem types) in the database enabled researchers to investigate where the most abundant bacterial or fungal phylum was located and whether the ecosystem type affected bacterial and fungal diversities and compositions at the global scale. We anticipated that this database could be further improved by adding more detailed information, such as bacterial and fungal compositions at the class, order, family, and genus levels. The database was available via Zenodo at https://doi.org/10.5281/zenodo.16195889 (Chen et al., 2025).
- Preprint
(2503 KB) - Metadata XML
-
Supplement
(8616 KB) - BibTeX
- EndNote
Status: open (until 08 Dec 2025)
-
RC1: 'Comment on essd-2025-501', Anonymous Referee #1, 07 Nov 2025
reply
-
AC1: 'Reply on RC1', Shutao Chen, 10 Nov 2025
reply
We greatly thank the reviewer 1 for the important comments. The main responses are detailed below.
Point 1: Microbial data, especially diversity, is heavily influenced by sequencing depth. Did the authors organize the raw sequencing results from these datasets to check if the diversity was subjected to rarefaction during calculation, in order to enhance the comparability of data across different studies?
Authors′ response: We have added a regression analysis in the revised manuscript to examine the relationships between the bacterial and fungal OTU richness values and soil depth. The results showed that the relationship between the bacterial OTU richness and soil depth could be simulated with an exponential function, with a P value less than 0.001 and a R2 of 0.011. No significant correlation (R2 = 0.0002, P > 0.05) between the fungal OTU richness and soil depth was found. In our database, the microbial diversity at different depths was associated with the compiled soil properties. Therefore, the rarefaction during calculation can be represented by the vertical variations in soil properties.
Point 2: Although there are now many available data points globally, some regions still lack usable data (as shown in Figure 1 of the paper). As a dataset, users may wish to obtain data for regions without experimental points by using certain methods. Have the authors considered using geostatistical methods to extrapolate the data from the existing points, along with the corresponding soil or climatic characteristics, to cover all continents?
Authors′ response: We tried to investigate the key environmental drivers of the spatial and temporal variations in bacterial and fungal diversities, as the potential drivers may contribute to modeling microbial diversity. However, the environmental drivers exhibited weak correlations with microbial diversity (Table 2), which may be caused by the great variability in microbial community across various ecosystem types. Our study demonstrated the complexity of the spatial and temporal variations in soil microbial community, increasing the difficulty in accurately simulating soil microbial community using geostatistical methods.
Point 3: The authors frequently refer to ecosystem functions in the paper. While amplicon sequencing can predict microbial functions through certain software (such as Picrust2), due to the limitations of these technologies, their application in ecological research remains challenging. Recently, shotgun sequencing methods to analyze the entire genetic information of soil microbes have been extensively documented in the literature. Did the authors consider supplementing the existing dataset with this approach?
Authors′ response: In this study, we focused on the soil microbial community compositions and diversities. Shotgun Sequencing involves fragmenting a genome into small, overlapping DNA segments, sequencing them in parallel, and using computational assembly to reconstruct the original sequence. High-Throughput Sequencing enables parallel sequencing of millions to billions of DNA fragments, which is more suitable for investigating soil microbial community compositions and diversities than Shotgun Sequencing methods.
Point 4: Microbial community composition or diversity can vary significantly across different seasons or crop growth periods. Did the authors consider the impact of seasonal temperature and humidity, or specific meteorological events, on microbial communities in their data analysis?
Authors′ response: The seasonal temperature and humidity may affect the microbial community compositions and diversities. However, it is hard to determine the effects of growing season length when dealing with the seasonal temperature and humidity. Moreover, there is so little information in the compiled literatures regarding the specific meteorological events that may influence soil microbial community on the sampling date. In this study, we considered that the soil microbial community compositions and diversities depended on the cumulative effects of long-term precipitation (i.e., mean annual precipitation) and temperature (i.e., mean annual temperature) values. Moreover, we compiled soil temperature and moisture data on the sampling date to explore the short-term effects of micrometeorological factors on soil microbial community (Tables 1 and 2).
Once again, thanks a lot for your kind comments and suggestions.
Citation: https://doi.org/10.5194/essd-2025-501-AC1
-
AC1: 'Reply on RC1', Shutao Chen, 10 Nov 2025
reply
Data sets
A global database of soil microbial communities and associated climate, soil and vegetation factors Shutao Chen et al. https://doi.org/10.5281/zenodo.16195889
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 225 | 25 | 12 | 262 | 45 | 16 | 14 |
- HTML: 225
- PDF: 25
- XML: 12
- Total: 262
- Supplement: 45
- BibTeX: 16
- EndNote: 14
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
With the advancement of sequencing technologies, it has become possible to collect global microbial community data to form databases that can serve soil biological scientists or evaluate the stability of global soil health and ecosystems under future climate change scenarios. The authors of this paper have compiled and organized a large amount of data through a literature review, resulting in a dataset on the bacterial and fungal community composition across different continents and ecosystems. The results are highly significant. However, I still have several concerns regarding the data collection process, which include the following points:
Microbial data, especially diversity, is heavily influenced by sequencing depth. Did the authors organize the raw sequencing results from these datasets to check if the diversity was subjected to rarefaction during calculation, in order to enhance the comparability of data across different studies?
Although there are now many available data points globally, some regions still lack usable data (as shown in Figure 1 of the paper). As a dataset, users may wish to obtain data for regions without experimental points by using certain methods. Have the authors considered using geostatistical methods to extrapolate the data from the existing points, along with the corresponding soil or climatic characteristics, to cover all continents?
The authors frequently refer to ecosystem functions in the paper. While amplicon sequencing can predict microbial functions through certain software (such as Picrust2), due to the limitations of these technologies, their application in ecological research remains challenging. Recently, shotgun sequencing methods to analyze the entire genetic information of soil microbes have been extensively documented in the literature. Did the authors consider supplementing the existing dataset with this approach?
Microbial community composition or diversity can vary significantly across different seasons or crop growth periods. Did the authors consider the impact of seasonal temperature and humidity, or specific meteorological events, on microbial communities in their data analysis?