the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A global database of soil microbial communities and associated climate, soil and vegetation factors
Abstract. Few scholars have compiled databases of soil microbial communities and associated climate, soil and vegetation factors at the global scale. However, many studies involving high-throughput sequencing of soil bacteria and fungi have been published in the past decade. In this study, we constructed a global database of the soil microbial communities and the associated climate, soil and vegetation factors, with sites on each of the seven continents and eleven ecosystem types. There were 8490 sets of soil bacterial and fungal community data for the different treatments and study sites in the database. Soil bacterial and fungal diversities were highly variable across various ecosystems. There was a highly significant (R2 = 0.4037, P < 0.001) linear regression relationship between the fungal and bacterial Shannon indices. Proteobacteria and Ascomycota were the most species-rich bacterial and fungal phyla, respectively, in most ecosystems. The median relative abundances of Proteobacteria and Ascomycota were 29.30 % and 57.49 %, respectively. The information (e.g., site names and ecosystem types) in the database enabled researchers to investigate where the most abundant bacterial or fungal phylum was located and whether the ecosystem type affected bacterial and fungal diversities and compositions at the global scale. We anticipated that this database could be further improved by adding more detailed information, such as bacterial and fungal compositions at the class, order, family, and genus levels. The database was available via Zenodo at https://doi.org/10.5281/zenodo.16195889 (Chen et al., 2025).
- Preprint
(2503 KB) - Metadata XML
-
Supplement
(8616 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on essd-2025-501', Anonymous Referee #1, 07 Nov 2025
-
AC1: 'Reply on RC1', Shutao Chen, 10 Nov 2025
We greatly thank the reviewer 1 for the important comments. The main responses are detailed below.
Point 1: Microbial data, especially diversity, is heavily influenced by sequencing depth. Did the authors organize the raw sequencing results from these datasets to check if the diversity was subjected to rarefaction during calculation, in order to enhance the comparability of data across different studies?
Authors′ response: We have added a regression analysis in the revised manuscript to examine the relationships between the bacterial and fungal OTU richness values and soil depth. The results showed that the relationship between the bacterial OTU richness and soil depth could be simulated with an exponential function, with a P value less than 0.001 and a R2 of 0.011. No significant correlation (R2 = 0.0002, P > 0.05) between the fungal OTU richness and soil depth was found. In our database, the microbial diversity at different depths was associated with the compiled soil properties. Therefore, the rarefaction during calculation can be represented by the vertical variations in soil properties.
Point 2: Although there are now many available data points globally, some regions still lack usable data (as shown in Figure 1 of the paper). As a dataset, users may wish to obtain data for regions without experimental points by using certain methods. Have the authors considered using geostatistical methods to extrapolate the data from the existing points, along with the corresponding soil or climatic characteristics, to cover all continents?
Authors′ response: We tried to investigate the key environmental drivers of the spatial and temporal variations in bacterial and fungal diversities, as the potential drivers may contribute to modeling microbial diversity. However, the environmental drivers exhibited weak correlations with microbial diversity (Table 2), which may be caused by the great variability in microbial community across various ecosystem types. Our study demonstrated the complexity of the spatial and temporal variations in soil microbial community, increasing the difficulty in accurately simulating soil microbial community using geostatistical methods.
Point 3: The authors frequently refer to ecosystem functions in the paper. While amplicon sequencing can predict microbial functions through certain software (such as Picrust2), due to the limitations of these technologies, their application in ecological research remains challenging. Recently, shotgun sequencing methods to analyze the entire genetic information of soil microbes have been extensively documented in the literature. Did the authors consider supplementing the existing dataset with this approach?
Authors′ response: In this study, we focused on the soil microbial community compositions and diversities. Shotgun Sequencing involves fragmenting a genome into small, overlapping DNA segments, sequencing them in parallel, and using computational assembly to reconstruct the original sequence. High-Throughput Sequencing enables parallel sequencing of millions to billions of DNA fragments, which is more suitable for investigating soil microbial community compositions and diversities than Shotgun Sequencing methods.
Point 4: Microbial community composition or diversity can vary significantly across different seasons or crop growth periods. Did the authors consider the impact of seasonal temperature and humidity, or specific meteorological events, on microbial communities in their data analysis?
Authors′ response: The seasonal temperature and humidity may affect the microbial community compositions and diversities. However, it is hard to determine the effects of growing season length when dealing with the seasonal temperature and humidity. Moreover, there is so little information in the compiled literatures regarding the specific meteorological events that may influence soil microbial community on the sampling date. In this study, we considered that the soil microbial community compositions and diversities depended on the cumulative effects of long-term precipitation (i.e., mean annual precipitation) and temperature (i.e., mean annual temperature) values. Moreover, we compiled soil temperature and moisture data on the sampling date to explore the short-term effects of micrometeorological factors on soil microbial community (Tables 1 and 2).
Once again, thanks a lot for your kind comments and suggestions.
Citation: https://doi.org/10.5194/essd-2025-501-AC1
-
AC1: 'Reply on RC1', Shutao Chen, 10 Nov 2025
-
RC2: 'Comment on essd-2025-501', Anonymous Referee #2, 30 Nov 2025
The authors established a global database of soil microbial communities with associated climate, soil, and vegetation factors, and examined the relationships between bacterial and fungal diversity indices and environmental variables. This database is valuable and necessary, as it provides a useful resource for researchers studying global soil bacterial and fungal diversity and composition. The study is generally well structured, but I offer a few suggestions to help improve the clarity and rigor of the work.
- Abstract: It would strengthen the abstract if the authors highlighted the importance and broader relevance of their work at the beginning.
- Abstract: The authors may consider specifying the temporal scale of the data included in the database within the abstract.
- Methods: Are all data points unique, or are there duplicate observations for the same site at the same or different times? How did the authors handle multiple data points at a given site?
- Methods: For the environmental factors (e.g., treatment, climate), are these based on annual average data or single-time measurements? Providing more details on the temporal scale of the data and the treatments would help clarify how the data were handled.
- Methods Section 2.3, line 186:
-- If the authors are unsure whether the relationship is linear, nonlinear, or nonexistent, why was only a linear relationship assumed and tested?
-- Did you also consider potential time effects in the model, given that the data were collected in different years? There may be temporal trends or relationships worth examining.
-- What about potential interactions among the variables?
-- Additionally, including an equation in the Methods section would be helpful for clarity.- Results: Figure 1 and its description may be better placed in the Methods section.
- Figures 2–3: A brief explanation of how to interpret these plots would be helpful for readers who are not familiar with this type of visualization.
- Results (lines 261–262): Do you have any plots that show the spatial distribution of the results?
Citation: https://doi.org/10.5194/essd-2025-501-RC2 -
AC2: 'Reply on RC2', Shutao Chen, 06 Dec 2025
We greatly thank the reviewer 2 for the crucial comments. The main responses are detailed below.
The authors established a global database of soil microbial communities with associated climate, soil, and vegetation factors, and examined the relationships between bacterial and fungal diversity indices and environmental variables. This database is valuable and necessary, as it provides a useful resource for researchers studying global soil bacterial and fungal diversity and composition. The study is generally well structured, but I offer a few suggestions to help improve the clarity and rigor of the work.
Authors′ response: We greatly appreciate your crucial comments. We have revised the manuscript and Table S1 accordingly.
- Abstract: It would strengthen the abstract if the authors highlighted the importance and broader relevance of their work at the beginning.
Authors′ response: The first sentence has been revised accordingly.
Examining the spatial and temporal variations in soil microbial compositions and diversities and associated climate, soil and vegetation factors may help in understanding the roles of microorganisms in soil ecosystems.
- Abstract: The authors may consider specifying the temporal scale of the data included in the database within the abstract.
Authors′ response: The temporal scale of the data has been added.
There were 8490 sets of soil bacterial and fungal community data for the different treatments and study sites in the database and the data were published from 2016 to 2024.
- Methods: Are all data points unique, or are there duplicate observations for the same site at the same or different times? How did the authors handle multiple data points at a given site?
Authors′ response: This point has been indicated in detail in the Methods section.
There were a number of duplicate observations for the same site at the same or different times in the database, and all reported microbial community data at a given site were included. These duplicate data mainly occurred in the studies with different experimental treatments, or they were measured at the different times. The microbial community data and associated soil properties and/or vegetation characteristics were simultaneously collected in the database. In most cases, the microbial community data exhibited differences under different values of soil properties and/or vegetation characteristics.
- Methods: For the environmental factors (e.g., treatment, climate), are these based on annual average data or single-time measurements? Providing more details on the temporal scale of the data and the treatments would help clarify how the data were handled.
Authors′ response: We greatly appreciate your crucial point. We have revised the manuscript and Table S1 accordingly.
For the treatment factors of fertilization and application of manure, biochar, straw, and compost, they were based on total seasonal values. We have changed the unit of "yr-1" to "season-1" in these cases in Table S1. Specifically, there was only one growing season in a year in the grasslands, forests, shrublands, and wetlands.
For the treatment factors of warming, precipitation manipulation and CO2 and O3 elevation, they were based on annual average data (Table S1).
- Methods Section 2.3, line 186:
-- If the authors are unsure whether the relationship is linear, nonlinear, or nonexistent, why was only a linear relationship assumed and tested?
Authors′ response: We have revised this sentence.
Regression analysis was used to explore the relationships between FOTUr and BOTUr, between FCHA and BCHA, between FSHA and BSHA, between FSIM and BSIM, between FACE and BACE, and between FGOC and BGOC. Linear and nonlinear models in the regression analysis were assumed and tested. Linear regression model [Equation (1)] best explained the relationships between FOTUr and BOTUr, between FCHA and BCHA, between FSHA and BSHA, and between FACE and BACE.
-- Did you also consider potential time effects in the model, given that the data were collected in different years? There may be temporal trends or relationships worth examining.
Authors′ response: We did not consider potential time effects in the regression analysis, given the hypothesis that the second-generation high-throughput sequencing platform had no significant changes from 2016 to 2024.
-- What about potential interactions among the variables?
Authors′ response: The potential interactions among the variables were not considered. Moreover, the meaning of the previous sentence was unclear and it has been revised. "Regression analysis was used to explore the relationships between FOTUr and BOTUr, between FCHA and BCHA, between FSHA and BSHA, between FSIM and BSIM, between FACE and BACE, and between FGOC and BGOC."
-- Additionally, including an equation in the Methods section would be helpful for clarity.
Authors′ response: An equation has been added.
Linear regression model [Equation (1)] best explained the relationships between FOTUr and BOTUr, between FCHA and BCHA, between FSHA and BSHA, and between FACE and BACE.
FDI=a×BDI+b (1)
In the model, FDI and BDI represent the fungal diversity indice and bacterial diversity indice, respectively; a and b are regression parameters.
- Results: Figure 1 and its description may be better placed in the Methods section.
Authors′ response: Figure 1 and its description have been placed in the Methods section.
- Figures 2–3: A brief explanation of how to interpret these plots would be helpful for readers who are not familiar with this type of visualization.
Authors′ response: A brief explanation has been added in the caption of Figures 2–3.
Violin plots indicate the distribution status and probability density of data in a specific ecosystem. The lower, middle and upper black lines in the violin plots represent the 25th, 50th and 75th percentiles, respectively.
- Results (lines 261–262): Do you have any plots that show the spatial distribution of the results?
Authors′ response: The plot (Fig. 3a) has been added.
The FOTUr ranged from 24 to 3433 no. copies g-1, with the lowest and highest values occurring in El Reno, USA, and Qingyuan, China, respectively (Fig. 3a).
-
AC2: 'Reply on RC2', Shutao Chen, 06 Dec 2025
Data sets
A global database of soil microbial communities and associated climate, soil and vegetation factors Shutao Chen et al. https://doi.org/10.5281/zenodo.16195889
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 294 | 63 | 23 | 380 | 50 | 26 | 34 |
- HTML: 294
- PDF: 63
- XML: 23
- Total: 380
- Supplement: 50
- BibTeX: 26
- EndNote: 34
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
With the advancement of sequencing technologies, it has become possible to collect global microbial community data to form databases that can serve soil biological scientists or evaluate the stability of global soil health and ecosystems under future climate change scenarios. The authors of this paper have compiled and organized a large amount of data through a literature review, resulting in a dataset on the bacterial and fungal community composition across different continents and ecosystems. The results are highly significant. However, I still have several concerns regarding the data collection process, which include the following points:
Microbial data, especially diversity, is heavily influenced by sequencing depth. Did the authors organize the raw sequencing results from these datasets to check if the diversity was subjected to rarefaction during calculation, in order to enhance the comparability of data across different studies?
Although there are now many available data points globally, some regions still lack usable data (as shown in Figure 1 of the paper). As a dataset, users may wish to obtain data for regions without experimental points by using certain methods. Have the authors considered using geostatistical methods to extrapolate the data from the existing points, along with the corresponding soil or climatic characteristics, to cover all continents?
The authors frequently refer to ecosystem functions in the paper. While amplicon sequencing can predict microbial functions through certain software (such as Picrust2), due to the limitations of these technologies, their application in ecological research remains challenging. Recently, shotgun sequencing methods to analyze the entire genetic information of soil microbes have been extensively documented in the literature. Did the authors consider supplementing the existing dataset with this approach?
Microbial community composition or diversity can vary significantly across different seasons or crop growth periods. Did the authors consider the impact of seasonal temperature and humidity, or specific meteorological events, on microbial communities in their data analysis?