GSOCS-LULCC: the Global Soil Organic Carbon Stock dataset after Land Use and Land Cover Change

Chen, Songchao; Shuai, Qi; Arrouays, Dominique; Chen, Zhongxing; Dai, Lingju; Hong, Yongsheng; Hu, Bifeng; Huang, Yuyang; Ji, Wenjun; Li, Shuo; Liang, Zongzheng; Ma, Yuxin; Richer-de-Forges, Anne C.; Schillaci, Calogero; Su, Yang; Teng, Hongfen; Wang, Nan; Wang, Xi; Wang, Yanyu; Wang, Zheng; Wang, Zhige; Xu, Dongyun; Xue, Jie; Ye, Su; Zhang, Xianglin; Zhou, Yin; Zhu, Peng; Shi, Zhou

doi:10.5194/essd-2024-373

Preprints

https://doi.org/10.5194/essd-2024-373

Preprints

23 Sep 2024

| 23 Sep 2024

Status: this discussion paper is a preprint. It has been under review for the journal Earth System Science Data (ESSD). The manuscript was not accepted for further review after discussion.

GSOCS-LULCC: the Global Soil Organic Carbon Stock dataset after Land Use and Land Cover Change

Songchao Chen, Qi Shuai, Dominique Arrouays, Zhongxing Chen, Lingju Dai, Yongsheng Hong, Bifeng Hu, Yuyang Huang, Wenjun Ji, Shuo Li, Zongzheng Liang, Yuxin Ma, Anne C. Richer-de-Forges, Calogero Schillaci, Yang Su, Hongfen Teng, Nan Wang, Xi Wang, Yanyu Wang, Zheng Wang, Zhige Wang, Dongyun Xu, Jie Xue, Su Ye, Xianglin Zhang, Yin Zhou, Peng Zhu, and Zhou Shi

Abstract. The direction and magnitude of soil organic carbon stock (SOCS) change following land use and land cover change (LULCC) are highly uncertain, largely due to the lack of relevant global soil data. Great efforts have been made to build SOCS database at regional, national and even sub-continental scales following LULCC; however, a comprehensive and open-access global database has not yet been developed, hindering a deep understanding of LULCC impact on SOCS dynamics. In this study, we introduce a new global SOCS database for LULCC, compiled from 639 articles documented in the Web of Science through the end of 2023. Targeting five major land uses (cropland, grasslands, forest, plantation, and savanna), this database – named the Global Soil Organic Carbon Stock dataset after Land Use and Land Cover Change (GSOCS-LULCC) – include 1,206 sites with 5,982 records at various sampling depths. The database will enable users to assess the global impact of LULCC on SOCS dynamics and identify the factors that control SOCS changes for specific types of LULCC. The GSOCS-LULCC database is freely available from the Zenodo platform at https://doi.org/10.5281/zenodo.11183819 (Chen et al., 2024).

How to cite. Chen, S., Shuai, Q., Arrouays, D., Chen, Z., Dai, L., Hong, Y., Hu, B., Huang, Y., Ji, W., Li, S., Liang, Z., Ma, Y., Richer-de-Forges, A. C., Schillaci, C., Su, Y., Teng, H., Wang, N., Wang, X., Wang, Y., Wang, Z., Wang, Z., Xu, D., Xue, J., Ye, S., Zhang, X., Zhou, Y., Zhu, P., and Shi, Z.: GSOCS-LULCC: the Global Soil Organic Carbon Stock dataset after Land Use and Land Cover Change, Earth Syst. Sci. Data Discuss. [preprint], https://doi.org/10.5194/essd-2024-373, 2024.

Received: 27 Aug 2024 – Discussion started: 23 Sep 2024

Competing interests: PZ is the member of the editorial board of journal Earth System Science Data.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Status: closed

RC1: 'Comment on essd-2024-373', Anonymous Referee #1, 28 Oct 2024

This data description paper presents a dataset of soil organic carbon stock before and after land use and landcover change, covering almost 6000 records from 639 articles. The dataset targets an important area of research and seems like a useful resource for analysis of land use change on soil carbon stocks. This dataset does appear to be broader in terms of studies included compared to other recent meta-analyses of land use change and soil carbon stocks.

While the dataset itself is useful, there are some deficiencies in the meta-data and the information provided in the dataset which, if improved, could make the dataset more useful and easier to understand. First, the dataset lacks detailed metadata explaining the meaning of each column, which should be included in the data archive. Several columns are not clear from the titles alone. The dataset should include a metadata file with a short description of the source and meaning of each column.

Second, several columns in the dataset have limited use because they do not use consistent classifications. The “Region” column is a mix of different information, with some cities, some broad regions (like river basin, name of forest reserve, etc. which are likely not meaningful outside the context of the specific study), and some that appear to be general classifications (e.g., “Dry forest”). Because they are not systematic, these values are not very useful in the context of a meta-analysis. Similarly, the soil type column uses a mixture of different classification approaches that are not compatible and therefore are not useful for systematic analysis. Some values are soil classes (e.g., Haplaquepts), some are descriptive or maybe from a different system (“Red earth”, “Brown soils”), some are soil orders (“Mollisols”, “Alfisol”). To be useful, this column needs to have a common classification. I understand that this information is likely not available for all soils in the dataset. I would suggest having multiple columns, one for soil order (in one classification scheme), which should be available for all sites or from larger-scale databases, and another column for more detailed classification or study-specific information about soil type.

The “Mass correction” field is not explained at all.

The “Year range” and “Year mean” are not clear either. A more useful approach here would be to have a column for “start year” and “end year”.

The “sampling depth” field seems to be redundant with the upper and lower depth columns. And speaking from experience, intermixing different formats (some numbers separated with a dash, some with a ~) is likely to make dealing with this data more difficult because any users will need to add post-processing code to account for multiple formats. “Year range” has similar issues with some entries having one number and some having multiple numbers separated by a ~.

Also, the land use change field would be more useful from a machine reading/analysis perspective if it were split into two columns, before and after.

Generally, I suggest adding columns with flags for when data were gap-filled or modeled as opposed to being present in the original papers. This is particularly important for bulk density, which is linearly related to SOCS and is modeled using a very simple function.

Other comments on the data description:

Line 40: Remove “promotes”

Line 51: Change “revelled” to “revealed”

Line 61: I would change to “English-languange academic journals” to specify that it is referring to the language and not the country

Figure 1: I found this diagram very helpful for explaining the approach

Line 87: Savanna should be included in this list of land use types (there are 5, not 4). And some detail on how a plantation was defined and differentiated from a forest would be helpful.

Line 102: This bulk density calculation seems overly simple given the pivotal importance of bulk density for calculating SOCS. This could introduce a lot of error to the database. I would recommend, at minimum, including a column in the dataset that indicates whether this model was used or if bulk density measurements were present in the original data. I would also suggest incorporating soil texture data into the bulk density function or using SoilGrids bulk density if it is available since it would incorporate more covariates.

Line 107: Was the 10% standard deviation consistent with the reported standard deviations of points in the dataset where mean and standard deviation were actually provided? It would be helpful to include a flag in the database for this similar to bulk density so it’s clear when this was provided versus modeled.

Line 120: Include a citation for the source of the biome information

Line 130: I don’t think it’s useful to show this before/after information in panel a of Figure 4, because the types of land use change are so different from each other. I would suggest having one bar plot with the distribution of means, or color coding them by the initial land use type (before conversion) instead of just before/after. Or show one distribution of the mean and a separate panel with the distribution of change (after minus before) across all points. Or show a two dimensional scatter plot of SOCS before versus after conversion.

Line 142: What kind of mass correction is this referring to? I don’t think it was ever explained, and it is also present in the actual dataset without explanation

Citation: https://doi.org/10.5194/essd-2024-373-RC1
RC2: 'Comment on essd-2024-373', Anonymous Referee #2, 09 Nov 2024

The authors collected a global dataset of SOC changes due to LULCC and presented its characteristics from different aspects. Several areas could be improved to enhance the dataset’s value:
Specific comments:
The dataset compilation from 639 articles is impressive, as illustrated in Figure 1. The results section effectively presents the dataset’s characteristics. However, a high impact journal such as ESSD would require a more thorough review and comprehensive dataset development. The current manuscript is overall simplistic, with a brief introduction and no discussion. The introduction could be strengthened by reviewing existing datasets on SOC changes, placing this study within the broader research context, and identifying any knowledge gaps it addresses. Comparing the findings with other dataset would help to validate the dataset. For instance, Huang et al. (2024) conducted a similar study that included 790 articles and emphasized identifying key SOC change drivers. Noting potential overlaps with Huang et al. (2024) and clarifying what makes this dataset unique would underscore its contribution.
Exploring how environmental factors influence SOC changes due to LULCC would provide deeper insights into underlying patterns and mechanisms. Additionally, an investigation into how SOC changes vary across different regions and climate zones could provide deeper insights into the underlying patterns and mechanisms. For example, generating a gridded SOC change map via modeling or spatial interpolation could gain this dataset’s utility and value.
Reference: Huang, X., Ibrahim, M. M., Luo, Y., Jiang, L., Chen, J., & Hou, E. (2024). Land use change alters soil organic carbon: Constrained global patterns and predictors. Earth's Future, 12, e2023EF004254. https://doi.org/10.1029/2023EF004254

Citation: https://doi.org/10.5194/essd-2024-373-RC2
RC3: 'Comment on essd-2024-373', Anonymous Referee #3, 05 Dec 2024

The manuscript addresses a critical issue related to soil carbon emissions from LULCC, which contribute significantly to anthropogenic carbon emissions. The study is important, but more comprehensive data are necessary to draw broad, general conclusions. Below are specific comments and suggestions to improve the manuscript:
The manuscript would benefit from a more comprehensive review of the carbon emissions resulting from LULCC. This would help illustrate the importance of this study in the broader context. Additionally, the authors should clarify whether any global studies on SOCS have already been conducted, and how this work contributes to the existing body of knowledge.
Is it possible to estimate BD based on particle size analysis of soil samples? The combined use of Equations 1 and 2 may introduce additional uncertainty in the SOC calculations. The authors should consider discussing this potential source of error and how it might affect the overall conclusions.
In Line 130, the manuscript refers to a missing figure (Figure #). Please ensure that all figures are properly cited and included in the manuscript.
Figure 4 would benefit from a more detailed analysis, potentially breaking down the data by continent or biome. This could reveal region-specific trends and improve the applicability of the results to diverse environmental contexts.
For a global-scale study, the sample size may be insufficient for robust spatial distribution analysis, particularly for some types of land-use conversions where the sample size is fewer than 100. Additionally, plantation forests are intensively managed and may require more focused analysis to account for the nuances of their conversion. The authors should discuss the implications of sample size limitations on the study’s findings.
In Figure 6, the unit of soil depth should be clarified. Additionally, a cumulative histogram may make it easier to interpret the data, as it would allow for a clearer understanding of how shallow soils are represented if deeper soils are also studied. The authors should also add a column of the maximum soil depth studied in the dataset, as this is critical for transparency and completeness.
The conclusion proposes several avenues for future work, but the authors should specify a plan or methodology for updating the database over time. Furthermore, the exclusion of peatlands and wetlands from the current version of the database should be justified. These ecosystems are significant in carbon storage, and their omission needs to be addressed.

Citation: https://doi.org/10.5194/essd-2024-373-RC3

Status: closed

RC1: 'Comment on essd-2024-373', Anonymous Referee #1, 28 Oct 2024

This data description paper presents a dataset of soil organic carbon stock before and after land use and landcover change, covering almost 6000 records from 639 articles. The dataset targets an important area of research and seems like a useful resource for analysis of land use change on soil carbon stocks. This dataset does appear to be broader in terms of studies included compared to other recent meta-analyses of land use change and soil carbon stocks.

While the dataset itself is useful, there are some deficiencies in the meta-data and the information provided in the dataset which, if improved, could make the dataset more useful and easier to understand. First, the dataset lacks detailed metadata explaining the meaning of each column, which should be included in the data archive. Several columns are not clear from the titles alone. The dataset should include a metadata file with a short description of the source and meaning of each column.

Second, several columns in the dataset have limited use because they do not use consistent classifications. The “Region” column is a mix of different information, with some cities, some broad regions (like river basin, name of forest reserve, etc. which are likely not meaningful outside the context of the specific study), and some that appear to be general classifications (e.g., “Dry forest”). Because they are not systematic, these values are not very useful in the context of a meta-analysis. Similarly, the soil type column uses a mixture of different classification approaches that are not compatible and therefore are not useful for systematic analysis. Some values are soil classes (e.g., Haplaquepts), some are descriptive or maybe from a different system (“Red earth”, “Brown soils”), some are soil orders (“Mollisols”, “Alfisol”). To be useful, this column needs to have a common classification. I understand that this information is likely not available for all soils in the dataset. I would suggest having multiple columns, one for soil order (in one classification scheme), which should be available for all sites or from larger-scale databases, and another column for more detailed classification or study-specific information about soil type.

The “Mass correction” field is not explained at all.

The “Year range” and “Year mean” are not clear either. A more useful approach here would be to have a column for “start year” and “end year”.

The “sampling depth” field seems to be redundant with the upper and lower depth columns. And speaking from experience, intermixing different formats (some numbers separated with a dash, some with a ~) is likely to make dealing with this data more difficult because any users will need to add post-processing code to account for multiple formats. “Year range” has similar issues with some entries having one number and some having multiple numbers separated by a ~.

Also, the land use change field would be more useful from a machine reading/analysis perspective if it were split into two columns, before and after.

Generally, I suggest adding columns with flags for when data were gap-filled or modeled as opposed to being present in the original papers. This is particularly important for bulk density, which is linearly related to SOCS and is modeled using a very simple function.

Other comments on the data description:

Line 40: Remove “promotes”

Line 51: Change “revelled” to “revealed”

Line 61: I would change to “English-languange academic journals” to specify that it is referring to the language and not the country

Figure 1: I found this diagram very helpful for explaining the approach

Line 87: Savanna should be included in this list of land use types (there are 5, not 4). And some detail on how a plantation was defined and differentiated from a forest would be helpful.

Line 102: This bulk density calculation seems overly simple given the pivotal importance of bulk density for calculating SOCS. This could introduce a lot of error to the database. I would recommend, at minimum, including a column in the dataset that indicates whether this model was used or if bulk density measurements were present in the original data. I would also suggest incorporating soil texture data into the bulk density function or using SoilGrids bulk density if it is available since it would incorporate more covariates.

Line 107: Was the 10% standard deviation consistent with the reported standard deviations of points in the dataset where mean and standard deviation were actually provided? It would be helpful to include a flag in the database for this similar to bulk density so it’s clear when this was provided versus modeled.

Line 120: Include a citation for the source of the biome information

Line 130: I don’t think it’s useful to show this before/after information in panel a of Figure 4, because the types of land use change are so different from each other. I would suggest having one bar plot with the distribution of means, or color coding them by the initial land use type (before conversion) instead of just before/after. Or show one distribution of the mean and a separate panel with the distribution of change (after minus before) across all points. Or show a two dimensional scatter plot of SOCS before versus after conversion.

Line 142: What kind of mass correction is this referring to? I don’t think it was ever explained, and it is also present in the actual dataset without explanation

Citation: https://doi.org/10.5194/essd-2024-373-RC1
RC2: 'Comment on essd-2024-373', Anonymous Referee #2, 09 Nov 2024

The authors collected a global dataset of SOC changes due to LULCC and presented its characteristics from different aspects. Several areas could be improved to enhance the dataset’s value:
Specific comments:
The dataset compilation from 639 articles is impressive, as illustrated in Figure 1. The results section effectively presents the dataset’s characteristics. However, a high impact journal such as ESSD would require a more thorough review and comprehensive dataset development. The current manuscript is overall simplistic, with a brief introduction and no discussion. The introduction could be strengthened by reviewing existing datasets on SOC changes, placing this study within the broader research context, and identifying any knowledge gaps it addresses. Comparing the findings with other dataset would help to validate the dataset. For instance, Huang et al. (2024) conducted a similar study that included 790 articles and emphasized identifying key SOC change drivers. Noting potential overlaps with Huang et al. (2024) and clarifying what makes this dataset unique would underscore its contribution.
Exploring how environmental factors influence SOC changes due to LULCC would provide deeper insights into underlying patterns and mechanisms. Additionally, an investigation into how SOC changes vary across different regions and climate zones could provide deeper insights into the underlying patterns and mechanisms. For example, generating a gridded SOC change map via modeling or spatial interpolation could gain this dataset’s utility and value.
Reference: Huang, X., Ibrahim, M. M., Luo, Y., Jiang, L., Chen, J., & Hou, E. (2024). Land use change alters soil organic carbon: Constrained global patterns and predictors. Earth's Future, 12, e2023EF004254. https://doi.org/10.1029/2023EF004254

Citation: https://doi.org/10.5194/essd-2024-373-RC2
RC3: 'Comment on essd-2024-373', Anonymous Referee #3, 05 Dec 2024

The manuscript addresses a critical issue related to soil carbon emissions from LULCC, which contribute significantly to anthropogenic carbon emissions. The study is important, but more comprehensive data are necessary to draw broad, general conclusions. Below are specific comments and suggestions to improve the manuscript:
The manuscript would benefit from a more comprehensive review of the carbon emissions resulting from LULCC. This would help illustrate the importance of this study in the broader context. Additionally, the authors should clarify whether any global studies on SOCS have already been conducted, and how this work contributes to the existing body of knowledge.
Is it possible to estimate BD based on particle size analysis of soil samples? The combined use of Equations 1 and 2 may introduce additional uncertainty in the SOC calculations. The authors should consider discussing this potential source of error and how it might affect the overall conclusions.
In Line 130, the manuscript refers to a missing figure (Figure #). Please ensure that all figures are properly cited and included in the manuscript.
Figure 4 would benefit from a more detailed analysis, potentially breaking down the data by continent or biome. This could reveal region-specific trends and improve the applicability of the results to diverse environmental contexts.
For a global-scale study, the sample size may be insufficient for robust spatial distribution analysis, particularly for some types of land-use conversions where the sample size is fewer than 100. Additionally, plantation forests are intensively managed and may require more focused analysis to account for the nuances of their conversion. The authors should discuss the implications of sample size limitations on the study’s findings.
In Figure 6, the unit of soil depth should be clarified. Additionally, a cumulative histogram may make it easier to interpret the data, as it would allow for a clearer understanding of how shallow soils are represented if deeper soils are also studied. The authors should also add a column of the maximum soil depth studied in the dataset, as this is critical for transparency and completeness.
The conclusion proposes several avenues for future work, but the authors should specify a plan or methodology for updating the database over time. Furthermore, the exclusion of peatlands and wetlands from the current version of the database should be justified. These ecosystems are significant in carbon storage, and their omission needs to be addressed.

Citation: https://doi.org/10.5194/essd-2024-373-RC3

Data sets

GSOCS-LULCC: the Global Soil Organic Carbon Stock dataset after Land Use and Land Cover Change Songchao Chen et al. https://doi.org/10.5281/zenodo.11183818

Viewed

Total article views: 3,006 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
2,153	484	369	3,006	77	104

HTML: 2,153
PDF: 484
XML: 369
Total: 3,006
BibTeX: 77
EndNote: 104

Views and downloads (calculated since 23 Sep 2024)

Month	HTML	PDF	XML	Total
Sep 2024	242	58	4	304
Oct 2024	223	43	4	270
Nov 2024	160	31	4	195
Dec 2024	155	39	1	195
Jan 2025	90	21	4	115
Feb 2025	62	11	41	114
Mar 2025	65	8	59	132
Apr 2025	45	19	85	149
May 2025	67	10	96	173
Jun 2025	63	20	39	122
Jul 2025	78	15	4	97
Aug 2025	86	15	2	103
Sep 2025	311	9	3	323
Oct 2025	121	31	2	154
Nov 2025	81	21	5	107
Dec 2025	88	23	4	115
Jan 2026	93	35	4	132
Feb 2026	74	24	1	99
Mar 2026	49	51	7	107
Apr 2026	0

Cumulative views and downloads (calculated since 23 Sep 2024)

Month	HTML	PDF	XML	Total
Sep 2024	242	58	4	304
Oct 2024	223	43	4	270
Nov 2024	160	31	4	195
Dec 2024	155	39	1	195
Jan 2025	90	21	4	115
Feb 2025	62	11	41	114
Mar 2025	65	8	59	132
Apr 2025	45	19	85	149
May 2025	67	10	96	173
Jun 2025	63	20	39	122
Jul 2025	78	15	4	97
Aug 2025	86	15	2	103
Sep 2025	311	9	3	323
Oct 2025	121	31	2	154
Nov 2025	81	21	5	107
Dec 2025	88	23	4	115
Jan 2026	93	35	4	132
Feb 2026	74	24	1	99
Mar 2026	49	51	7	107
Apr 2026	0

Viewed (geographical distribution)

Total article views: 2,926 (including HTML, PDF, and XML) Thereof 2,926 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 02 Apr 2026

Short summary

The impact of land use and land cover change (LULCC) on soil organic carbon stock (SOCS) is uncertain due to limited global data. Despite regional efforts, a comprehensive global SOCS database has been lacking. This study introduces the Global Soil Organic Carbon Stock dataset after LULCC (GSOCS-LULCC), compiled from 639 articles covering 1,206 sites and 5,982 records across five major land uses. This open-access database enables global assessment of LULCC's effects on SOCS dynamics.


Total:	0
HTML:	0
PDF:	0
XML:	0