OpenLandMap-soildb: global soil information at 30 m spatial resolution for 2000–2022+ based on spatiotemporal Machine Learning and harmonized legacy soil samples and observations
Abstract. There is increasing interest in global dynamic soil information with changes in soil properties mapped over time and at high spatial resolution. Thanks to long-term, multi-temporal, and fine- and medium-resolution satellite missions such as Landsat, MODIS, Copernicus Sentinel and similar, it is possible to produce globally consistent predictions of key soil variables that match other 10–30 m spatial resolution global data sets. This paper describes data preparation, modeling, and production of OpenLandMap-soildb: global dynamic predictions of soil organic carbon content, soil organic carbon density, bulk density, soil pH in H2O, soil texture fractions (clay, sand and slit) and USDA subgroup soil types (USDA soil taxonomy subgroups) at 30 m spatial resolution based on spatiotemporal Machine Learning (Quantile Regression Random Forest with output predictions showing the mean plus the lower and upper prediction intervals of 68 % probability). To train the models, a large compilation of soil samples imported from legacy soil projects was used: 216,000 soil samples with soil carbon density (kg m-3), 408,000 soil samples with soil carbon content (g kg-1), 272,000 samples with soil pH in H2O, 363,000 samples with clay, silt, and sand (%), and 134,000 samples with bulk density oven dry (t m-3). Soil carbon and soil pH were mapped with 5 year time-intervals; soil texture fractions, bulk density, and soil types were mapped for recent years only. The cross-validation results indicate RMSE of 17.7 (kg m-3; 0.486 in log-scale) and CCC of 0.88 for SOC density, RMSE of 51.3 (g kg-1; 0.574 in log-scale) and CCC of 0.87 for SOC content, RMSE of 0.15 (t m-3) and CCC of 0.92 for bulk density of fine-earth, RMSE of 0.51 and CCC of 0.91 for soil pH, RMSE of 8.4 % and CCC of 0.87 for soil clay content, and RMSE of 12.6 % and CCC of 0.84 for soil sand content respectively. The most important variables for predicting soil organic carbon density (kg m-3) were: soil depth, Landsat-based uncalibrated Gross Primary Productivity (GPP), Normalized Difference Vegetation Index (NDVI) and CHELSA bioclimatic indices. The global distribution of soil pH can be primarily explained by the CHELSA Aridity Index (long-term), annual precipitation, and salinity grade. The global stocks for 2020–2022+ period for 0–30 cm depth interval are estimated at 461 Pg (Peta grams); the results further indicate that, in the last 25 years, the world has lost at least 11 Pg of SOC in the top soil. Suggestions are made on how to set up global permanent monitoring stations to accurately track land degradation and enable land restoration projects. The training dataset is available at https://doi.org/10.5281/zenodo.4748499 (Hengl and Gupta, 2025), while the resulting data products can be accessed at https://doi.org/10.5281/zenodo.15470431 (Consoli et al., 2025). Both datasets are released under a CC-BY license.
Competing interests: Tomislav Hengl, Davide Consoli, Xuemeng Tian, Mustafa Serkan Isik, Leandro Parente, Yu-Feng Ho and Rolf Simoes are employed by OpenGeoHub.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.