CHLSOC: the Chilean Soil Organic Carbon database, a multi-institutional collaborative effort

A critical aspect of predicting soil organic carbon (SOC) concentrations is the lack of available soil information; where information on soil characteristics is available, it is usually focused on regions of high agricultural interest. To date, in Chile, a large proportion of the SOC data have been collected in areas of intensive agricultural or forestry use; however, vast areas beyond these forms of land use have few or no soil data available. Here we present a new SOC database for the country, which is the result of an unprecedented national effort under the framework of the Global Soil Partnership. This partnership has helped build the largest database of SOC to date in Chile, named the Chilean Soil Organic Carbon database (CHLSOC), comprising 13 612 data points compiled from numerous sources, including unpublished and difficult-to-access data. The database will allow users to fill spatial gaps where no SOC estimates were publicly available previously. Presented values of SOC range from 6 × 10−5 % to 83.3 %, reflecting the variety of ecosystems that exist in Chile. The database has the potential to inform and test current models that predict SOC stocks and dynamics at larger spatial scales, thus enabling benefits from the richness of geochemical, topographic and climatic variability in Chile. The database is freely available to registered users at https://doi.org/10.17605/OSF.IO/NMYS3 (Pfeiffer et al., 2019b) under the Creative Commons Attribution 4.0 International Public License. Earth Syst. Sci. Data, 12, 457–468, 2020 www.earth-syst-sci-data.net/12/457/2020/ M. Pfeiffer et al.: CHLSOC 459


Introduction
Soil organic carbon (SOC) stocks play a vital role in the global carbon (C) cycle and make up nearly two-thirds of the total terrestrial carbon pool (Eswaran, 2000;Sarmiento and Gruber, 2002). Therefore, knowledge of the contents and dynamics of SOC stocks is essential for estimating trends in the evolution of atmospheric carbon dioxide (CO 2 ) to be used as an input and applied to models of global climate change (Jones et al., 2005;Davidson and Janssens, 2006). However, predictions of SOC stocks vary widely due to the limited availability of soil data for remote regions and existing soil datasets being biased towards highly managed forest and agroecosystems (Duarte-Guardia et al., 2018). Chile is not exempt from these difficulties, having many of its publicly available soil and SOC data focused on intensively cultivated central regions (Padarian et al., 2012(Padarian et al., , 2017. In fact, vast areas of the country are situated in the high Andean mountains, the hyperarid Atacama Desert or the inaccessible Magellanic moorland of the Patagonian fjords, for which very few soil data are available. These areas are of particular interest for SOC dynamics and stock predictions as they represent the extreme ends of a huge latitudinal climate gradient, from Earth's driest extreme in the north (Atacama Desert) to the very humid conditions of the Patagonian Pacific margin, all flanked by the second-highest mountain range in the world (Garreaud et al., 2009;Ewing et al., 2008;Loisel and Yu, 2013).
Access to spatially explicit, consistent and reliable soil data is essential to model and map the status of soil resources globally to an increasingly detailed resolution in order to respond and assess global issues (Arrouays et al., 2014;FAO, 2015;Hengl et al., 2014;Omuto et al., 2013). Furthermore, soil datasets are one of the most important inputs for Earth system models (ESMs) to address, for example, the importance of terrestrial sinks and sources of greenhouse gases (Dai et al., 2019;Luo et al., 2016). At the same time, soils in ESMs are one of the largest sources of uncertainty (Dai et al., 2019). Hence, in recent years, there has been a growing effort to improve access to and quality of soil datasets, a key goal of the Global Soil Partnership Pillar 4 Implementation Plan sponsored by the Food and Agriculture Organization of the United Nations (Batjes et al., 2017;Omuto et al., 2013). Efforts to increase access to harmonized soil products containing comparable and consistent datasets, including soil carbon, are highly valuable and appreciated by an increasing number of users (Arora et al., 2013;Baritz et al., 2014;Batjes et al., 2017;Hendriks et al., 2016;Jones and Thornton, 2015;Luo et al., 2016;Maire et al., 2015).
In an unprecedented national effort, between May 2018 and April 2019, a group of professionals from 39 public and private institutions joined together to build the largest (to date) Chilean SOC database (CHLSOC). The database was compiled from varied data sources including soil surveys, publications, private reports, unpublished research data, and cryptic documents unknown to the public and often difficult to access. This work resulted in a harmonized database of 13 612 points, which is a great improvement considering that previously up-to-date harmonized data on SOC for Chile included 45 points in WoSIS (Batjes et al., 2017).
The entire CHLSOC database (13 612 data points from 25 sources; summarized in Table 1) is freely available for registered users to download at https://doi.org/10.17605/OSF.IO/NMYS3 (Pfeiffer et al., 2019b). This joint effort has resulted in a comprehensive Chilean soil database that is available to the international community for analysis, exchange and interpretation.

Database sources
In order to fill the gaps in the current data, 889 soil profiles and 12 723 topsoil samples from all over Chile (Table 2) were gathered, curated and harmonized. Of this information, 89 % had previously been unpublished or unavailable to the national and global scientific community. The resultant soil information was from all of the administrative regions and 16 out of 17 ecological zones of Chile ( Fig. 1; Table 3).
Data compiled from the literature are referenced in Table 1. Sources include legacy soil surveys, environmental assessment reports, research papers, private reports, theses and unpublished data provided by researchers. The minimum requirements for inclusion in the database were geographiccoordinate information, records of soil horizon depth and soil organic carbon content (or organic matter content). Other soil variables, such as bulk density, texture and/or coarse fragments, sampling depth, sampling year, and measurement methods, were included where available. Approximately 20 % of horizon samples included information on bulk density (BLD) measured using the clod or the core (cylinder) method, and only 382 horizons (< 3 %) included information about coarse fragments (CRF).
The resulting database (summarized in Table 1) includes datasets of variable size, source and composition. Unpublished data sources are referenced in the database to the coauthor and group who provided the data. Examples of unpublished data sources are shown in Table 1 and include those of the Oficina de Estudios y Políticas Agrarias (ODEPA) with 782 points provided by José Ramirez, Methanobase (Table 1), corresponding to surface samples (0-25 cm) from the Magallanes Region collected in 2016 and provided by Lea Cabrol and Maialen Barret (Table 3). A further 51 data points from the Environmental Impact Assessment System (SEIA) were included from mostly underrepresented areas, such as the Andes and the Atacama Desert.
The largest contributor to CHLSOC (9935 data points) was the SOC dataset of the Agricultural and Livestock Service (SAG by its Spanish abbreviation). The data were comprised of SOC obtained from the first 20 cm of soil by auger  (1996a, b, 1997a, b, 1999, 2002, 2003, 2005a, b, 2007 or excavation methods sampled by beneficiaries (farmers) of the SAG subsidy program. Another important data contributor was the legacy soil survey data compiled by the Centro de Información de Recursos Naturales (CIREN), reported as regional soil surveys that were carried out from the 1960s up to 2007. In total, CIREN compiled 37 soil surveys, totaling 540 data points over 177 500 km 2 (equal to about 24.5 % of the total Chilean territory), many of which are already compilations of former studies originally not referenced by CIREN (CIREN, 1996a(CIREN, , b, 1997a(CIREN, , b, 1999(CIREN, , 2002(CIREN, , 2003(CIREN, , 2005a(CIREN, , b, 2007.

Data harmonization processing and caveats
The assembled data were sampled over several decades and compiled by different authors and institutions. We would like to mention the following warnings to the data users: first, for some data points it was not possible to find or verify the original data source. Second, a potential source of uncertainty may be the analytical method employed for analysis; for most samples (97 %), SOC content was analyzed using the wet-oxidation method, and a small number were analyzed by total combustion (CN elemental analyzer). Discrepancies in SOC results between combustion methods have identified wet combustion as a less reliable assessment method for SOC, as it tends to underestimate organic carbon at higher SOC contents (Kumar et al., 2019) and potentially overestimates it in highly reduced soils (Chatterjee et al., 2009). This issue has not been addressed in Chile to date. The recommended methods for SOC determination are currently wet oxidation and loss on ignition; however, dry combustion is a more accurate alternative (Sadzawka et al., 2006). Future data collection initiatives should stress consistent analytical procedures as a revision of local standards is urgently required. Finally, a possible source of bias in data from SAG is the fact that samples were taken by farmers following SAG guidelines where a composite sampling is taken for each parcel.

Spatial distribution
To date, CHLSOC is the most complete data compilation for mainland Chile, comprising 13 612 points, a great improvement in comparison with former databases used in Chile for SOC assessments. For example, national SOC mapping studies (Padarian et al., 2017;Reyes Rojas et al., 2018) were based almost exclusively on CIREN data (540 points). CHLSOC can be used to show the influence of soil, vegetation and climatic conditions on SOC concentrations. Table 3 shows the number of data compiled in this work, by vegetation formation. It is important to note that the scheme of Luebert and Pliscoff (2006) corresponds to the potential vegetation belts that originally occupied the territory and does not necessarily reflect current land use. We refer to vegetation formations as "ecosystems" as this is a more common term and to avoid further specific disciplinary discussion, which is outside the scope of this work. In order to represent each ecosystem (by vegetation formation) in CHLSOC, the database is based on the number of data points divided by the total coverage of the ecosystem in Chile. More than two thirds (85.73 %) of the data are sampled from a concentrated area (25 % of the total country area) found in the following four ecosystems: deciduous forest, broad-leaved forest, sclerophyllous forest and thorny forest. The first two ecosystems are located in the northern section of the temperate macrobioclimatic zone and the second two in the southern section of the Mediterranean macrobioclimatic zone (Moreira-Muñoz, 2011). These ecosystems are characterized by a combination of benign climate, high-quality soils and water availability (for irrigation), resulting in a long history of agricultural activity and human settlement (Armesto et al., 2010). For this reason, these areas have experienced the highest land use conversion to agriculture, forestry and urban use in the country (Echeverría et al., 2006;Schulz et al., 2010;Arroyo et al., 2008). Deciduous forests (14.7 % of the country) are the most represented, with 52.14 % of the data points collected in CHLSOC located between latitudes 35 and 41 • S (Fig. 1).
The second-largest pool of data (8.6 % of the total data compiled in this work) is for evergreen forest, steppe and grassland (Table 3), which comprise 10.3 % of the country's area. These ecosystems are located between 41 and 53 • S in the temperate macrobioclimate (Moreira-Muñoz, 2011), a thermally homogeneous territory with a considerable precipitation gradient that can reach several meters of mean annual precipitation on its western section, along the Pacific coast (Garreaud et al., 2009). These areas contain vast sections of pristine forest, with only 8 % of the land being converted to other land use (Pliscoff and Fuentes-Castillo, 2011). Most of the data collected here correspond to the eastern section of the administrative region of Aysén in Patagonia. The relatively high representation of these ecosystems in the database can be attributed to (i) the intense agricultural use of the northern section of the evergreen forest and (ii) an unprecedented effort in soil sampling in the Aysén Region (43.5-49 • S) by SAG and the Agricultural Research Institute (INIA by its Spanish acronym; Table 1; Hepp and Stolpe, 2014).
Arguably the most important ecosystem in terms of SOC stocks for Chile is that of the moorlands, which comprise a large area located on the Pacific coast of Patagonia where the landscape is fragmented into fjords and small islands (be -Table 3. Distribution of SOC data points per ecosystem (vegetation formation) according to Luebert and Pliscoff (2006 tween 44 and 55 • S). The moorlands cover a significant section (9.1 %) of the country's area and are probably the largest soil carbon reservoir in Chile, with an almost continuous carpet of thick peat bog to a depth of 5 m in some places (Loisel and Yu, 2013;Minasny et al., 2019). Despite the importance of moorland soils, most of our knowledge of this ecosystem comes from the northern and eastern borders, whereas there is limited information about peat soils in remote areas of the western fjords (20 observations in this database). The Atacama Desert section of Chile (Table 3; desert, low desert scrub and desertic scrub) comprises 2.18 % of the CHLSOC database but corresponds to 6 % of the country's area. However, the number of data points compiled for this region (298) constitutes a great improvement compared with previous national work on SOC for the Atacama Desert, which only included 3 points (Padarian et al., 2017).
The scarce SOC information for this region may be due to the extreme aridity of the region, low biological activity and low SOC accumulation (McKay et al., 2003). Vegetation is restricted to a narrow belt along the coast that receives water from fog, deep valleys that cross the desert and the western flank of the Andes (Moreira-Muñoz, 2011).
Regions of high altitude and mountainous areas comprise 102 data points (0.74 % of the database) representing 16.2 % of the country's area. Two characteristic alpine vegetation formations exist in the Chilean Andes between 18 and 38 • S (Fig. 1) that comprise herbaceous alpine vegetation and alpine dwarf scrub. Most of the data are concentrated on the lower part (alpine dwarf scrub), while virtually no soil data are available for the higher section of the Andes (above 3000 m a.s.l.). The scarcity of soil data for this region means that assessment of the impact of climate change on soil C stocks is uncertain as large quantities of SOC are stored in this ecosystem (Bockheim and Munroe, 2014).
Few data are available for the coniferous forest, deciduous shrubland, thorny shrubland and arborescent shrubland areas of vegetation (Table 3) located in areas of low forestry or agricultural interest, but these areas comprise less than 2.5 % of the country.
In summary, the data we have compiled demonstrate the imbalance between areas of agricultural and forestry interest and areas beyond those land uses. Three areas of high value in terms of ecological, scientific and ecosystem services nationwide (and worldwide) are underrepresented in terms of soil data: the high Andes, the Atacama Desert and western Patagonia. Government efforts to develop soil surveys in these regions should be promoted urgently. In particular, a SOC inventory of western Patagonia is essential to properly assess the national stock of SOC and the potential to include this area in carbon offset programs.

Temporal distribution
The date of sample collection is provided in more than 90 % of the included data (12 318 data points). The majority of points were sampled in 2006 and between 2010 and 2018 (Fig. 2). The high number of data from the last decade enables users to estimate modern carbon in Chilean soils. Most of the data that report the year in which they were sampled are concentrated in a short timeframe and mainly correspond to the SAG database (2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018) and to sampling efforts re- Data from CIREN (Table 1) did not report a sampling date. However, as they consist of a compilation of known former soil surveys, we can limit the period in which samples were collected and analyzed to the period between 1970 and 2007. The oldest data points correspond to those collected by Holdgate (1961) in the western Patagonian fjords in 1959.

Data availability
Data are available at: https://doi.org/10.17605/OSF.IO/ NMYS3 (Pfeiffer et al., 2019b); the data are represented by a code defining the soil name; soils from the CIREN data source are identified by a three-letter code corresponding to the soil series, and data from 10 other sources are identified by the site or author name. Geographical coordinates are according to UTM WGS 84.

Conclusions
The process of generating this database was a distributed data collection effort, which is a step forward under the efforts of the GlobalSoilMap.net project and the guidelines of the FAO Global Soil Partnership. The database presented here increases the public availability of SOC data for Chile 10fold thanks to a joint effort of dozens of researchers and institutions. A high proportion of this database, 89 % (12 125 data points), consists of unpublished data that have now been made available. CHLSOC now contains a valuable SOC representation of a mosaic of ecosystems in Chile which represents one of Earth's most extreme climate gradients. However, there are still big differences in the number of data obtained from managed (agro)ecosystems and natural systems in areas of low population density. We would like to stress the urgency of generating a discussion at a national level regarding the need for a comprehensive soil survey program to increase the sampling in these underrepresented areas. Moreover, to include more data in the next versions of CHLSOC, future official CIREN soil surveys in Chile and other datasets should be encouraged to report holistic metadata covering sampling designs, locations, sampling dates and analysis methods.  Author contributions. MP, GFO, MG, RO, NB, JB, MF, GG, AG, JM, JR, CR, IS and SB designed the framework to produce the database. The paper was written by MP and JP with contributions from all other authors, who reviewed and provided input on the paper.
Competing interests. The authors declare that they have no conflict of interest.