A taxonomically harmonized and temporally standardized fossil pollen dataset from Siberia covering the last 40 kyr

Xianyong Cao, Fang Tian, Andrei Andreev, Patricia M. Anderson, Anatoly V. Lozhkin, Elena Bezrukova, Jian Ni, Natalia Rudaya, Astrid Stobbe, Mareike Wieczorek, and Ulrike Herzschuh

Pollen records from Siberia are mostly absent in global or Northern Hemisphere synthesis works. Here we present a taxonomically harmonized and temporally standardized pollen dataset that was synthesized using 173 palynological records from Siberia and adjacent areas (northeastern Asia, 42–75 N, 50–180 E). Pollen data were taxonomically harmonized, i.e. the original 437 taxa were assigned to 106 combined pollen taxa. Age–depth models for all records were revised by applying a constant Bayesian age–depth modelling routine. The pollen dataset is available as count data and percentage data in a table format (taxa vs. samples), with age information for each sample. The dataset has relatively few sites covering the last glacial period between 40 and 11.5 ka (calibrated thousands of years before 1950 CE) particularly from the central and western part of the study area. In the Holocene period, the dataset has many sites from most of the area, with the exception of the central part of Siberia. Of the 173 pollen records, 81 % of pollen counts were downloaded from open databases (GPD, EPD, PANGAEA) and 10 % were contributions by the original data gatherers, while a few were digitized from publications. Most of the pollen records originate from peatlands (48 %) and lake sediments (33 %). Most of the records (83 %) have ≥3 dates, allowing the establishment of reliable chronologies. The dataset can be used for various purposes, including pollen data mapping (example maps for Larix at selected time slices are shown) as well as quantitative climate and vegetation reconstructions. The datasets for pollen counts and pollen percentages are available at (Cao et al., 2019a), also including the site information, data source, original publication, dating data, and the plant functional type for each pollen taxa.

1 Introduction

Continental or sub-continental pollen databases are essential for spatial reconstructions of former climates and past vegetation patterns of the terrestrial biosphere and in interpreting their driving forces (Cao et al., 2013); they also provide data for use in palaeodata–model comparisons at a continental scale (Gaillard et al., 2010; Trondman et al., 2015). Continental pollen databases from North America, Europe, Africa, and Latin America have been successfully established (Gajewski, 2008), and a fossil pollen dataset has been established for the eastern part of continental Asia (including China, Mongolia, southern Siberia, and parts of Central Asia; Cao et al., 2013). These datasets have been used to infer the locations of glacial refugia and migrational pathways by pollen mapping (e.g. Magri, 2008; Cao et al., 2015) and to reconstruct biome or land cover (e.g. Ni et al., 2014; Trondman et al., 2015; Tian et al., 2016) and climates at broad spatial scales (e.g. Mauri et al., 2015; Marsicek et al., 2018).

Pollen records from Siberia have rather seldomly been included in global, Northern Hemisphere, or synthesis works (Sanchez Goñi et al., 2017; Marsicek et al., 2018), probably because (1) few records are available in open databases or (2) available data are not taxonomically harmonized and lack reliable chronologies. Binney et al. (2017) established a pollen dataset together with a plant macrofossil dataset for northern Eurasia (excluding East Asia; the dataset has not been made accessible yet), but the chronologies were not standardized and the pollen data restricted to 1000-year time slices. In addition, a few works that make use of Siberian fossil pollen data either present biome reconstructions (Binney et al., 2017; Tian et al., 2018), which do not require taxonomic harmonization of the data, or restrict the analyses to selected times slices such as 18, 6, and 0 ka (Tarasov et al., 1998, 2000; Bigelow et al., 2003).

Here we provide a new taxonomically harmonized and temporally standardized fossil pollen dataset for Siberia and adjacent areas.

2 Dataset description

2.1 Data sources

We obtained 173 late Quaternary fossil pollen records (generally since 40 ka) from Siberia and surrounding areas (42–75 N, 50–180 E) from database sources and/or contributors or by digitizing published pollen diagrams (Appendix A; this table is available in PANGAEA). A total of 102 raw pollen count records were downloaded from the Global Pollen Database (GPD;, last access: August 2010); 18 pollen count records were downloaded from the European Pollen Database (EPD;, last access: July 2016); 20 pollen records (16 sites have pollen count data, others with pollen percentages) were collected from the PANGAEA website (Data Publisher for Earth and Environmental Science, which also includes most pollen records found in GPD and EPD;, last access: July 2016); raw pollen count data of 17 sites were contributed directly by the data gatherers; and pollen percentages for the remaining 16 sites were digitized from the published pollen diagrams.

Figure 1Spatial distribution of fossil pollen records (+) in the study area. The number of each site is used as its ID in Table A1.

2.2 Data processing

Pollen standardization follows Cao et al. (2013), including homogenization of taxonomy at family or genus level generally (437 pollen names were combined into 106 taxa; Appendix B; this table is available in PANGAEA) and re-calculation of pollen percentages on the basis of the total number of terrestrial pollen grains. To obtain comparable chronologies, age–depth models for these pollen records were re-established using Bayesian age–depth modelling with the IntCal09 radiocarbon calibration curve (“Bacon” software; Blaauw and Christen, 2011). We set up a gamma distribution accumulation rate with a shape parameter equal to 2, a beta distribution with a “strength” of 20 for all records for the accumulation variability, a mean “memory” of 0.1 for lake sediments, and a high memory of 0.7 for peat and other sediment types (following Blaauw and Christen, 2011). For the 20 pollen records without raw pollen counts, we set the terrestrial pollen sum based on the descriptions given in the original publications. Approximate values or ranges were provided for 16 records, e.g. more than 600 for the pollen record from Chernaya Gorka palsa and between 452 and 494 grains for Two-Yurts Lake, these pollen sums are assigned at 600 and 470, respectively. A pollen sum of 400 is assigned for the other four records because no information was provided in the publications.. The “pollen counts” were then back-calculated using the pollen percentages and pollen sum. Finally, the pollen datasets are available with both count data and percentage data in table format in Excel software (taxa vs. samples), with age and location information for each sample.

2.3 Data quality

The Siberia pollen dataset includes pollen count data and percentages from 173 pollen sampling sites (Fig. 1). Sites are distributed reasonably evenly in eastern and western Siberia, but geographic gaps still exist in central Siberia (55–70 N, 90–120 E), where no published pollen records exist.

The dataset includes 83 pollen records from peat sediments, 57 records from lake sediments, 23 from fluvial sediments, 6 from coastal or marine sediments, 3 from palaeosol profiles, and 1 from palsa sediment (Appendix A). The peat and lake sediments generally have reliable chronologies and high sampling resolutions of the pollen records. About 83 % of the pollen records have ≥3 dates (∼57 % have ≥5 dates); 73 % of the pollen records have sampling resolutions of <500 years per sample and only 14 % sites have >1000 years per sample (Appendix A).

Within this dataset, 91 % of the pollen records (157 sites) have raw pollen count data or percentages with complete pollen assemblages (Appendix A). Although there might be some rare pollen taxa excluded from the published pollen diagrams (16 sites) that were digitized, these pollen taxa are likely of minor importance within the pollen assemblages. In addition, during digitizing we ensured that the sum of pollen percentages for each pollen assemblage was within 100±10 %, to minimize artificially introduced errors.

The pollen records were counted by different scientists that gave different pollen names to the same pollen types requiring taxonomic homogenization (from 437 original taxa to 106 combined taxa). However, this reduces the taxonomic resolution of the dataset. In cases where homogenization would have resulted in grouping pollen taxa with different growth forms (herb, shrub, or tree) together, we keep the taxa separately even though not all analysts separated them (for instance, Betula pollen is separated into Betula_shrub, Betula_tree and Betula_undiff). We also append the original pollen names to the dataset to ensure feasibility of future studies on various topics using these data.

The chronologies of most pollen records are based on a reasonable number of dates (mostly 14C; at least 3 dates per record). However, we also included pollen records from under-represented areas or periods that do not meet this criterion. Furthermore, most of the pollen records cover only part of the last 40 kyr, and comparatively few pollen records cover (parts of) the last glacial (i.e. >11 ka). We interpolated pollen abundances at 16 key time slices (40, 25, 15, 13, 11, 10, 9, 8, 7, 6, 5, 4, 3, 1.5, and 0.5 ka) using the interp.dataset function in the R package rioja (Juggins, 2012) to produce pollen presence–absence maps for Larix as an example of the distribution of available sites at these 16 key time slices (Fig. 2). We also present boxplots for 14 major pollen taxa from all available sites at the 16 key time slices (Fig. 3), which illustrates the general temporal patterns.

Figure 2Pollen-inferred presence–absence maps for Larix at key time slices. Black squares indicate presence while empty circles indicate absence.

Figure 3Boxplots of percentages of 10 major pollen taxa at all available sites at key time slices. La: Larix; Pc: Picea; Pn: Pinus; Be: Betula; Al: Alnus; Sa: Salix; Cy: Cyperaceae; Er: Ericaceae; Po: Poaceae; Ar: Artemisia.


3 Potential use of the Siberian fossil pollen dataset

Fossil pollen data mapping can be used to reveal broadscale spatial distributions over time, as Cao et al. (2015) demonstrate. In this paper, we present presence–absence maps for Larix as an example (Fig. 2). Larix has extremely low pollen productivity (e.g. Niemeyer et al., 2015) that causes the under-representation of Larix pollen compared to its cover in the pollen source vegetation (Lisitsyna et al., 2011). Accordingly, Larix pollen is accepted as an indicator of the presence of Larix locally (e.g. Lisitsyna et al., 2011). The pollen presence–absence maps for Larix (Fig. 2) show a wide geographical range over the last 40 000 years, even during the Last Glacial Maximum, when there was very likely a relatively low density of larch. Our results generally confirm the distribution revealed by Larix macrofossil analysis (Binney et al., 2009). The Larix distribution changes revealed by our pollen dataset exemplify the usability of the dataset for vegetation reconstruction.

The Siberian fossil pollen dataset has already been used for biome reconstruction (Tian et al., 2018), although an integration of this dataset into global or Northern Hemisphere-wide biomization research is still pending.

Pollen percentages in pollen assemblages do not directly reflect species abundance in the vegetation community because of different pollen productivity. Therefore, quantitative vegetation composition is modelled using pollen productivity estimates (e.g. Sugita et al., 2010; Trondman et al., 2015). Our pollen dataset was recently used to reconstruct plant cover quantitatively using the REVEALS model to describe the compositional changes in space and time, which is more reliable than using pollen percentages directly (Cao et al., 2019b).

Modern pollen data have been published from many sites in Siberia (e.g. Tarasov et al., 2007, 2011; Müller et al., 2010; Klemm et al., 2015). These modern pollen datasets can be used to investigate modern pollen–climate relationships, and these modern relationships can be used to make quantitative climate reconstructions, as has been done previously (e.g. Marsicek et al., 2018).

4 Data availability

Five datasets including overview and reference (site information), dating data, plant functional type for each pollen taxa, and pollen count and pollen percentage for each sample are available at (Cao et al., 2019a).

5 Summary

We present a taxonomically harmonized and temporally standardized fossil pollen dataset of 173 palynological records with counts and percentages from Siberia and adjacent areas (northeastern Asia, 42–75 N, 50–180 E).

Our open-access dataset is a key component that can help provide quantitative estimates of vegetation or climate, which can be used to validate palaeo-simulation results of general circulation models for the Northern Hemisphere.

Appendix A

Table A1Details of the fossil pollen records in the Siberian pollen dataset. NA – not available.

* Indicates the inclination of age–depth model with Lake Biwa. Elev. indicates elevation. Res. (year) indicates the temporal resolution. GPD: Global Pollen Database; EPD:
European Pollen Database; Pan: PANGAEA. Material codes for radiocarbon dating: A = terrestrial plant macrofossil; B = non-terrestrial plant macrofossil; C = peat–gyttja bulk;
D = pollen; E = total organic matter from silt; F = animal remains and shells; G = charcoal; H = CaCO3; U = unknown.

Appendix B

Table B1Pollen taxa used in the dataset and their corresponding original Latin names.

Author contributions

UH and XC designed the pollen dataset. XC and FT compiled the standardization for the dataset and wrote the draft. Other authors provided pollen data and all authors discussed the results and contributed to the final paper.

Competing interests

The authors declare that they have no conflict of interest.

The authors would like to express their gratitude to all the palynologists who, either directly or indirectly, contributed their pollen records to the dataset or accessible databases.

Financial support

This data collection and research were supported by the German Research Foundation (DFG), Palmod project (German Ministry of Science and Education), the GlacialLegacy project (consolidator grant of the European Research Council of UH, grant agreement no. 772852), and the Russian Fund for Basic Research (for AVL, research project no. 19-05-00477).

