Compilation of relative pollen productivity (RPP) estimates and taxonomically harmonised RPP datasets for single continents and Northern Hemisphere extratropics

Relative pollen productivity (RPP) estimates are fractionate values, often in relation to Poaceae, that allow vegetation cover to be estimated from pollen counts with the help of models. RPP estimates are especially used in the scientific community in Europe and China, with a few studies in North America. Here we present a comprehensive compilation of available northern hemispheric RPP studies and their results arising from 51 publications with 60 sites and 131 taxa. This compilation allows scientists to identify data gaps in need of further RPP analyses but can also aid them in finding an RPP set for their study region. We also present a taxonomically harmonised, unified RPP dataset for the Northern Hemisphere and subsets for North America (including Greenland), Europe (including arctic Russia), and China, which we generated from the available studies. The unified dataset gives the mean RPP for 55 harmonised taxa as well as fall speeds, which are necessary to reconstruct vegetation cover from pollen counts and RPP values. Data are openly available at https://doi.org/10.1594/PANGAEA.922661 (Wieczorek and Herzschuh, 2020).


Introduction
Pollen records are widely used for the reconstruction of vegetation composition (e.g. Bartlein et al., 1984;Li et al., 2019). However, such records need to be interpreted carefully, as different taxa have different pollen productivities and dispersal abilities. While some taxa produce much and/or light pollen which is transported over large distances and thus overrepresented in the pollen records compared with vegetation, others produce little and/or heavy pollen which is hardly found in pollen records despite a high abundance of the taxon in the vegetation (e.g. Prentice, 1985;Prentice and Webb, 1986). To overcome these problems, relative pollen productivity (RPP) has been estimated and fall speed of pollen (FSP) measured or calculated for major plant taxa in several regions of the world (e.g. Baker et al., 2016;Broström et al., 2004;Commerford et al., 2013;Wang and Herzschuh, 2011). Most of these studies are limited to north-central Europe and China. Some major review studies provide RPP estimates for a number of sites and taxa (e.g. Broström et al., 2008;Li et al., 2018;Mazier et al., 2012), but a study compiling all available RPP estimates from the Northern Hemisphere -which would be useful to identify the most suitable dataset for a site-specific reconstruction -is not available. For an informed selection of the best-fitting RPP values, a consistent overview of metadata and information on the RPP data assessment is required. Table 1. Publications returned by our literature research for relative pollen productivity (RPP) estimates. Literature not included in all further evaluations is given in italics and marked with an x. If a study has been further examined but did not use the ERV model it is noted in brackets. Abraham and Kozáková (2012) Y.   Andersen, 1967 (   Combined large-scale RPP datasets are available for Europe (Mazier et al., 2012) and temperate China (Li et al., 2018). Such a compilation has, until now, not been available for North America. By including recent studies, we created new datasets for North America (including Greenland), Europe (including Arctic Russia), and China (including subtropical regions). Combining these into one northern hemispheric RPP dataset might allow for vegetation reconstructions using broad-scale pollen datasets by adopting a consistent approach.
Here we present a compilation of available RPP publications, four large-scale datasets of RPP estimates, and fall speeds (FSPs) for major northern hemispheric plant taxa.

Literature search
To find literature on relative pollen productivity estimates (RPP or PPE), we conducted internet searches in Google Scholar (https://scholar.google.de/, last access: 24 June 2020) and Web of Science (https://apps. webofknowledge.com/, last access: 24 June 2020) for the terms "PPE", "RPP", "Pollen productivity", "Pollen productivity estimates", and various combinations of our search terms. Furthermore, we used literature cited in publications on RPP estimates to gain the most complete overview possible of existing literature about northern hemispheric RPP estimates. Of the resulting 63 publications from our literature search, 12 were excluded a priori (e.g. if they did not provide RPP estimates or consisted only of compilations of previously available RPP data) and are marked with an x in Table 1.

RPP compilation
All RPP values and, if given, their standard deviation (SD) or standard error (SE) were collected from the literature. If the data were only presented as figures, values were extracted with the help of CorelDRAW X6. The RPP values from an unpublished study by Li et al. and from the studies of Ge et al. (2015), He et al. (2016), Wu et al. (2013), and Zhang et al. (2017), which are only available in Chinese, where extracted from Li et al. (2018), while the study of Chen et al. (2019) was extracted from Jiang et al. (2020).
While different approaches exist to estimate RPP, the extended R value (ERV) is the most common approach. Details on the ERV model and related assessment criteria can be found in, for example, Abraham and Kozáková (2012), Bunting et al. (2013), and Li et al. (2018). The maximum likelihood method (decreasing likelihood function score or increasing log-likelihood with distance) can be used to identify the relevant source area of pollen (RSAP) and should reach an asymptote with increasing sampling distance (Sugita 1994). For reliable results, the vegetation sampling area should be ≥ RSAP (Sugita 1994). Unexpected behaviour of the maximum likelihood method can occur if assumptions of the ERV model are not met (Li et al., 2018). Furthermore, a sufficient number of randomly selected sites (number of sites greater than or equal to the number of taxa for RPP estimation) is necessary (Li et al., 2018). Last but not least, for the correct application of the REVEALS model, RPP estimates need to have a standard deviation provided, to allow for correct estimation of the vegetation cover.
To allow for further assessment of the presented RPP data, we collected information on, for example, the maximum likelihood, the vegetation sampling radius, and the site distribution used in the different studies (Table A2, Wieczorek and Herzschuh, 2020, https://doi.org/10.1594/PANGAEA.922661). This will help researchers when creating customised RPP datasets. If RPP estimates for several models (e.g. ERV submodel 1, 2 or 3) were presented in the original study, we used all of them for the RPP compilation and added the information on which one was chosen as the best fit by the original author and/or in the RPP compilations of Mazier et al. (2012) and Li et al. (2018) (Tables A1, A3, Wieczorek and Herzschuh, 2020, https://doi.org/10.1594/PANGAEA.922661).

Continental RPP datasets
To develop large-scale datasets for North America (including Greenland), Europe (including Arctic Russia), China, and the Northern Hemisphere, we confined ourselves to those studies in which the prerequisites for the ERV model are met, i.e. a correct maximum likelihood curve, vegetation sampling radius greater than or equal to RSAP, and number of sites greater than or equal to the number of taxa. Furthermore, we only used studies providing standard errors or standard deviations. However, some exceptions were made: studies without information on RSAP or likelihood, for example, were included if they were previously found to be reliable by Mazier et al. (2012) or Li et al. (2018). In North America particularly, only a few studies are available. We thus incorporated further studies and indicate which assumptions are not met. We followed the authors of the original publications in the choice of the most reliable ERV model, but we included previous assessments of Li et al. (2018) and Mazier et al. (2012).
To be able to compare RPP estimates of different studies, it is necessary that all use the same reference, in our case Poaceae in accordance with most other studies. It is possible to recalculate RPP values based on other reference taxa by setting the original reference taxon to the RPP value resulting from other studies and recalculating all other RPP estimates based on that ratio (Mazier et al., 2012;Li et al., 2018). Of those studies selected for the continental RPP datasets, three did not have Poaceae as the original reference and did not include an RPP for Poaceae. The study of Bunting et al. (2005, reference taxon Quercus) did not provide standard deviations, so we used the values provided by Mazier et al. (2012) for this study, including the standard error. The RPP estimates of Li et al. (2015, reference taxon Quercus) were recalculated based on the mean Quercus RPP provided by F. , Zhang et al. (2017, Changbai), and . The RPP estimates of Matthias et al. (2012, reference taxon Pinus) were recalculated based on the mean Pinus RPP provided by Räsänen et al. (2007) and Abraham and Kozáková (2012). The study of Jiang et al. (2020) used Quercus as the reference taxon but included a value for Poaceae, which was used as the basis for recalculation.
With the remaining RPP estimates, two datasets of RPP were created. To obtain a reasonable taxonomic harmonisation, we assigned broader taxonomic levels to some taxa of the original publications. We kept all original values for the analyses, and calculated means per harmonised taxon for the final datasets if more than one value of finer taxonomic levels was available (Table 2).
In the choice of reliable values, we mainly followed the strategy of Mazier et al. (2012) and Li et al. (2018).
Dataset v1 includes all values of the chosen studies, except those RPP estimates which have an SD (or SE) greater than the RPP.
Dataset v2 is further reduced with the following steps.
-If N ≥ 5, the highest and smallest RPP estimates are excluded -If N = 4, the most deviating value from the taxaspecific mean is excluded. An exception is as follows: if two values are from the same study (they are generally similar), their mean is calculated and used for the overall mean (Salix in North America; Betula, Fabaceae, and Larix in China; Rumex in Europe). The most deviating value is chosen based on the resulting mean. An exception in North America is as follows: Betula with four values from only two studies are all kept.
-If N = 3, a value is only excluded if it is strongly deviating (> 100 % of the mean of all values), like Caryophyllaceae in China (RPP of an unpublished study by Li Rubiaceae + Galium type Rumex Rumex + Rumex sect. acetosa + Rumex acetosella + Rumex acetosa t.

Tilia
Tilia + Tilia begoniifolia + Tilia tomentosa + Tilia cordata et al. in Li et al. (2018)). Exceptions are as follows: in North America Asteraceae and in Europe Apiaceae with three values from only two studies are all kept, as the two similar ones came from the same study.
Dataset v2 was created separately for each continent and is comparable to the Alt-1 dataset of Li et al. (2018) and PPE.st2 of Mazier et al. (2012).
To calculate the SE of averaged RPP estimates, the delta method (Stuart and Ord, 1994, details in the supplement of Li et al., 2020) was applied. For the calculation of an RPP from pollen counts, a variance-covariance matrix is created. If only RPP ± SD (or SE) are available, the covariance is set to 0 and the final equation results in Some problems arise from the labelling of standard errors and standard deviations. While some studies provide stan-dard deviations, others provide standard errors or give no information. Some studies provide standard deviations, which are labelled as standard errors in other studies. Given this ambiguity, we used every value as it is and noted whether standard deviation or standard error are said to be given.

Northern hemispheric dataset
The majority of RPP studies concentrate on China and Europe, with one study from Arctic Russia and few studies from North America. We thus decided to create a northern hemispheric dataset to be applied only for broad-scale studies for which RPP data for various taxa would otherwise be lacking. The dataset for the whole Northern Hemisphere was calculated with all data of the continental datasets. We conducted Kruskal-Wallis tests on the dataset v2 between the continents for each taxon. Additionally, we conducted the tests on the variability between taxa, once for the Northern Hemisphere and separately for each continent, including only taxa with n > 2. Statistical analyses have been conducted with R software, version 3.5.3 (R Core Team, 2019).

Fall speeds
To use RPP values with, for example, the REVEALS model, fall speeds are necessary for the distance weighting of pollen input. Fall speeds were extracted from the compiled literature of the RPP datasets. If several values were available for one taxon (see Table A4), we calculated the mean with unique values, so if several studies had the same fall speed for one taxon, we used only one of them. Taxonomic levels were combined according to Table 2. Fall speeds for continental datasets were calculated based on studies used for RPP data.

RPP compilation
The compilation of RPP studies includes data from 49 studies, 43 of them using a form of the ERV model (Tables A1-A3 Twentynine studies used Poaceae as the reference taxon, while 20 studies used different taxa. The summary provides original RPP values with the given reference taxon. Only those used for the RPP datasets contain further RPP values recalibrated to Poaceae as the reference. An overview of all locations of the compiled RPP studies is given in Fig. 1, which clearly shows the absence of studies in Central Asia and large parts of Russia. Only a few studies have been conducted in North America. Not all studies provide information on the likelihood or RSAP, hampering the assessment of the reliability of the presented RPP values. Other studies do not provide standard deviations, leading to inaccurate results in subsequent applications.

RPP datasets
Of 60 RPP datasets, 28 (coming from 23 studies) were excluded prior to the calculation of the combined RPP datasets.
The likelihood function score should decrease and approach an asymptote when reaching the RSAP (see Sect. 2). Within the sampled vegetation area, the curve does not approach an asymptote in the studies of Calcote (1995) and Chaput and Gajewski (2018), meaning that vegetation composition is not studied up to the RSAP. As Poaceae was not used as the referenced taxon, we decided to not use these data despite the scarcity of studies in North America. In the studies of Han et al. (2017) and Xu et al. (2014), the likelihood function score increases. We followed the assessment of Li et al. (2018) and did not incorporate these RPP estimates. The likelihood function score further increases in the study of Ge et al. (2017Ge et al. ( , year 2014. Data from He et al. (2016) are not used in accordance with Li et al. (2018), as pollen are sampled from a pollen trap, which might behave differently compared to moss pollsters or lakes. In the study of Hjelle and Sugita (2012), the likelihood function score does not approach an asymptote. Sugita et al. (1999Sugita et al. ( , 2006 do not provide information on the likelihood, and RPP values are given without information on standard deviation or standard error. The studies of Twiddle et al. (2012) and Li et al. (2011) do not provide standard deviations or errors for the presented RPP values. The study of Wu et al. (2013, orig-inal publication in Chinese) was rejected by Li et al. (2018) because of a too large sampling area and we followed this assessment. Theuerkauf et al. (2013) does not provide information on the maximum likelihood or the RSAP. Data from Chen et al. (2019) were extracted from Jiang et al. (2020) but included insufficient information on the study design and the ERV approach. Data from the study of Qin et al. (2020) have been rejected as they had very high values for most taxa compared to other studies, which we assume was a systematic problem of the study. The study of Fang et al. (2019) was excluded because it was designed to test different methods for RPP estimation and was carried out in patchy vegetation without enough sites.
On the other hand, some studies were incorporated despite missing information or likelihood curves that did not meet our criteria. Hjelle (1998) and Nielsen (2004) do not provide information on the likelihood but have been included in the dataset of Mazier et al. (2012, i.e. was assessed by an expert). Bunting et al. (2013) do not provide information on the likelihood nor do they sample vegetation up to the value of RSAP. The scarcity of data from North America together with Poaceae as a reference taxon led us to the decision to keep these RPP estimates. While the likelihood function score should decrease and reach an asymptote at the radius of the RSAP, the log-likelihood should increase before reaching the asymptote. This is not the case for the study of Commerford et al. (2013), but data have been included due to scarcity of North American studies. At the boreal forest site of Hopla (2017), the likelihood function score does not reach an asymptote. Again, these data have been included due to the scarcity of North American studies.

Continental and northern hemispheric RPP datasets
All RPP data in the final dataset are given relative to Poaceae. Of 49 publications covering 60 sites, 27 publications and 31 sites are included in the final PPE datasets (10 studies and 11 datasets for China, 14 studies and 16 datasets for Europe, 3 studies and 4 datasets for North America). We have RPP data for 33 taxa in China, 34 taxa in Europe, and 25 taxa in North America. The northern hemispheric dataset consists of RPP values and fall speeds for 55 taxa (Tables 3-6, Wieczorek and Herzschuh, 2020, https://doi.org/10.1594/PANGAEA.922661). Twenty- Table 3. Overview of continental and northern hemispheric relative pollen productivity (RPP) estimates for woody vegetation with their standard error (SE) (dataset v1) and fall speeds. All values are relative to Poaceae. See Table A1 for information on original RPP data,  Table A4 for information on original fall speed values, and methods on the creation of dataset v1 (Wieczorek and Herzschuh, 2020     eight taxa are available in only one of the continental datasets (13 in China, 6 in North America, 9 in Europe). In dataset v1, 11 RPP values have an SD < 1 between the different datasets, while 15 have an SD > 1 (Fig. 2). The size of RPP as well as the variability of RPP values between continents partly differs between datasets v1 and v2 (Figs. 2, 3).
Comparison with taxa available in the compilations of Mazier et al. (2012, Europe) and Li et al. (2018, temperate China) clearly shows differences in absolute RPP values or a high absolute deviation for some taxa (Fig. 5, e.g. Juniperus, Artemisia, Rosaceae), while many others (e.g. Alnus, Quercus, or Ranunculaceae) have a similar range of values, especially when considering the absolute deviation.

RPP compilation
The compilation is, to our knowledge, the first overview of available RPP studies covering the whole Northern Hemisphere. It highlights data gaps with respect to certain regions and taxa and as such guides the design of future RPP studies. Good geographic coverage is, to date, limited to central/northern Europe and China (Fig. 1). RPP studies in Russian and North American boreal forests as well as in tropical regions are largely lacking. The compilation covers most common taxa, mostly at the genus level, but the taxonomic resolution of available RPP estimates varies between studies and depends on the level to which pollen has been identified. Furthermore, while some taxa have a large number of avail- able RPP estimates, for 24 taxa (i.e. ∼ 40 %) only one or two datasets are available. By including additional metadata, our compilation is useful for the identification of available RPP sets at specific sites and regions and indicates how suitable they may be for further research. For many studies, however, missing details needed for the evaluation (e.g. information on the maximum likelihood method) or use (e.g. standard deviation) of the RPP values lower their usefulness. It should therefore be stated clearly whether data are presented with standard deviation or standard error.

Continental and hemispheric PPE datasets
Using RPP estimates for pollen-based quantitative vegetation reconstruction (Sugita, 2007;Theuerkauf et al., 2016) has improved our understanding of environmental change (e.g. Marquer et al., 2014). In this paper, we present RPP datasets for three continents and one dataset of northern hemispheric extratropical RPP estimates and corresponding fall speeds, based on a compilation of studies.
We found that RPP values partly vary between the three continental datasets. Some uncertainty arises due to the use of inconsistent reference taxa. Most studies used Poaceae, a widespread family, whose pollen is easy to identify and often preserved in a good state. However, as discussed by Broström et al. (2008), the pollen cannot be identified to the species level, and different studies may thus have used different species of Poaceae for the reference. Other taxa at higher taxonomic resolution such as Quercus or Acer are therefore sometimes used as the reference taxon (see Table A1, Wieczorek and Herzschuh (2020); https://doi.org/10.1594/PANGAEA.922661).
Reasons for variable RPP values have been discussed in depth by Broström et al. (2008) and Li et al. (2018) and are mainly methodological factors such as different sampling designs and environmental factors such as vegetation characteristics. Furthermore, pollen taxa from different sites can contain different species. Li et al. (2018) discussed in detail for Pinus and Artemisia that vegetation structure and climate of different Chinese study regions, but also methodological differences like the pollen sample type (moss vs. lake sediment) and vegetation sampling method, can explain the variability of RPP estimates within one taxon even better than the occurrence of different taxa. This will be even more apparent when combining data for the whole Northern Hemisphere. However, our compilation clearly indicates that taxa have mostly characteristic RPP values (i.e. within-species variability is low compared to variability between species), while we found no significant differences between continents (i.e. variability within continents is not lower than variability between continents). This implies, when aiming to compare vegetation change between continents, that transformation of pollen data using RPP from another continent is better than keeping the data untransformed. While one has to keep in mind the limited amount of data influencing the statistical power, we conclude that there is no particular reason to not set up a northern hemispheric RPP dataset. Still, before applying one of the datasets presented, researchers should consult the original publication to be sure it fits their needs and standards and be aware of the rather problematic use of SD and SE, which might have influenced our presented SEs.

How to use the datasets
The RPP compilation can be used to get a good overview of existing RPP studies, to identify research gaps, and to find RPP estimates to apply at one's study area. It is important (i) to use only those RPP data which have been evaluated by experts or the author as best fit and (ii) to look at the original publication for further information on how the RPP estimates have been generated.
The continental datasets can be applied to assess vegetation changes using broad-scale pollen datasets. It is important to keep in mind that different taxa with different pollen pro- ductivities and dispersal abilities are combined in one RPP value and the application to such broad-scale datasets can only be an approximation. This is especially important for the northern hemispheric dataset, which should not be applied to calculate site-specific vegetation compositions. This dataset fills data gaps of RPP values in various regions, but at the cost of accuracy. We consider the presented averaged RPP values as a tool for data transformation to be applied to broad-scale pollen datasets. Using the dataset in this way can account for differences in pollen productivities and transportation rather than obtaining fully reliable quantitative information about the vegetation cover around a specific site.
Author contributions. MW and UH designed the study and wrote the manuscript. MW carried out the analyses and produced the tables and figures.
Competing interests. The authors declare that they have no conflict of interest.