Long-term phenological data set of multi-taxonomic groups, agrarian activities and abiotic parameters from Latvia, northern Europe

A phenological data set collected by citizen-scientists from 1970 to 2018 in Latvia is presented, comprising almost 47,000 individual observations of eight taxonomical groups, in addition to agrarian activities and abiotic parameters, covering in total 159 different phenological phases. These original data published offline in annual issues of the Nature and History Calendar 10 (in Latvian, Dabas un vēstures kalendārs ) have been digitized, harmonized and geo-referenced. Overall, the possible use of such data is extensive, as phenological data are excellent bioindicators for characterising climate change and can be used for the elaboration of adaptation strategies in agriculture, forestry, and environmental monitoring. The data also can be used in cultural–historical research; for example, the database includes data on sugar beet and maize, the cultivation of which was imposed on collective farms during the Soviet period. Thus, such data are not only important in the 15 Earth sciences but can also be applied to the social sciences. The data significantly complement current knowledge on European phenology, especially regarding northern regions and the temporal biome. The data here cover two climate reference periods (1971–2000; 1981-2010), in addition to more recent years, and are particularly important in monitoring the effects of climate change. The database can be considered the largest open phenological data set in the Baltics. 20 The data are freely available to all interested at (Kalvāne


Introduction
From the relatively narrow study of natural rhythms, phenology has developed into an interdisciplinary field of science, the 25 application of which is constantly expanding. Phenological data are important for agriculture, forestry (Peñuelas et al., 2009), understanding ecological processes (Walther et al., 2002), human health (Dierenbach et al., 2013) (knowledge of allergen flowering times and implementation of early warning and monitoring systems can protect those affected by pollinosis and reduce economic losses due to illness), tourism and education (Kalvāne, 2011), and onsite phenological observations are used 2 as ground truth for satellite data calibration (Nagai et al., 2018). Additionally, phenological data are increasingly being used 30 as bioindicators of climate change (Jochner and Menzel, 2015;Menzel et al., 2020;Ovaskainen et al., 2013).
The oldest phenological data series in Europe dates back to 1354 and holds the dates of grape harvest in Beaube, Burgundy, France (Labbé et al., 2019), as well as data from 15 th century France and Austria (Schleip et al., 2008). The earliest known data collection in the Baltic region, although fragmentary in nature, dates back to the 17 th century, obtained by reconstructing the date of the rye harvest in Estonia (Ahas, 2008). In Lithuania and Estonia, more systematic observations have been 35 undertaken in botanical gardens: when the Botanical Garden of Vilnius University was founded in Lithuania in 1782 (Romanovskaja and Baksiene, 2008) and in 1865 with the founding of the Botanical Garden of the University of Tartu in Estonia (Ahas, 2008). In Russia, a network of volunteer observers was established in 1848, when the Russian Geographical Association distributed a questionnaire to the scientific community, including schoolteachers (Ovaskainen et al., 2013).
Fragmentary data on phenology on the territory of present-day Latvia can be found in the collections of German clergies, as 40 well as in the mid-19 th century collections of articles of the Riga Society of Nature Researchers (in German, Naturforscher-Verein zu Rīga; in Latvian, Rīgas Dabaspētnieku biedrība). Systematic networks of voluntary observers across the Baltic states were established in the 1920s (Ahas, 2008;Grišule and Briede, 2008;Romanovskaja and Baksiene, 2008) and have, with varying success and changes in the maintainers of the data, continue to operate to this day, continuing the development an extremely valuable database. 45 Here we have described the significance of phenological data and the history of phenology in Latvia. We have evaluated the quality and reliability of the phenological data, describing the methods of data quality assessment and the structure of the database in detail. We have outlined the potential for its use in climate change research as an example of seasonal, chronological analysis (phenological calendar method) and long-term and annual fluctuation assessment. The presented data cover a period of almost 50 years, including not only plant phenology, but also animal, insect, fungal phenology, as well as agrarian seasonal 50 practices and abiotic parameters such as first snow, frost, and ice melting.
The goal of the publication is to (1) contribute to open data and science in the field of phenology, (2) outline the possible applications of the phenological data set, and (3) provide insight into the development of phenology in Latvia. Data set include 159 phenological phases over 1971 to 2018 for eight taxonomic groups, as well as abiotic and agrarian phenomena that can be attributed to a temporal biome in northern Europe. 55 The greatest value of our data is the large number (159) of reported phenological occurrences, which allows one to not only fully characterize the annual seasonal developments across the Latvian landscape, but also analyse long-term changes, mainly in the context of climate change. It is important to note that the data are not only useful for the Earth sciences, but they also have societal value. Agrarian observations allow one to gain insight in cultural and historical events: for example, the database contains data on the initiation of grazing in Latvia. Historically, the first cattle grazing after the long winter (Latvia has 96 to 60 155 days frost days annually (Avotniece et al., 2017)) was a significant event, which was celebrated by entertainment like traditional water fights (rumulēšanās in Latvian). Along with changes in agricultural practices, such events were no longer recorded. During the Soviet period, volunteer observers recorded maize (Zea mays) and sugar beet (Beta vulgaris) phenology, which had previously not been cultivated often in Latvia and were in a way "imposed". Observations of maize and sugar beet were recorded over 1970 to 1976. The publication takes an expanded look at the application of data in the Earth sciences. 65 The phenological data described here are available in the Zenodo data repository at https://doi.org/10.5281/zenodo.3982086 (Kalvāne et al., 2020), and integrated into the common European phenological database PEP725 (Pan European Phenology Project) (Templ et al., 2018). presented. The original data source is volunteer observations (Fig. 2). 75 With a varying degree of success, and along with changes of the data maintainers, this network of volunteer observers has been operating since 1927, with interruptions during World War II. In this publication we have collected phenological data, digitising them from the above-mentioned sources for the last two climatic reference periods (1971-2000; 1981-2010), as well as for more recent years, which are especially important for monitoring the effects of climate change. In this data source (Nature and History Calendar), the starting dates or first occurrence in the observation area of a given phenomenon are usually fixed. They are grouped by season: spring, summer and autumn.
In total, the database contains almost 47,000 records of 159 different phenological phases from 103 observation points (Fig. 85 4). Observations of the same phenological phase from the same location are considered a single data series. According to this definition, there are 7892 data series. However, there are only 1980 data series with 10 or more yearly observations. The table in the image shows phenological observations recorded in different locations (1.Vērgale, 2.Aizpute… 26.Visķi, 27.Šķaune); on header row phenological phases are mentioned: first individualsstarlings (in Latvian, atlido strazdi) etc.
Leontīna Pelše generally described autumn (handwriting text below the table): "autumn is wet and rainy, without indian summer. Harvest is good. A lot of berries and nuts. Often thunderstorm, hail. The first snow was on 26 October, 23.4 cm. 28 October: first river ice. 30 October: Air temperature during night -9°C". 100 The coordinates of the administrative units (village or town) were recorded as the location of the reported observations; it is evident that these were not the exact locations of actual observations. The locations of the observations were coded by the village or town name.
The dates were entered into an Excel spreadsheet following the format of the original publication, with each sheet including the year of observations, further the data set was processed in R (R Core Team, 2019). We have coded the phytophenological 105 phases according to the BBCH scale (Meier et al.,, 2001). For the start of crop sowing, we assigned code BBCH00 (Dry seed, innate or enforced dormancy, tuber not sprouted); however, the sowing date usually is the end date of this phase rather than starting date.

Quality control
A multi-step quality control procedure for the presented data set was adopted. There are two main sources of possible errors 110 in the data set: (1) errors related to the digitization of the paper publications and (2) errors related to observations, processing and publication in the original source. If identified, the errors of the first type were corrected manually. The presence of the second type of errors can only be inferred from internal inconsistencies of impossible or implausible values of reported observations. When such internal inconsistencies were identified, the values were flagged in the final data set, but not eliminated. The user of the data set can themselves decide to exclude such flagged values. The following error screening 115 procedure, similar to that used by Rutishauser et al. (2019), was used.
1. Test-1: Global outlier identification was applied if at least 4 observations of the given phase in a given year were available; 98.4% of all observations met this criterion. At first, deviation of each observation from the median value (Fmed-dev) for the given year and phenological phase for all stations was calculated. Then, the phenophase specific standard deviation (SDmed-dev) of the differences between yearly median (Fmed-dev) and actual observation for all years 120 and stations combined was calculated. All observations deviating from yearly median by more than 4 standard deviations Fmed-dev > 4 SDmed-dev were double checked against the original publication and corrected if necessary. After that, the remaining observations not passing Test-1 were individually considered by authors and, if so decided, flagged as implausible. After manual correction of typing errors, only 16 observations were flagged by Test-1 and only 4 of these observations were flagged as implausible according to expert judgement (Fig. 3); 125 2. Test-2: Local outlier identification was carried out for series where there are more than 10 observations for a given phenological phase (65% of all observations). Observations that deviated from the station median of the given phenological phase by more than 3 standard deviations were flagged and double checked for typing errors against the original paper publication. After error correction, 94 observations were flagged by Test-2 and 9 of those were flagged as implausible according to expert judgement (Fig. 3); 7 3. Test-3: This test examines the impossible order of phenological phases. Certain phenological phases must follow a strict order; e.g. fruiting (BBCH87) can take place only after flowering (BBCH61). We defined two lines of phase order: vegetative and generative. We checked these for each species, station and year. A special case is the development of winter cereals: it is normal that sowing and emergence takes place in the autumn, after the onset of the spring and summer development phases. Therefore, the arbitrary code "ReassumedGrowth" instead of regular 135 BBCH is assigned as the first number in spring development. In Test-3 we were able to consider only 9214 or 19.6% of all observations. Of those, only 14 (0.15%) failed this test and were marked both as "Wrong order" and "Implausible" (Fig. 3).
The questionable observations identified in Test-1 and Test-2 were evaluated by experts (the authors of this publication) and were flagged as "implausible" if considered unrealistic. Test-1 and Test-2 were reiterated after excluding implausible values. 140 In total, only 43 observations (less than 0.1% of the data set) were flagged as "implausible".

Figure 3: Association between Test-1 (standard deviation from yearly median of the given phase) and Test-2 (standard deviation from station median of the given phase); black lines indicate thresholds of 4 (Test-1) or 3 (Test-2) standard 145
deviations.

Temporal trends
In the presented data set, the time series from individual observation locations are rather short. There are indications in the literature that a shift in meteorological conditions in the Baltic region occurred around 1990 (Apsīte et al., 2013;Jaagus et al., 2017). Similar Europe-wide changes to earlier spring phenology starting from 1988 was reported by Wu et al. (2016). However, 150 to gain insight into the temporal phenological trends, we sub-selected series that match two criteria: 1) at least four observations before 1987 and 2) at least four observations after 1992. This ensures that data points from either side of 1990 are included. 8 Further, we calculated a simple linear trend for each station-phase time series and aggregated them along the lines of major phase groups: start of leaf unfolding (BBCH11), flowering (BBCH61), ripening (BBCH83-87) and plant senescence (BBCH92-93), first arrival and departure of migratory birds, in addition to abiotic and agrarian phenological phases. 155

Phenological calendar
The overview of the most observed phenological phases and their arrangement can be presented as phenological calendar. We selected only phases with more than 500 observations, arranged them by the average reported date of phase onset and presented as a boxplot. The box plot was supplemented with relative number of yearly observations indicated by density plot. The phenological calendar includes data on 46 phenological phases, arranged in chronological order by the median values. 160

The history of phenology in Latviaa brief overview
For Latvia, fragmentary phenological data can be found in certain regions in the diaries of clergy (Jansons, 1929;Kalvāne, 2011) from the beginning of the 19 th century. The first published data on the phenology of birds, bees and crops were published in 1866 in Terbata (now Tartu, Estonia) in Korrespondenzblatt der Naturforscher-Vereins zuze Riga (Naturforscher-Vereins 165 zu Riga, 1867). In the beginning of the 20 th century, the text Nature phenomena in annuals changes by Jesens (1920)  data" as phenological data were seen as complementary to a region's climatic information (Jansons, 1929). The guidelines also stated that "...it is also important that the observation includes plants that are indeed well-known, which are related to some other interests, including some quite practical ones, and such that can be observed, one could say, right before the eyes" 175 (Jansons, 1929). It should be noted that the guidelines have not changed significantly over time. The text Phenological Observations in Latvia was published until 1935. Observations were taken at more than 70 different points and this was the largest observation network to operate in Latvia.
Phenological data for the 1940s and 1950s were summarised in the publications of Zirnītis (1956) as the average period values, without indicating the sources of the obtained data, which might have been for political reasons. Information is lacking about 180 the maintainer of data during this period.
After World War II the network of phenological observations was renewed by the Hydrometeorological Service. Observations about crops and wildlife were taken at 20 meteorological stations and 20 posts. The obtained data were used for agricultural 9 purposes. Since the establishment of the Phenological Commission of the Geographical Society of the Latvian Soviet Socialist Republic in 1959, a network of public correspondents, phenologists, has operated the Society (Sproģe, 1979). Since 1971, 185 phenological data have also been published annually in the Nature and History Calendar, which have been digitised as part of the study.
In recent years (since 1999) the phenological network has been voluntarily coordinated by the co-author, Ģērmanis, continuing to send observation questionnaires, data forms (Fig. 2)  In parallel with the traditional network of volunteer observers, natural observations in Latvia are recorded in the Dabasdati.lv portal, maintained by the NGO, Latvian Fund for Nature. A smartphone application is also available that has significantly increased interest in nature observations. For example, in the first 3 months of 2020, more than 34,000 observations (16,000 with photographs) were recorded, mainly of birds as Latvia has a long history of ornithology (Priedniece, 2020).
Ornithological data are published on the latvijasputni.lv online portal. 200

Database structure
Digitised, historical data of volunteer observers after data quality control (see the section of methods) are combined in a database. The data are freely available at https://zenodo.org/repository at https://doi.org/10.5281/zenodo.3982086.

Phenological phases 230
The database contains data on eight different taxonomic groups, as well as on abiotic phenomena (first snow, snowmelt, ice moving, first thunder, first and last frost) and agrarian activities (planting of seedlings, planting and harvesting of potatoes, ploughing in autumn, etc.). However, it should be noted that the largest data group is for plants (almost 70%), the second largest group is birds (11%), followed by abiotic phenomena (8%), agricultural data (7%) and insects (3%). Other taxonomic groups comprise less than 1% (Fig. 6). From the taxonomic group of plants, almost half of the observations are recorded about 235 the flowering phase. The majority of volunteers note the first observed specimen/individual, which is often the earliest value, rather than the average value specific to the observation area, which is important to keep in mind when using these data. According to one volunteer observer (Ģērmanis Agris, personal communication 2020), it usually takes another 2 to 3 days for other specimens/individuals to reach the specified developmental stage reported in the observations.
Overall, the quality of the data can be assessed as high, as evidenced by the very low proportion of observations that were 250 determined to be unrealistic. For example, data from the same source (Nature and History Calendar) were successfully used to calibrate the initial model of the budding and flowering of the bird cherry (Padus avium) and silver birch (Betula pendula; Kalvāns et al., 20152013). The uncertainty of the model (2 to 4 days) was comparable or higher than in other similar studies (Siljamo et al., 2008).

Data limitations 255
When using the data, it is important to note that the georeferencing was approximate; the observers record the first instance of the phenomenon; most of the observations represent the spring phases; 4. the number of observation points varies from year to year: 1. A significant shortcoming of the database, the lack of metadata must be mentioned; we only know the approximate location of observation points. The coordinates of administrative units were taken as the location of the reported 260 observations although it is evident that this is not the exact location of the actual observation. However, the study region (Latvia) is on the East European Plain with poorly articulated terrain and elevations not exceeding 311 m above sea level; considering the relatively flat and homogeneous territory, this should not have a major impact when analysing data globally or regionally.
2. As is common in an area of strong seasonality, the volunteer observers often pay more attention to the spring phases 265 and less to summer or autumn, which may affect the data analysis, such as annual lengths or assumptions about growth stages and length.
3. Volunteers record the first appearance of the phenomena in the area. It is possible that the significance of phenological trends has also been influenced by the methodology of data collection. Voluntary observers do not observe the specific individuals from year to year but record the earlier observed value: this is event monitoring and not status monitoring. 270 For example, in the study by Forrest and Miller-Rushing (2010), it was stated that larger plants (with better physiological conditions) bloom earlier than smaller ones in the same population. Also, the healthiest birds lay eggs earlier than those that are in a weaker physiological state (Forrest and Miller-Rushing, 2010).
4. The number of observers varies greatly from year to year, reaching a peak in the 1970s, and no continuous data series from 1970 to 2018 are available for any observation point (Fig. 7), which is required for analyses of long-term changes 275 and for regional studies. There are two ways to solve this problem. The usual approach is to combine a series of observations from adjacent observation points, as has been done in the past (Kalvane et al., 2009). The second approach is to calibrate the phenological model using the available observation data set and to use, for example, gridded meteorological data (E-OBS), data of reanalysis or operational meteorological models for regional calculations (Kalvāns et al., 20152014;Wu et al., 2016). 280 The above-mentioned factors must be considered before using the data in our database; however, they do not reduce the value of the database or the applicability of the data.

Applicability of phenological data: temporal changes and seasonality in the landscape
A large amount of data, as in our case, allows one to describe chronological changes in the landscape in generalthe phenological calendar methodand to analyse long-term data changes that serve as evidence of climate change. This makes 285 it possible to compare annual fluctuations in the context of both species and region.

Seasonality in the landscape
Phenological calendars are long-established, identifying the most important indicator species (specific, characteristic and easily identifiable development phase) (Kalvāne, 2011) or by describing natural seasonal phenomena in general. The methods of creating phenological calendars and the forms of representation differ, but their goal is to demonstrate events in nature in 290 chronological order.
In this study, based on the quality of the data (phenological phases were selected with at least 500 observations throughout Latvia), a phenological calendar for the temporal biome has been created.
In Latvia, snowmelt marks the beginning of the phenological season, but the end of it is marked by leaves falling off the apple tree (Malus domestica; BBCH93). In total, 46 phenological phases are described in the calendar. As can be seen in Fig. 7, the 295 spring and autumn phases show a relatively larger scattering of values, i.e. larger interannual variation. For example, the beginning of flowering of pioneer species (Kolářová et al., 2014) such as hazel (Corylus avellana) and grey alder (Alnus incana; BBCH61) in Latvia may occur from the end of December to the beginning of May (Kalvāne and Kalvāns, 20212020).
It should be noted that during recent years, the beginning of flowering can already be observed in March. The timing of the summer phases is more consistent. In general, the green-up in Latvia is observed from end of April to the beginning of May. 300 As it is an area with high seasonality, which includes a long winter period, it is reasonable that spring indicators are recorded more often than, for example, summer or autumn phases.  Figure 7 shows that the data coverage varies from year to yearmost observations are recorded in the 1970s, and recently the number of observations has been declining. 310

Phenological data as the representative of long-term and annual fluctuations
Analysis of phenological trends is a generally accepted method in bioclimatology, which characterises both long-term and short-term changes. Figure 8 shows the flowering of hazel (Corylus avellana) as the phenological indicator of the beginning of spring, the flowering of linden (Tilia cordata) marks the middle of summer, while the yellowing of the leaves of the silver birch (Betula pendula) in the Baltic region (Kalvāne, 2011) is considered to be the beginning of autumn. 315 Autumn phases (both of birch and other wild deciduous trees) are characterised by large annual variations of observation 320 points. Within one year, yellowing can be recorded in the range of up to 2 months. Interestingly, after the year 2000, the observations become even more heterogeneousthe data demonstrate a larger deviation in the observation points. We explain this with changes in the humidity regimein recent years, the yellowing of the leaves due to drought has become more common (also observed for linden, Tilia cordata). In general, the spring and summer phases tend to occur earlier, and the characteristics of autumn trends are less pronounced. 325 Phenological observations are mentioned in the scientific literature as bioclimatic indicators of climate change, indicating significant phenological changes in all regions of the world. The evaluations of our data also coincide with those mentioned in the literature. By grouping the phenological data according to taxonomic groups, by separating the leaf unfolding, flowering, maturation and leaf colouring of the plant phases, it can be seen in the created histograms that significant seasonal changes have taken place in the landscape of Latvia (Fig. 9). 330 The largest changes have been recorded for the unfolding (BBCH11) and flowering (BBCH61) phase of plantsalmost 90% of the data included in the database demonstrate a negative trend. The ripening (BBCH83-87) phase also has a negative trendmost crops and wild plants ripen earlier. The onset of the autumn phase as the leaves change colour (BBCH92) and fall (BBCH93) shows a later trend with changes at the regional and interspecies level. For example, on average Norway maple (Acer platanoides) and aspen (Populus tremula) tend to colour later, while linden and birchearlier. Bird migration trends are 335 ambiguous: some bird species return earlier and some later. In the autumn migration, for example, on average the white stork leaves earlier, while geese migrate later (Kalvāne and Kalvāns, 20212020). Interestingly, the phase BBCH93the beginning of leaf fallrather frequently occurs before phase BBCH92the start of leaf senescencefor such tree species as birch, cherry, apple trees, aspen, and maple. There are 30 such cases. However empirical observations indicate that such a situation is possiblefirst leaves can fall while green. It should be noted that such 345 cases have been recorded in the last decade, which may indicate a change in influencing factors; for example, the risk of drought has increased during recent years. Therefore, the sequence of the phases BBCH92 and BBCH93 is not universal and was excluded from Test-3 (Section 2.2.).
For the abiotic phenomena, the greatest changes have been recorded for the first thunder, snowmelt, which occurs significantly earlier, and autumn frosts and first snow, which occur later. Changes in agrarian activities have been observed in those taking 350 place in autumn, such as field ploughing and winter sowing, which on average, are presently carried out later than was the case in the middle of the 20th century, indicating an extension of the season of field work. In general, significant seasonal changes have taken place across the Latvian landscape.

Summary and conclusions
We have presented the historical and phenological data of volunteer observers in Latvia for a period of almost 50 years as open data set. Data can primarily be used to describe the seasonal events of the temporal biome and to analyse influencing factors.
It should be noted that the role of phenological observations is growing. As provided in the example publications, the 360 application types vary; for example, they provide a good characterization of the seasonality of the landscape, of chronology ( Fig. 7), or change of chronology due to environmental changes (for example, the trophic mismatch between pollinators and plant flowering, which has been extensively described; Hegland et al., 2009). Seasonality changes can also be observed in the landscape of Latvia, especially in the early spring phases (Fig. 8).
Phenological data allow for the analysis of long-term and annual fluctuations (Fig. 8), which in combination with 365 meteorological studies create a comprehensive climatic characterization of the territory, which has also long been a priority for phenologists in Latvia (Section 3.1.). Phenology studies play an important role in the research of climate change, because phenological data are used as bioindicators to identify changes on regional and global scales. Satellite data are used on the global modelling scales; in the calibration of which, phenological data, as the ground truth, play an invaluable role. The greater the number of open data on ground truth observations available for validation, the higher the quality of the satellite imagery 370 product. In turn, modelling is important for the development of agricultural and forestry adaptation strategies. Future projections are based on the assessment of complex factors and phenological parameters, and the inclusion of influencing factors provides a more comprehensive and complete assessment important for third parties such as insurance companies, and decision-making bodies.
This database is unique as it covers not only phytophenological data but also other taxonomic groups, as well as abiotic and 375 agricultural phenomena. The latter have major potential applications in the Earth sciences, as well as in the social sciences.

Author contributions
Gunta Kalvāne was responsible for the study's conceptualization, investigation and supervision; prepared the manuscript with contributions from all co-authors. Andis Kalvāns designed the methodology part as well as prepared the phenological data for the database; undertook statistical analyses, programming and visualization. Andris Ģērmanis undertook data curation and 380 validation by using CRediT contributor roles taxonomy.
The authors declare that they have no conflict of interest.