The PROFOUND Database for evaluating vegetation models and simulating climate impacts on European forests

Process-based vegetation models are widely used to predict local and global ecosystem dynamics and climate change impacts. Due to their complexity, they require careful parameterization and evaluation to ensure that projections are accurate and reliable. The PROFOUND Database (PROFOUND DB) provides a wide range of empirical data on European forests to calibrate and evaluate vegetation models that simulate climate impacts at the forest stand scale. A particular advantage of this database is its wide coverage of multiple data sources at different hierarchical and temporal scales, together with environmental driving data as well as the latest climate scenarios. Specifically, the PROFOUND DB provides general site descriptions, soil, climate, CO2, nitrogen deposition, tree and forest stand level, and remote sensing data for nine contrasting forest stands distributed across Europe. Moreover, for a subset of five sites, time series of carbon fluxes, atmospheric heat conduction and soil water are also available. The climate and nitrogen deposition data contain several datasets for the historic period and a wide range of future climate change scenarios following the Representative Concentration Pathways (RCP2.6, RCP4.5, RCP6.0, RCP8.5). We also provide pre-industrial climate simulations that allow for model runs aimed at disentangling the contribution of climate change to observed forest productivity changes. The PROFOUND DB is available freely as a “SQLite” relational database or “ASCII” flat file version (at https://doi.org/10.5880/PIK.2020.006/; Reyer et al., 2020). The data policies of the individual contributing datasets are provided in the metadata of each data file. The PROFOUND DB can also be accessed via the ProfoundData R package (https://CRAN.R-project.org/package=ProfoundData; Silveyra Gonzalez et al., 2020), which provides basic functions to explore, plot and extract the data for model set-up, calibration and evaluation.


Introduction
Process-based models are key tools for understanding systems and forecasting climate change impacts in ecology and Earth system science (Schellnhuber, 1999). Vegetation is a crucial component of the Earth system, and forests are particularly relevant through their influence on hydrological and biogeochemical cycles, biodiversity and ecosystem services. Process-based vegetation models are used as diagnostic tools to disentangle the influence of different environmental and human drivers on biogeochemical cycling as well as vegetation structure from local to plot-level (Eastaugh et al., 2011;Fontes et al., 2010;Pretzsch et al., 2015;Tiktak and van Grinsven, 1995) to global scales Ito et al., 2017). At the same time these models are also the main tools to project climate change impacts on vegetation under changing environmental conditions, again from local (Reyer, 2015;Rötzer et al., 2013) to global levels (Zhu et al., 2016).
With increasing model complexity, the inclusion of more and more processes and models being increasingly used as tools for making quantitative projections for policy and management, there is a strong need to install some quality control on their performance. A basic requirement would be that models are actually able to match observed data. Moreover, while informal methods for calibration and model comparisons were often used in the past, the community has shifted in recent years towards more formal statistical methods for such tasks (Dietze et al., 2013;Hartig et al., 2012), which creates a need for systematic benchmarking data. For all these tasks, the availability of a wide range of data types crossing different spatial-temporal scales is generally viewed as beneficial (Grimm and Railsback, 2012).
The process of formal calibration, comparison and evaluation of complex vegetation models is often hindered by the availability and the harmonization of suitable data. The data necessary to drive a vegetation model are often complex and need to be compiled from different data sources (e.g. Bagnara et al., 2019). In particular for model comparisons, besides data for the evaluation of individual models, common input and driving data for process-based vegetation models are needed to ensure fair comparisons between the participating models. Although model comparisons have a long tradition in vegetation modelling (Cramer et al., 1999(Cramer et al., , 2001Bugmann et al., 1996;Morales et al., 2005), they have often been limited by overall data availability and comparability. Common databases that are ready to use for thorough model evaluation would allow the community to gain a better appreciation of model differences, explore structural uncertainties and provide a basis for more systematic ensemble projections of climate impacts.
Recently, several initiatives have started compiling model evaluation, input or driving data for a wide range of applications of process-based vegetation models (Huntzinger et al., 2013;Kelley et al., 2013;Warszawski et al., 2014;Sitch et al., 2015;Collier et al., 2018). Although these initiatives have leveraged important scientific progress, many of them have focussed on the global scale, mostly providing evaluation, input and driving data from global products. Such global products generally lack the breadth and depth of process-level detail required to rigorously assess model performance at smaller scales as for example they lack longterm and detailed measurements of forest stand structure. The database for the project "Towards robust projections of European forests under climate change" (hereafter PRO-FOUND DB) described here aims to bring together data from a wide range of data sources to evaluate vegetation models and simulate climate impacts at the forest stand scale. It has been designed to fulfil two objectives: to allow for a thorough evaluation of complex, processbased vegetation models using multiple data streams covering a range of processes at different temporal scales and to allow for climate impact assessments by providing the latest climate scenario data.
The PROFOUND DB only provides data for individual forest stands but contains a number of elements that are de-signed to foster comparison of both global/regional models and local models. The climate data, for example, are provided locally (or bias-corrected using local data) in the same way that stand-scale vegetation models would need them and also extracted from global gridded datasets that global vegetation models would use. The PROFOUND DB is also designed to allow for disentangling of uncertainties that affect quantitative model predictions in ecology (see Lindner et al., 2014, andDietze, 2017, for an explanation of different uncertainty types), for example by facilitating standardized evaluations of structural or process uncertainties via model comparisons. Model input and driver uncertainty are addressed through a wide range of climate data from different sources, covering the full range of Representative Concentration Pathways (RCPs). Collalti et al. (2018Collalti et al. ( , 2019, for example, have used the PROFOUND DB to study the effects of thinning on carbon use efficiency across a combination of all four RCPs and five global climate models. Finally, parametric uncertainty can be assessed through the wide range of data that can be used for inverse calibration. In the following we describe the main components of the PROFOUND DB  and an R Package  developed to explore the database and allow rapid and easy access for modellers.  Soil 2011O 1995/2008O 1995/1996O 2003/2004O 1995/ 2004O 2010O 1997/2004/ 2006O Local climate 2000-2008D 1996D 1996D 1998D 1996-2008D 1901D 1960D 1996D Reanalysis climate 1901D 1901D 1901D 1901D 1901D 1901D 1901D 1901 1950-2099 D 1950-2099 D 1950-2099 D 1950-2099 D 1950-2099 D 1950-2099 D 1950-2099 D 1950-2099 1980A 1980A 1980A 1980A 1980A 1980A 1980A 1980A Forest tree data 1997A 1992A 2001-2008A 1997A -1948-2011A 1967A 1994-2017A Forest stand data 1997A 1992A 1995-2011A 1997A 1986A 1937-2011A 1967A 1994-2017A MODIS 2000C 2000C 2000C 2000C 2000C 2000C 2000C 2000C Flux 2000-2008-2008M Meteorological 2000-2008-2008 Atmospheric heat conduction 2000-2008 M 1996-2014 M 1996-2014 M -1996-2008 M --1996-2012 M Soil flux series 2000-2008 M 1996-2014 M 1996-2014 M -1996-2008 M --1996-2012 M 2 The PROFOUND Database

Forest site selection and concept
The forest sites featured in the PROFOUND DB were selected to provide a wide array of data sources across a European gradient. We focussed in particular on providing long time series of tree-and stand-level growth and yield as well as carbon cycle data available from eddy-flux measurements because these variables are most commonly in calibrating and evaluating process-based vegetation models. The selected sites spread along a wide climatic gradient across Europe ( Fig. 1, Table 3) and cover some of the most common European forest types, as well as the main central European forest management history of favouring monospecific, evenaged forests or mixtures of two tree species. We compiled the data from existing data sources and collected the definitions of variables, their units and information about the main measurement methods from the site principal investigators (PIs) and from official descriptions of the data to harmonize the variables as much as possible. The overall guiding principle for the compilation of the data was to provide data that can be easily used by modellers for setting up and evaluating their models. In order to allow for data uncertainty to be reflected in model calibration studies, we also included uncertainty estimates for the measured data, such as those available for carbon flux measurements (see Sect. 3.2.9), wherever possible.

Data sources
The PROFOUND DB provides information on the site, soil and forest stand as well as data for climate, atmospheric CO 2 concentration, nitrogen deposition, carbon fluxes, atmospheric heat conduction and remote sensing at a range of different temporal resolutions (i.e. from 30 min to decadal measurements). Table 1 provides an overview of the different data types and their temporal resolution available in the PRO-FOUND DB. All variables available are listed in Tables S1-S13 in the Supplement. In the following we describe how the  Pilegaard et al., 2003).
individual sub-datasets of the PROFOUND DB have been brought together and describe the key variables and characteristics of each dataset.

Site information
For each forest site, the PROFOUND DB contains information on general site characteristics such as coordinates, elevation and forest type ( Table 2). There is also information on the potential natural vegetation and main tree species belonging to the regional flora (not shown).

Soil data
The description of the soil profiles contains information about physical and chemical properties of each soil horizon including the organic layer. Unfortunately the soil data are very heterogeneous for the sites, and considerable amounts of data are missing. In order not to lose the data that are available for only a subset of sites, we did not harmonize the individual variables, but for each site we provide the soil data in a consistent format. Despite these limitations, for most sites important soil data such as the depth of horizons, soil texture, bulk density, field capacity, wilting point, carbon and nitrogen content, and pH of the soil solution are available (see Table S2).

Local climate
For every site we compiled the locally observed daily meteorological data, either from measurement towers or from nearby meteorological stations. These time series cover the main climatic variables required by vegetation models and different time periods for each site (Table 3). They represent the best possible climate information for each site and are most suitable for model simulations comparing simulation output to observations.

Reanalysis products
In order to cover longer historical time periods and to assess uncertainties due to the choice of different climate inputs, the PROFOUND DB also provides long historical daily climate time series for each of the sites extracted from four different global reanalysis/observational products: Climate variables for the forest stands were extracted from the 0.5 • ×0.5 • grid cell of the global reanalysis/observational product in which the forest stand is located. The data are then kept at the original 0.5 • × 0.5 • resolution to allow for comparing the effects of choosing climate inputs for a vegetation model from a global reanalysis product as opposed to the local data presented in Sect. 3.2.3. The difference between the local data and the reanalysis data is most obvious for those sites located in complex, hilly terrain such as Collelongo or KROOF (Table 2). In these hilly locations the grid box average heights of the reanalysis products differ substantially from the heights of the site measurements.

Climate scenarios
The PROFOUND DB provides climate scenarios based on simulations performed for CMIP5 Table 3. Averages of the daily maximum temperature (T max ), daily minimum temperature (T min ), daily mean temperature (T mean ), annual precipitation sum (P ), daily mean relative humidity (RH), daily mean air pressure (AP), annual sum of global radiation (R, direct + diffuse shortwave radiation) and daily mean wind speed (W ) for each of the sites in the PROFOUND DB from five different sources: a locally observed climate and four different global reanalysis/observational products (GSWP3, Princeton, WATCH, WFDEI). The column "Year" indicates the years for which the mean climates have been calculated for the different sources. Please note that the two Solling sites have the same climate. NA: not available.
Bily  (https://www.wcrp-climate.org/wgcm-cmip/wgcm-cmip5; last accessed: 5 June 2020) that were bias-corrected and interpolated to a common grid resolution of 0.5 • × 0.5 • according to Hempel et al. (2013). The climate variables for each site available were extracted from the grid cell of the downscaled climate forcing dataset in which the forest plot is located. The data can be used in very different ways by the vegetation modelling community.
-The "ISIMIP Fast Track" scenarios (ISIMIPFT) consist of daily climate data available from five different global climate models ( S1). For RCP2.6, RCP4.5 and RCP8.5 from IPSL-CM5A-LR, HadGEM2-ES and MIROC5, additional data are also available for the period 2100-2299. These long-term climatic pathways stabilize at around 1-2 • C in the end of the 23rd century compared to 1980-2005 for RCP2.6, around 3-5 • C for RCP4.5 and up to 16 • C for RCP8.5. For all four GCMs, there are also time series of pre-industrial climatic conditions available from 1661 to 2299 (or 1661-2099 for GFDL-ESM2M), the so-called pre-industrial control run. The pre-industrial climates from each GCM for the time period 1661-1860 can be combined with the historical climates from 1861 to 2005 and any future time periods from the corresponding GCM to create a longterm time series of climate data from 1661 to 2299 (or 2099 depending on the GCM-RCP combinations) without almost any resampling . The ISIMIP2b data are best suited to test the implications of long-term stabilization pathways and different degrees of warming relative to pre-industrial conditions in vegetation models.
-The "ISIMIP2b locally bias-corrected" scenarios (ISIMIP2bLBC) have the same structure as the ISIMIP2b data but have been bias-corrected using an improvement of the method of Hempel et al. (2013) as described in Frieler et al. (2017) and Lange (2017) and the local observed climatologies presented in Sect. 3.2.3. The ISIMIP2bLBC data are hence best suited for scenario studies that require climatic data to be as consistent as possible with the observational data (Fig. 3).

Atmospheric CO 2 concentrations
Time series of atmospheric CO 2 concentrations are provided as annual, global data, hence as one time series for all sites of the PROFOUND DB assuming a well-mixed atmosphere.  Figure 4 shows the historical increase in CO 2 concentrations since 1765 and the projected future emissions according to the different RCPs. From RCP2.6 till RCP8.5 the total level of CO 2 increases strongly, and also the date of stabilizing emissions is reached much later in RCP8.5. RCP2.6 is the only RCP that projects declining CO 2 levels in the long run.

Nitrogen deposition
The nitrogen deposition data, reported as total deposition of reduced and oxidized wet and dry nitrogen deposition, respectively, have been extracted for each site of the PRO-FOUND DB from two different datasets which serve different purposes.
-EMEP data. For detailed model evaluation studies that require the best possible estimates of local nitrogen deposition, we extracted data from the "Co-operative programme for monitoring and evaluation of long-range transmission of air pollutants in Europe" (EMEP) for the time period 1980-2014 (EMEP/CEIP, 2014a, b). Sea-salt-corrected data are available from 1980 to 1995 in 5-year steps and from 1986 to 2014 at annual time step and are derived by atmospheric transport modelling (Simpson et al., 2012).
-ISIMIP data. For model simulation studies, we also provide nitrogen deposition estimates based on atmospheric chemistry modelling for a historical time period  and four future scenarios, where nitrogen deposition follows the four RCPs. The data are further described in Lamarque et al. (2013a, b), sea-saltcorrected and consistent with the global nitrogen deposition data provided within ISIMIP .  The data are taken from the global dataset without further corrections and hence are not intended to represent realistic, local forecasts but rather rough estimates of future nitrogen projections.
For the 1980-2014 time period, the ISIMIP data are typically lower and less dynamic than the EMEP estimates (Fig. 5). However, while they do not seem suitable for historical model evaluations, they cover a much longer time period and are clearly interesting for scenario studies because they feature different nitrogen deposition pathways consistent with RCP climates and CO 2 pathways. It is also important to note that measured throughfall of NO 3 and NH 4 is on average lower than modelled total deposition, due to canopy uptake (Marchetto et al., 2020). Moreover, for the two Solling sites the data presented here are identical while in reality total N deposition rates in the spruce stand should be higher because of higher dry depositions. Actually, the ratio between Solling spruce and Solling beech is 1.4 for NH 4 throughfall fluxes, 1.6 for NO 3 throughfall fluxes, 1.4 for NH 4 total deposition and 1.4 for NO 3 total deposition, both using a canopy budget model (Ulrich, 1994) for the period 1980-2014. However, these ratios are not constant and show an increasing trend over time.

Forest inventory data
For each site, the PROFOUND DB provides information about the forest stand at tree and stand level ( Table 4). The data are available for different time periods and have different measurement intervals, but generally they cover mostly the second half of the 20th century and the first decade of the 21st century (Table 1). The data also cover a wide array of height-age and diameter at breast height (DBH)-age relationships (Figs. 6-7). For seven out of nine sites individual tree DBH and height measurements are available. The time series length ranges between 15 and 65 years within the time period 1948-2015. For the Sorø site, the DBH and heights have been reconstructed from tree-ring data (Babst et al., 2014), and the full stand reconstruction is available from 1996 to 2010 at annual resolution (see Sect. S1 in the Supplement). Individual tree data allow analysis and comparison of model simulations with data on single-tree growth. From the tree data, we calculated a range of widely used stand variables (see Table S8). Additional stand-level data are available for some of the sites, such as leaf litter production or leaf area index, and have been included (see Table S8).

Flux data
The carbon fluxes, i.e. net ecosystem exchange (NEE), ecosystem respiration (RECO) and gross primary production (GPP) are taken from the Tier One FLUXNET2015 dataset (http://fluxnet.fluxdata.org/, last access: 5 June 2020). We provide estimates of fluxes calculated using different estimates for gap-filled and partitioned fluxes to give a rough estimate of the uncertainty added to the long-term budgets in the process. NEE data are filtered using two different methods to calculate uStar thresholds (Barr et al., 2013, and a modified version of Papale et al., 2006;see also FLUXNET2015, 2017. Daytime (i.e. Lasslop et al., 2012) and night-time (i.e. Reichstein et al., 2005) refer to whether ecosystem respiration parameters were estimated from only night-time fluxes or also using daytime data (zero intercept of GPP light response curve). In many cases the number of accepted night-time fluxes is low and the temperature range is narrow, which leads to high uncertainty in the estimated respiration. This can be improved by also using daytime fluxes. On the other hand in the daytime method the uncertainties of photosynthetic light, temperature and possible vapour pressure deficit (VPD) responses may be attributed to respiration parameters. Further information about the daytime and night-time methods is available in Lasslop et al. (2010) and Reichstein et al. (2005) and also FLUXNET2015 (2017). We also extracted different uncertainty estimates for each variable. Additionally, we provide time series of the sensible and latent heat flux, soil (soil water and soil temperature) and meteorological variables at a 30 min time resolution from the FLUXNET2015 database including measurement uncertainty estimates. Table 5 provide an overview of the main carbon fluxes at each of the sites featured in the PROFOUND DB. Tables S9 and S11-S13 provides the full list of available variables.

Remote sensing data
The PROFOUND DB includes remote sensing information at different spatial scales and temporal frequencies, specific for each product. We included five MODIS products (ORNL DAAC, 2008a-e) and several vegetation indices calculated from the surface reflectance data for each of the forest sites. The original MODIS scenes are available at the NASA Land Processes Distributed Archive Center (LP DAAC) (https: //lpdaac.usgs.gov/, last access: 5 June 2020). The specific time series included in the PROFOUND DB were downloaded from the Land Product Subset Web Service of the Oak Ridge National Laboratory Distributed Active Archive . Time series of tree diameter at breast height (DBH) versus age of the forest stands in the PROFOUND DB. The basal area-weighted mean DBH is shown for all stands with the exception of Le Bray for which the arithmetic mean DBH is shown (marked by asterisks). For Sorø, the DBHs have been reconstructed (see text in Sects. 4.9 and S1).
Center (ORNL DAAC) (https://daac.ornl.gov/MODIS/, last access: 5 June 2020). The ORNL DAAC MODIS subsetting web service is implemented to allow users access to massive amounts of remote sensing data (Santhana-Vannan et al., 2011). In addition, a second set of vegetation indexes was calculated from the reflectance values. A summary of this information is shown in Table 6. The full list of variables and how they were aggregated is provided in Table S10. The main difference among the forest sites is the data quality, which is highly dependent on the presence of clouds. When possible, low-quality observations have been substituted by interpolated values, otherwise the cell was left blank.
In any case the alteration of the original data was minimal. It is also important to note that the size of the pixel is large compared to the plot size of the forest stands, which means the pixel data also contain other vegetation types than the ones present at the sites.
Three general types of data are included: (1) geophysical variables as measured from the MODIS sensor, i.e. reflectance and temperature; (2) spectral indexes derived directly from reflectance values at different wavelengths; and (3) vegetation properties (i.e. FPAR, LAI, GPP and net photosynthesis) as estimated from physical variables through a range of models. The basal area-weighted mean height is shown for all stands with the exception of Le Bray for which the arithmetic mean height is shown (marked by * ). For Sorø, the heights have been reconstructed (see text in Sects. 4.9 and S1). Table 5. Summary of the observed carbon fluxes at the sites in the PROFOUND DB. Shown is the range (min and max) and the average (in brackets) of the annual sums in the observational period. All data are estimates based on the CUTRef method with daytime data included for RECO and GPP. GPP is expressed with negative values because it is considered a downward flux from the atmosphere. Likewise, negative NEE values indicate a carbon sink and positive values a carbon source.

Name
Years Although the MODIS sensor acquires daily information, the PROFOUND DB includes only composite data; that is, for each pixel the best value during a period of time (8 or 16 d) is selected as being representative of that specific period. Spatial resolution is also specific for each product and is dependent on the physical and technical limitations in the acquisition process of the variables involved in the product computation.
The NDVI and EVI at 250 m spatial resolution coming from the MOD13Q1 product were calculated from the visible and near-infrared spectral regions. A temporal frequency (16 d composite) was chosen to minimize the effect of clouds. The EVI index was developed to correct for atmospheric and background effects so that it shows a larger dynamic range in areas with high vegetation density (Didan et al., 2015).
The spectral profiles in the whole optical domain (i.e. 459-2155 nm) for each 8 d composite are represented by the surface spectral reflectance at seven wavelengths coming from the MOD09A1 product at 500 m spatial resolution. The criteria for the compositing process are low cloudiness, cloud shadows and low solar zenith angle; when several of these criteria are fulfilled the selection is based on the minimum value in the blue band (Vermote et al., 2015).
The second set of spectral indexes was computed from the MOD09A1 product. The indices based on the spectral shape have the advantage of combining information on three bands instead of two, and when the bands used are located in the SWIR region relevant information related to water is captured (Palacios-Orueta et al., 2005;Khanna et al., 2007;Palacios-Orueta et al., 2012).

Description of the forest sites
The most northern site is Hyytiälä in Finland with a boreal climate, while the most southern sites are Le Bray in France and Collelongo in Italy with an oceanic and Mediterranean montane climate, respectively. All other sites represent temperate climatic conditions ranging, however, from oceanic  Lary, 2015) and should be used for future regeneration. Historically, the site was seeded with 3000-5000 seedlings per hectare and then cleared once or twice to reach a density of 1250 ha −1 at 7 years old when seedlings reach the size for DBH recruitment.
Solling (beech) 6000 (5000-7000) 2 0.6 (0.5-0.7) 5 The actual stand was established in 1847 from natural regeneration. Until the beginning of measurements in 1966, the stand was regularly thinned. (Belgium, Denmark) to temperate (France, Germany) to subcontinental (Czech Republic). Unfortunately, sites representing more continental and (east) Mediterranean forests from southern and southeastern Europe are missing.

Bily Kriz (CZ)
The Bily Kriz site belongs to the ICP Forests Level II network and is a FLUXNET site located in the Moravian-Silesian Beskydy Mountains, Czech Republic, at an altitude of 875 m a.s.l. The climate is temperate with an annual mean temperature of 7.4 • C and an annual precipitation sum of 1434 mm over the 2000-2008 period. The soil is classified as a Haplic Podzol. The site is typical for mountain regions of temperate Europe such as the Black Forest, Bohemian Forest Šumava and forested Carpathians (Hercynian (spruce-)firbeech forests) but also the higher mountain belts in the (sub-)Mediterranean. Stand-forming tree species for such sites are Fagus sylvatica, Abies alba and Picea abies. Currently, a large part of mixed mountain forests are strongly managed for timber production. The main tree species occurring in Bily Kriz are Picea abies, rarely with a small proportion of Fagus sylvatica. The stand data represent an (even-aged) Picea abies monoculture with a mean DBH of 19 cm (year 2015). The potential vegetation belongs to the geobiocoenetype groups: Abieti-fageta (5AB3) -Abies alba Mill. + Fagus sylvatica L. with understory: Calamagrostis arundinacea (L.) Roth, Oxalis acetosella L., Vaccinium myrtillus L., Deschampsia flexuosa (L.) Trin. More information about the site can be found in Kratochvilova et al. (1989) and Meteorological yearbook (2012).

Collelongo (IT)
The experimental site of Collelongo is located in Selva Piana, a pure Fagus sylvatica forest in Collelongo (AQ, central Italy) at 1560 m a.s.l. Located 100 km from Rome, it is one of the first Italian sites of the ICP network and also part of the ILTER international network. The climate is Mediterranean montane, with a mean annual temperature of 7.2 • C and a mean annual precipitation of 1179 mm in the period 1996-2014. Bedrock consists of Cretaceous limestone. Soil depth exhibits high spatial variability ranging from 40 to 100 cm and is classified as a Humic Alisol (Chiti et al., 2010) or Dystric Luvisol according to the FAO classification. The stand is a typical Apennine beech forest dominated by Fagus sylvatica with sporadic trees of Taxus baccata. The phytosociological association is Polysticho -Fagetum (Feoli and Lagonegro 1982). Currently, Collelongo constitutes a managed

Hyytiälä (FI)
The most northern site included in the PROFOUND DB is the ICP Forests Level II site Hyytiälä, Finland. It is also a FLUXNET site and the coldest site with an annual temperature of 4.4 • C and 604 mm annual precipitation during the 1996-2014 period and lies at 185 m a.s.l. The soil is classified as a Haplic Podzol. Picea abies is the naturally dominant tree species building Fennoscandian moss-rich spruce forests with Pinus sylvestris. A Pinus sylvestris stand was sown in 1962, today with admixtures of Picea abies and hardwood species (Betula pendula, Betula pubescens and Populus tremula). Mean DBHs were 17 cm for P. sylvestris, 5 cm for P. abies and 7 cm for hardwood species in the year 2008. More information about the site can be found in Haataja and Vesala (1997)

KROOF (DE)
The KROOF forest belongs to the "Kranzberg Forest Roof Experiment" of the Technical University of Munich (TUM) and the Helmholtz Zentrum München. The site is located close to Freising, Germany, in the Kranzberger Forst in 502 m a.s.l. (wc-alt.). Mean annual temperature is around 8.2 • C, and annual rainfall is around 849 mm during the period 1998-2010. The soil type, Luvisol, is typical for the region. The potential natural vegetation is (sessile oak-) beech forest (Fagus sylvatica, Quercus petraea, Quercus robur).
The establishment of the research plot dates back to 1992. The mixed stand comprises large groups of Fagus sylvatica surrounded by Picea abies with mean DBHs of 26 and 33 cm in 2010, respectively. Other occurring species are Acer platanoides (20 cm), Pinus sylvestris (31 cm), Larix decidua (26 cm) and Quercus robur (29 cm). More information about the site can be found in Pretzsch et al. (1998Pretzsch et al. ( , 2014 and Matyssek et al. (2014).

Le Bray (FR)
The ICP Forests site Le Bray is located 20 km southwest of Bordeaux, France, at an altitude of 61 m a.s.l. Mean annual temperature is about 13.4 • C, and precipitation is 920 mm during the 1996-2008 period, constituting a moderate oceanic climate. The soil type is Arenosol (sandy and hydromorphic podzol), which is one of the most common soils in the region.  Loustau (1998), Bosc et al. (2003) and Berbigier et al. (2001).

Peitz (DE)
Peitz is a long-term research plot in eastern Brandenburg, Germany. The site lies at about 50 m a.s.l. The annual rainfall amounts to more than 608 mm, and annual mean temperature is around 9. More information about this site can be found in Riek and Stähr (2004) and Noack (2011Noack ( , 2012 and about the climate data in Gerstengarbe et al. (2015).

Solling spruce (DE)
Solling 305 is also a long-term intensive forest monitoring plot of the ICP Forests Level II network in central Germany. As the Solling beech site it belongs to the LTER (site LTER_EU_DE_009) and is a permanent soil monitoring plot of the state of Lower Saxony. It is situated close to the Solling beech site at an elevation of about 508 m a.s.l. and has similar site conditions as the Solling beech stand.
Potential natural vegetation is a Luzulo luzuloido Fagetum. Dominant species of the actual ground vegetation are Vaccinium myrtillus, Polytrichum formosum and Deschampsia flexuosa (Bolte et al., 2004). The forest is a 133-year-old Norway spruce (Picea abies) stand with a mean DBH of 46.6 cm and a mean height of 33.1 m in 2016. More information about the site can be found in Le Mellec et al. (2010), Bonten et al. (2011, Fleck et al. (2016) and Wegehenkel et al. (2017).

Sorø (DK)
The ICOS site Sorø (DK-Sor in the FLUXNET and ICOS databases) is located in Denmark at an elevation of 40 m a.s.l. The climate is warm temperate and fully humid with a mean annual temperature of 9 • C and annual precipitation sum of 774 mm during the period 1996-2010. The soil has been classified as Alfisols and Mollisols. Potential natural vegetation is deciduous broadleaved forest dominated by Fagus sylvatica. Other species occurring in the area are Fraxinus excelsior, Larix decidua, Picea abies, Quercus spp. and Acer spp. However, the region is mostly used as cropland. Data on tree DBH are reconstructed from tree-ring measurement (Babst et al., 2014) and historical management information for the time period from 1994 to 2017. Stand data are derived from these data for the time period from 1994 to 2017 (see Sect. S1). The mean DBH of this Fagus sylvatica stand was 41 cm in the year 2017. More information about the site can be found in Ladekarl (2001), Pilegaard et al. (2003Pilegaard et al. ( , 2011 and Wu et al. (2013). More information about the site can be found in Ladekarl (2001), Pilegaard et al. (2003Pilegaard et al. ( , 2011 and Wu et al. (2013).

Forest management of the sites
The sites available in the PROFOUND DB are managed forests, and the historic management can be derived from the tree and stand-level data (in terms of reduction of stem numbers). However, for future scenario studies, generic, simple management and planting guidelines are available (Tables 7-8). This future management corresponds best to "intensive even-aged forestry" as defined by Duncker et al. (2012).

The PROFOUND R package (ProfoundData)
The ProfoundData R package provides functions to access the PROFOUND DB (Figs. S2 and S3). The ProfoundData package plus a detailed vignette explaining the functionalities are available on CRAN (https://CRAN.R-project.org/ package=ProfoundData, last access: 5 June 2020). The Pro-foundData package serves as an interface for users that want to access the PROFOUND DB as a relational database via the R statistical software (R Core Team, 2016). The following main functions are included to achieve this goal: -"getData" to download data (data can be downloaded for one forest site and one underlying dataset at a time); -"browseData" to check the available forest sites, datasets, variables for a dataset, datasets for a forest site and the database version, metadata, data policy and original data source; -"plotData" to quickly inspect any variable of the datasets visually; -"summarizeData" to summarize data from the database; -"queryDB" to pass self-defined queries; -"writeSim2netCDF" to write netCDF files and can be used to convert data (and other files such as model simulation output) into netCDF files.
While the ProfoundData R package is meant to provide easy access to the PROFOUND DB, the database is also fully functional without the R package.

Conclusions
A wide range of data are needed to properly evaluate complex process-based vegetation models. The PROFOUND database compiles data from soil, climate, stand and flux measurements with data from remote sensing, atmospheric nitrogen modelling and climate modelling. Moreover, by providing data at 0.5 • × 0.5 • grid level plus locally biascorrected climate data, the datasets can be used to compare local forest models to global vegetation models. The PRO-FOUND database thus facilitates model evaluation, calibration, uncertainty analysis and model intercomparisons, highlighting the immense value of long-term environmental monitoring data for robust inferences about causal processes and future dynamics of forests.  (2001), Pilegaard et al. (2003Pilegaard et al. ( , 2011 and Wu et al. (2013) EUROFLUX, CarboEurope, CarboEurope-IP, NITRO-EUROPE, CarboExtreme and Risø-National Laboratory (DK) and technical University of Denmark (DTU) Supplement. The supplement related to this article is available online at: https://doi.org/10.5194/essd-12-1295-2020-supplement.

Appendix A: List of FLUXNET sites
Author contributions. CPOR and RSG contributed equally to the paper. CPOR and FH initiated the research. CPOR RSG, KD and FH designed the PROFOUND database. CPOR, RSG, YH and KD harmonized and prepared data for the PROFOUND database. RSG programmed the PROFOUND database and R package together with FH, FB and JS. LK and JK provided data for Bily Kriz. AC, GM, CT and EA provided data for Collelongo. PK, AM, TV, IM and JP provided data for Hyyitälä. TR and HP provided data for KROOF. DL, LMB, PB, DP and SL provided data for Le Bray. MN and PLB provided data for Peitz. HM, SF and MW provided data for the Solling sites. AI, KP and FB provided data for Sorø. DC and MV prepared the EMEP nitrogen data. HT and MB prepared the ISIMIP nitrogen data. AP, VC and RSG prepared the MODIS data. MB, JV, SL and HK prepared the climate data. SL bias-corrected the climate data. MM and MG checked the data and R Package. All other authors provided expertise on individual datasets and how to prepare them. CPOR wrote the manuscript with the support of all authors.
Competing interests. The authors declare that they have no conflict of interest.

Acknowledgements.
We are grateful for the support of all contributing data entities. The climate scenarios have been provided by ISIMIP (BMBF, grant no. 01L1201A1). The initial plot selection was supported with data from the International Co-operative Programme on Assessment and Monitoring of Air Pollution Effects on Forests (ICP Forests) operating under the UNECE Convention on Long-range Transboundary Air Pollution (CLRTAP). The data collection in Bily Kriz was supported by the Ministry of Education, Youth and Sports of CR within the CzeCOS programme, grant number LM2015061. Data collection at the Collelongo site was supported by the projects EUROFLUX, CANIF, CARBOEU-ROFLUX, FORCAST, CarboEurope and PRIN-MIUR. Activity and data analysis at the site are currently funded by resources available from the Ministry of University and Research (FOE-2019), under projects CNR DTA.AD003.474 and CNR DBA.AD003.139. The Hyytiälä data collection was supported by the projects EU-ROFLUX, CARBOEUROFLUX, CarboEurope and CarboExtreme and by the Academy of Finland Centre of Excellence programme, projects 118615, 141135 and 272041. The KROOF data were provided by TU Munich funded through the DFG -Sonderforschungsbereich SFB 607 and the DFG -KROOF project "Interactions between Norway spruce and European beech under drought" (PR 292/12-1, MA 1763/7-1, MU 831/23-1) as well as by the Bavarian State Ministry for Nutrition, Agriculture and Forestry and the Bavarian State Ministry for Environment and Health and BaySF (Bavarian State Forest Enterprise). The data for Le Bray were kindly provided by INRA funded through the projects EUROFLUX, CAR-BOEUROFLUX, CarboEurope, CARBO AGE and CarboExtreme. The Peitz data were kindly provided by Eberswalde Forestry Competence Centre. We are grateful to the Northwest German Forest Research Institute, Göttingen, for providing the Solling Data.
Solling Data from January 2009 to June 2011 were co-funded by LIFE+ and the Regulation (EC) no. 614/2007 of the European Parliament and of the Council, project FutMon (Further Development and Implementation of an EU-level Forest Monitoring System). The Sorø data collection has been funded through the EU projects EUROFLUX, CarboEurope, CarboEurope-IP, NitroEurope, CarboExtreme and Risø-National Laboratory (DK) and the Technical University of Denmark (DTU). This work used eddy covariance data acquired and shared by the FLUXNET community, including these networks: CarboEurope-IP, CARBOITALY and ICOS. The FLUXNET eddy covariance data processing and harmonization were carried out by the ICOS Ecosystem Thematic Center, AmeriFlux Management Project and Fluxdata project of FLUXNET, with the support of CDIAC, and the OzFlux, Chi-naFLUX and AsiaFLUX offices. Graham Weedon was supported by the Joint DECC and Defra Integrated Climate Programme -DECC/Defra (GA01101). CPOR and RSG acknowledge support from the German Federal Office for Agriculture and Food (BLE, grant no. 2816ERA06S). Friedrich Bohn acknowledges funding from the project "Inside out" (no. POIR.04.04.00-00-5F85/18-00) funded by the HOMING programme of the Foundation for Polish Science co-financed by the European Union under the European Regional Development Fund. Hyungjun Kim acknowledges the Grantin-Aid for Specially promoted Research 16H06291 and Scientific Research (18KK0117) from the Japan Society for the Promotion of Science. We are also grateful to Kirsten Elger, Robert Gieseke, Katja Henning-Hofmann and Michael Flechsig for their support to make the database open access. We are grateful to the many unmentioned technicians and students for their substantial help to maintain the continuous long-term field observations. Financial support. The PROFOUND Database has been developed based on work from COST Action FP1304 PROFOUND (Towards Robust Projections of European Forests under Climate Change), supported by COST (European Cooperation in Science and Technology, https://www.cost.eu/, last access: 5 June 2020), the Inter-Sectoral Impact Model Intercomparison project (ISIMIP, grant no. 01L1201A1).) and the I-Maestro project ("Innovative forest MAnagEment STRategies for a resilient bioecOnomy under climate change and disturbances", grant nos. 773324 and 22035418) funded by the ERA-NET Cofund Forest-Value and benefited from discussions in the IUFRO Task Force on Climate Change and Forest Health.
Review statement. This paper was edited by David Carlson and reviewed by two anonymous referees.