BAYWRF: a high-resolution present-day climatological atmospheric dataset for Bavaria

Climate impact assessments require information about climate change at regional and ideally also local scales. In dendroecological studies, this information has traditionally been obtained using statistical methods, which preclude the linkage of local climate changes to large-scale drivers in a process-based way. As part of recent efforts to investigate the impact of climate change on forest ecosystems in Bavaria, Germany, we developed a high-resolution atmospheric modelling dataset, BAYWRF, for this region over the thirty-year period of September 1987 to August 2018. The atmospheric model employed in this study, the Weather Research and Forecasting (WRF) model, was configured with two nested domains of 7.5 and 1.5 km grid spacing centred over Bavaria and forced at the outer lateral boundaries by ERA5 reanalysis data. Using an extensive network of observational data, we evaluate (i) the impact of using grid analysis nudging for a single-year simulation of the period of September 2017 to August 2018 and (ii) the full BAYWRF dataset generated using nudging. The evaluation shows that the model represents variability in near-surface meteorological conditions generally well, although there are both seasonal and spatial biases in the dataset that interested users should take into account. BAYWRF provides a unique and valuable tool for investigating climate change in Bavaria with high interdisciplinary relevance. Data from the finest-resolution WRF domain are available for download at daily temporal resolution from a public repository at the Open Science Framework (Collier, 2020; https://doi.org/10.17605/OSF.IO/AQ58B).


Introduction
The forcing of climate change in modern times is clearly of global nature, and many important scientific problems can be understood at the global scale as well (e.g. Held and Soden, 2006). Climate impact assessments, however, must also understand the effects at regional and even local scales in order to develop appropriate adaptation and mitigation measures. Although local phenomena such as glaciers, lakes, vegetation patterns or stream flow show a strong dependence on large-scale climate dynamics, these proxies experience further variability when the large-scale signal is transferred to their location (e.g. Mölg et al., 2014). In order to contextualize local changes, there is a need to link local climate to the large-scale climate, ideally in a process-based way.
In dendroclimatological studies, the traditional approach is to compute a calibration function between local or regional tree-ring parameters and climatic variables. Typically, such a statistical relationship would try to utilize local station data (which are generally sparse), gridded observations (which tend to be of coarse resolution) or indices of large-scale climate dynamics (which describe coupled atmosphere-ocean modes) as the climatic influence (e.g. Hochreuther et al., 2016). Besides known problems like stationarity (e.g. Frías et al., 2006), statistical approaches also limit the possibilities to explain the influences at the various scales on a processresolving level. Dynamical downscaling with a full numerical atmospheric model provides a physical answer (Giorgi and Mearns, 1991), yet the disadvantage is the high computational cost. Hence, dynamical downscaling at near-kilometre resolution has traditionally been performed on a case-study basis for weather events (e.g. Gohm et al., 2008). Multidecadal simulations, on the other hand, were typically lim-Published by Copernicus Publications. 3098 E. Collier and T. Mölg: BAYWRF: a high-resolution present-day climatological atmospheric dataset ited to resolutions of tens of kilometres (e.g. Di Luca et al., 2016). With the progress of computational resources, dynamical downscaling is becoming a candidate for climate impact studies that require local-scale information, and the first decadal simulations at ∼ 1 km resolution are now available (e.g. Ban et al., 2014;Collier et al., 2018). From the resultant model output, impact studies can utilize information about local meteorological conditions at high spatial and high temporal resolution, and over long, climatologically relevant temporal periods. Moreover, the physically consistent output enables the generation of said process understanding of influences across the various climatic scales.
The management of forests is a classical impact study where adaptation and mitigation measures meet the heterogeneous effects of climate change at local scales (e.g. Lindner et al., 2014). With this background, the project BayTreeNet was started recently under the umbrella of the interdisciplinary climatological research network Bayklif (https:// www.bayklif.de; last access: 1 March 2020) and aims to investigate the response of forest ecosystems to current and future climate dynamics across different growth areas in Bavaria, Germany. The project comprises a network of 10 measurement sites where meteorological and dendroecological data will be monitored and used both for research and for public and educational outreach, which are currently in the process of being established. High temporal (approximately daily) and high spatial resolution data are key components of dendroecological impact studies, since the physiological behaviour of trees, their structural properties and functional wood anatomy, as well as other important parameters such as wood density and mortality risk, are not only influenced by seasonal averages but also by short-term extreme events and weather anomalies (e.g. Bräuning et al., 2016).
Previous regional climate simulations including Bavaria over continuous multi-decadal periods were performed with model resolutions as high as 5-7 km and up to the year 2009 (e.g. Berg et al., 2013;Warscher et al., 2019). However, to the best of our knowledge, such datasets at the kilometre scale and up to the near present do not yet exist, despite previous research highlighting the importance of convectionpermitting resolution in this region (Fosser et al., 2014). We address this data gap by performing simulations with an atmospheric model, configured with a convection-permitting spatial resolution in a nested domain over Bavaria, for the recent climatological period of 1987 to 2018. These data have the potential to find multidisciplinary interest among researchers assessing ecological and human dependencies on the climate for scientific and practical questions.

Atmospheric model
The atmospheric simulations were performed using the advanced research version of the Weather Research and Forecasting (WRF) model version 4.1 (Skamarock and Klemp, 2008) configured with two one-way nested domains of 7.5 and 1.5 km grid spacing situated over Bavaria ( Fig. 1), hereafter referred to as D1 and D2. Terrain data were taken from NASA Shuttle Radar Topographic Mission data re-sampled to 1 km and 500 m grids (Jarvis et al., 2008; https://cgiarcsi.community/ data/srtm-90m-digital-elevation-database-v4-1; last access: 24 May 2020) for D1 and D2, respectively, while land use data were updated based on the European Space Agency Climate Change Initiative Land Cover data at 300 m spatial resolution (http://maps.elie.ucl.ac.be/CCI/viewer/download. php; last access: 18 April 2018). The physics and dynamics options used in the simulations are based on several recent convection-permitting applications of WRF by the authors (e.g. Collier et al., 2019) but were not specifically optimized for these domains due to the computational expense of the simulations. The options are summarized in Table 1 and a sample namelist is provided in Appendix A. As no cumulus parameterization was employed in D2, both deep and shal-low convection are assumed to be explicitly resolved. We note that no additional urban physics were enabled beyond the default parameterization used by the Noah family of land surface models (Liu et al., 2006) and that land use sub-tiling was not enabled.
Forcing data at the lateral boundaries of D1 and bottom boundaries of both domains was taken from the ERA5 reanalysis (Copernicus Climate Change Service (C3S), 2017) at 3-hourly temporal resolution. The 30-year simulation was divided into 30 annual simulations that were run continuously from 15 August of year n − 1 to 31 August of year n. The first 16 d of each simulation were discarded as spin-up time, retaining data from 1 September of year n−1 onwards. Atmospheric carbon dioxide (CO 2 ) was updated in WRF for each simulation year using annually and globally averaged concentrations at the surface from the National Oceanic and Atmospheric Administration Earth System Research Laboratory (Tans and Keeling, 2019). Each simulation employed the CO 2 concentration of year n, which ranged from 351 to 407 ppm between 1988 and 2018. All other parameters and bottom boundary conditions (e.g. vegetation and land use) were held constant for all simulations. Therefore, they do not capture the impact of known land-use changes over the study period (e.g. Fuchs et al., 2013).
Each run required 12 d of wall time with 320 processors on the Meggie compute cluster at the Erlangen Regional Computing Center, for a total of 2.86 million core hours. The model was compiled using Intel 17.0 compilers and run using distributed-memory parallelization. Model output was written at 2-hourly intervals, amounting to more than 55 TB of data, in addition to ∼ 30 TB of pre-processing and input files. We selected this write frequency as a compromise between high temporal resolution and the logistical challenges of storing, analysing and disseminating the data.

Evaluation of forcing strategy
For the period of 00:00 UTC, 1 September 2017, to 00:00 UTC, 1 September 2018, we compared two simulations with different forcing approaches: one excluding and one including grid-analysis nudging to constrain drift in the large-scale circulation (e.g. Bowden et al., 2013). This period was selected due to the higher availability of observational data closer to present day and because the summer of 2018 contained a record heatwave with drought conditions (Beyer, 2018), permitting evaluation of an extreme event. We refer to these simulations as WRF_NO_NUDGE and WRF_NUDGE, respectively. For the WRF_NUDGE simulation, analysis nudging was applied in D1 outside of the planetary boundary layer and above the lowest 10 model levels using the default strength (3.0 × 10 −4 ) for temperature and winds and reduced strength (5.0 × 10 −5 ) for the water vapour mixing ratio (e.g. Otte et al., 2012), consistent with a previous decadal application of WRF (Collier et al., 2018). Given the computational expense of each annual simulation, we did not attempt to optimize the nudging coefficients for our study area and instead evaluate simply whether nudging in this form improves the simulated atmospheric variables or not.

Observational data
For model evaluation, we used data from the German Weather Service (DWD) Climate Data Center for all stations in Bavaria with hourly temporal resolution available, which provide good spatial coverage of our study area (Table 2; Fig. 2). To evaluate the forcing approach, we compared the following near-surface atmospheric variables at the highest temporal resolution available in the simulations, which is 2-hourly: air temperature and relative humidity at 2 m (T and RH), zonal and meridional wind components at 10 m (U and V ), and surface pressure (PS). In addition, we compared with daily total precipitation (PR) without correcting for undercatch. In our comparison with observations, we excluded measurement sites where the observed terrain height differed from the modelled value by more than 100 m (similar to, e.g. Vionnet et al., 2019), corresponding to four sites in total for all variables except for PS (three) and PR (nine). After this exclusion, the average difference between modelled and observed terrain height at all stations was within ±8 m for each dataset. We also excluded any days with missing observational data when computing daily statistics. We did not evaluate radiation variables, as only sunshine hours are available from the DWD in sufficiently large sample sizes. However, for understanding temperature biases in WRF during summer 2018, we used incoming shortwave radiation from the DWD Climate Data Center dataset entitled "Hourly station observations of solar incoming (total/diffuse) and longwave downward radiation for Germany" (Table 2). In total, there were four sites with both incoming shortwave (SW) and T data available in Bavaria between 1 June and 31 August 2018 whose elevation was represented within ±100 m in D2: Nuremberg (id 3668), Weihenstephan-Dürnast (5404), Würzburg (5705), and Fürstenzell (5856).
For statistical analysis, we computed the mean deviation (MD), mean absolute deviation (MAD) and the coefficient of determination (R 2 ) between station data and data from the closest grid point in D2 without spatial interpolation at twohourly and, for precipitation, at daily temporal frequency. The MD, also referred to here as the model bias, and the MAD were computed from observation minus model data. For precipitation, only daily totals were evaluated, and the MD and MAD were computed considering only days with non-zero observed precipitation.
Finally, we also compared night-time land surface temperature (LST) from the MODIS MYD11A1 dataset (Table 2) at 1 km spatial and daily temporal resolution with simulated skin temperature in D2 for the period of 1 June to 31 August 2018. The night view time ranged from 1.2 to 2.8 h in local solar time, with a domain and time averaged value 3100 E. Collier and T. Mölg: BAYWRF: a high-resolution present-day climatological atmospheric dataset   Table 2. Datasets labelled in black are shown by filled black circles, while datasets labelled in pink are shown by open pink circles, illustrating that locations for measurements of air temperature and humidity (a; TT_TU_MN009 and RF_TU_MN009) and of wind speed and direction (b; F_MN003 and D_MN003) were the same. The locations for measurements of surface pressure (P0_MN008) and of precipitation (R1_MN008) are shown in panels (c) and (d), respectively.   In our comparison, we excluded nights when MODIS had more than 50 % missing data over D2, leaving a sample size of 52. For evaluating the full simulation, we performed a similar analysis with the aforementioned station datasets for T , RH and PR (Table 2); however, we averaged or summed the data to daily timescales for comparison with BAYWRF. In addition to comparing with individual stations, we also compared monthly total precipitation in BAYWRF with the gridded dataset REGNIE from the DWD CDC, which is based on interpolated station data and available at 1 km spatial resolution (e.g. Rauthe et al., 2013). For the comparison, REGNIE data were regridded to the WRF grid using patch interpolation and the ESMF regridding toolbox in NCL (https://www.ncl.ucar.edu/Document/Functions/ ESMF/ESMF_regrid.shtml; last access: 10 September 2020) and the centred pattern correlation between the two datasets was computed.

Numerical issues in BAYWRF
We note that unphysically large sub-surface temperatures were simulated at a number of glacierized grid points, primarily during the months of July to September. Considering all of D2, the daily average number of affected grid The mean bias over all stations is shown for each simulation using the same colour assignment, while the lower and upper quartiles of the station biases are shown as a blue polygon and green bars for WRF_NO_NUDGE and WRF_NUDGE data, respectively. cells was 24, compared with 294 glacierized and 122 500 total cells. The maximum number of affected grid points was 274 on 31 August 2017, corresponding to 0.2 % of D2. In addition, over the climatological simulation, only one grid point in Bavaria was affected (J = 71, I = 285; 47.4952 • N, 13.6039 • E). Surface temperature remained physical, since it is limited at the melting point over glacier surfaces, and soil moisture was unaffected, since it is specified to be fully saturated in glacierized grid cells. No other land-use categories were affected, and adjacent grid points were also unaffected, as the land surface model operates as a column model with no lateral communication. To preclude usage of these data, sub-surface temperature was set to missing where it exceeded the melting point at glacierized grid points in BAY-WRF. More information about this numerical issue is available on the model's GitHub repository (https://github.com/ wrf-model/WRF/issues/1185; last access: 24 May 2020).

Evaluation of forcing approach
Averaged over the evaluation year, both WRF simulations capture the magnitude and variability of sub-diurnal nearsurface meteorological conditions at most sites well ( Fig. 3; Table 3). The interquartile range (IQR; range between upper and lower quartile) of MDs is 1 order of magnitude smaller than the observed standard deviation for all variables. As expected, variability is best captured for T and PS, with R 2 values that uniformly exceed 0.87 and 0.96, respectively. Those of RH have a larger range but a lower quartile above ∼ 0.55. Compared with these variables, the model shows less skill in simulating sub-diurnal variability in winds, with lower quartiles of R 2 for U and V of approximately 0.39 and 0.27, respectively.
Shifting to daily timescales, both simulations represent variability in daily total PR surprisingly well, with the upper quartile of MDs below ∼ 1.25 mm and lower quartiles of R 2 exceeding 0.18 and 0.33, depending on the simulation. The MD is positive at the majority of stations, indicating that WRF generally underestimates observed precipitation. The underestimate is likely greater than reported here, since the observations were not corrected for wind-induced undercatch. In addition to underestimating observed daily precipitation events (total sample size of 35 791 for all stations and record lengths), the simulations also produce false daily precipitation events, the vast majority of which are very small in magnitude (the median value in both WRF simulations is less than 0.1 mm d −1 ). Considering wetter days (precipitation exceeding 1 mm d −1 ; Ban et al., 2014), the number of false events is 10 times smaller than the number of observed events (sample sizes of 3096 and 2249 in WRF_NO_NUDGE and WRF_NUDGE, respectively).
Previous studies evaluating WRF over this region have reported root-mean-square deviations (RMSD). For direct comparison, the mean RMSD in WRF_NUDGE for 2-hourly T and RH is 2.67 • C and 13.7 %, respectively, and for daily total precipitation it is 5.27 mm. These values are comparable to previous high-resolution applications of WRF over Bavaria (Warscher et al., 2019).
Examination of model biases on a monthly basis reveals further insights into the model performance (Fig. 4). The amplitude of the annual cycle is overpredicted in WRF, indicating that the good average agreement in T results from compensating biases: there is a cold bias in WRF in winter, a well-known issue with the model over snow-covered surfaces (e.g. Tomasi et al., 2017), and a warm bias in summer (Fig. 4a). The latter bias results in an underprediction of RH during this season (Fig. 4b), suggesting that WRF represents absolute humidity more accurately. The summer temperature bias is also more sustained than the winter one, resulting in the long tails (heads) in the distribution of MDs of T (RH) in Fig. 3. There is also a general underprediction of nearsurface winds from fall to early winter, as exemplified by the results for U in Fig. 4c and the slight positive skewness of the distribution of MDs for both U and V in Fig. 3, consistent with overly stable atmospheric conditions resulting from the cold bias. Finally, the model tends to overestimate precipitation in early spring and underestimate it in summer and fall. The reported seasonal and mean biases in daily precipitation are consistent with a potential underestimate of deep convection and convective precipitation at 1.5 km grid spacing. Although simulated mean precipitation shows a weak grid dependency below a spacing of ∼ 4 km (Langhans et al., 2012), sub-kilometre spatial resolution is required to explic-Figure 6. Scatter plots of (a) air temperature bias vs. incoming shortwave radiation bias and (b) air temperature bias vs. land use category in closest grid cell to the station. The category abbreviations from left to right describe "Urban and Built-Up Land" (10 sites), "Dryland Cropland and Pasture" (4 sites), "Grassland" (72 sites), "Deciduous Broadleaf Forest" (1 site), 'Evergreen Needleleaf Forest' (11 sites), and "Mixed Forest" (3 sites). For both panels, data from WRF_NO_NUDGE and WRF_NUDGE are displayed as blue square and green circle markers, respectively. itly resolve the evolution and characteristics of clouds (e.g. Bryan et al., 2003;Craig and Dörnbrack, 2008;Prein et al., 2015). Figure 5 shows a representative time series of T and SW for the station in Nuremberg in June 2018. The time series illustrates that the positive temperature bias in summer 2018 results from two distinct contributions. First, there is an overestimation of daytime maximum T , coinciding with an overestimation of SW. This relationship is observed both at Nuremberg and at the other three stations for which both datasets are available ( Fig. 6a; see Sect. 2.2). The overestimation suggests there is an underestimation of either daytime cloudiness or its impact on incoming SW at the surface, likely stemming from the microphysics parameterization. Ban et al. (2014) identified similar processes underlying a warm bias in summer in a convection-permitting decadal simulation over central Europe. Second, there is an overestimation of night-time minimum T , suggesting that land surface processes may play a role. Of the 101 stations with T measurements available, the dominant land-use categories of the grid cells containing stations are "Urban" (10 sites), "Dryland Cropland and Pasture" (4 sites), "Grassland" (72 sites), "Deciduous Broadleaf Forest" (1 site), "Evergreen Needleleaf Forest" (11 sites), and "Mixed Forest" (3 sites). The overestimation of night-time T is greatest at stations located in grid cells classified as urban (Fig. 6b), consistent with a previous evaluation of WRF with the Noah-MP land surface model (LSM) for urban and rural stations in summer (Salamanca et al., 2018). The bias amplification in urban grid cells may reflect an incorrect classification of the underlying land surface in WRF, as only the Munich city station (id 3379) is listed as an urban station on the DWD's list for computing heat island effects. It may also result from an overestimation of heat storage when a mosaic approach is not used, and therefore the entire grid cell is treated as urban (Daniel Fenner, personal communication, 2020). The potential role of the land surface specification or properties is reinforced by the comparison with MODIS data (Fig. 7), which shows the largest warm biases over grid cells classified as urban or croplands, while biases are smaller in forested areas. There is also a cold bias along the foothills and at higher elevations in the Alps. The biases are slightly smaller in WRF_NUDGE than in WRF_NO_NUDGE, consistent with the station-based assessment.
In addition to factors internal to WRF, we note that the driving reanalysis data may also contribute to the warm bias, at least at some locations. From the available observations, 60 stations have both valid T data between June and August 2018 and a modelled elevation in ERA5 that is within ±100 m of reality. Averaged over the summer months and all stations, ERA5 has a mean warm bias of 0.37 • C. At 25 of the sites, a warm bias exceeding 0.5 • C is present, with an average value over these sites of 0.92 • C. The inclusion of grid-analysis nudging leads to a small but nearly uniform improvement in agreement between observed and simulated variables. The distribution of MDs is closer to zero for all variables except U and PS, while those of MADs are closer for all variables ( Fig. 3 and Table 3). R 2 values are also uniformly higher when nudging is used, and the lowest lower-quartile value is 0.3 in WRF_NUDGE compared with only 0.18 in WRF_NO_NUDGE. Nudging produces a particularly noticeable improvement in simulated precipitation, halving the MD and nearly doubling the R 2 values (Figs. 3, 4 and Table 3). Its usage also reduces the magnitude of the seasonal temperature biases and the number of extreme occurrences of the warm bias in summer (Figs. 4 and 6). Considering daily timescales, the agreement of WRF_NUDGE with the observations is similar or even improved (Table 4): the mean MD is largely unaffected, but the average MAD decreases and average R 2 increases. Based on these improvements, grid-analysis nudging was adopted for the climatological simulations.

Evaluation of BAYWRF
Averaged over the full simulation period, BAYWRF shows a similar magnitude of agreement with station T and RH data at daily timescales as was found at sub-diurnal timescales for the single evaluation year ( Fig. 8; cf. Fig. 3). For T , the MD has lower and upper quartiles of −0.3 and 0.4 • C, respectively, while the values of R 2 uniformly exceed 0.92. For RH, the MD has lower and upper quartiles of 0.4 % and 4.4 %, while the respective values for R 2 are 0.57 and 0.65. For PR, the upper and lower quartiles of MDs considering days with observed precipitation are −0.1 and 0.1 mm, while for R 2 the values are 0.41 and 0.47. A similar number and magnitude of wet false events are simulated (20 times less than the sample size of observed events). Spatially, BAY-WRF exhibits a positive bias in T and a negative bias in RH in the interior of Bavaria and converse anomalies in the prealpine and alpine areas in the south and along the eastern border of the region (Fig. 8a, c). The mean R 2 values for RH show a clear meridional gradient (Fig. 8d), which suggests that the model has some difficulty capturing processes governing near-surface moisture fluctuations in the southern part of Bavaria. Nonetheless, the highest correlation coeffi- for which data were available at each station into the four quartiles. The largest marker size, which delineates records with more than 75 % valid data points, is therefore not available for PR, as this dataset begins on 1 September 1995. 3108 E. Collier and T. Mölg: BAYWRF: a high-resolution present-day climatological atmospheric dataset cients for observed precipitation events are found in this region (Fig. 8f). In addition, considering monthly precipitation sums, the centred pattern correlation between REGNIE and BAYWRF ranges from a lower quartile of 0.64 to an upper quartile of 0.82. Therefore, the characteristics of precipitation variability in time and space are captured by the dataset.
For BAYWRF, we note that in addition to the potential factors contributing to temperature biases discussed in Sect. 3.1, evaluation of the climatological simulation is also affected by discontinuities in station location and instrumentation. One example is Nuremberg, which moved on 4 December 1995 from (49.4947 • N, 11.0806 • E) to (49.5030 • N, 11.0549 • E). The older station position is shifted one grid cell to the south and one grid cell to the west compared with its current location, corresponding to a shift in land use from urban (old position) to grasslands (new). Any discontinuities in location and underlying surface type are not captured since the most recent station positions are used for extracting meteorological data from D2. This potential source of discrepancies should be taken into consideration for climatological analyses (e.g. comparing observed and simulated trends).

Data availability
Data from BAYWRF are available for download on the Open Science Framework (OSF; Collier, 2020; https://doi.org/10.17605/OSF.IO/AQ58B). Due to the size of the simulations, we have only provided daily mean data from the finest WRF domain (D2; 1.5 km grid spacing) after cropping close to the extent of Bavaria and removing vertical levels above ∼ 200 hPa, amounting to 450 GB in total. Data are divided into three-and four-dimensional fields by year and month, with respective file sizes of ∼ 150 MB and 1.1 GB. For the four-dimensional data, perturbation and base-state atmospheric pressure and geopotential were combined to generate full model fields, while perturbation potential temperature was converted to atmospheric temperature.