SLOCLIM: A high-resolution daily gridded precipitation and
temperature dataset for Slovenia

Abstract. We present a new publicly available daily gridded dataset of maximum and minimum temperature and precipitation data covering the whole territory of Slovenia from 1950 to 2018. It represents the great variability of climate at the crossroads between the Mediterranean, Alpine and continental climatic regimes with altitudes between 0–2864 m a.s.l. We completely reconstructed (quality control and gap filling) the data for the three variables from 174 observatories (climatological, precipitation and automatic stations) with the original records all over the country. A comprehensive quality control process based on the spatial coherence of the data was applied to the original dataset, and the missing values were estimated for each day and location independently. Using the filled data series, a grid of 1 × 1 km spatial resolution with 20,998 points was created by estimating daily temperatures (minimum and maximum) and precipitation, and their corresponding uncertainties at each grid point. In order to show the potential applications, four daily temperature indices and two on precipitation were calculated to describe the spatial distribution of: 1) the absolute maximum and minimum temperature; 2) the number of frost days; 3) the number of summer days; 4) the intensity of precipitation; and 5) the maximum number of consecutive dry days. The use of all the available information, the complete quality control and the high spatial resolution of the grid allowed for an accurate estimate of precipitation and temperature that represents a precise spatial and temporal distribution of daily temperatures and precipitation in Slovenia. The SLOCLIM dataset is publicly available at https://doi.org/10.5281/zenodo.4108543 and can be cited as Škrk et al. (2020).


experience of previous studies creating daily datasets at the regional level, both for precipitation (Chaney et al., 2014;Hernández et al., 2016;Serrano-Notivoli et al., 2017;Yatagai et al., 2012) and temperature (Lussana et al., 2018;Serrano-Notivoli et al., 2019), a new high-resolution dataset for the Slovenian territory is suitable considering the high-density network of stations that is available. Serrano-Notivoli et al. (2017 developed a method to reconstruct and create daily gridded datasets using all the available information in three stages: 1) quality control of raw data; 2) estimate of missing values and 3) 70 prediction of new values over a gridded dataset. This methodology has been proved to work well in dense networks such as the one available for mainland Spain and its island territories (Serrano-Notivoli et al., 2017, but also for specific mountain areas with a large diversity of environments (Cuadrat et al., 2020;Martínez del Castillo et al., 2019).
We present a new high-resolution daily gridded precipitation and temperature dataset for Slovenia (SLOCLIM) with a spatial resolution of 1x1 km for the period 1950 to 2018. We used all the available climatic information in the country gathered from 75 stations in all possible environments to create a grid based on thoroughly reconstructed data series in three steps: a) a comprehensive quality control of the original data; b) the estimate of new values corresponding to the missing data; and c) the creation of the gridded dataset using the reconstructed series.

Input data
We used the meteorological records from 174 stations of the Slovenian Meteorological Agency (ARSO) regularly distributed 80 over the territory of Slovenia (Figure 1) to create the gridded datasets. Although some data series begin before 1950, we decided to limit the research to the years 1950 to 2018, when the station network remained stable over time and space. All the stations recorded daily precipitation, while 167 recorded both precipitation and temperature (ARSO, 2020). The altitudinal pattern showed a lower availability of data at the highest elevations, with less than five stations over 1,000 m a.s.l.  (107) and decreasing thereafter. The early measurements contained more precipitation than temperature data.

Methodology
A complete reconstruction of the original daily data series was performed following the three-step methodology developed in Serrano-Notivoli et al. (2017, which is based on an (1) exhaustive quality control (QC) of the raw series, (2) the estimate of new values for missing records using all the available information and local regression models, and (3) the estimates of new values in a regular grid using the reconstructed series. 100 The QC detected and removed the most obvious erroneous data in the original dataset based on different criteria created to keep the internal and the spatial coherence of the data. For precipitation five criteria were used: (1) suspect data: the observed value was over zero and all its 10 nearest observations were zero; (2) suspect zero: the observed value was zero and all its 10 nearest observations were over zero; (3) suspect outlier: the magnitude of the observed value was 10 times higher or lower than that one predicted by its 10 nearest observations; (4) suspect dry day: the observed value was zero, wet probability was 105 over 99%, and predicted magnitude was over 5 mm; and (5) suspect wet day: the observed value was over 5 mm, dry probability was over 99%, and predicted magnitude was under 0.1 mm. For temperature we used also five criteria: (1) internal coherence; (2) removal of months containing less than three days of data; (3) removal of those days out of range considering maximum temperature (TMAX) ≥ 50 °C or TMAX ≤ -30 °C and minimum temperature (TMIN ≥ 40 °C or TMIN ≤ -35 °C; (4) removal of all days in a month with a standard deviation equal to zero (suspect repeated values in the series); and (5) removal of all days in 110 a month if the sum of the differences between TMAX and TMIN was equal to zero (suspect duplicated values in TMAX and TMIN). These criteria helped us to evaluate, amongst others, situations of internal incoherence like abnormal values, significant deviations from other values or duplicated values. In such cases the data was removed. In order to perform the QC process by applying the above referenced criteria, reference values (RVs) for each day and the location of the original dataset were then calculated using the information from the 10 nearest stations, whose selection varied according to the data availability. The 115 RVs were estimated through the combined use of generalized linear mixed models (GLMMs) and generalized linear models (GLMs). The precipitation or temperature data was used as a dependent variable in each case, and the geographic information of each station (latitude, longitude, altitude and distance from the coast) as the independent variables. The calculated RVs were then compared with their corresponding original data to assess their quality and remove suspect values following the QC criteria. 120 After QC, the second stage consisted of calculating new RVs based on the cleaned dataset, that were used to replace the missing values. A leave-one-out cross-validation (LOO-CV) was implemented, where RVs were calculated for all days and locations including those for which an observation exists but without using that observation, for comparison and validation purposes.
This whole process produced a serially-complete dataset.
In the third stage, new values at each grid location of the 1x1 km gridded dataset were estimated based on the reconstructed 125 point-based dataset, including a measure of uncertainty for each location and day, as calculated from the standard error of the models. The advantage of this methodology is that all of the available data can be used as there are no restrictions imposed due to the length or structural characteristics of the series.
Based on the gridded dataset created from the original data and with the described method, we computed four temperature and two precipitation indices to show the potential applications of the grid: (1) the mean annual maximum value of daily maximum 130 temperature, (2) the mean annual minimum value of daily minimum temperature, (3) the average annual count of days when daily minimum temperature was below 0 °C (frost days), (4) the average annual count of days when daily maximum temperature was over 25 °C (summer days), (5) maximum number of consecutive dry days per year with daily precipitation of less than 1 mm, and (6) daily precipitation intensity (annual precipitation/number of wet days). All the indices were presented with their corresponding uncertainty estimates. 135 The maps throughout this article were created using ArcGIS® software by Esri. ArcGIS® and ArcMap™ are the intellectual property of Esri and used herein under license. Copyright © Esri. All rights reserved. For more information about Esri® software, please visit www.esri.com.

Quality control 140
Precipitation: The number of removed data varied over the years, increasing from 1950 onwards and reaching a peak in 1967, and then decreasing until the end of the period (Figure 3a). The majority of the data was removed due to the Suspect data and Suspect https://doi.org/10.5194/essd-2020-327

Temperature:
The quality control process removed 1.28% of the original TMAX data and 0.86% of the original TMIN data (Figure 3b).
According to the monthly criteria, 1,090 (0.06%) pieces of data for the TMAX and 2,139 (0.13%) for the TMIN were removed, and according to the daily criteria, 20,491 (0.21%) values were removed for the TMAX and 12,283 (0.73%) values for the TMIN. 150 The number of observed zero precipitation days (dry days) in Slovenia was 1,006,891, and the number of estimated ones (only for corresponding days with measurements) was 1,011,337, the difference being +0.44% in favour of estimated ones. Terming the wet days as positive (observed P (precipitation) >0) and the dry days as negative (observed P=0), the true negative rate (RV=0 & P=0) was over 92%, and the true positive rate (RV>0 & P>0) over 85% (Table 1). To a large extent the false negatives 160 (RV=0 & P>0) and false positives (RV>0 & P=0) were due to the prediction of precipitation for days with low amounts of this. In events with very low amounts of precipitation, the estimate of the probability of occurrence was likely to be dry, despite the fact that the station could register a minimum quantity of rain (usually under 1 or 2 mm). This causes a small difference in amounts, and is becoming more distinct in the dry/wet accuracy assessment. The highest Pearson correlations (> 0.94) were found in the comparison by days between the original dataset and corresponding estimates, rather than in the comparison by stations ( Figure 4). This indicates the complexity of the geographical variability 170 (elevation, orientation, landscape configuration, etc.) between stations. The highest correlation between the measurements and estimates by stations was found for medians without zeros (only considering days with P>0 in measurements and estimations).
The comparison showed the lowest correlation in the 95 th percentiles of wet days when comparing by both stations and days.

180
The histogram of estimated and observed precipitation showed a good agreement ( Figure 5). However, values below 1 mm were slightly underestimated in some cases. The agreement between the histograms was very high above 1 mm. and 900 m a.s.l. (Table 2). Overestimation was higher above 1100 m a.s.l. with the exception of 1500-2000 m a.s.l., where the estimation was lower than the measurements. The ratio of the standard deviation (RSD) also showed differences between 185 estimated and observed values, especially between 1100 and 1300 m a.s.l. The mean absolute error (MAE) increased with altitude, which can be attributed to the decreasing number of available stations with altitude.    The ratio of means (RM) indicated that the monthly aggregates of daily precipitation in all months were slightly underestimated. The ratio of standard deviation (RSD) was very close to 1 through all months (Table 3). In December, however, the RSD was below 0.95, indicating a bias in the variance estimate. In the other months there are no substantial biases in the 200 variance estimation. Overall, all monthly statistics are relatively homogeneous over the year due to the evenly distributed precipitation over the year.

Temperature:
The   The frequency of the observed temperatures and their estimates (Figure 7) showed very good general agreement. There was a slight underestimation of TMIN (in the temperature intervals 0 to 5 °C and 10 to 15 °C) and TMAX (interval 5 to 10 °C).
Overestimation was found for TMAX and TMIN for the interval -5 to 0 °C. 220   Most of the stations in Slovenia are located at lower altitudes, 60% of them between 100 and 500 m a.s.l., which leads to a slight underestimation of TMIN values at 900 -1100 and 1500-2000 m a.s.l. due to having less data. Also, an overestimation of TMIN was observed at the 1300 -1500 m a.s.l. range, but the similarity between the observed and estimated temperatures was generally higher for TMAX (Table 4). 230   The ratio of means (RM) showed that the differences in the means between observed and estimated temperatures were lower for TMAX, with all the values very close to 1 (Table 5). TMIN showed, overall, a higher underestimation, especially in March.
Moreover, the ratio of standard deviation (RSD) was higher in TMAX, while TMIN showed values under 0.95 from April to September, indicating a bias in variance estimation in these months. ME values were close to 0 in all cases, and Pearson 240 correlations were higher than 0.98 in TMAX in all months, ranging from 0.93 to 0.98 in TMIN.

Spatial distribution and uncertainty in daily precipitation
Mean precipitation on wet days (SDII) showed that the highest values occurred in the central to northwestern part of the country, and also at some individual locations on the south and north, with amounts up to 27 mm (Figure 8a). The lowest amounts were in the northeastern part of Slovenia (less than 10 mm). The average maximum length of dry spells (CDD) 245 reached the highest values in the northeastern part of the country, with 70 days, while this number decreased in the southeast and northwest until the values were lower than 35 days (Figure 8b). The uncertainty of SDII was very low (<10%) all over the country (Figure 9a), while the uncertainty of CDD averaged between 16 and 20% in most of it, with higher values (21-25%) in the northeastern part ( Figure 9b).
The highest values of the mean annual absolute daily maximum temperature (TXx) were found in the southwest and east of 250 Slovenia, with temperatures up to 36 °C (Figure 10a), while values below 20 °C were more frequent at higher elevations. As expected, the mean annual absolute daily minimum temperature (TNn) showed a similar pattern to TXx (Figure 10b), with the lowest values in the north and northwestern part of Slovenia (-22 °C) and much higher values in the southwest, where the influence of the sea affects temperatures. However, in the eastern half of the territory there was a lower spatial variability compared to that seen for TXx. The mean annual number of summer days (SU) was calculated from the annual count of days 255 when daily TMAX was over 25 °C. Similar to the TXX, the highest values were found in the southwestern and also eastern part of the country (Figure 10c). In the mountain areas, however, only a small number of summer days were found. The mean annual number of frost days (FD) was calculated from the annual count of days when the minimum daily temperature was below 0 °C. The number of frost days varied from 36 in the coastal area to 240 in the highest elevations, where orography had a significant influence (Figure 10d). 260 The highest uncertainty values of TXx and SU were found in the northwestern part of the country (Figure 11a

Discussion
SLOCLIM is the first climatic reconstruction for Slovenia at a high spatial resolution, providing daily data of TMAX, TMIN and the amount of precipitation from 1950 to 2018. This new daily gridded dataset contributes significantly to the climate description of the country, and it is expected to be of high value for multiple potential applications, especially in areas where observed climate records are not available, such as agricultural land, forests and mountains. 285 There were no climatic reconstructions available for Slovenia until the work of Dolinar (2016), who established a grid on a 1 km scale for the monthly temperature and precipitation from 1961 onwards, using explanatory variables such as longitude, latitude, altitude and their second degree polynomial terms to improve the reconstruction. Besides this, there have been other activities aimed at the homogenization of climatic data, such as the project "Climate Change and Variability in Slovenia" (Vertačnik et al., 2015), which was based on homogenized monthly data with the goal of obtaining high quality meteorological 290 data from 1961 onwards.
In order to fill the gap, the SLOCLIM database was created with a methodology that is suitable for a country like Slovenia with large climatic and orographic differences, as it provides a reliable independent assessment of climate parameters, https://doi.org/10.5194/essd-2020-327 considering the local character of precipitation and temperature at each particular location. The period 1950 to 2018 represents the longest possible time-span for data reconstruction at the given resolution, which is mainly determined by the number, 295 location and quality of meteorological stations. The method makes use of all available information and includes a comprehensive iterative quality control process that checks the spatial and temporal consistency of the data until no suspect values are detected.
The presented results show very high correlations between the estimated values and the measurements for both temperatures and precipitation, indicating the ability to reproduce realistic climatic situations. The comparison by days showed higher 300 Pearson correlations between the original precipitation dataset and the corresponding estimates, than the comparison by stations. The reason for this is the complexity of the high geographical variability between stations, which could, to a certain extent, also influence the underestimation of minimum temperatures at higher altitudes. The dataset used in Dolinar (2016) showed slightly higher correlation coefficients between predicted and measured values than found in our study, but it was constructed on a monthly basis using a completely different approach. Furthermore, Dolinar (2016) obtained lower correlations 305 between predicted and measured values for precipitation than for temperature, which is consistent with our results and is attributed to the generally higher precipitation variability both in space and time.
The agreement between the original dataset and the corresponding estimates in our study was high due to the good quality of the original data, mainly because ARSO regularly undertakes a validation control of the climatological data collected by meteorological stations, resulting in consolidated data (Bertalanič et al., 2006). Although the spatial coverage with 310 meteorological stations decreases over time, it is still relatively high compared to other countries.
In Slovenia, precipitation is highest in the northwest (in the Julian Alps) and southwest in the Dinaric Mountains, and lowest in the northeast (ARSO, 2016b), as confirmed also by our reconstruction. The highest uncertainties in precipitation were found in the northeast, in the area with the lowest amount and intensity of precipitation. Therefore, the uncertainty is higher because it is more difficult to make solid predictions in areas where precipitation is very low and the number of consecutive dry days 315 is higher. The highest temperatures are reached in the southwestern part of the country, and the lowest in the northwest (in the Alpine region), as shown by ARSO (2016a) and confirmed by our reconstruction. The presented indices of mean annual absolute daily maximum and minimum temperatures are in agreement with the observations of ARSO (2016a). The highest uncertainty levels for the high temperature indices (TXx, SU) were found for the northwest, and for the lowest temperatures (TNn, FD) in the southwest, possibly due to the high orographic effect and the distance to the sea. In the southwest, lower 320 temperatures occur less frequently then in other parts of the country, which possibly explains why the uncertainty is greater.
In this regard, the uncertainty of the number of summer days (SU) was particularly high in the areas at the highest altitudes.
The geographical distribution of the temperature indices is consistent with the climatic regions defined by Kozjek et al. (2017).
The SLOCLIM database opens up many possibilities for applications in areas which need data with high spatial and temporal resolution. It has the potential to address a number of relevant climate related topics, including recognition of climatic trends 325 and extremes as well as their mitigation and planning of prevention measures. SLOCLIM is expected to support ongoing and future research activities which are focused on basic climatic questions in Slovenia and surrounding regions, such as like trend https://doi.org/10.5194/essd-2020-327 analyses and projections (Dolinar et al., 2018;de Luis et al., 2014;Milošević et al., 2017), as well as reconstruction of past climate and extreme events based on tree-ring parameters (Cook et al., 2015;Čufar et al., 2008;Hafner et al., 2014). Research where climate data are needed also includes that focused on the effects of climate change on the well-being and health of 330 population (Ciuha et al., 2019;Pogačar et al., 2019Pogačar et al., , 2020, as well as agroclimatic shifts and the food supply (Ceglar et al., 2012(Ceglar et al., , 2019a(Ceglar et al., , 2019b. In Slovenia, a densely afforested country, climate affects the survival and productivity of forest trees due to shifts in leaf phenology (Čufar et al., 2012, 2015), and the production of crucial tree tissues, like wood and phloem (Delpierre et al., 2019;Martinez del Castillo et al., 2018;Prislan et al., 2013). While forest trees can be damaged due to extreme events like ice storms (Decuyper et al., 2020;Klopčič et al., 2020), the effects of climate differ along various gradients (Čater and 335 Levanič, 2019;Jevšenak et al., 2020) and at the microsite level (Diaci et al., 2020;Kermavnar et al., 2019). In most cases the use of precise daily climatic parameters can help to improve recognition of climate-growth relationships (Jevšenak, 2019;Jevšenak and Levanič, 2018).

Data availability
The SLOCLIM dataset is freely available in the web repository Zenodo. It can be accessed through https://doi.org/ 340 10.5281/zenodo.4108543 and cited as Škrk et al. (2020). The data are arranged in six files (daily maximum and minimum temperature, daily amount of precipitation and their uncertainties).

Conclusions
The result of the presented study is a high-resolution daily gridded precipitation and temperature dataset for Slovenia with a spatial resolution of 1x1 km for the period 1950 to 2018 (SLOCLIM). Advantages of SLOCLIM compared to other existing 345 datasets in Slovenia are the use of all available data and the provision of reconstructed daily data over a dense grid. It delivers the daily amount of precipitation as well as maximum and minimum temperature. The available measurement data with regard to latitude, longitude, altitude, and distance from the coast were used to calculate the dataset. All the data were quality controlled by using cross-validation. Missing values were calculated using the 10 nearest observations. All results showed high correlation between estimated and observed values, and six climatic indices were also calculated: precipitation intensity, length 350 of dry period, mean annual absolute maximum and minimum temperatures, mean annual number of frost days and mean annual number of summer days. The indices correlated well with altitudes and distance from the coast.
SLOCLIM represents a novel publicly available climatic database for Slovenia with multiple potential applications for recognition and mitigation of climate related events directly affecting the environment, well-being and health of the population, agriculture and food-supply, as well as forestry. High resolution gridded daily data are expected to facilitate research activities 355 in numerous scientific disciplines studying climatic trends and projections, reconstruction of past climate and prediction of https://doi.org/10.5194/essd-2020-327 Climatic data were provided by the Environmental Agency of the Republic of Slovenia (ARSO) within the Ministry of the Environment and Spatial Planning. We thank Gregor Senegačnik from the Slovenia Forest service for help with ArcMap. We thank Zorko Vičar for his great help with the original climatic data. We thank Paul Steed for language editing.

Financial support 370
The study was supported by the Slovenian Research Agency ARRS (programs P4-0015 and P4-0085 and young researchers' program). RSN and MdL were partially supported by the Government of Aragón through the "Program of research groups" (group H38, "Clima, Agua, Cambio Global y Sistemas Naturales").