Iberia01: a new gridded dataset of daily precipitation and temperatures over Iberia

The present work presents a new observational gridded dataset (referred to as Iberia01) for daily precipitation and temperatures produced using a dense network (thousands) of stations over the Iberian Peninsula for the period 1971–2015 at 0.1 regular (and 0.11 CORDEX-compliant rotated) resolutions. We analyze mean and extreme indices and compare the results with the E-OBS v17 dataset (using both the standard and ensemble versions, at 0.25 and 0.1 resolutions, respectively), in order to assess observational uncertainty in this region. We show that Iberia01 produces more realistic precipitation patterns than E-OBS for the mean and extreme indices considered, although both are comparable for temperatures. To assess the differences between these datasets, a new probabilistic intercomparison analysis was performed, using the E-OBS ensemble (v17e) to characterize observational uncertainty and testing whether Iberia01 falls within the observational uncertainty range provided by E-OBS. In general, uncertainty values are large in the whole territory, with the exception of a number of kernels where the uncertainty is small, corresponding to the stations used to build the E-OBS grid. For precipitation, significant differences – at the 10 % level – between both datasets were found for fewer than 25 % of days over the Iberian Peninsula. For temperature, a very inhomogeneous spatial pattern was obtained, with either a small (in most of the regions) or large fraction of significantly different days, thus indicating sensible regions for observational uncertainty. Iberia01 is publicly available (Herrera et al., 2019a, https://doi.org/10.20350/digitalCSIC/8641). Copyright statement. The Iberia01 gridded dataset is made available under the Open Database License. Any rights in individual contents of the database are licensed under the Database

1 means at 5 • latitude by 10 • longitude grid. Later, this grid was extended to cover the entire globe, with a higher spatial resolution (at 0.5 • resolution, Jones et al., 1986a, b) and currently includes several variables covering Earth's land areas for 1901(CRU TS4.0, Harris et al., 2014Trenberth et al., 2014). However, this kind of resolution is too coarse for regional analysis, which typically requires datasets with tens of kilometres spatial resolution and daily to sub-daily temporal data, in order to differentiate climatic sub-regions and extreme events. In Europe, the first high resolution continental observational gridded dataset was produced within the framework of the ENSEMBLES project 5 (E-OBS dataset), for daily precipitation and maximum, minimum and mean temperatures (Klein Tank et al., 2002;Haylock et al., 2008;Klok and Klein Tank, 2009) and sea level pressure (van den Besselaar et al., 2011). With more than 1490 citations in Scopus (as by August 2019), this is the most used climate reference for European climate studies. Yet, in some regions, E-OBS relies on a sparse observational network which limits its ability to correctly represent not only mean values, but also the variance and extremes, particularly over complex topography (Klok and Klein Tank, 2009). The Influence of stations density in the quality of gridded products has been analyzed in the last decades by 10 several authors: Rudolf et al. (1994) was able to significantly reduce the precipitation error, from a maximum of 40% to 20%, by doubling the number of stations within a 2.5 • grid box; Prein and Gobiet (2017) found that in regions with sparse data the uncertainties associated to mean seasonal precipitation could reach 60%; Beguería et al. (2016) found that, in a high resolution observationally based gridded dataset, the density of the underlying observations determines its spatial variance and thus strongly influences climate variability; Hofstra et al. (2010) concluded that, by randomly changing the number of stations in each grid box, a reduction in the density of stations decreases the variability 15 of both precipitation and temperature with large implications in the representation of extremes. Moreover, large temporal differences in the number of stations within each grid box also adds another source of uncertainty since it can change trends of the time series (Hofstra et al., 2009;Frei, 2014;Beguería et al., 2016). Finally, in an analysis of the sources of uncertainty in observationally based gridded datasets, Herrera et al. (2019b), highlight that the station density represents the major variability factor, irrespective of the interpolation method. The authors analysed several grids for Spain (complex topography) and Poland (smooth topography) and concluded that the influence of station density 20 is more pronounced in Spain than in Poland due to the large spatial variability and complex orography of the first.
The quality of the station observations is an additional source of observational uncertainty for gridded products. These uncertainties may be reduced by applying quality control procedures and homogenising the time series (Herrera et al., 2012). Precipitation time series also commonly suffer from undercatch associated to windy conditions, which usually results in underestimation of the correct precipitation rate (Frei et al., 2003). Yet in complex topography an increased uncertainty may be associated to the use of these types of corrections (Adam 25 et al., 2006). The areal representativeness of a particular station also poses a challenge. Again, in regions with high terrain gradients, like mountains or coastal areas, surface temperatures are affected by local circulations like sea-breeze, up/down slope breeze associated to nighttime radiative cooling in the valleys and to differentiated warming/cooling at sunrise/sunset of the slopes (Whiteman, 1982;Whiteman and McKee, 1982;Whiteman, 1990). Frei (2014) proposed a new interpolation method to tackle the latter, in which the thermal vertical profile of the station surrounding area is considered. Yet, Frei (2014) also acknowledges that the best way to reduce this type of uncertainty is through 30 high station density.
In the Iberian Peninsula, Herrera (2011) and Herrera et al. (2012) built a precipitation regular grid for continental Spain and Balearic Islands based on 2756 stations (Spain02) following the methodology used in E-OBS. The same methodology was also applied by Belo-Pereira et al. (2011) for continental Portugal using more than 400 stations (PT02). Both datasets had consistent grids (with 0.2 • resolution) and time periods  and were combined to build a gridded precipitation dataset of opportunity for the Iberian peninsula (IB02). However, this is not an homogeneous product for the Iberian peninsula due to the discrepancies existing between the two datasets near 5 the borders, particularly in the northern mountains. Recently, Herrera et al. (2015) updated the Spanish grid including precipitation and temperatures (daily maximum, mean and minimum) and enhancing the spatial resolution to 0.1 • (regular); moreover, they also provided results on a 0.11 • rotated grid (CORDEX compliant) for the purpose of Regional Climate Model (RCM) evaluation. The lack of temperature for PT02 (and IB02) hinders a comprehensive analysis of the large inter-annual and spatial variability, characteristic of the Iberian climate (Esteban-Parra et al., 1998;Muñoz-Díaz and Rodrigo, 2004;Cardoso et al., 2013). These problems could be solved building an Iberian grid, 10 using observational station data from both countries.
In this paper, we develop an Iberian wide daily regular grid at 0.1 • resolution, for precipitation and temperatures (maximum, mean and minimum) as well as a 0.11 • rotated grid (EURO-CORDEX compliant) suitable for model evaluation purposes. This grid is based on a high density network of stations across continental Portugal and Spain and Balearic Islands, with a reasonably stable number of stations for the period 1971-2015. This represents the first gridded dataset of daily precipitation and temperatures for Iberia as a whole, and can be considered 15 an update of the PT02 and IB02 datasets. Here, we also introduce the elevation as covariate in the interpolation process, which was missing in the initial PT02 and Spain02. The resulting dataset is compared against the most recent version of E-OBS (v17.0, referred to as v17), which includes a new ensemble version (v17e) to assess observational uncertainty and allows for a new probabilistic intercomparisson of these datasets.
The paper is structured as follows. In Section 2 a description of the data and methods considered in this work is presented. The main 20 results are described (Sec. 3). Finally, the main conclusions from the analysis are detailed in Section 4.

Observation Network and Quality Control
The present work is based on a dense network of thousands of stations from the Spanish Meteorological Agency (AEMET), the Portuguese Institute for Sea and Atmosphere (IPMA) and the Portuguese Environmental Agency (APA). To keep consistency with previous datasets, the 25 final network was obtained applying the same quality control used to build Spain02 (see Herrera, 2011;Herrera et al., 2012, for a detailed description), which requires stations with at least 15 (40) years in the period 1951-2015 with less than 10% yearly missing precipitation (temperature) data. The resulting observational network includes 3486 and 275 stations for precipitation and temperature, respectively, as shown in Figure 1(a-b). Note that detailed metadata for each station, including geographical and data availability information, is provided as part of the dataset in the same repository. Figure 1(c) shows the data availability on a yearly basis, exhibiting a clear decline of the number of 30 stations in the last two decades, mainly for precipitation. Therefore, the resulting gridded product is not suitable for historical trend analysis, since biased results could be obtained as a result of the changing number of stations. Moreover, during the period 2009-2014 there are very few precipitation stations in Portugal and, therefore, results should be interpreted with caution in this period. Therefore, we recommend using the reference climate period 1971-2000 for this dataset. Overall, the spatial distribution of the stations is quite homogeneous over the Iberian Peninsula with a good representation of the orographical gradients, specially for the case of precipitation (see the first column of Fig. 1).

35
Therefore, the elevation was included as a covariate in the interpolation process (at a monthly scale) to model and reflect these gradients (see Sec. 2.4). per year for different thresholds of annual missing data for precipitation (black) and temperature (red).

E-OBS Gridded Datasets (v17 and v17e)
E-OBS (Haylock et al., 2008) is the reference gridded dataset of daily precipitation and temperatures in Europe and has been previously used to analyze the observational uncertainty in the context of the evaluation of regional climate models (see, e.g. Kotlarski et al., 2019). In this 5 study, we use both the standard (v17, 0.2 • resolution) and the ensemble (v17e, 0.1 • resolution; Cornes et al., 2018) versions of E-OBS v17 as benchmark for comparison purposes. In addition to the estimated daily value for each gridbox, the ensembles grid also provides a measure of daily uncertainty, characterized by the standard deviation of the ensemble. E-OBS v17 is used for the sake of comparison with Iberia01 (see Fig. 2) and the E-OBS v17e (the ensemble) is used to assess the observational uncertainty provided by this dataset, and test whether Iberia01 does not differ significantly from the ensemble (i.e. it falls within the observational uncertainty range) with a certain confidence (90%) day by day. For this purpose, the E-OBS ensemble mean (µ) and spread (σ) are used to define a normal distribution N (µ, σ) characterizing observational uncertainty for each grid box and day, and the corresponding Iberia01 values are classified as either inside or outside (values outside the P5-P95 percentile range) the uncertainty range for each grid box and day. Note that outsider values indicate significant differences between both datasets (as characterized by the E-OBS ensemble).

Weather Indices
To analyze the mean and extreme regimes of precipitation and temperature we use the indicators shown in Table 1. In particular, the 50-year return value for each grid-box was used to characterize the extreme regimes (for the period 1971-2015) obtained by adjusting a Generalized Extreme Value (GEV) distribution to the series of annual maximum of daily values (see Herrera et al., 2015, for a detailed description). In the case of precipitation, both wet-day frequency and rainfall intensity have been considered to properly characterize the mean regime. 10

15
first, the 3D-TPS is applied to the monthly value considering the elevation given by the Global Digital Elevation Model (GTOPO30, see section on code and data availability) as covariate; second, the daily anomaly is interpolated by applying OK; as a result, both the daily anomaly and monthly value are combined to obtain the interpolated daily values To ensure the area-averaged representativity of the final values, the initial interpolation is done over an auxiliary 0.01 • resolution grid 20 and, then, the interpolated results are upscaled (averaged) to the target resolutions, in our case a regular version of 0.10 • spatial resolution (10km approx.) and a rotated version matching the grids considered in the EURO-CORDEX project (0.11 • and 0.44 • ). In this work we only describe for simplicity the regular version of 0.10 • spatial resolution, although the other datasets are also provided (these datasets will be used in a future paper to evaluate the performance of EURO-CORDEX models over Iberia).
Taking into account the two-step interpolation procedure followed to develop the area-average representative gridded dataset, a natural doubt surges about the possible application of the auxiliary very-high resolution grid (1 km) to build a grid at a resolution higher than 10 km.
For instance, the new RCM convective permitting simulations performed in the framework of the CORDEX Flagship Pilot Studies (FPS) reach a resolution of 2 − 3 km and, thus, high resolution grids are needed for the evaluation of these projects (Giorgi et al., 2009;Jacob et al., 2014). Some previous works warn against the development of high-resolution grids with few (or none) stations per gridbox (Herrera 5 et al., 2019b). Therefore, we limited the resolution of Iberia01 to 0.10 • and do not provide higher resolution intermediate products.
As an illustrative example to illustrate the effective resolution of the datasets, we consider a convective high-resolution extreme precipitation event occurred on 4-5 November 1997, and characterized by heavy precipitation over most of the Iberian Peninsula. This event had great socioeconomic impacts in Portugal (Ramos and Reis, 2002) and Spain (Lorente et al., 2008), and was ranked as the second greatest extreme precipitation event of the Iberian Peninsula (Ramos et al., 2015). We use this event as an illustrative case study the effective resolution of a 10 potential 3 km gridded version of Iberia01, as compared with the standard 10 km resolution. Table 1

20
In the case of precipitation, E-OBS cannot reproduce neither the mean spatial pattern (pr) nor the intensity of the 50-year return value (RV50Yp). E-OBS underestimates mean precipitation by 15% (mean relative bias, for E-OBSv17 and v17e, as obtained from the spatial mean numbers in Figure 2), particularly in the Central System range of the Iberian Peninsula, and 50-year return values by 40 -44% (mean relative bias, for E-OBSv17 -v17e, respectively), with some very high biases in some Southern and Mediterranean regions. The case of wet-day frequency is different, since all datasets show a clear overestimation, with Iberia01 showing a more orographic pattern than E-OBS.

25
In this case, the higher resolution of E-OBS v17e provides further spatial detail as compared to the standard v17 one, which is not evident for precipitation intensity. Moreover, the E-OBS observational uncertainty is of similar magnitude to the mean value (also with kernels of small uncertainty corresponding to stations) reflecting a large uncertainty for this variable. Figure 3 shows the percentage of significantly different days for each gridbox, variable and season. For precipitation (first row), only Iberia01 wet-days were used in order to minimize the effect of the different wet-day frequencies. The differences for this variable exhibit a 30 homogeneous spatial pattern over the Peninsula with values around 10% in general; this is due to the large uncertainty of the daily E-OBS ensemble spread (see Fig. 2). Regarding the temperatures, most of the spatial pattern presents values close to zero, reflecting the similarity between both datasets for these variables. However, some local differences are found particularly for the mean (second row) and maximum (third row) temperatures, with the greatest values reached in the Pyrenean and Central ranges and the south coast of the Iberian Peninsula, in agreement with the differences shown in Figure 2. In this case, the ensemble uncertainty is in agreement with the differences between these 35 two datasets found in Figure 2.
Finally, we consider an illustrative extreme event occurred the 4-5 November 1997 to illustrate the effective resolution of the different datasets (including potentially new higher resolution products). To this aim, compare the resulting values of the 0.1 • grid with a higher resolution 0.03 • one developed using the auxiliary 0.01 • grid generated in the interpolation process. Figure 4 show the results obtained for the extreme event indicating that an increment of the Iberia01 resolution beyond 10 km has no clear impact in the effective resolution of the precipitation pattern. In particular, in spite of the apparent improvement of both versions of Iberia01 w.r.t. E-OBS v17e, there are only slight 5 differences between both versions of Iberia01 when compared with observations.

Conclusions and Discussion
In this work a new gridded dataset for the Iberian Peninsula and the Balearic Islands based on a quality-controlled and dense station network has been described and compared with E-OBS v17, considering both the standard and the ensemble version of this product, to reflect and analyze the observational uncertainty related with both datasets.

10
It is shown that Iberia01 is able to reproduce the spatial pattern and intensity of both the mean and extreme regimes of precipitation and temperature, in terms of the weather indices defined in Table 1, including extreme events as the illustrative case study occurred the 4-5 November 1997 shown in the Figure 4. However, E-OBS v17 tends to underestimate the extremes and soften the spatial pattern of precipitation, in agreement with previous studies (Herrera et al., 2012). In the case of temperature, both datasets exhibit similar spatial patterns with the main differences appearing in the Guadalquivir and Guadiana basins, and the Pyrenean range.

15
We have also used the ensemble version of E-OBS, E-OBS v17e to define observational uncertainty, and analyzed whether Iberia01 does differ significantly from the ensemble (i.e. it falls outside the observational uncertainty range). In this case, we conclude that both datasets are significantly different. In general, uncertainty values are large in all Iberia, with the exception of a number of kernels where the uncertainty is small, corresponding to the stations used to build the E-OBS grid. For precipitation, significant differences -at a 10% level-between both datasets were found for less than 25% of days over the Iberian Peninsula. For temperature, a very inhomogeneous spatial pattern was 20 obtained, with either a small (in most of the regions) or large fraction of significantly different days, thus indicating sensible regions for observational uncertainty.
The complex orography and the influence of both the Atlantic Ocean and the Mediterranean Sea modulate the precipitation over the Iberian Peninsula. This leads to particular regimes, as the cold drop in the east coast, that a continental adjustment of the interpolation model is not able to reproduce, particularly when a low-dense observational network is considered. In this sense, the large increase of rain 25 gauges considered in Iberia01, when compared with E-OBS, give rise to a much improved precipitation rendering. In the case of temperature, although the observational network considered is similar in both cases, the pattern tends to be more orographic in E-OBS v17 due to the continental adjustment of the interpolation method that overrates this component avoiding regional behaviors. In addition, the contribution of the observational network considered in France also has a clear effect on the interpolated value over the Pyrenees and the northeast of the Iberian Peninsula.

30
Note that the interpolation method, independently of the target resolution, is calibrated to reproduce the spatial dependence of the mean field of the target variable, which is usually greater than the grid resolution (1 • approximately in this case). Therefore, the effective resolution of purely interpolated gridded products is limited by this spatial value, which define the size of the kernels used for the interpolation process.
As a result, in order to properly evaluate the convecting permitting CORDEX simulations, other approaches like regional reanalysis (e.g. Häggmark et al., 2000) or methods combining interpolation and analysis as the proposed by Quintana-Seguí et al. (2017) and Peral et al.

5 Code and data availability
All the datasets used in this work are publicly available. On the one hand, the Iberia01 dataset is publicly available through the DIGITAL.CSIC open science service (Herrera et al., 2019a, DOI: http://dx.doi.org/10.20350/digitalCSIC/8641). Moreover, a THREDDS remote access to this dataset is available from the Santander Climate Data Service, via the User Data Gateway (instructions at http://meteo.unican.es/udg-wiki).
On the other hand, the E-OBS v17 dataset is remotely available through the KNMI's THREDDS server http://opendap.knmi.nl/knmi/thredds/ 5 e-obs/e-obs-catalog.html and the ensemble version E-OBS v17e is available through the Copernicus' Climate Change Service http://surfobs. climate.copernicus.eu. Elevation data is taken from the Global 30 Arc-Second Elevation (GTOPO30) DOI: 10.5066/F7DF6PQS. The R code needed to partially reproduce the results of this paper (for the remotely accessible datasets Iberia01 and E-OBS v17) is publicly available at https://github.com/SantanderMetGroup/notebooks, building on the remote data services above described and on the climate4R R framework (Iturbide et al., 2019).
Author contributions. Herrera S., Gutiérrez J.M. and Soares P.M. conceived the study; Gutiérrez J.M., Soares P.M., Cardoso R.M., Espíritu-5 Santo F. and Viterbo P, obtained and processed the Spanish and Portuguese observational datasets; Herrera S. implemented the code to make the interpolation and the analysis, and built the dataset and figures of the paper; Herrera S., Soares P.M., Gutiérrez J.M. and Cardoso R.M.
wrote the manuscript and all the authors revised the results.
Competing interests. The authors declare that there are not any competing interest.