Ten-year return levels of sub-daily extreme precipitation over Europe

Information on the frequency and intensity of extreme precipitation is required by public authorities, civil security departments, and engineers for the design of buildings and the dimensioning of water management and drainage schemes. Especially for sub-daily resolutions, at which many extreme precipitation events occur, the observational data are sparse in space and time, distributed heterogeneously over Europe, and often not publicly available. We therefore consider it necessary to provide an impact-orientated data set of 10-year rainfall return levels over Europe based on climate model simulations and evaluate its quality. Hence, to standardize procedures and provide comparable results, we apply a high-resolution single-model large ensemble (SMILE) of the Canadian Regional Climate Model version 5 (CRCM5) with 50 members in order to assess the frequency of heavy-precipitation events over Europe between 1980 and 2009. The application of a SMILE enables a robust estimation of extreme-rainfall return levels with the 50 members of 30-year climate simulations providing 1500 years of rainfall data. As the 50 members only differ due to the internal variability in the climate system, the impact of internal variability on the return level values can be quantified. We present 10-year rainfall return levels of hourly to 24 h durations with a spatial resolution of 0.11 (12.5 km), which are compared to a large data set of observation-based rainfall return levels of 16 European countries. This observation-based data set was newly compiled and homogenized for this study from 32 different sources. The rainfall return levels of the CRCM5 are able to reproduce the general spatial pattern of extreme precipitation for all sub-daily durations with Spearman’s rank correlation coefficients > 0.76 for the area covered by observations. Also, the rainfall intensity of the observational data set is in the range of the climate-model-generated intensities in 60 % (77 %, 78 %, 83 %, 78 %) of the area for hourly (3, 6, 12, 24 h) durations. This results in biases between −16.3 % (hourly) to +8.2 % (24 h) averaged over the study area. The range, which is introduced by the application of 50 members, shows a spread of −15 % to +18 % around the median. We conclude that our data set shows good agreement with the observations for 3 to 24 h durations in large parts of the study area. However, for an hourly duration and topographically complex regions such as the Alps and Norway, we argue that higher-resolution climate model simulations are needed to improve the results. The 10-year return level data are publicly available (Poschlod, 2020; https://doi.org/10.5281/zenodo.3878887). Published by Copernicus Publications. 984 B. Poschlod et al.: Ten-year return levels of sub-daily extreme precipitation over Europe


Introduction
has already been chosen by Nissen and Ulbrich (2017) based on legislation and stakeholder interviews. Also, the recent study of Berg et al. (2019) calculates this return level for nine selected regional climate models of the EURO-CORDEX multi model ensemble. 65 The durations between one hour and 24 hours cover a variety of rainfall generating mechanisms such as convection, advection, and orographic precipitation. The complexity of these processes inducing extreme precipitation, its inherent intermittency properties and its variability are still not well understood and matter of recent climate and weather research (Trenberth et al., 2017;Das et al., 2020). Hence, the comparison to observational data is also relevant for the evaluation of the process knowledge within the regional climate model and the applied parametrization schemes. 70 2 Data and methods

The Canadian Regional Climate Model Version 5 Large Ensemble (CRCM5-LE)
The global climate for this study is based on a large ensemble of global climate model (GCM) simulations, which was performed with the Canadian Earth System Model version 2 (CanESM2) at the rather broad spatial resolution of 2. 8° (Arora et al., 2011;Fyfe et al., 2017). The CanESM2 was run for 1000 years forced by constant preindustrial conditions. After 75 applying small random atmospheric perturbations, five runs with differing initial conditions were set up starting in January 1850 (Leduc et al., 2019). On 1 January 1950, ten new random atmospheric perturbations were applied to each of the five runs resulting in an ensemble of 50 members in sum. These 50 simulations were forced with estimations of historical CO2 and non- Implementing slight atmospheric perturbations in 1850 and 1950 results in different climate realizations, though neither the atmospheric forcing nor the model dynamics, physics or structure were changed (Arora et al., 2011). The climate projections only differ due to the internal variability of the climate system, which is caused by non-linear dynamical processes intrinsic to the atmosphere (Deser et al., 2012;Hawkins and Sutton, 2009;von Trentini et al., 2019).
The framework for the design of the single-model large ensemble (SMILE) of the regional climate model (RCM) as well as 85 the simulations of the CRCM5-LE were then carried out within the ClimEx project (Climate change and hydrological extreme eventsrisks and perspectives for water management in Bavaria and Québec). Each of the 50 CanESM2 simulations were dynamically downscaled with the CRCM5 applying the EURO-CORDEX grid specifications (0.11° horizontal resolution equalling around 12.5 km).
The precipitation related physical parameterization schemes in the CRCM5 setup include the following modules (Bresson et 90 al., 2017;Martynov et al., 2012;2013): subgrid-scale orographic gravity-wave drag by McFarlane (1987) is implemented and low-level orographic blocking is parametrized via Zadra et al. (2003). The planetary boundary layer scheme (Benoit et al., 1989;Delage, 1997;Delage & Girard, 1992) was used in a modified version by McTaggart-Cowan and Zadra (2015) in order to introduce hysteresis effects. The Sundquist (1978; scheme is applied as condensation scheme to diagnose large-scale precipitation. Shallow convection is parameterized with the Kuo-transient scheme (Bélair et al., 2005;Kuo, 1965) and deep 95 convection is described with the Kain and Fritsch (1990) scheme. Land surface processes are simulated by the Canadian Land Surface Scheme, version 3.5 (CLASS3.5; Verseghy, 1991Verseghy, , 2009 and lakes are modeled with the one-dimensional freshwater lake model (FLake; Martynov et al., 2012;2013). For the details of the whole CRCM5 setup the reader may be referred to Martynov et al. (2012) or Hernández-Díaz et al. (2012).
RCM SMILEs are relatively rare due to the high demands on computing power. In addition to the CRCM5-LE only the 21-100 member CESM-COSMO-CLM SMILE with a horizontal spatial resolution of 0.44° (Addor & Fischer, 2015) and the 16member EC-EARTH-RACMO2 SMILE with a horizontal spatial resolution of 0.11° (Aalbers et al., 2018) are available for a European domain. Although newer model versions are already available, such as CanESM5 (Swart et al., 2019), the existing CRCM5-LE provides a unique database with the highest number of members, largest domain, and highest spatial resolution available. 105 In this study, we focus on the precipitation during the time period of 1980 to 2009, which is simulated by the CRCM5 and stored in hourly resolution. Hence, for the calculation of return periods, 1500 years of hourly precipitation under conditions of this climate period are available. Leduc et al. (2019) evaluate mean precipitation during 1980 to 2012 by comparing the annual rainfall with E-OBS data over the whole European domain. Generally, the CRCM5-LE shows a wet bias in mean precipitation of up to 2 mm d -1 during the winter and less than 1 mm d -1 for the summer, spring and fall periods. Regions with higher biases 110 are located at the west coasts of Spain, Portugal, Ireland, UK, Norway, Croatia, Albania, and Greece and in the topographically complex areas of the Alps, Carpathians, and Pyrenees (Leduc et al., 2019). These precipitation biases are in the range of the EURO-CORDEX models as well (Kotlarski et al., 2014).

Calculation of rainfall return periods 115
In climate science, extreme precipitation is mostly assessed via the analysis of high quantiles, such as the 99.7 % quantile, which equals the occurrence probability of an event happening once per year (Santos et al., 2015;Hennemuth et al., 2013).
Risk analysis, engineering guidelines and also legislative thresholds are often expressed as return levels. Applying Extreme Value Theory (EVT), return periods can be calculated by fitting extreme value distributions to a selection of independent and identically distributed samples of extreme events (Coles, 2001). EVT consists of the two fundamentally different sampling 120 strategies block maxima (BM) and peak-over-threshold (POT). By choosing annual block maxima as sampling strategy, we ensure that the extreme samples are independent from each other. Still, sampling only one event per year may result in a loss of information compared to the POT approach. Also, lower-intensity observations, which are not extreme, but still the maximum value of the year, may be included due to the application of the BM strategy.
Due to the hourly resolution of the CRCM5-LE data, the hourly maxima are constrained to the fixed window at the full hour 125 (e.g. 6:00 to 7:00). For all other durations (3-hourly, 6-hourly, 12-hourly and 24-hourly, respectively) we allow hourly moving windows for the selection of maxima.
We applied a Mann-Kendall-test with p = 0.05 (0.01) on the 50 series of 30 annual maxima and five different durations revealing a trend for less than 6 % (1 %) of all grid cells over all durations. The affected grid cells vary in location within the 50 climate model simulations and we therefore do not apply any de-trending methods. 130 Following the Fisher-Tippett theorem, the distribution of block maxima samples converges to the Generalized Extreme Value (GEV) distribution Eq. (1): where µ, σ and ξ represent the location, scale and shape parameters of the distribution. The shape parameter ξ governs the tail behaviour of the GEV distribution. According to the value of ξ, the GEV corresponds to the Gumbel (ξ = 0), Fréchet (ξ > 0) 135 or Weibull (ξ < 0) distribution (Coles, 2001). We fit the location, scale, and shape parameters separately for each of the 50 differing 30-year block maxima via the method of L-moments (Hosking et al., 1985) using the software package by Gilleland and Katz (2016). The method of L-moments has proven to deliver stable results for small sample sizes (Delicado & Goria, 2008;Hosking et al., 1985;Kharin & Zwiers, 2000). We have also applied Maximum Likelihood Estimation (MLE). There, the median return levels are almost equal to L-moments, but the variability within the 50 members is slightly larger due to 140 more unstable results at the edges of the ensemble. MLE is recommended by Delicado and Goria (2008) for sample sizes of n ≥ 50, which is why we keep the fits based on the method of L-moments. The goodness-of-fit is assessed applying the Anderson-Darling test with 5 % significance level following Chen and Balakrishnan (1995). The goodness-of-fit test with 5 % significance level at 280 × 280 grid cells for 50 members would yield 196000 locations on average, where the null hypothesis is erroneously rejected, also called typ I error or false positives (Ventura et al., 2004). Hence, we apply the false discovery rate 145 (FDR) (Benjamini and Hochberg, 1995) following the approach of Wilks (2016), which adjusts the critical p-value for statistical testing at many locations. Less than 0.1 % of all fits are rejected for all durations. The median values of µ, σ and ξ, as well as the respective standard deviation over the 50 members are shown within the Supplementary Materials. The 10-year return periods are calculated inverting Eq. (1). For the most robust estimation at each grid cell, the median of the 50 return periods is chosen. 150

Observational rainfall return periods
The observational data are combined from many different national precipitation data sets. This special effort had to be made, since reanalysis products underrepresent extreme events due to the interpolation methods, especially in regions with low measuring network density and for short-duration events at local scales (Hofstra et al., 2008). In order to compare the national observational data to the climate model output, areal precipitation data sets of observations are linearly rescaled to the 0.11° 155 CRCM5 grid. Point observations are spatially interpolated via ordinary kriging and aggregated to the 0.11° grid of the CRCM5-LE. We describe the data processing for each national data set in the following. Table 1 provides an overview of the applied observational data and how they were accessed.

Germany 160
The German weather service provides an estimation of rainfall return periods based on 5-minute resolution gauge observations between 1951 and 2010 (DWD, 2020; publicly available). The documentation of the data processing is given in Malitz and Ertel (2015). A Peak-over-Threshold (POT) approach was chosen to sample extreme events. Events between May and September were de-clustered to guarantee for independent samples. After fitting an exponential distribution, the calculated return periods are spatially interpolated over Germany resulting in grid cells of approximately 8 km. We rescale these grids 165 linearly to the 0.11° specifications of the CRCM5-LE.

Austria
The Austrian data set is publicly available for single grid cells on a web-portal by the ministry of agriculture, regions and tourism (BMLRT, 2020). For the generation of the return periods, the rain gauge data are supplemented by a convective weather model in order to improve the density of observations (Kainz et al., 2007). Similar to the German dataset, a POT 170 approach was applied. Details are reported in BMLRT (2006). We linearly interpolate the Austrian data to the 0.11° grid.

Belgium
Return periods of extreme precipitation in Belgium were calculated by van de Vyver (2012). Therefore, a spatial GEV model was applied considering multisite data. The GEV parameters are related to geographical and climatological covariates through a regression relationship. The data are provided by the Belgian Royal Meteorological Institute (RMI, 2020; publicly available) 175 for each commune. We interpolate the communal point data on the CRCM5-LE grid via ordinary kriging.

Switzerland
In Switzerland, return periods of hourly, 3-hourly, 6-hourly and 12-hourly precipitation are available at 67 rain gauges for the time period 1981 to 2019 (MeteoSwiss, 2020). They were calculated by fitting a GEV to seasonal maxima via Bayesian estimation. As 24-hourly return periods are not provided, we use the estimates for daily extreme precipitation, which cover a 185 time period from 1966 to 2015 at 337 sites. We apply an adjustment factor of 1.14 to transfer the daily fixed window return levels to 24-hourly moving window levels as this relation has been found to be rather stable (Barbero et al., 2019;Boughton & Jakob, 2008). Within the CRCM5-LE, we find a factor of 1.13 between daily and 24-hourly return periods, which confirms this relationship. We interpolate the pointwise estimations of the return levels to the CRCM5 grid applying ordinary kriging. 190 Dyrrdal et al. (2015) generate a spatially coherent map of extreme hourly precipitation return levels in Norway. They link GEV distributions with latent Gaussian fields in a Bayesian hierarchical model to overcome the sparse observational network. The precipitation gauges only operate during an extended summer season, whereas the highest 12-hourly and 24-hourly rainfall sums occur during fall and winter in western Norway. Due to this limitation, the data have to be classified as experimental.

Norway
Hence, for 24-hourly return levels, we use the daily gridded precipitation data set seNorge2 at 1 km resolution (Lussana & 195 Tveito, 2017; publicly available; Lussana et al., 2018), which covers the time period from 1957 to 2019. We fit the GEV to the annual maxima of each 1 km grid cell and apply the adjustment factor of 1.14 to transfer the daily estimates to moving windows of 24 hours. The resulting return levels are then linearly interpolated to the 0.11° grid.

Slovenia
The Slovenian Environment Agency provides rainfall return periods at 63 gauges (SEA 2020; publicly available), which they 200 derived by fitting a Gumbel distribution (see Eq. (1) with ξ = 0). The time periods differ for each site. We interpolate the return levels to the 0.11° grid via ordinary kriging.

United Kingdom (without Northern Ireland)
For the United Kingdom, we use the gridded estimates of hourly areal rainfall for Great Britain (CEH-GEAR1hr; Lewis et al., 2019b; publicly available), which covers a time period of 1990 to 2014 in 1 km spatial resolution. For every grid cell and 205 duration, we sample the annual maxima, fit the GEV and calculate the return levels. Then, we aggregate the areal rainfall return levels to the 0.11° grid.

Denmark
For the Danish climate, the rainfall return levels are assumed to be almost constant with very low variability across the whole country (Madsen et al., 2017). They used data of 83 rain gauges with minute-resolution covering the period 1979 to 2012 with 210 more than 10 years of observations. For the extreme value analysis, a partial duration series model is applied to estimate the intensity duration frequency relationships of extreme precipitation. We use their average values of 24.9 mm (33.3 mm, 40.2 mm, 46.7 mm, 55.3 mm) as 10-year return levels for hourly (3-hourly, 6-hourly, 12-hourly, 24-hourly) durations.

Netherlands
As for Denmark, the return levels show very low variability in the Netherlands, which is why the KNMI provides single values 215 for the whole country (Beersma et al., 2018). The 10-year return levels amount to 31 mm h -1 , 39.8 mm (3 h) -1 , 46.0 mm (6 h) -1 and 52.9 mm (12 h) -1 . As no estimation for 24-hourly return levels is provided, we use daily precipitation sums of the 1 km resolution gridded data set between 1951 and 2010 (KNMI, 2020; publicly available). The data is based on 300 measurement stations and interpolated via ordinary kriging. After extracting the annual maxima, we fit the GEV and rescale the resulting return level of daily precipitation to the 0.11° grid. Furthermore, we apply the adjustment factor of 1.14 to transfer the return 220 level to a 24-hourly estimate.

Sweden
In Sweden, the variability of return periods of extreme precipitation is also assumed to be very low. Olsson et al. (2018) provide tables of hourly, 3-hourly, 6-hourly, and 12-hourly return levels for four regions of Sweden. The estimations are based on over 120 rain gauges covering the period 1996 to 2017. For each of the four regions, all rain gauge data were concatenated to one 225 single time series. A POT approach was carried out and the Generalized Pareto Distribution (GPD) was fitted via maximum likelihood estimation. The domain of the CRCM5-LE covers only the middle, south-eastern and south-western Swedish subregions. The 24-hourly duration is not available and we therefore apply an extrapolated value for the three regions, which is adapted to the values of the neighbouring countries Finland and Denmark.

Finland 230
Within a project about short-duration rainfall extremes in urban areas, radar measurements over whole Finland between 2000 and 2005 have been used to estimate the hourly return level of 10-year rainfall (Aaltonen et al., 2008). The radar measurements of the whole country were pooled to enlarge the database for extreme value analysis. The hourly 10-year return level amounts to 22.9 mm h -1 for the whole country. For longer durations of 6 and 24 hours, Venäläinen et al. (2007) have calculated return levels for different sites in Finland. As for Denmark, we take one average value for the whole country from the stations, which 235 are covered by the CRCM5-LE domain. For 3-hourly and 12-hourly estimate we interpolate according to the values of the neighbouring countries Sweden and Denmark. The final countrywide return levels amount to 22.9 mm (27.0 mm, 34.0 mm, 44.0 mm, 53.1 mm) as 10-year return levels for hourly (3-hourly, 6-hourly, 12-hourly, 24-hourly) durations.

Italy
In Italy, meteorological observations are the responsibility of the provincial administration. The data availability, the data 240 format and the available products differ within all 21 regional authorities. A good overview of this issue is given in Libertino et al. (2018), who also analyse the combined product "Italian Rainfall Extremes Database". Though, the authors are not allowed to pass on this database, unless the permission of all individual provincial administrations has been obtained. We therefore focus on data, which are available, and gathered information for 14 provinces. Annual maxima for rain gauges are provided for Basilicata (Manfreda et al., 2015), Calabria (ARPACAL, 2020; personal registration needed), Friuli Venezia Giulia 245 (ARPAFVG, 2020), Marche (PCRM, 2020; user account necessary), Piemonte (ARPAP, 2020; Java application of database has to be downloaded and run), Toscana (RT, 2020), Trento (Meteotrentino, 2020), Umbria (Morbidelli et al., 2016) and Valle d'Aosta (CFRAVA, 2020). We fitted the GEV and calculated the 10-year return levels. Rainfall return levels are directly available for stations in Lazio (CFRRL, 2020), Liguria (ARPAL, 2013) and Veneto (ARPAV, 2020). Fitted parameters for the LSPP model (linea segnalatrice di probabilità pluviometrica) are given for rain gauges in Lombardia by de Michele et al. 250 (2005), which can be used to derive rainfall intensities for the 10-year return period. For the stations in the region of Molise, fitted parameters for an exponential model are provided (RM, 2001). All derived point data of return levels were interpolated applying ordinary kriging.

Spain
For Spain, we have only gathered information about daily rainfall return levels. Herrera et al. (2012) have developed a gridded 255 data set of daily precipitation sums based on 2756 measurement stations for the period 1950 to 2003. They used a two-stage kriging approach to interpolate the data. Due to the dense station network, extreme precipitation events are accurately reproduced in opposite to typical reanalysis data sets. The data are publicly available (SMG, 2020). We extracted the annual maxima, fitted the GEV and applied the adjustment factor of 1.14 to transfer the daily data to 24-hour moving window estimations. Then we rescaled the gridded data to the specifications of the 0.11° CRCM5 grid. 260

Portugal
Following the same approach as the Spanish data set, Belo-Pereira et al. (2011) have created grid data of daily precipitation.
The data set is based on 806 stations and therefore the dense station network again ensures an accurate reproduction of extremes after the interpolation process. Data are available at IPMA (2020; publicly available). We used the same process as for the Spanish data to estimate 24-hour return levels. . We used the same process as for the Spanish data to estimate 24-hour return levels.

Post-processing for the comparison to areal data
Most of the observational data sets are based on point measurements, whereas the climate model simulates areal estimates of precipitation. In order to improve the comparability of these two kinds of data, Areal Reduction Factors (ARF) are often applied on the point measurements (Wilson, 1990). ARF are empirically derived factors and are dependent on the temporal 275 and spatial resolution as well as the local climate (Sunyer et al., 2016). In addition to the difference in space, we need to apply a correction to the hourly data, as the observations are based on hourly maxima with moving windows, whereas the hourly data of the climate model is constrained to the fixed window between full hours.
In Austria and Slovenia, we use the ARF by Breinl et al. (2020), which amount to 1.30 (1.20, 1.13, 1.09, 1.06) for hourly (3hourly, 6-hourly, 12-hourly, 24-hourly, respectively) duration. In the Italian provinces, the reduction factors by Mineo et al. (2018) are applied. These show a stronger reduction for shorter durations (ARF1h = 1.52, ARF3h = 1.22, ARF6h = 1.07). For 285 12-hourly and 24-hourly duration, Mineo et al. (2018) do not propose any reduction. As the areal correction is already implemented within the SHYPRE process chain of the French data, we only apply temporal correction factors of 1.03, 1.02 and 1.01 for hourly, 3-hourly and 6-hourly durations following Berg et al. (2019). These temporal correction factors are also added to the ARF of Wilson (1990), Breinl et al. (2020) and Mineo et al. (2018). An overview of the applied ARF is given in Table 2. 290 For the combination of the overlapping national data sets, the mean of the two overlapping data sets is calculated.

Results
The median at each grid point of the 10-year return levels of hourly, 3-hourly, 6-hourly, 12-hourly, and 24-hourly precipitation of the 50 CRCM5-LE members is generated and stored as comma separated textfiles (Poschlod 2020). For each duration we store one file with five columns containing the return level based on the median of the 50-member CRCM5-LE, the 5 %-295 quantile and the 95 %-quantile of the ensemble at each grid cell as well as the geographical coordinates. We use this format because of a possible application within a non-scientific environment, whereas within climate science, the netcdf-format is widely used. Figure 1 shows the rainfall intensity for hourly and 12-hourly precipitation return levels for the European domain based on the median of the 50-member CRCM5-LE. Though covering the whole year, Figure 1 can be compared to the 10year return levels of nine RCM setups of the EURO-CORDEX ensemble, which were calculated for summer-time precipitation 300 only (Berg et al., 2019). We chose the same colour scaling for a better comparability. The median return levels of the CRCM5-LE show a more homogeneous regional distribution with less scattering than the single RCM members in Berg et al. (2019).
Also, single members of the CRCM5-LE show this smooth regional distribution, but the use of the median of 50 SMILE members enhances this behaviour, as it filters out the internal variability of the climate system within individual 30-year periods. For the hourly return levels, the combination of CanESM2 and CRCM5 shows relatively high intensities such as the 305 two most intense model setups HIRHAM5--ECEARTH-r03 and REMO2009--MPI-ESM-RL in Berg et al. (2019). Though, the spatial pattern differs, as the CRCM5-LE produces lower hourly rainfall intensities in the eastern part of the study area and shows a higher sensitivity to the topography of the Alps. In the central Alpine areas, the CRCM5-LE simulates very low hourly rainfall intensities of 6 to 15 mm h -1 . The highest rainfall intensities are simulated south of the Alps and at the Adriatic coast.
For the 12-hourly duration, these areas also show the highest median rainfall intensities, with the Norwegian west coast and 310 the Atlantic coast of northern Portugal and Spain also exhibiting high values. The lowest 12-hourly return levels are produced for the southwest and the north of the study area (northern Africa, UK, Scandinavia, and north-eastern Europe). The 12-hourly 10-year return levels based on the median of the CRCM5-LE are similar to all nine RCM-GCM combinations of Berg et al. (2019) in terms of spatial patterns as well as rainfall intensities. Hence, we argue that the differences between the parametrization of convection induces the big deviations within the hourly return levels, as for this duration convection is the 315 main driver of extreme precipitation in large parts of Europe (Berg et al., 2013;Coppola et al., 2018;Lenderink & Meijgaard, 2008;Kendon et al., 2014).
In order to compare the return levels of the CRCM5-LE to observational data, we present both data sets as well as the percentage deviation in Fig. 2, 3 and 4 for all durations.
The combined observational datasets (see Fig. 2, 3 & 4) show quite smooth transitions between most of the different data 320 sources and methods. The biggest deviation is found at the border of Norway and Sweden for hourly to 12-hourly durations (see Fig. 2 & 3), as the estimate of the rainfall return level for western Sweden by Olsson et al. (2018) is a lot higher than the estimate by Dyrrdal et al. (2015) for eastern Norway. This is due to the sparse sampling of observations and differing approaches to derive return levels (see Sect. 3.1). We also find slight deviations for the Netherlands, where the return levels by Beersma et al. (2018) are higher than the surrounding levels for northern Belgium and western Germany. For the shorter 325 durations of hourly and 3-hourly return levels (see Fig. 2), deviations occur at the border between Italy and France as well as between Italy and Switzerland. This is due to the higher ARF applied in Italy (see Section 3.2). These deviations emphasize the need for homogeneous data sets of extreme precipitation.
As the 50 members of the CRCM5-LE also provide a range of equally probable estimations of return levels, we hatch areas, where the observations are not within the range of the regional climate model ensemble. The rainfall intensity of the 330 observational data set is within the range of the climate model generated intensities in 60 % (77 %, 78 %, 83 %, 78 %)of the area for hourly (3-hourly, 6-hourly, 12-hourly, 24-hourly) durations (see Fig. 2, 3 & 4). This fraction of areas is gradually increasing between hourly and 12-hourly durations, whereas it slightly decreases for the 24-hourly duration. For the 24-hourly return period, data for the Iberian Peninsula and Poland was added, whereby no data for these countries was available for the hourly to 12-hourly evaluation. Without these additional data sets, the fraction of areas, where 24-hourly observational return 335 levels are within the CRCM5-LE return levels, would amount to 80 %. In addition, in the Netherlands, Switzerland and Norway, different data bases are used for the estimations of the return levels of hourly to 12-hourly durations and 24-hourly duration (see Section 3).

The hourly intensities are generally underestimated by the CRCM5-LE except for England and Wales, northern Italy, northern
Austria and the northern part of Norway, resulting in an areal average bias of -16.3 % (see Fig. 2). There is also an area-wide 340 underestimation in the Mediterranean as well as Scandinavia in all 50 members of the large ensemble, which is why the observations are not in the range of the CRCM5-LE for large parts of these areas (see Fig. 2). For durations of three to twelve hours, the biases over the whole area decrease to -1.0 %, -0.5 % and +0.1 % (see Fig. 2 & 3).The high intensities of southern France, southern Switzerland and parts of Italy are underestimated (see Fig. 2 & 3). Also in Sweden and Finland the observational data sets report higher rainfall intensities. For the 24-hourly aggregation, the bias amounts to +8.2 % (see Fig.  345 4). The CRCM5 overestimates 24-hourly rainfall intensities in western Norway and at the Atlantic coast of the northern Iberian Peninsula, which is why the observations are not in the range of the 50 CRCM-LE members (see Fig. 4).
We calculate the Spearman's rank correlation coefficient ρ as a measure to compare the spatial patterns. For the median of the return levels of the CRCM5-LE and the observational data the coefficient amounts to 0.83 (0.81, 0.76, 0.78, 0.83, respectively) of the area for hourly (3-hourly, 6-hourly, 12-hourly, 24-hourly, respectively) durations. These values confirm the visual 350 impression of a high spatial pattern correlation when comparing both data sets (see Fig. 2, 3 & 4).

Discussion
Generally, the overall low bias of the return levels based on climate model data as well as the high spatial correlation between the observational and modelled return levels prove that the CRCM5-LE is able to capture the features of the heterogeneous set of drivers which govern the European climate of heavy and extreme precipitation. 355 Especially for countries without any sub-daily precipitation measurement, the data set based on climate model simulations can provide valuable estimations. But also for countries offering return levels of extreme sub-daily precipitation, our results show that the sparse observational network can be supported by climate model simulations. Accordingly, the Austrian return level data (Sect. 3.1.) are supplemented by a convective permitting weather model (Kainz et al., 2007).

Uncertainties of observational data 360
Due to differing methods, temporal resolution of the rain gauges, available time periods and areal coverage, we do not regard the combined observational data set as "truth", but as the largest possible comparison product, which is directly based on hourly observations. The uncertainties within these data are caused by different sources. First, the rain gauge measurements systematically underestimate precipitation due to splashing raindrops, wetting of the funnel surface, evaporation from the funnel and wind effects (Førland et al., 1996;Richter, 1995;Westra et al., 2014). The choice of the sampling approach as well 365 as the choice of the extreme value distribution leads to differing estimations of return levels (Lazoglou & Anagnostopoulou, 2017). Also, the fitting of the parameters of the respective extreme value distribution to the extreme precipitation samples induces additional uncertainty (Muller et al., 2009). As described in Sect. 3.1, the applied EVT approaches differ for the national data sets. Lazoglou and Anagnostopoulou (2017) have shown that the estimations of 50-year return levels of daily precipitation at ten different mediterranean stations differ between 5 % and 15 % according to the application of GEV or GPD 370 and three different fitting methods.
The national data sets of Norway and Germany do not refer to all seasons, but cover only summer-time events (Dyrrdal et al., 2015;Malitz & Ertel, 2015). The available time periods of observations differ for all countries, but also differ within the countries, as new rain gauges were installed over time and other measurement stations were discarded. Short time periods increase the uncertainties of the parameter fits of the extreme values distribution (Cai & Hames, 2011). Additionally, extreme 375 precipitation, especially when caused by convectional processes, is spatially highly variable (Zolina et al., 2014). Therefore, the representativeness of single point observations is limited.
Transferring these rather uncertain point-scale observation-based data to areal estimates can be carried out with various spatial interpolation methods such as inverse distance weighting, multivariate splines, machine learning approaches, or different versions of kriging, where auxiliary geographical or climatological covariates can be added via regression fields (e.g. Malitz 380 & Ertel, 2015;van de Vyver, 2012). In combination with low spatial coverage of the rain gauges (Isotta et al., 2014), this step induces additional methodical uncertainties. The regionalization of extreme precipitation is subject to a wide field of research, where many sophisticated methods are applied, which show different interpolation results (Hu et al., 2019). As for most countries only the return level itself and not the time series of rainfall is provided, we applied ordinary kriging to regionalize the observational point data. 385 The linear scaling to the 0.11° CRCM5-LE grid was applied to the national data, which are provided as areal estimates with different spatial resolution. The aggregation and linear scaling to the spatial resolution of 0.11° smooths extreme values of single grid cells.
The last step to make observation data and climate model data comparable features the application of the areal reduction factor (ARF). The ARFs are derived empirically and therefore differ between different studies, which also causes uncertainty (Berg 390 et al., 2019;Sunyer et al., 2016;Wilson, 1990). Junghänel et al. (2017) estimate a tolerance range of ±15 % for 10-year return levels of the German national data (Sect. 3.1.1).
Even though the combined observational data set is subject to different limitations and uncertainties, it is a necessary approach to evaluate the return levels of climate models not only locally or countrywide, but to perform a validation at (almost) continental scale. To our knowledge, such an assessment has not been carried out before. The confidence level in this validation 395 varies by country depending on the underlying rainfall database and the procedure of the return level calculation, which has been described in section 3. The obvious deviations in our homogenized observational return level product at the country borders between Norway and Sweden, between Italy and France and Switzerland as well as between the Netherlands and Germany and Belgium (as described in section 4), show that the validation in these regions is subject to major uncertainties for hourly to 12-hourly durations. On the other hand, the good fit and the preservation of topographic features at the borders 400 of Germany, Denmark, Belgium, France, Austria, Switzerland, and Slovenia support the confidence level in the validation for these regions. For the 24-hourly duration we find no major deviations along the country borders, which increases the confidence in this return level duration.

Natural variability within the CRCM5-LE 405
Extreme precipitation events show a high variability due to the natural variability of the climate system (Aalbers et al., 2018).
By the application of a 50-member SMILE, we assume the range of natural variability of extreme rainfall to be represented by the ensemble (Deser et al., 2012;Hawkins and Sutton, 2009;von Trentini et al., 2019). In consequence, while all 50 members are forced by the same emissions and are simulated by the same model structure and physics, the resulting return levels differ from each other. 410 In order to visualize this range, we show the standard deviation, as well as the 5 % and 95 %-quantiles of all 50 members at each grid cell representing the 10-year return level of 3-hourly precipitation (Fig. 5). The standard deviation amounts to 3.33 mm as areal average. The 5 % and 95 %-quantile return levels differ by -4.7 mm and +5.8 mm from the median, respectively, which equals a percentage range of -14 % to +17 %. This range is quite stable for other durations as well (-15 % to +18 % for hourly, -15 % to +14 % for 6-hourly, -14 % to +17 % for 12-hourly and -13 % to +17 % for 24-hourly durations). Thereby, 415 the overall variability is mainly caused by natural variability of the rainfall intensity. The spatial patterns of the minimum and maximum estimates show high agreement with a Spearman's rank correlation coefficient of ρ = 0.91. Hence, we conclude that the application of annual maxima of 30-year periods and EVT can filter out the spatial variability of single extreme events, but the estimates of 10-year return levels are still governed by the natural variability within the 30-year periods.
For a local scale visualization, we provide the range of the CRCM5-LE return levels as well as the observational return levels 420 for all considered durations at six different European cities (see Fig. 6). Oslo, London, Brussels, Paris, Munich and Rome show different climates and distances to the ocean. We also include the city of Rome as example, where the observational data are not within the range of the climate model simulations. We find that the absolute range of natural variability is dependent on the intensity of rainfall, which is also visible in the standard deviation in Fig. 5. We argue that convective processes are more affected by natural variability and, therefore, the return levels in Rome and Paris show greater variability than in Oslo or 425 London.
Due to the application of a SMILE, natural variability of return levels of extreme rainfall can now be quantitatively assessed at local, regional, and continental scales. Before, it has been included within uncertainty estimations of observational return levels as additional source of uncertainty (Junghänel et al., 2017), but was only estimated rather arbitrarily.

Limitations of the CRCM5-LE
In general, the results of the CRCM5-LE are governed by model uncertainty, as the ensemble only features one combination of GCM and RCM. Different model combinations or even modifications of the dynamics, physics and structure of the same climate models would yield different return level estimates. The results of the study by Berg et al. (2019) suggest that the 435 influence on the return level estimates of the RCM is significantly greater than that of the atmospheric forcing by the GCM.
The return levels simulated by the CRCM5-LE are limited by the spatial resolution of the model setup, by the temporal resolution of the stored precipitation output and by the non-explicit calculation of convectional precipitation using parametrization schemes. Short-duration rainfall extremes over Europe are mainly governed by convectional processes, which can only be resolved by regional climate models with explicit convection schemes, i.e. spatial resolutions of 4 km or less 440 (Tabari et al., 2016). Prein et al. (2015) have shown that improved spatial resolution also leads to better reproduction of convectional rainfall. Several studies have reported that the application of convection-permitting models (CPMs) improves the reproduction of heavy rainfall events over Europe (Berthou et al., 2018;Kendon et al., 2014;Poschlod et al., 2018). In addition to the benefit of explicitly calculating convection, complex topography is better resolved with a better spatial resolution. The 0.11° resolution of the CRCM5-LE equals around 12.5 km, which leads to systematic shifts of the location of 445 high orographic precipitation. This phenomenon is visible for steep mountainous slopes with a westward exposition, such as the Black Forest in south-western Germany, or the Appenine in central Italy (see Fig. 3). The CRCM5-LE simulates the areas of high intensity orographically enhanced precipitation one to two grid cells further in the west than the observational data set.
These deviations are not affecting the bias as quality measure, as the areal average intensity is reproduced, but the location is not correctly estimated. However, the centred Pearson product-moment coefficient includes these local deviations. We argue 450 that a higher spatial resolution would be able to lower these errors.
Generally, the CRCM5-LE setup shows a high sensitivity to orographic features, as the return levels at the central Alpine areas are simulated with lower intensities than the selection of EURO-CORDEX RCMs by Berg et al. (2019). Observations also show an intense gradient from high-intensity rainfall at the Alpine slopes and low-intensity precipitation in the inner Alps.
Though, the area of low-intensity rainfall is smaller than simulated by the CRCM5 (see Fig. 2, 3 & 4). 455 The already existing 30-member CPM multi model ensemble (Coppola et al., 2018) has provided promising results for convective events over complex topography in Europe. Though, the inter-model spread is governed by model uncertainty as well as natural variability. We conclude that a convection permitting version of SMILE is needed to improve the reproduction of sub-daily convectional extreme rainfall, to resolve complex topography over Europe and to disentangle natural variability and model uncertainty. As even the simulation of the 50-member SMILE with 0.11° spatial resolution was very cost-intensive 460 in terms of computing power and data storage, a CPM SMILE would place high demands on high performance computing centres.

Summary and conclusion
We provide a data set of 10-year return levels of hourly to 24-hourly rainfall over Europe. The results are compared to an observation-based return level product, which was combined by several national or even sub-national data sets. The CRCM5 setup has shown good agreement to the observational data for large parts of the study area in terms of bias and spatial 470 correlation, with highest agreement for 3-hourly to 24-hourly durations. The application of a SMILE has enabled to assess the impact of natural variability on the estimation of sub-daily rainfall return periods. The range of natural variability has to be added as uncertainty range to any observational data set, as the observed weather can be interpreted as only one possible realization of the climate within the ranges of natural variability of the climate system.
The provided data are a good source of information for countries with low observational coverage of sub-daily rainfall. 475 Although, we do not necessarily recommend to apply the data for the planning and design of infrastructure, as the model results are governed by the limitations of temporal and spatial resolution and parametrization of convection, we conclude that the study delivers a homogenized data set of sub-daily heavy precipitation across most of Europe and supports an improved description and understanding of precipitation dynamics in high resolution. Given the very promising findings of our study and acknowledging its observable limitations, we argue that a convection-permitting single-model initial-condition large 480 ensemble would be very valuable to further improve the analysis of extreme precipitation and its natural variability.
We conclude with the serious demand that sub-daily observational data should be homogeneously processed, registered, and stored centrally with public access, at least for scientific applications. Even the national data sets, which are publicly available already, are very difficult to find and access due to the restrictions reported in Sect. 3. We hope that the Global Sub-Daily Rainfall Dataset (GSDR; Lewis et al., 2019a) can start to bridge these gaps and we encourage all meteorological offices to 485 provide their data.

Author contribution
BP, JS and RL designed the concept of the study. BP carried out the data analysis and the visualization. BP prepared the manuscript with contributions from both co-authors.

Competing interests 490
The authors declare that they have no conflict of interest.