A new dataset of river flood hazard maps for Europe a nd 1 the Mediterranean Basin region 2

12 Continental scale hazard maps for riverine floods have grown in importance in the last years. 13 Nowadays, they are used for a variety of research and commercial activities, such as evaluating 14 present and future risk scenarios and adaptation strategies, as well as a support of national and 15 local flood risk management plans. Here, we present a new set of high resolution (100m) hazard 16 maps for river flooding that covers most of the geographical Europe and all the river basins 17 entering the Mediterranean and Black Seas in the Caucasus, Middle East and Northern Africa 18 countries. Maps represent inundation along 329’000 km of river network for six different flood 19 return periods, expanding the previous datasets available in the region. The input river flow data 20 is produced by the hydrological model LISFLOOD using new calibration and meteorological 21 data, while inundation simulations are performed with the hydrodynamic model LISFLOOD- 22 FP. In addition, we present a detailed validation exercise using official hazard maps for 23 Hungary, Italy, Norway, Spain and the United Kingdom, that provides a more detailed 24 evaluation of the new dataset in respect to previous works in the region. We find that modelled 25 maps can identify on average two-thirds of reference flood extent, however they also 26 overestimate flood-prone areas for flood probabilities below 1-in-100-year, while for return 27 periods equal or above 500 years the maps can correctly identify more than half of flooded 28


areas. Further verification in North African and Eastern Mediterranean regions is needed to 29
better understand the performance of the flood maps in arid areas outside Europe. We attribute 30 the observed skill to a number of shortcomings of the modelling framework, such as the absence 31 of flood protections and rivers with upstream area below 500 km 2 , and the limitations in 32 representing river channels and topography of low land areas. In addition, the different design 33 of reference maps (e.g. extent of areas included) affects the correct identification of the areas for 34 the validation, thus penalizing scores. However, modelled maps achieve comparable results to 35 existing large-scale flood models when using similar parameters for the validation. We 36 conclude that recently released high-resolution elevation datasets combined with reliable data of 37 river channel geometry may greatly contribute to improve future versions of continental-scale 38 flood hazard maps. The database is available for download at 39 http://data.europa.eu/89h/1d128b6c-a4ee-4858-9e34-6210707f3c81 (Dottori et al., 2020a).  146 The input hydrographs necessary for the flood simulations are derived from the LISFLOOD 147 streamflow dataset described in Section 2.1, following the approach proposed by Alfieri et al. 148 (2014). Streamflow data is available for the EFAS river network at 5 km grid spacing for rivers 149 with upstream drainage areas larger than 500 km 2 . For each pixel of the river network we 150 selected annual maxima over the period 1990-2016 and we used the L-moments approach to fit 151 a Gumbel distribution and calculate peak flow values for reference return periods of 10, 20, 50, 152 100, 200 and 500 years. Note that we also calculated the 30-and 1000-year return periods in 153 limited parts of the model domain to allow validation against official hazard maps, see Section 154 2.3. 155

Hydrological input of flood simulations
Subsequently, we calculate a Flow Duration Curve (FDC) from the long-term simulation. The 156 FDC is obtained by sorting in decreasing order all the daily discharges, thus providing annual 157 maximum values Q D for any duration i between 1 and 365 days. Annual maximum values are 158 then averaged over the entire period of data, and used to calculate the ratios ε i between each 159 average maximum discharge for i -th duration Q D(i) and the average annual peak flow (i.e. Q D = 160 1 day). Design flood hydrographs are derived using daily time steps. The peak value is given by 161 the peak discharge for the selected T-year return period Q T , while the other values Q i are 162 derived multplying Q T by the ratio ε i . The hydrograph peak Q T is placed in the centre of the 163 hydrograph, while the other values Q i are sorted alternatively to produce a triangular 164 hydrograph shape, as shown in Figure 2. high-resolution river network at the same resolution. Along this river network we identify 179 reference sections every 5 km along stream-wise direction, and we link each section to the 180 closest upstream section (pixel) of the EFAS 5km river network, using an partially automated 181 procedure to ensure a correct linkage near confluences. In this way, the hydrological variables 182 necessary to build the flood hydrographs can be transferred from the 5km to the 100m river 183 network. Figure 3 describes how the 5km and 100m river sections are linked using a conceptual 184 scheme. 185 Then, for every 100 m river section we run flood simulations using the 2D hydrodynamic model generally available online for consultation on Web-GIS services, only few countries and river 235 basin authorities make the maps available for download in a format that allows comparison with 236 geospatial data. Table 1 presents the list of flood hazard maps that could be retrieved and used 237 for the validation exercise, while Figure 1 shows their geographical distribution. Note that the 238 relevant links to access these maps are provided in the Data Availability section. 239 Even though more official maps are likely to become available in the near future, the maps here 240 considered offer an acceptable overview of the different climatic zones and floodplain 241 characteristics of the European continent. Conversely, we could not retrieve national or regional 242 flood hazard maps outside Europe, meaning the skill of the modelled maps could not be tested

Performance metrics and validation procedure 260
The national flood hazard maps listed in Table 1   We evaluate the performance of simulated flood maps against reference maps using a number of 291 where ∩ is the area correctly predicted as flooded by the model, and Fo indicates the 296 total observed flooded area. HR scores range from 0 to 1, with a score of 1 indicating that all 297 wet cells in the benchmark data are wet in the model data. The formulation of the hit ratio does 298 not penalize overprediction, which can be instead quantified using the false alarm ratio FAR: 299 where / is the area wrongly predicted as flooded by the model. FAR scores range from 0 301 (no false alarms) to 1 (all false alarms). Finally, a more comprehensive measure of the 302 agreement between simulations and observations is given by the critical success index CSI, 303 defined as: 304 where ∪ is the union of observed and simulated flooded areas. CSI scores range from 0 306 (no match between model and benchmark) to 1 (perfect match between benchmark and model). 307 It is well recognized that the quality of flood hazard maps strongly depend on the accuracy of 315 elevation data used for modelling (Yamazaki et al., 2017). This is especially crucial for 316 continental scale maps, since the quality of available elevation datasets is rarely commensurate 317 to the accuracy required for modelling flood processes [Wing et al., 2017]. Moreover, high-318 resolution and accurate elevation data such as LIDAR-based DEMs cannot be used for reasons 319 of consistency, given that these data are only available for few areas and countries. 320 3) Results and discussion 347 We present the outcomes of the validation exercise by describing first the general results at 348 country and regional scale in Section 3.1. Then, we discuss in the main text the outcomes for 349

Additional tests
England, Hungary and Spain (Section 3.2), while the Norway and Po river basin case studies are 350 presented in the Appendix C. We also complement the analysis with additional validation over 351 major river basins in England and Spain. In 3.1 Validation of modelled maps at national and regional scale 357 Table 3 presents the results of the validation for each testing area and return period. The 358 performance metrics are calculated using the total extent of the reference and modelled maps 359 with the same return period. The first visible outcome is the low scores for the comparisons 360 with reference maps with high probability of flooding, i.e. low flood return periods (<30 years). 361 Performances improve markedly with the increasing of return periods due to the decrease of 362 false alarm rate FAR, while the hit rate HR does not vary significantly. In particular, critical 363 success index (CSI) values approach 0.5 for the low probability flood maps, i.e., for return 364 periods equal or above 500 years. Considering that most of the reference flood maps include the 365 effect of flood defences (contrary to the modelled maps), these results suggest that the majority 366 of rivers in the study areas may be protected for flood return periods around 100 years or lower, 367 as indeed reported by available flood defence databases (Scussolini et al., 2016). Differences 368 between simulated and reference hydrological input are likely to influence the skill of modelled 369 flood maps. However, further analyses are difficult because we have no specific information on 370 the hydrological input used for the reference flood maps (e.g. peak flows, hydrograph shape). In agreement between modelled and observed hydrological regime, but this does not necessarily 373 translate to extreme values. High-probability floods are also sensitive to the method used to 374 reproduce river channels, and the simplified approach used in this study might underestimate 375 the conveyance capacity of channels (see Section 3.2.2 for an example). Finally, the better 376 performance for low-probability floods may also depend on floodplain morphology, where 377 valley sides create a morphological limit to flood extent. 378 3.2 Discussion of results at national and regional scale 383 The results in Table 3   Besides these results, the visual inspection of reference maps suggest that the underestimation is 413 partly caused by the high density of mapped river network in the reference maps, in respect to 414 modelled maps. Indeed, the modelling framework excludes river basins with an upstream basin 415 area below 500 km 2 , meaning that EFAS maps only cover main river stems but miss out several 416 smaller tributaries. This is clearly visible over the Severn and in the upper Thames basins 417

Hungary 431
The results in Table 3

Spain 455
The performance of the modelled maps in Spain show a fairly stable HR value and decreasing 456

FAR values with increasing return periods, similarly to what was observed for England and 457
Hungary. The analysis of the results for the major river basins of the Iberian Peninsula, reported 458 in Table 5 Table 6 and 7. For our 498 framework, we calculated each index in Table 6 using the overall modelled and reference flood 499 extent available for each return period (e.g. the value for the 100-year maps includes reference 500 and modelled maps for England, Spain and Norway). As such, each area is weighted according 501 to the extent of the corresponding flood map. 502 As can be seen in Table 6 However, this might result in a reduction of true false alarms, because part of overestimated 522 flood areas can go undetected. To verify this hypothesis, we recalculated the performance 523 indices against the 100-year reference map in Spain using a 1km buffer instead of the 5km 524 previously applied to constrain the validation area. As a result the false alarm ratio dropped reduction of true false alarms, especially in river basins with continuous map coverage such as 527  Sampson et al. (2015). Metrics for the latter study are calculated 537 removing all channels with upstream areas of less than 500 km2. 538

539
The different masking applied to reference flood maps may explain some of the differences:  Table 3. As can be seen, differences are generally reduced across 560 the different areas and return periods. Version 1 of the flood maps produced slightly better 561 results in Hungary for the 100-and 1000-year return period (increased CSI and HR, lower 562 FAR), while version 2 has somewhat improved performances in England, mainly driven by 563 higher HR. 564 These outcomes may be interpreted considering the changes in input data between the two 569 versions, and the structure of the modelling approach and of input data, which in turn has not 570 changed substantially. The main difference between the two map versions is given by the 571 hydrological input, with version 2 using the latest calibrated version of the LISFLOOD model. 572 For the 100-year return period, peak flow values of version 2 are on average 35% lower than 573 version 1 in Hungary, and 16% lower in England. However, similar decreases are also observed 574 for the 1-in-2-year peak discharge which determines bankful discharge. The resulting reduction 575 in channel hydraulic conveyance in respect to version 1 is likely to offset the decrease of peak 576 flood volumes, which explain the small difference in overall flood extent given by the F2/F1 577 parameter in table 7. Such result confirm that the knowledge of river channel geometry is 578 crucial to correctly model the actual channel conveyance and thus improve inundation 579 modelling. Other differences in input data are given by minor changes in Manning's parameters 580 and in the EFAS river network, which might contribute to the observed differences. 581 582 3.5 Influence of elevation data 583    Awareness System (EFAS), and will be used to perform operational flood impact forecasting in 616 EFAS (Dottori et al., 2017). 617 We performed a detailed validation of the modelled flood maps in several European countries 618 against official flood hazard maps. The resulting validation exercise is the most complete 619 undertaken so far for Europe to our best knowledge, and provided a comprehensive overview of flood maps outside Europe did not allow any validation in the arid regions in North Africa and 622 Eastern Mediterranean. In these areas, further research will be needed to better understand the 623 performance of the flood mapping procedure here proposed. Modelled maps generally achieve 624 low scores for high and medium probability of flooding. For the 1-in-100-year flood 625 probability, the modelled maps can identify on average two-thirds of reference flood extent, 626 however they also largely overestimate flood-prone areas in many regions, thus hampering the 627 overall performance. Performances improves markedly with the increasing of return period, 628 mostly due to the decrease of the false alarm rates. In particular, critical success index (CSI) 629 values approach and in some cases exceed 0.5 for return periods equal or above 500 years, 630 meaning that the maps can correctly identify more than half of flooded areas in the main river 631 stems and tributaries of different river basins. 632 It is important to note that the validation was affected by problems in identifying the correct 633 areas for a fair comparison, because of the different density of the mapped river network in 634 reference and modelled maps. In our study we opted for a conservative approach using large 635 buffers to constrain comparison areas, which possibly penalized the model performance, e.g. 636 due to spurious false alarms in areas not considered by official maps. However, we observed 637 that the proposed maps achieve comparable results to other large-scale flood models when using 638 similar parameters for the validation. 639 The low skill of modelled maps for high and medium probability of flooding, with large 640 overestimations observed in different lowland areas, is mostly motivated by the non-inclusion 641 of flood defences in the modelling framework and the simplified representation of channel 642 hydraulic conveyance, due to the absence of datasets at European scale describing river 643 channels and defence structures (i.e. design standards and location of dyke systems). Such 644 information combined with high-resolution DEM fed with local-scale information (artificial and 645 defence structures) is crucial to improve the performance of large-scale flood models and apply 646 more realistic flood modelling tools, as observed also by Wing et al (2017Wing et al ( , 2019b. On this 647 point, we found that the modelling approach has limited sensitivity to changes in the 648 hydrological input, because channel conveyance is linked to streamflow characteristics. Such 649 finding highlight the need for independent data of river channel width, shape and depth to better 650 reproduce streamflow and flooding processes. Moreover, the improved results offered by the The official flood hazard maps used for the validation exercise are freely accessible at the 677 following websites: 678  Spain: https://www.miteco.gob.es/es/cartografia-y-sig/ide/descargas/agua/zi-lamina.aspx 679 (in Spanish) 680  Norway: https://www.nve.no/flaum-og-skred/kartlegging/flaum/ (in Norwegian) 683  England: https://data.gov.uk/dataset/bed63fc1-dd26-4685-b143-2941088923b3/flood-684 map-for-planning-rivers-and-sea-flood-zone-3 ; https://data.gov.uk/dataset/cf494c44-685 05cd-4060-a029-35937970c9c6/flood-map-for-planning-rivers-and-sea-flood-zone-2 686  Figure B1 suggest an 758 acceptable hydrological skill of the LISFLOOD calibration in Norway, with a majority of gauge 759 stations scoring KGE values above 0.5. In the areas with lower scores, the model performance 760 for low-probability flood events might be influenced by an incorrect estimation of peak