INGe: Intensity-ground motion dataset for Italy

In this paper we present an updated and homogeneous earthquake data set for Italy compiled by joining the Italian Macroseismic Database DBMI15 and the Engineering Strong-Motion (ESM) accelerometric data bank. The database has been compiled through a extensive procedure of selection and revision based on two main steps: 1) the removal of several earthquakes in DBMI15 because the data source has been considered to be largely unreliable and 2) the extraction of all the localities reporting intensity data which are located within 3 km from the accelerograph stations that recorded the data. 5 The final data set includes 323 recordings from 65 earthquakes and 227 stations in the time span 1972-2016. The events are characterized by magnitudes in the range 4.0-6.9 and depths in the range 0.3-45.0 km. Here, we illustrate the data collection and the properties of the database in terms of recording, event and station distributions as well as Mercalli-Cancani-Sieberg (MCS) macroseismic intensity points. Furthermore, we discuss the most relevant features of engineering interest showing several statistics with reference to the most significant metadata (such as moment magnitude, several distance metrics, style 10 of faulting etc). The data set can be downloaded from data repository Zenodo at https://doi.org/10.13127/inge.1 (Oliveti et al., 2020).


Compilation of the data set
The ESM flatfile (Lanzano et al., 2018) is a parametric table which contains strong motion data and associated reliable metadata of manually processed waveforms related to the ESM database. This flatfile was built within the Thematic Core Service for Seismology of the project EPOS (European Plate Observing System, .epos-eu.org). The data set for flatfile compilation includes 2,179 earthquakes recorded by 2,080 stations from Europe and Middle-East during 1969-2016 and originated in 70 different tectonic environments, whose magnitude ranges from 4.0 to 8.0. The magnitude is computed from moment tensor solutions provided by different sources [e.g., the Regional Centroid Moment Tensor, RCMT, and the Time Domain Moment Tensor, TDMT, catalogues (Pondrelli et al., 2006;Scognamiglio et al., 2009) selected according to an hierarchical schema that firstly privileges earthquake-specific studies, secondarily the Moment Tensor solution (Ekström et al., 2012) and, finally, the regional or international bulletins, such as the International Seismological Centre (ISC)]. Strong motion intensity measures 75 consist of peak and integral parameters and duration of each waveform. The periods at which the spectral amplitudes of the (5% damping) acceleration and displacement response are computed range 0.01-10 s, whereas the amplitudes of the Fourier spectrum for the frequency range 0.04-50 Hz. The site classification is based on the average shear wave velocity in the uppermost 30 meters (V S30 ), according to the Eurocode 8 (Code (2005)) categorization scheme. V S30 values are obtained from in situ geophysical measurements, where available, or derived from geology maps. In addition, an estimation of V S30 is provided 80 using the empirical correlation with the topographic slope by Wald and Allen (2007). Furthermore, the flatfile includes the epicentral distance for all the records and, when the fault geometry is available, the Joyner-Boore distance. More details on the structure and organisation of the flatfile are discussed by Lanzano et al. (2019). To compile our database, we have extracted from the ESM flatfile the event and station information, magnitude estimates, distance measurements, style of faulting, maximum among the two horizontal components of peak ground acceleration (PGA), peak ground velocity (PGV), and peak 85 response spectral acceleration amplitudes (at 0.3, 1.0, and 3.0 s). The choice of the periods is based on those used in ShakeMap (Wald et al., 1999;Worden et al., 2020). The latest version of the Italian Macroseismic Database, DBMI15 (Locati et al., 2019), includes 122,701 Macroseismic Data Points (MDPs) related to 3,212 earthquakes. DBMI uses a specific and continually updated gazetteer related to the whole Italian territory. In the gazetteer each record is associated to a locality, with place name, an identifier, and other useful information. As explained in Locati et al. (2019), the term "locality" equally refers to either 90 region, province, or county capitals, and to variously sized hamlets, towns, or cities. The gazetteer ensures the correspondence between the place name of a locality and a pair of geographical coordinates matching the average macroseismic intensity value of a more or less large area with a point. Sometimes the available data are not detailed enough to evaluate the intensity with a high degree of validity, and such an uncertainty is represented with a range (e.g. 6.0-7.0, 7.0-8.0); in this case, we adopted the average value of the range. However, in other cases the uncertainty is considered so high that DBMI15 adopts one of the available non-conventional descriptive values (e.g. "HD", "D", or "F"). In particular, "F" correspond to class 4.0, "HF" correspond to class 5.0; "SD" correspond to class 5.5; "D" correspond to class 6.5; "HD" correspond to class 8.5, as described in Locati et al. (2019). MDPs collected and organized in DBMI15 come from studies of different authors and institutions, such as Macroseismic Bulletin, online databases (e.g., CFTI4Med; Guidoboni et al., 2007 and reports specifically prepared for updating the data set content. To generate our dataset, we have extracted all the MDPs from DBMI15 corresponding to 100 earthquakes listed in the ESM flatfile and that have not been listed in the Macroseismic Bulletin since this latter data source has been proved to be largely unreliable. Then, we have made a cross-matching between ESM and DBMI15 data sets. In order to pair intensity and PGM values, we have chosen a distance criteria, common to most of the studies in this field (e.g., Caprio et al., 2015;Locati et al., 2017) whereas, recently, Gomez-Capera et al. (2020), to warrant similarity in terms of site response, correlated these observations keeping a threshold distance of about 3 km and also through the check of geological and topo-105 graphic conditions match. In our work we have selected the localities reporting intensity data which are located within 3 km from the strong motion stations that recorded the data. In some cases we have found that the same recording stations could to be paired to different intensity points and, according to our selection criteria, we have chosen the closest ones. Noteworthy and in addition to the previous data sets compiled by Faenza and Michelini (2010) and Faenza and Michelini (2011), this new assembled data set includes intensity-PGM pairs at intensity levels larger than 8.0, thanks to the inclusion of recent earthquake 110 data.

Data and metadata
The information that was included for the characterisation of each data point in the dataset can be summarised as follows: -Earthquake source parameters: primary ESM and INGV event ids, date and time of occurrence (origin time), hypocentral coordinates (geographical coordinates and depth), style of faulting (SoF), magnitude (moment-Mw or local-ML).

115
The moment magnitude is available for 88% of the data. Local magnitude ML is used when Mw is not provided. In the following, we will refer to a generic "Magnitude" or "M" matching either Mw or ML according to the above described procedure; -Station information: network and station code, and location of the receiver; EC8 class, measured and calculated V S30 from the ESM flatfile, and extracted V S30 from the V S30 grid adopted by ShakeMap ; 120 -Distance measurements: epicentral distance, R EP I , azimuth and finite-source distance measure related to fault geometry R JB , distance between the selected MCS points and the strong motion stations. The Joyner-Boore distance is available for 60% data. Epicentral distance is used when R JB ) is not provided. In the following, we will refer to a generic "Distance" matching either R JB or R EP I according to the above described procedure. -macroseismic data: MCS values located within 3 km from the stations (referred to a locality, with place identifier and name and a pair of geographical coordinates) and extracted V S30 from the V S30 grid adopted by ShakeMap .
Moving to the description of the data taken from DBMI15, first of all we note that our data set, which contains 323 macro- Our dataset includes 65 earthquakes (Fig. 3) that occurred between June 1972 and October 2016. The earthquakes in the data set were recorded by 227 stations (Fig. 4) located at distances within 300 km from the earthquakes. The epicentral locations showed in Fig. 4 shows that the geographical distribution of the events reflects the pattern of Italian seismicity.

140
The distribution of MDPs in km distance and V S30 is illustrated in Fig. 5, where site-to-MDP distance is defined as the closest distance between the MDPs and the strong motion stations that recorded the data. As shown, the site-to-MDP distances range from 0.013 km to 2.938 km. The V S30 values are extracted from the V S30 grid adopted by ShakeMap . This figure evidences that most intensity-PGM pairs do not distance more that 1 km and are rather homogeneously distributed in terms of V S30 .

145
Relative position between earthquakes and recording stations are expresses through the event-to-station azimuth (degrees) and the distance (km). The event distribution in km distance and azimuth is reported in Fig. 6. The data set doesn't contain earthquake signals arriving at receiver from all azimuths; there are gaps in the azimuthal coverage along the axis north-east south-west at larger distances. This is mainly due to the geographical setting of the Italian peninsula. The distribution of the source to site distance in km is given in Fig. 7. We observe that most of the recordings were acquired within 80 km from the 150 epicenter. In Fig. 7 R JB if available otherwise R EP I .   (Fig. 9a). A good part of the data are also in the magnitude range 6.0-7.0, due to the contribution of the previously mentioned events. The following features of the events are also considered: focal depth and focal mechanisms. The distribution of earthquakes focal depths (Fig. ??b) indicates that seismicity is concentrated in the upper 30 km of the crust, corresponding to about 94% of the total events. Looking at the comparison in terms of focal mechanisms in Fig. 9c, the Normal Faulting (NF) earthquakes are prevalent (40%). Conversely, 20% and 17% The magnitude-distance distribution of our data set is given in Fig. 10, grouped by style of faulting. The Joyner-Boore distance (R JB ) is relevant only for events with M > 5.5 and is available for 194 records. Data are quite well sampled for distance between 10 and 100 km. Looking at the focal mechanisms distribution in Fig. 10, the normal faulting style is predominant for strong events with magnitude comprises between 6.0 and 6.9.

165
The histograms of the data points with reference to the EC8 site class (Code (2005)) derived from different V S30 sources at all stations are shown in Fig. 11. Direct average shear wave velocity V S30 values obtained from geophysical investigations are available for 89 stations, corresponding to 39% of the recording sites (dark gray bars in Fig. 11). In the other cases, the site category was estimated on the basis of the V S30 calculated from topographic slope (light gray bars in Fig. 11) according to Wald and Allen (2007) or on the basis of the V S30 extracted from the V S30 grid adopted by ShakeMap  170 (smokewhite bars in Fig. 11). Apart from the unreliable site classification determined from the incomplete V S30 measured data set, the majority of the recording stations are classified as class EC8-B in the both remaining cases (about 58% and 76% ,respectively). Instead, the second numerous class is EC8-A (31%) for calculated V S30 and EC8-C (14%) for extracted V S30 data points. No stations are classified as EC8-D and EC8-E (Fig. 11). Fig. 12 shows the distribution of the strong motion and MCS intensity data versus distance grouped by style of faulting.

175
Overall, the database is quite well distributed although we note that only two data-points are related to stations with distances > 200km and there are few intensity data at closer distances for small intensity values (i.e. in the range 3 ≤ MCS ≤ 3.5).
This follows from the DBMI15 data being compiled for damaging events (i.e. medium-large magnitude earthquakes producing macroseismic damage; e.g. Allen and Wald, 2009). Also the removal of several earthquakes, whose source has proved to be largely unreliable, affects the number of intensity data when the distance is very low. However, Fig. 12 also illustrates the 180 relevant number of MDPs with moderate intensities, and in particular between 4 and 5. The increase, in comparison to the previous DBMI releases, is due to the inclusion of many moderate energy earthquakes, in particular after the 19th century (Locati et al., 2019). Looking at the focal mechanisms distribution in Fig. 12, the normal faulting style is predominant for high peak ground motion values.

185
A new Italian dataset has been compiled comprising 65 events of magnitude 4.2-6.9 that occurred from the year 1972 through 2016 for a total of 323 pairs of macroseismic and ground motion parameters. The data set can be used as reference to benchmark studies seeking correlations between ground motion parameters and MCS macroseismic intensities.
Much effort has been invested in its compilation to identify the earthquakes to be included with the goal of providing an updated and more reliable version of the data set compiled initially by Faenza and Michelini (2010) and Faenza and Michelini 190 (2011). The work required (i) the intersection of two different sources, the DBMI15 intensity database (Locati et al., 2019) and the ESM accelerometric data bank (Lanzano et al., 2018), (ii) the removal of several earthquakes because the data source was considered unreliable and (iii) the selection of only the closest localities reporting intensity data which are located within 3 km from the recording stations. In addition to the PGM and macroseismic intensity data pairs, each datum includes earthquake information (e.g. origin time, 195 depth, magnitude, magnitude type, focal mechanism, etc), and recording station information (e.g. station code and location of the receiver, EC8 site class attribution, V S30 values), and distance measurements.
The data collected can be used for development and testing of Ground Motion Intensity Conversion Equations (GMICE) and Intensity Prediction Equations (IPE). These both are important for seismic hazard studies and for the calculation of ShakeMaps.
Overall, the publication of this data set is expected to promote the adoption of best practices and to accelerate research 200 progress.