27 May 2021

27 May 2021

Review status: a revised version of this preprint is currently under review for the journal ESSD.

INSTANCE – the Italian seismic dataset for machine learning

Alberto Michelini1, Spina Cianetti2, Sonja Gaviano4,2, Carlo Giunchi2, Dario Jozinovic1,3, and Valentino Lauciani1 Alberto Michelini et al.
  • 1Istituto Nazionale di Geofisica e Vulcanologia, via di Vigna Murata, 605, 00143 Rome, Italy
  • 2Istituto Nazionale di Geofisica e Vulcanologia, via Cesare Battisti, 53, Pisa Italy
  • 3Unversità degli Studi Roma Tre, Largo San Leonardo Murialdo 1, Rome, Italy
  • 4Unversità degli Studi di Firenze, Via La Pira 4, Firenze, Italy

Abstract. The Italian earthquake waveform data are here collected in a dataset suited for machine learning analysis (ML) applications. The dataset consists of near 1.2 million three-component (3C) waveform traces from about 50,000 earthquakes and more than 130,000 noise 3C waveform traces, for a total of about 43,000 hours of data and an average of 21 3C traces are provided per event. The earthquake list is based on the Italian seismic bulletin ( of the ``Istituto Nazionale di Geofisica e Vulcanologia'' between January 2005 and January 2020 and it includes events in the magnitude range between 0.0 and 6.5. The waveform data have been recorded primarily by the Italian National Seismic Network (network code IV) and include both weak (HH, EH channels) and strong motion recordings (HN channels). All the waveform traces have a length of 120 s, are sampled at 100 Hz, and are provided both in counts and ground motion units after deconvolution of the instrument transfer functions. The waveform dataset is accompanied by metadata consisting of more than 100 parameters providing comprehensive information on the earthquake source, the recording stations, the trace features, and other derived quantities. This rich set of metadata allows the users to target the data selection for their own purposes. Many of these metadata can be used as labels in ML analysis or for other studies. The dataset, assembled in HDF5 format, is available at (Michelini et al., 2021).

Alberto Michelini et al.

Status: final response (author comments only)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on essd-2021-164', Martijn van den Ende, 03 Jul 2021
    • AC1: 'Reply on RC1', Alberto Michelini, 06 Jul 2021
  • RC2: 'Comment on essd-2021-164', John Clinton, 26 Jul 2021
    • AC2: 'Reply on RC2', Alberto Michelini, 16 Aug 2021

Alberto Michelini et al.

Data sets

INSTANCE The Italian Seismic Dataset For Machine Learning Michelini, A., Cianetti, S., Gaviano, S., Giunchi, C., Jozinovic, D., Lauciani, V.

Alberto Michelini et al.


Total article views: 653 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
442 181 30 653 8 6
  • HTML: 442
  • PDF: 181
  • XML: 30
  • Total: 653
  • BibTeX: 8
  • EndNote: 6
Views and downloads (calculated since 27 May 2021)
Cumulative views and downloads (calculated since 27 May 2021)

Viewed (geographical distribution)

Total article views: 562 (including HTML, PDF, and XML) Thereof 562 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
Latest update: 19 Sep 2021
Short summary
We present a dataset consisting of seismic waveforms and associated metadata to be used primarily for seismologically oriented machine learning (ML) studies. The dataset includes about 1.3 M three-component seismograms of fixed 120 s length, sampled at 100 Hz and recorded by more than 600 stations in Italy. The dataset is subdivided into seismograms deriving from earthquakes (~1.2 M) and from seismic noise (~130,000). The ~54,000 earthquakes range in magnitude from 0 to 6.5 from 2005 to 2020.