Enriching the GEOFON seismic catalog with automatic energy magnitude estimations

. We present a seismic catalog (Bindi et al.,


Introduction
Several magnitude scales have been defined to characterize the size of an earthquake.We can, however, divide magnitude scales in two groups, with one including magnitudes based on the amplitudes and periods of different seismic phases measured on band-limited signals (e.g., the body and surface wave magnitudes; Gutenberg, 1945a, b) and the other including magnitude scales related to estimations of macroscopic physical parameters of the earthquake source.The latter comprise the moment (M w ; Kanamori, 1977;Hanks and Kanamori, 1979) and the energy (M e ; Boatwright and Choy, 1986) magnitudes, which are based on seismic moment (Aki, 1966) and radiated seismic energy (Haskell, 1964), respectively.These two magnitude scales are somewhat complementary because, although both represent an estimation of earthquake-related energy, they are determined by different parts of the source spectrum.The seismic moment extrapolated from the low-frequency end and represents the release of elastic energy stored in the Earth's crust or mantle being proportional to the integrated slip across the fault surface.The radiated seismic energy describes the fraction of the total energy released being radiated as seismic waves across all frequencies; i.e., it depends on the earthquake dynamics such D. Bindi et al.: GEOFON energy magnitude catalog as rupture velocity but also stress drop.M e estimates have been shown to play an important role when used in conjunction with M w to better characterize the tsunami and shaking potential of an earthquake (Newman and Okal, 1998;Di Giacomo et al., 2010).
M w is routinely computed from long period signals of broadband recordings, and it has become a robust and reliable source parameter for large and moderate earthquakes worldwide (Di Giacomo et al., 2021).On the other hand, the computation of M e is hindered by the necessity of integrating the velocity power spectra over a wide frequency range, while using signals in a limited bandwidth and taking into account propagation effects at high frequencies.
Aiming at validating and testing the procedures for operational purposes, we present a seismic catalog of M e computed, following the methodology proposed by Di Giacomo et al. (2008) and Di Giacomo et al. (2010) for the rapid assessment of energy magnitude (i.e., without requiring additional source information other than the hypocentral location).The approach is based on the analysis of spectra computed for teleseismic vertical component P waveforms.Teleseismic P waves are commonly used to compute M e for global earthquakes, as their energy loss during propagation can be more reliably modeled compared to S waves.We further present a detailed analysis of the residuals in a reduced high-quality catalog for events with M w ≥ 5.8 with respect to the M w available in the GEOFON (GEOFOrschungsNetz) catalog and the M e values computed by IRIS.

Single-station estimation
We implement the methodology proposed by Di Giacomo et al. (2008) andDi Giacomo et al. (2010) to compute M e .Teleseismic vertical component P waveforms (BHZ channels, where B stands for broadband, H for high-gain seismometer, and Z for vertical component) are analyzed in the distance range from 20°to 98°and for earthquakes shallower than 80 km.Standard teleseismic range usually starts at 30°, but we use 20°to allow closer stations to be used for rapid response purposes.Shorter distances, however, are difficult to include for global earthquakes, as regional effects are not well accounted for with a 1-D model.Propagation effects are accounted for by frequency-dependent amplitude decay functions that are computed numerically (Wang, 1999) for the ak135Q model (Kennett et al., 1995;Montagner and Kennett, 1996) in the frequency range 0.012-1 Hz.
An estimate of radiated seismic energy E s is obtained for each single station from the integral of the power spectra of the vertical component P waveform, which is corrected for propagation effects (Haskell, 1964): where α, β, and ρ are the P-wave velocity, S-wave velocity, and the density at the source, respectively.f is the frequency, and f 1 = 0.012 Hz and f 2 = 1 Hz are the lower and upper limits of the considered spectral bandwidth.u(f ) is the Pwave velocity spectrum.G(f ) is the median value of Green's function spectrum for displacement, computed from multiple combinations of focal mechanisms and varying strike, dip, and rake over a regular grid (Di Giacomo et al., 2008).We used analysis windows starting 10 s before the P arrival and with lengths of 90 s for M w ≤ 7.5, 120 s for 7.5 < M w ≤ 8.5, and 180 s for M w > 8.5.The energy magnitude M e estimate for a single-event station pair is in turn computed as M e = 2/3(log 10 E s − 4.4), with E s given in joules (Bormann et al., 2002).The procedure provides M e estimates at each recording station that can be averaged to minimize path-specific deviations not accounted for by the theoretical model (e.g., directivity and focal mechanism effects and regional variations in attenuation).

Open-source tool for computing M e
The above procedure is implemented in the package me-compute (Zaccarelli, 2023).The program uses stream2segment (Zaccarelli et al., 2019;Zaccarelli, 2018) to download events, station metadata, and waveforms from International Federation of Digital Seismograph Networks (FDSN)-compliant repositories in a structured query language (SQL) database.
In our application, the download is configured to fetch events from the GEOFON (Quinteros et al., 2021) event web service, selecting events with computed M w in the time span 2011-2023.Waveforms are download from EIDA (Strollo et al., 2021) and IRIS (https://service.iris.edu/,last access: March 2024) data centers.The processing routine is implemented in a Python module, which computes the station energy magnitude for each downloaded waveform segment, as summarized in Sect.2.1, and then calculates the event energy magnitude M e as the mean of all station magnitudes within the 5th-95th percentile range.
The final output consists of the following files: a tabular file in a hierarchical data format (HDF), where each row represents the metadata and measurements, specifically also the station energy magnitude estimate, for a single waveform; a tabular file in a comma-separated value (CSV) format aggregating the results of the previous file, where each row represents a seismic event, reporting the event data end metadata and including the M e estimate for the event; an HTML (HyperText Markup Language) file visualizing the selected content reported in the CSV file, where the information for each event can be visualized on an interactive map; and one file per processed event in the Quake Markup Language (QuakeML) format, where we also included the M e value.

Catalog compilation
We use me-compute to compute M e for M w ≥ 5 earthquakes since 2011 in the GEOFON catalog.Table 1 summarizes the steps followed to compile the disseminated M e catalog.The catalog reports the single-waveform energy magnitude M eij estimated at station j for earthquake i.The energy magnitude M e for each considered event i is then computed as the median of M eij over the set of recording stations, without considering station static corrections.The starting dataset D0 consists of more than 1 000 000 waveforms (channels BHZ) generated by 6963 earthquakes recorded by 7765 stations belonging to 246 different networks.Only recordings with an average signal-to-noise ratio (SNR) for the amplitude greater than 3 within the frequency range of interest are included in D0.Several integrity and quality checks are applied to remove outliers and faulty signals.Dataset D1 is obtained by analyzing the median residual at the network level, discarding 14 networks characterized by median residuals outside the 2.5 and 97.5 percentile range (Fig. 1a).Dataset D2 is then generated by analyzing the station median values and excluding 382 stations with residuals outside the 2.5-97.5 percentile range (Fig. 1b).Most of the networks and stations removed will have instrumental problems or faulty metadata regarding instrument responses, although in some cases stations with very strong site effects might also be excluded.
An anomaly score is computed to further refine the dataset by flagging anomalous amplitudes using the software sdaas (Zaccarelli, 2022).The software, developed from the work of Zaccarelli et al. (2021), is based on a machine learning algorithm specifically designed for outlier detection (Isolation Forest) which computes an anomaly score in [0, 1], representing the degree of belief of a waveform to be an outlier.The score can be used to assign robustness weights or to define thresholds above which data can be discarded.After inspecting the distribution of the anomaly scores, we set the threshold to 0.62 for M w < 7.5 and to 0.80 for M w ≥ 7.5.
The spatial distribution of events and stations generating dataset D3 (Bindi et al., 2024) are shown in Fig. 2a, b.The corresponding M e residuals are shown in Fig. 3 against distance and M w .The largest positive residuals correspond mostly to earthquakes with M w < 6 recorded at distances > 60°, where the implemented methodology is expected to generate biased station M e estimates due to the limitations in the analyzed bandwidth and low signal-to-noise ratio (Di Giacomo et al., 2008, 2010).The overall residual distribution is unbiased and does not show trends of the mean value with distance and magnitude.
Therefore, we further limit the dataset by only considering events with M w ≥ 5.8 and at least 10 single-station measurements; we further exclude stations with fewer than 10 recordings in total.We added a column in the disseminated D3 dataset to flag lines corresponding to D6.It consists of ∼ 750 000 waveforms for 1671 earthquakes and 7135 stations.The event and station locations of D6 are shown in Fig. 2c and d.

Quality assessment via residual analysis
We perform residual analysis to validate the D6 catalog.The relationship between M e and M w is analyzed by performing the following mixed-effects regression (Bates et al., 2015;Stafford, 2014): where M eij is the single-waveform energy magnitude estimate at station j for earthquake i, M wi is the moment magnitude of earthquake i, intercept c 1 and slope c 2 parameters define the median model, and δS i and δE j are terms that capture station-specific and earthquake-specific adjustments, respectively.ij accounts for the leftover effects (i.e., residuals that are specific to a particular path/waveform).The random effects δS, δE, and are zero mean normal distributions by construction.In particular, δS j (inter-station residual) can represent site effects or instrumental gain corrections, with most of the latter probably removed by the outlier filtering stages described above.The interevent residual δE i is an event-specific deviation from the M e expected for a given M w from the linear regression term.Finally, ij can be thought of as a noise term for individual measurements, which can be either related to path-specific heterogeneity in attenuation with respect to the 1-D reference model or the influence of ambient noise on the actual measurement.
The interevent and inter-station term distributions are shown in Fig. 4, which are described by standard deviations of τ = 0.246 and φ S = 0.188 m.u., respectively; the standard deviation of the is φ 0 = 0.232 m.u.When combining the interevent variability τ with the intra-event variability equal to φ = φ 2 0 + φ 2 S , we obtain the total standard deviation σ = τ 2 + φ 2 = 0.407, which represents the variability in the single-station M eij residuals with respect to the average M e computed per event.It is worth noting that the δS j values can be used as station corrections to compute the energy magnitude of new events.In this case, the inter-station contribution https://doi.org/10.5194/essd-16-1733-2024 Earth Syst.Sci.Data, 16, 1733-1745, 2024   1), respectively.Panels (c) and (d) show event and station locations for dataset D6 (Table 1), respectively.
to the total variability is removed, and the expected variability in the M eij distribution is reduced to τ 2 + φ 2 0 = 0.338.Finally, the linear regression model is defined by the coefficients c 1 = (0.77 ± 0.09) m.u. and c 2 = (0.92 ± 0.01).Considering the simplicity of the linear model in Eq. ( 2) and the large dataset analyzed, the uncertainty in the median model (sometimes referred to as σ µ ; Atik and Youngs, 2014) is very low, increasing from 0.007 for M w = 6 to 0.039 for M w = 9.
We show the spatial distribution of δS in Fig. 5. Since M eij is computed considering spectral values below 1 Hz and using teleseismic recordings for distances above 20°, δS capture station-specific effects connected to large-scale geological and tectonic crustal features, as exemplified in Fig. 5b for stations located in Europe.Positive δS (i.e., M eij larger than  the median) are observed for stations located in basins like in the Po plain, in the Moesian region, in the Netherlands, and in the East Anatolian Fault region.Negative values δS (i.e., M eij lower than the median) are observed for stations located in mountain ranges such as the Pyrenees, the Alps, or in Harz Highlands and also tectonically highly active regions that are as cratonic as the East African Rift.The station terms can represent both site amplification, e.g., for stations in sedimentary basins, and anomalously high or low attenuation in the crust and or mantle surrounding the station.The station-specific residuals are disseminated along with the catalog to allow the computation of M e for future earthquakes, while taking into account static magnitude corrections to reduce variability.
The spatial distribution of the interevent variability, δE, is shown in Fig. 6 for the smallest and largest values.
Considering depths shallower than 30 km (Fig. 6a and b), continental Asia, the Philippines, and Indonesia, as well as the Aleutian Islands, show positive values.California, Mexico, central America, and the Mid-Atlantic Ridge are characterized mostly by negative values.Considering deeper events (Fig. 6c and d), Japan and the Philippines have mostly positive values, while Mexico and central America have mostly negative values.The event-specific residuals are also disseminated, along with the catalog to increase the usefulness of the product from the event point of view, to allow the user to perform further refinements.
Path-specific residuals are shown in Fig. 7 for three selected receiving areas in Europe, California, and Australia.Since in the partition of the residuals the leftover distribution represents the component not related to systematic station and event effects, they are mostly connected to lateral variability in attenuation in the Earth's interior with respect to the used global 1-D model and amplitude variation related to P-wave radiation patterns for different focal mechanisms.
Finally, the M eij versus M w scaling defined by the linear regression coefficients c 1 and c 2 of Eq. ( 2) is shown in Fig. 8.

Catalog validation: comparison with IRIS
The energy magnitude computed in this study is compared to the values disseminated by IRIS through the SPUD service IRIS DMC (2013).The methodology implemented by IRIS is described by Convers and Newman (2011) and based on the analysis of Boatwright and Choy (1986) and Newman and Okal (1998).Similar to our approach, the energy flux is computed from the P-wave group (P + pP + sP) in the frequency domain.The single-station estimations are corrected for frequency-dependent anelastic attenuation effects and converted back to the energy radiated by the source by applying corrections for geometrical spreading, depth, and mechanism-dependent effects for P waves and considering a theoretical partition of the energy between P and S waves.The energy is computed considering the frequency https://doi.org/10.5194/essd-16-1733-2024 Earth Syst.Sci.Data, 16, 1733-1745, 2024  For the magnitude range from 6 to 8, this results in biases of 0.06 m.u. for M e vs. M e (HF) and varying from 0.17 to −0.04 m.u. for M e vs. M e (BB); i.e., our estimates are nearly unbiased relative to M e (HF) and tend to slightly overestimate M e (BB) at the lower end of the applicability range.

Catalog validation: role of style of faulting
The faulting style is classified into normal, reverse, and strike slip categories, based on the plunge of the P, T, and N axes (Frohlich and Apperson, 1992), as extracted from the GEO-FON moment tensor solutions.There is a normal fault (NF) if the plunge (P ) ≥ 60°, a strike slip (SS) if the plunge (N ) ≥ 60°, and a thrust fault (TF) if the plunge T ≥ 50°.In the other cases, the earthquake is labeled with OF (other faulting styles).To investigate the role of the style of faulting (SOF), we separate the event term into a fixed offset for each SOF class and a perturbation term for each event.If we indicate the classes of the SOF grouping factor (corresponding to NF, SS, TF, and OF) with k = 1, 2, 3, 4 and the class of event i with k i , then the equation for the extended mixed-effects model is where δSOF are the terms characterizing the average effects of the different SOF and δE SOF are accounting for interevent differences within each SOF class (nested random effects).The standard deviations of the δS, δSOF, δE SOF , and distributions are φ S = 0.190, τ SOF = 0.095, τ = 0.236, and φ 0 = 0.232, respectively, generating a total standard deviation σ = 0.393.The SOF terms are δSOF 1 = 0.098 (NF), δSOF 2 = −0.108(SS), δSOF 3 = −0.045(TF), and δSOF 4 = 0.055 (OF) (Fig. 10).The largest difference is between SS and NF, a total of 0.206 m.u.There is a systematic impact of the SOF on the intercept of the model, but the associated variability is smaller compared to the interevent variability τ (in other words, SOF effects are statistically significant, but the distributions of interevent terms separated according to faulting style are strongly overlapping).
The SOF effects might arise due to physical differences (on average) between the different faulting types, e.g., due to systematically different stress drops, differences in the maturity of faults or typical environments (intra-plate vs. interplate), where different faulting types occur most often, or they might be artifacts due to the fact that the (Di Giacomo et al., 2008) method used here does not account for radiation pattern effects, and the teleseismic arrivals utilized here preferentially sample certain parts of the focal sphere.Therefore, we also investigate the role of the SOF in the relationship behttps://doi.org/10.5194/essd-16-1733-2024 Earth Syst.Sci.Data, 16, 1733-1745, 2024  tween M e derived in this study and the M e (HF) and M e (BB) values disseminated by IRIS.We recall that the methodology implemented by IRIS accounts for radiation pattern effects, which are related to the SOF.For this analysis, the regression model is the following: where M iris is either M e (HF) or M e (BB).Results shown in Fig. 11 confirm that the largest intercept difference is between normal and strike slip events, and the differences in terms of magnitude units (m.u.) are also similar between the other SOF.This suggests that a large part of the SOF term is influenced by radiation pattern effects, and interpretations of these differences in terms of geodynamics or hazard potential should be done very cautiously.March 2024), which occurred on 7 December 2021 10:28:00.3UTC (M e 5.7 and M w 5.5).The scmert add-on is available at https://github.com/SeisComP/scmert(last access: March 2024).The add-on has been configured in GEOFON to trigger the calculation for each origin created by the automatic processing with magnitude ≥ 5.5 and to compute station magnitudes M eij for all stations according to the definition of M e in the distance 20-98°.The scmert procedure is applied with the settings used by the GEOFON Earthquake Information Service, using stations available in real time from the GEOFON Extended Virtual Network (https://geofon.gfz-potsdam.de/eqinfo/gevn/, last access: March 2024), including station selection and the distribution trimming of 25 %.The workflow for M e computations is as follows: as soon as an automatically detected event reaches the magnitude threshold, scmert is triggered and starts to compute M eij upon receiving data from stations beyond 20°.The process continues until the selected window length (determined by the actual preliminary magnitude) of the last station at 98°is acquired.The first esti- mate of the magnitude M e is released shortly after collecting 20 M eij estimates from individual station, usually within a few minutes of the earthquake's origin time.SeisComP modules continue to refine the estimate until no further updates are required (this includes manual release at later stages).The computed station magnitudes M eij are fully integrated also into the SeisComP Origin Locator View Graphical User Interface (scolv GUI; Fig. 12), with station magnitudes and residuals displayed in a dedicated energy magnitude tab.
The energy magnitude values from both modules are compared in Fig. 13.We used scmert with the same settings as the GEOFON earthquake monitoring service, including station selection and trimming of the distributions.The values are in good agreement, and the best-fit model is M e = 0.057 + 0.987M e (GEO), with a standard deviation of 0.118.The average difference computed for magnitudes between 6 and 8 is −0.028.
All values for M e that have been calculated since the start of the routine processing with scmert can be accessed via the fdsnws-event web service running at GEOFON by specifying M e as the magnitude type (i.e., https://geofon.gfz-potsdam.de/fdsnws/event/1/query?starttime=2021-12-07&magnitudetype=Me& includeallmagnitudes=true&nodata=404, last access: March 2024).These values are also disseminated to other agencies (e.g., ISC and EMSC) via the usual downstream channels, including a real-time push service.

Code and data availability
Code used for computing the energy magnitude is available from offline computations in me-compute https://doi.org/10.

Conclusive remarks
We computed the energy magnitude M e for 6349 events in the moment magnitude catalog disseminated by GEOFON.
When combined with M w , M e allows a better characterization of the tsunami and shaking potential of an earthquake.The procedure used to compile the dataset, which includes 1 031 396 M e values for each recording station, is described in detail.Residuals are evaluated using a mixed-effects regression, which partitions the overall residuals into eventspecific and station-specific contributions.These random effects are included in the distributed catalog, enabling the computation of M e for future events and using inter-station residuals as station corrections to reduce the uncertainty in M e .They also enable the assessment of energy magnitude adjustments for specific regions or faulting mechanisms, using interevent residuals and locating propagation anomalies with respect to the global model used to compute Green's functions using the leftover residuals.The methodology employed for computing M e (Di Giacomo et al., 2008) is suitable for the rapid assessment of M e (Di Giacomo et al., 2010).Therefore, it has been implemented as a module for SeisComP, allowing the automatic computation of M e in real time and keeping the M e catalog up to date.Competing interests.The contact author has declared that none of the authors has any competing interests.
Disclaimer.Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper.While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.
Acknowledgements.We thank all network operators providing data via EIDA-ORFEUS and IRIS, as well as all real-time data providers contributing to the GEOFON virtual network.We thank Kirsten Elger for her help in preparing the dissemination of the products of this study.Finally, we appreciated the valuable feedback and suggestions provided by two anonymous reviewers and by the editor Andrea Rovida.

Figure 1 .
Figure 1.Median network residuals (circles) for dataset D0 (a) and median station residuals for dataset D1 (b).Red lines correspond to the 2.5 and 97.5 percentiles of the distributions.For each network (a) and station (b), the horizontal bars correspond to the interval median (circle) ±1 median absolute deviation (MAD).A few values falling outside the range considered for the horizontal axis are not shown.

Figure 3 .
Figure 3. Energy magnitude residuals versus distance (a) and moment magnitude (b) for dataset D3.Blue dots indicate residuals also included in D6.The horizontal red lines bound the 90 % confidence interval [−0.43, 0.50] of the residual distribution.The error bars indicate the mean ±1 standard deviation of the residuals computed over different distance (20°wide) and magnitude (1 m.u.wide) intervals.

Figure 4 .
Figure 4. Cumulative distribution functions for event δE (a), station δS (b), and leftover distributions (circles) determined according to the mixed-effects regression in Eq. (2) applied to dataset D6.Dotted lines correspond to standard deviations ±1τ (a), ±1φ S (b), and ±1φ 0 (c).Horizontal red lines in panels (a) and (b) are the standard errors in the random effects.In panel (c), values of exceeding ±1.2 in absolute value are not shown.

Figure 6 .
Figure 6.Extreme values for event specific residuals δE for the M eij versus M w mixed-effects model of Eq. (2).Only values below the 10th percentile (panels b and d) and above the 90th percentile (panels a and c) of the distribution are shown (the percentiles are about ±0.3).In panels (a) and (b), earthquakes with hypocentral depths shallower than 30 km are selected.In panels (c) and (d), events deeper than 30 km are considered.The distribution of δE versus depth for all events is shown in panel (e).

Figure 7 .
Figure 7. (a) Residual distribution of Eq. (2) for three different receiving areas, showing only | | > 0.30.(b) As in panel (a) but considering only the European receiving area.(c) As in panel (a) but considering only the receiving area in California.(d) As in panel (a) but considering only the receiving area in Australia.Circles indicate the earthquake locations.

Figure 8 .
Figure 8. M eij versus M w scaling.Gray circles are the station M eij estimates, and filled circles represent event M e values calculated as medians of all station estimates for that event.Color indicates how many stations contributed to each estimate.The best-fit line in green is derived from the mixed-effects regression (Eq.2), considering ±1 interevent standard deviation τ (red lines).The faint black line shows equality for reference.

Figure 9 .
Figure 9.Comparison with energy magnitude disseminated by IRIS considering (a) M e (HF) and (b) M e (BB) (717 common events).The red line shows the linear regression fit, and the dotted lines show 1 standard deviation of the M e residuals.The blue line shows line of equality for reference.

Figure 10 .
Figure 10.M e versus M w categorized with SOF.

Figure 11 .
Figure 11.M e versus M e (BB) and M e (HF), as categorized with SOF.

Figure 12 .
Figure 12.Screenshot of the SeisComP Origin Locator View (scolv) interactive tool for the M w 7.7 Republic of Türkiye earthquake that occurred on 6 February 2023, 01:17 UTC along the East Anatolian Fault.The obtained network magnitude value of M e is 7.8.Stations used are color-coded according to M e magnitude residuals (top-left frame).Stations in gray are excluded from the network magnitude, do not match the distance range definition, or were trimmed while computing the average magnitude because they are within the ±12.5 % range.The top-right scatterplot shows M e residuals by distance (in red those that contributed to actual M e network magnitude).The topography shown in the map is generated using the ETOPO1 global relief model (Amante and Eakins, 2009).

Figure 13 .
Figure 13.Comparison between M e computed in real time by GE-OFON, with the scmert add-on for SeisComP (x axis) and offline estimation using me-compute (y axis), considering 153 common events.
Author contributions.DB, AS, and DDG conceptualized the study.RZ developed the Python code used to compile the disseminated catalog.AH developed the add-on for SeisComP.DB developed the quality checks.PE, AH, and AS organized the publication of M e by GEOFON via web services.All authors participated to the finalization of the article.

Financial support.
he article processing charges for this openaccess publication were covered by the Helmholtz Centre Potsdam -GFZ German Research Centre for Geosciences through the finacial support of the Horizon Europe project Geo-INQUIRE, funded by the European Commission (HORIZON-INFRA-2021-SERV-01, project no.101058518).The article processing charges for this open-access publication were covered by the Helmholtz Centre Potsdam -GFZ German Research Centre for Geosciences.https://doi.org/10.5194/essd-16-1733-2024Earth Syst.Sci.Data, 16, 1733-1745, 2024