A multi-sensor satellite-based archive of the largest SO2 volcanic eruptions since 2006

We present a multi-sensor archive collecting spatial and temporal information about volcanic SO2 clouds generated by the eleven largest eruptions of this century. The detection and monitoring of volcanic clouds is an 15 important topic for aviation management, climate issues and weather forecast. Several papers have been published focusing on single events, but not any archive is available at the moment to be used as background for future studies. We archived and collocated the SO2 vertical column density estimations from three different instruments (AIRS, IASI and GOME-2), the atmospheric parameters vertical profiles from the Global Navigation Satellite Systems (GNSS) Radio Occultations (RO) and the vertical backscatter from the Cloud-Aerosol LIdar with Orthogonal Polarization 20 (CALIOP). We additionally provide information about the cloud top height from three different algorithms and the atmospheric anomaly due to the presence of the cloud. The dataset consists of 223 days monitored with SO2 clouds, collocated with 56675 backscatter profiles and 70126 radio occultation profiles. The modular structure of the archive allows an easy collocation of the different datasets according to the users’ needs and the cross-comparison of the datasets shows the high consistency of the parameters estimated with different sensors and algorithms. The data 25 described here will be published with a DOI after final acceptance of this manuscript (Tournigand et al., 2020, http://doi.org/10.5880/fidgeo.2020.016). During the discussion period, the data are accessible via this temporary link: http://pmd.gfzpotsdam.de/panmetaworks/review/0f85d699707efcdc567765bd0dafaaadf94b6df5a531f310167f7e974ea803bf.


Introduction
The above-mentioned GVP provides the most extensive catalogue of historical eruptions with information related to both volcanoes and their eruptions. This catalogue is a firsthand source of information when starting an investigation of a given volcano or a given style of eruption. Similarly, the Large Magnitude Explosive Volcanic Eruptions (LaMEVE) dataset includes data such as the magnitude of the event, the bulk volume, the tephra fallout volume, column height of Quaternary (from 2.58 Ma ago to present) eruptions with VEI ≥ 4 (Brown et al., 2014). Other datasets 70 are available including data on geochemical composition (e.g. Turner et al., 2009), or acoustic acquisitions (e.g. Fee et al., 2014), even though these types of database are generally limited to a specific volcano or a specific time window (e.g. de Moor et al., 2017).
Previous papers focusing on single or a few eruptions are based on personal data collections or project collaborations and this makes it difficult for data comparison and studies with new techniques or algorithms. The eruptions of Okmok 75 and Kasatochi in 2008 were the focus of a special issue (JGR Atmospheres, 2018) collecting 27 papers studying the cloud with all the available remote sensing platforms and algorithms. Sarychev Peak 2009 volcanic cloud was also well studied (e.g., Carn and Lopez, 2011;Kravitz et al., 2011;Rybin et al., 2011;Doeringer et al., 2012). The Eyjafjallajokull 2010 eruption affected the economy and social life of Europe and beyond, changing the rules of air traffic management and the volcanic cloud was subject of a number of studies (e.g., Flentje et al., 2010;Marenco et 80 al., 2011;Prata and Prata, 2012;Stohl et al., 2012). Grimsvotn 2011 was a quite interesting eruption from a scientific point of view because the SO2 cloud was separated from the ash cloud (Moxnes et al., 2014) and different researchers studied it (e.g., Flemming and Inness, 2013;Marzano et al., 2013;Cook et al., 2014;Prata et al., 2017). The Nabro 2011 eruption was the subject of an interesting discussion regarding the direct intrusion to the stratosphere of the volcanic cloud (e.g., Bourassa et al., 2012;Clarisse et al., 2014;Fromm et al., 2014;Biondi et al., 2017) and the 85 Puyehue Cordon Caulle, erupting in the same period, was of interest because the cloud moved around the globe (e.g., Bignami et al., 2014;Griessbach et al., 2014;Theys et al., 2014;Biondi et al., 2017). However, other volcanic clouds, such as the ones produced by Merapi 2010 (Picquout et al., 2013), Tolbachik 2012 , Kelut 2014 (Kristiansen et al., 2015;Vernier et al., 2016) and Calbuco 2015 (Marzano et al., 2018;Lopes et al., 2019) were not studied in depth.

90
Considering SO2 emissions, several datasets and inventories are available and updated in the course of the years. In recent times, Ge et al. (2016) (2019). The dataset provides "SO2 mass loadings for all significant global volcanic eruptions detected from space since October 1978" to present (Carn, 2019). The MSVOLSO2L4 includes ancillary information about the volcanoes, such as the name and location of the volcano, as well as information about the eruptions, for example, start and end date, and VEI. This information is retrieved from the GVP database. The dataset also reports the observed plume altitude (in kilometres) where known. Otherwise, an estimated plume altitude 100 above vent depending on eruption type, and the measured SO2 mass in kilotons (= 1000 metric tons) is provided Carn, 2019).
In this study, we present the first database predicated on satellite data, reporting: 1. SO2 retrievals from Atmospheric Infrared Sounder (AIRS), Infrared Atmospheric Sounding Interferometer (IASI), and the Global Ozone Monitoring Experiment 2 (GOME-2). The data from the three sensors provide 105 horizontal and temporal information on SO2 concentrations; 2. SO2 cloud altitude estimations from IASI, the Cloud-Aerosol LIdar with Orthogonal Polarization (CALIOP) backscatter and the Global Navigation Satellite System (GNSS) Radio Occultation (RO); 3. the cloud aerosol subtype information from CALIOP; 4. atmospheric properties such as temperature, pressure, humidity, from GNSS RO profiles.

110
The information is provided for eruptions classified by GVP as VEI 4 or larger, with an SO2 total mass loading larger than 0.05 Tg and that occurred from 2006 to 2016 since not any eruption in the periods 2016-2019 was yet classified as VEI 4 at the time of the archive preparation. The selected volcanoes and relative eruptions are (Table 1) To the best of our knowledge, there is no current unique database collecting SO2 maps and volcanic cloud altitude estimations from several instruments, cloud aerosols subtypes and atmospheric properties for explosive eruptions. Accurate knowledge on volcanic SO2 clouds concentration and altitude as well as, their spatial and temporal evolution, is of crucial importance in the investigation of an eruption climatic impact. Thus, we believe that the database 120 presented here will help current and future investigations as well as support the development of more accurate retrievals methodologies.

Instrument description
A summary of the instruments' characteristics together with parameters provided in this work and references to the algorithms are reported in Table 2. In this archive, each sensor measuring SO2 amounts measures the partial column 125 density, due to their own limitations (Brenot et al., 2014). We here use the terms vertical column density (VCD) to refer to this partial column density.

AIRS
The Atmospheric Infrared Sounder (AIRS) is a cross-track hyperspectral instrument onboard the polar-orbiting satellite Aqua launched in June 2002 with ascending equator crossing local time at 13:30. AIRS completely covers 130 the full globe two times per day with a swath of 1650 km and spatial resolution of 13.5 x 13.5 km 2 at nadir and 41 x 21 km 2 at high latitudes (Susskind et al., 2003). The SO2 pixels are identified using infrared channels centered at the 7.3 μm absorption peak relying on the correlation between the measured spectrum and a reference spectral shape. The amount of SO2 in each pixel is computed by a least-squares procedure based on an off-line radiative transfer model. This technique performs well for SO2 reaching high tropospheric altitudes or the stratosphere where the water vapor 135 content is negligible. Comparisons with other techniques Carn et al., 2017) show an agreement within 10-30% and a typical retrieval error for a single AIRS pixel of about 6 Dobson Unit (DU) (Prata and Bernardo, 2007). Preprint. Discussion started: 16 July 2020 c Author(s) 2020. CC BY 4.0 License.

IASI
The Infrared Atmospheric Sounding Interferometer (IASI) is a Fourier transform instrument onboard the near-polar 140 sun-synchronous orbiting satellites Metop-A and Metop-B, respectively, launched in October 2006 and September 2012 with ascending equator crossing local time at 9:30. IASI completely covers the full globe two times per day with a swath of 2200 km and a spatial resolution of 12 km at nadir (Clerbaux et al., 2009). The SO2 retrieval is based on a brightness temperature difference in the SO2 ν3 band (Clarisse et al., 2012) which is converted to SO2 concentration integrated along the vertical axis the Vertical Column Density (VCD) using look-up tables and operational profiles of 145 pressure, temperature and humidity. The retrieval of VCD assumes that all SO2 is located at particular atmospheric layers (5, 7, 10, 13, 16, 19, 25 or 30 km above sea level) providing different estimations at different altitudes. Then a second algorithm  is applied to compute the SO2 cloud altitude with an accuracy of about 2 km for plumes below 20 km.

150
The Global Ozone Monitoring Experiment 2 (GOME-2) is an ultraviolet-visible spectrometer, on board of the Metop-A and Metop-B satellites, measuring solar light backscattered by the atmosphere or reflected by the Earth in nadirviewing geometry with a swath of 1920 km and spatial resolution of 40x80 km at nadir (Munro et al., 2006). The SO2 VCD retrieval  is based on the strong SO2 absorption between 240 and 400 nm and uses a differential optical absorption spectroscopy technique (Platt and Stutz, 2008). All measurements in the wavelength ranging from 155 315 to 326 nm are fitted to laboratory absorption data of SO2 and converted to VCD with an air mass factor from radiative transfer models assuming hypothetical atmospheric layers representative of different scenarios of emissions. The SO2 VCD is provided for 3 atmospheric layers representative of different scenarios of emissions: low troposphere (~2.5 km above the surface), upper troposphere (∼6 km) and lower stratosphere (∼15 km).

160
The

GNSS RO
The Global Navigation Satellite Systems (GNSS) Radio Occultation (RO) is an active limb sounding technique which uses the signal transmitted by a GNSS satellite and received by a Low Earth Orbit satellite. The signal, travelling 170 through the atmospheric layers, is bent according to the different refractivity of each layer and thus provides information about the vertical structure of the troposphere and stratosphere (Kursinski et al., 1997). The vertical resolution of the RO ranges between 100 m in the upper troposphere to about 500 m in the lower stratosphere at low/mid-latitudes (Zeng et al., 2019), while the horizontal resolution can range from about 50 km in the troposphere to 200-300 km in the stratosphere (Kursinski et al., 1997). We use for this archive the RO bending angle, refractivity, 175 temperature, pressure and specific humidity profiles processed by the Wegener Center for Climate and Global Change (WEGC) with the Occultation Processing System (OPS) version 5.6 (EOPAC Team, 2019). We also provide the bending angle anomaly which is proven to be an efficient parameter to understand the impact of the VC on the atmospheric structure (Biondi et al., 2017;Cigala et al., 2019;Stocker et al., 2019). The way this anomaly is computed is detailed in section 4.1. The RO profiles are obtained using a combination of geometric optics and wave optics , and the EUMETSAT/METOP missions (Luntama et al., 2008). The accuracy of the GNSS RO is 0.2 K in terms of temperature and 0.1% in terms of refractivity and the data from the different mission are very consistent (Scherllin-Pirscher et al., 2011), so there is no need of inter-calibration or homogenization (Foelsche et al., 2011).

Data description
The archive (Tournigand et al., 2020) consists of two sets of files, the daily files and the eruption files, i.e., one file

190
per eruption including all collected information. Thus, for each eruption listed in Table 1 the user can choose to access one single day or to the whole eruptive period depending on the user's demand. The amount of days covered by the archive for each eruption depends on the SO2 detection availability from AIRS, IASI and GOME-2. Each file is in NetCDF-4 format and file names are self-explanatory with daily files following the format volcanoname_year_month_day and the eruption files following the format volcanoname. As an example, a user who 195 wishes to access all available data corresponding to the Sarychev volcano on 12 of June 2009 will have to look for the file sarychev_2009_06_12.nc. The organization of both file types is described hereafter for each instrument.

IASI
IASI data are organized in the same way for both file types and are composed of 5 variables, IASI_lat, IASI_lon, IASI_date, IASI_SO2 and IASI_height respectively containing the latitude (°N), longitude (°E), date and time (POSIX time), SO2 (DU) and cloud altitude (m). The date variable consists of a line vector with elements corresponding to 210 each granule. Similarly, the other variables are matrices with columns corresponding to the different granules and lines to the data points of the given granule having an SO2 content higher than 0 DU.

GOME-2
GOME-2 data's organization is identical in both file types. GOME-2 is composed of 6 variables, GOME_lat, GOME_lon, GOME_date, GOME_SO2_1, GOME_SO2_2, GOME_SO2_3 respectively corresponding to the latitude 215 (°N), longitude (°E), date and time (POSIX time), low troposphere SO2 vertical column density (DU), mid-troposphere SO2 vertical column density (DU) and low stratosphere SO2 vertical column density (DU). As for AIRS and IASI, the date variable corresponds to a line vector providing each granule's date. The rest of the variables correspond to matrices with granules also separated in columns and the data points of those granules distributed in lines. In the case of GOME-2, only data points having their three SO2 vertical columns contents higher than 0 DU were included. Pixel 220 with high Solar Zenith Angle (SZA) has also been filtered.

GNSS RO
GNSS RO data are organized in different ways in the two file types. In the daily files, the RO data are separated in different variables according to the instrument they are collocated with (AIRS, IASI or GOME-2). For each set of RO data collocated with a given instrument 10 variables are provided, latitude (°N), longitude (°E), date (POSIX time),

240
bending angle (rad), bending angle anomaly (%), temperature (K), pressure (Pa), refractivity (N-unit), specific humidity (kg.kg -1 ) and volcanic cloud top altitude (m). Each variable is a matrix with each column corresponding to a RO profile and lines to the latitude dimension. Only the variable containing volcanic cloud altitude is a line vector with each element corresponding to a different RO profiles.
In the files containing the whole eruptive period, the RO data are not separated according to the instrument they are 245 collocated with but compiled all together. Thus, the same 10 variables are provided as in daily files, each containing the totality of the RO profiles.

RO data
The RO profiles included in this archive are collocated spatially at ±0.2° and temporally at ±12h with data points from 250 the volcanic aerosol maps provided by AIRS, IASI, and GOME-2 acquisitions.

Climatology
The RO reference climatology for each area of interest is calculated based on 5° latitude bands using our dataset of RO profiles covering a period from 2001 to 2017. The averaging of all available RO profiles present within each latitude band provides the RO reference climatology.

Anomaly calculation
The bending angle (BA) anomaly integrated into this archive is calculated by subtracting the RO reference climatology profile from the individual RO BA profile and normalized with respect to the reference climatology profile (Eq 1), following the methodology described in Biondi et al. (2017). to identify related atmospheric features. The presence of volcanic clouds in the atmosphere generates a prominent peak in the BA anomaly profile.
With the bending angle anomaly, BA the individual bending angle profile and BAclim the BA climatology profile.

Peak detection 265
The peak detection of bending angle anomaly profiles was automatically done using a customized Matlab algorithm. This algorithm, further developed after Cigala et al. (2019), identifies all the peaks displaying a variation larger than 4.5% in the bending angle anomaly profile with respect to local minimums. Only the peaks having their maximum value between 10 and 22 km of altitude are kept. Peaks vertically spreading over more than 8 km have been removed. Finally, amongst the remaining peaks, the lowest altitude one is selected as a cloud top altitude.

Cloud top automatic detection
For each L1 532 nm version 4.10 CALIOP backscatter product collocated with RO profiles, an automatic cloud top detection was performed using a customized Matlab algorithm. The collocation thresholds were kept at ±0.2° and 275 ±12h between RO profiles and CALIOP backscatter products to provide cloud top altitudes consistent between the instruments. The first step of the cloud top detection procedure consists in cropping the CALIOP backscatter image according to the collocated RO profile position at +/-14° in latitude and +/-80° in longitude. The objective of this first step is to remove unnecessary information from the CALIOP image and to focus the processing on the zone of interest in order to save computational time. These latitude and longitude ranges are based on a series of tests performed on 280 backscatter images and correspond to the best compromise between image size reduction and loss of volcanic cloud information. Threshold backscatter values are then implemented to remove the noise outside of the range 3x10 -2 -7x10 -4 km -1 .sr -1 to which volcanic clouds correspond. One median filter (4x3 pixels) and two Wiener filters (4x3 and 2x2 pixels) are then successively applied to the resulting backscatter image to reduce the noise within the threshold range. Below 10 km altitude, the RO bending angle anomaly is quite noisy due to the presence of moisture. We thus 285 focus only on volcanic clouds with a top altitude above 10 km and removed image information below this altitude. The next step of CALIOP data processing is to identify remaining groups of pixels (or clusters) within the image. The Matlab connected components finder is set in this customized algorithm to keep only the clusters combining more than 300 pixels. Amongst these selected clusters the nine biggest ones are kept for the final processing stage. This final stage consists of distinguishing clusters corresponding to volcanic features from the ones corresponding to 290 meteorological clouds. To do so, the aspect ratio of a series of volcanic and meteorological clouds were measured and a threshold value of 0.09 was set as the higher limit for volcanic clouds. Finally, the remaining clusters' top altitudes are measured and an average value calculated and saved in the archive as an estimate of the volcanic cloud top altitude.

Cloud type detection
The cloud type detection was performed using the L2 Vertical Feature Mask (VFM) CALIOP product. These VFM 295 products were collocated at ±0.2° and ±1h with AIRS, IASI and GOME-2 for the purpose of such data is to confirm the presence of certain aerosols types simultaneously with SO2. Among the different cloud types available in VFM products the types 2, 6, 9 and 10 were of particular interest for this archive since they respectively correspond to dust, elevated smoke, volcanic ash and sulfate/other. For each CALIOP file of the archive, the aerosol subtype was extracted using a customized Matlab routine. This Matlab algorithm reads the VFM data, detects the matching latitude/longitude 300 points of all the CALIOP track collocated with AIRS, IASI and GOME-2 and subsets the latitude and longitude array data based on the chosen spatial window of 2°. The algorithm then extracts the feature sub-type of interest as a function https://doi.org/10.5194/essd-2020-109

305
This archive combines five different approaches of volcanic cloud detection and each type of measurement/instrument has its own uncertainties. For AIRS, IASI and GOME-2 the uncertainty depends on many parameters (e.g. thickness of the volcanic cloud, amount of aerosols), one of the most important is the unknown volcanic cloud altitude. Thus, the error is case dependent and a general value of measurement uncertainty cannot be provided. Furthermore, the measurement noise of instruments increases over time due to instrument degradation (Lang et al., 2009;Dikty et al., 310 2010). However, error budgets of AIRS and IASI can be respectively found in the studies by Prata and Bernardo (2007) and Clarisse et al. (2012), while an uncertainty analysis of GOME-2 is provided by Rix et al. 2012 in the case of the 2010 Eyjafjallajökull's volcanic eruption.

Data cross-comparison
Over the past decades, satellite data have proven efficient in volcanic clouds detection through a variety of techniques.

315
Those data are essential in the study of the spreading of such clouds on a global scale but are scattered between the different agencies in charge to process them. This archive gathers satellite data covering 11 VEI 4 eruptions from 2008 to 2016 for a total of 223 days of data coverage (Table 3).
This archive is organized in different sections ( Figure 1) with each instrument estimation separated from each other. Several parameters are measured using different instruments, such as SO2 VCD and cloud top altitude, to allow cross-320 correlation between the different retrieval algorithms. The database allows the quick visualization of AIRS, IASI, GOME-2, CALIOP and RO data at a given date and time as well as the collocation of any instrument data points with another one. In order to illustrate the use of this archive, we extracted two test cases ( Figure 2). The first case ( Figure  2a) is the 2008 eruption of Kasatochi volcano for which we selected the 9 th of August as reference. The second case (Figure 2b) is the 2009 Sarychev Peak eruption for which we selected the 12 th of June as reference. In both cases we 325 considered SO2 values larger than 3 DU from AIRS, IASI and GOME-2 for 24 hours, RO profiles collocated within ±0.2° and ±12h and CALIOP tracks collocated within ±12h.
In the case of Kasatochi, we selected 4178 AIRS, 1241 IASI and 56 GOME-2 data points with SO2 VCD larger than 3 DU, 379 CALIOP profiles from 7 different tracks (Figure 2a, blue circles) within ±12h and 100 RO profiles ( Figure  2a, red circles) collocated within ±0.2° and ±12h. In the case of Sarychev, 1070 AIRS, 209 IASI and 41 GOME-2 data 330 points, 261 CALIOP profiles from 3 different tracks and 54 RO profiles have been selected with the same criteria. Due to the modular archive structure reported in Figure 1, the user can easily select different time frames, different SO2 thresholds and collocation period range to be adapted to any purpose.
We have manually verified the correct functioning of the algorithm which collocates the different instruments. We randomly selected several days from different eruptions and compared date, time and coordinates of the acquisitions, 335 then compared the results with those ones automatically provided by the algorithm. We have additionally used a visual validation method for all the samples plotting the SO2 cloud superimposed to the CALIOP tracks and the RO tangent points. As for the cloud top height, we have collocated the RO and CALIOP estimations with the closest IASI pixel and compared the corresponding values. In Table 4

345
The raw CALIOP data can be found at https://urs.earthdata.nasa.gov/. The archive consists of daily files and "eruption" files. For each eruption, we provide access to single daily files or to one file for the whole eruptive period. The files of any eruption are compressed (.zip) NetCDF-4 format (including the daily and whole eruptive period) together with two pdfs (Supplement) describing the file structure. The file names are self-explanatory with daily files following the format volcanoname_year_month_day.nc and the eruption files 350 following the format volcanoname.nc. As an example, a user who wishes to access the data corresponding to Kasatochi volcano on 11 th of August 2008 will have to look for the file Kasatochi_2008_08_11.nc. In case the user wants all the available data for the Kasatochi eruption, they will have to look for the file Kasatochi.nc. The data structure of daily files and volcano files is reported in the Supplement. The data described here will be published with a DOI after final acceptance of this manuscript (Tournigand et al., 2020, http

Summary and conclusions
This paper presents the first comprehensive archive with information on large SO2 volcanic clouds since 2006. We collected three different datasets of volcanic SO2 detection from AIRS, IASI and GOME-2 instruments and co-located 360 the detected pixels with the CALIOP and the GNSS RO products to get information about the cloud vertical structure.
The archive provides information about the SO2 detection (with 3 different algorithms), the cloud top height (with 3 different algorithms), the cloud aerosol type (CALIOP vertical mask feature reference), the atmospheric parameters (bending angle, refractivity, temperature, pressure and specific humidity) and the atmospheric change due to the presence of the volcanic cloud (bending angle anomaly). Up to date, there are no public archives of volcanic clouds 365 which can be used as a reference for further studies and all the information is scattered in different locations and available under different conditions. The aim of this archive and this paper is to provide the users with a complete set of state-of-the-art data. The interest in volcanic clouds detection and monitoring is high and there are still some challenges like the accurate determination of the cloud top height and cloud density to be faced. This archive will make available to the scientific community a relevant number of cases to develop and test new algorithms contributing 370 to improving the accuracy on the estimation of fundamental volcanic clouds parameters. The modular structure of the archive can be easily extended in the future to smaller eruptions (VEI<4) and to other SO2 estimations, facilitating the inter/cross-comparison between algorithms, allowing to reconstruct the cloud structure and dynamical characteristics and supporting the development of cloud dispersion models.