Design and description of the MUSICA IASI full retrieval product

, Abstract. IASI (Infrared Atmospheric Sounding Interferometer) is the core instrument of the currently three Metop (Mete-orological operational) satellites of EUMETSAT (European Organization for the Exploitation of Meteorological Satellites). The MUSICA IASI processing has been developed in the framework of the European Research Council project MUSICA 5 (MUlti-platform remote Sensing of Isotopologues for investigating the Cycle of Atmospheric water). The processor performs an optimal estimation of the vertical distributions of water vapour (H 2 O), the ratio between two water vapour isotopologues (the HDO/H 2 O ratio), nitrous oxide (N 2 O), methane (CH 4 ), and nitric acid (HNO 3 ), and works with IASI radiances measured under cloud-free conditions in the spectral window between 1190 and 1400 cm − 1 . The retrieval of the trace gas proﬁles is performed on a logarithmic scale, which allows the constraint and the analytic treatment of ln[HDO] − ln[H 2 O] as proxy for 10 the HDO/H 2 O ratio. Currently, the MUSICA IASI processing has been applied to all IASI measurements available between October 2014 and April 2020, of data each an These ﬁles each proﬁles representativeness in form of averaging as well as data ﬁltering options and give examples of the horizontal and continuous temporal years of data with global daily coverage). This This, at a ﬁrst glance apparent drawback of large data ﬁles and data volume is counterbalanced by multiple possibilities of data reusability, reuse, which are brieﬂy discussed. In addition an extended output data ﬁle is made avialble. It contains the same variables as in the standard output data ﬁles are provided in addition to together with Jacobians (and spectral responses) for many different uncertainty sources and Gain matrices (due to this 25 additional variables it is called the extended output). It is limited to 74 observations over a polar, mid-latitudinal and tropical site. We use this additional Jacobian and Gain data for assessing the typical impact of different uncertainty sources – like surface emissivity or spectroscopic parameters – and different cloud types on the retrieval results. We offer two data packages with DOI for free download via the repository RADAR4KIT. The ﬁrst data package has a data volume of about 17.5 GB and is linked to https://doi.org/10.35097/408 . It contains example standard extended output data 30 ﬁles for all MUSICA IASI retrievals made for a single day (more than 0.6 million). Furthermore, it includes a ReadMe.pdf ﬁle with a description of how to access the total data set (the 25 TB) or parts of it. This data package is for users interested in the typical global daily data coverage and in information about how to download the large data volumes of global daily data for longer periods. The second data package is linked limited to https://doi.org/10.35097/412 and contains the extended output data ﬁle. Because it provides data for only 74 example retrievals, observations (over a polar, mid-latitudinal and tropical site), 35 its data volume is only 73 MB and it is thus recommended to users for having a quick look on the data. All data can be freely downloaded via the repository RADAR4KIT and detailed information like DOIs and data set references are given in the data availability section.

years of data with global daily coverage). This This, at a first glance apparent drawback of large data files and data volume is counterbalanced by multiple possibilities of data reusability, reuse, which are briefly discussed. In addition an extended output data file is made avialble. It contains the same variables as in the standard output data files are provided in addition to together with Jacobians (and spectral responses) for many different uncertainty sources and Gain matrices (due to this 25 additional variables it is called the extended output). It is limited to 74 observations over a polar, mid-latitudinal and tropical site. We use this additional Jacobian and Gain data for assessing the typical impact of different uncertainty sources -like surface emissivity or spectroscopic parameters -and different cloud types on the retrieval results.
We offer two data packages with DOI for free download via the repository RADAR4KIT. The first data package has a data volume of about 17.5 GB and is linked to https://doi.org/10.35097/408 . It contains example standard extended output data Figure 1. Outline of the MUSICA IASI processing chain. This paper focuses on the processing steps as indicated by the red frame. The green symbols indicate different products. The supply of detailed information on retrieval settings and product characteristics offers many different possibilities for a posteriori processing and date resusage (indicated as indicated on the bottom of the schematics and discussed in Sect. 8). Sect. 8 (S5P here means the main sensor on the Sentinel-5 Precursor satellite). matrices. A summary and an outlook are provided in Sect. 9. For readers that are no experts in the field of remote sensing 85 retrievals, Appendix A provides a short compilation with the theoretical basics and the most important equations on which we refer throughout this paper. Appendix B reveals that for the MUSICA IASI retrieval product we can assume moderate non-linearity (according to Chapter 5 of ?), which is important for many data reuse options. Appendix C explains how the data can be used in form of a total or partial column product.

The IASI instruments on Metop satellites
90 IASI is a Fourier-transform spectrometer and measures in the infrared part of the electromagnetic spectrum between 645 cm −1 and 2760 cm −1 (15.5 µm and 3.63 µm). After apodisation (L1c spectra) the spectral resolution is 0.5 cm −1 (Full Width Half Maximum, FWHM). The main purpose of IASI is the support of Numerical Weather prediction. However, due to its high signal to noise ratio and the high spectral reolution, resolution, the IASI measurements offer very interesting possibilities for atmospheric trace gas observations (e.g. ?). (e.g., ?). 95 The IASI instruments are carried by the Metop satellites, which are Europe's first polar-orbiting satellites dedicated to operational meteorology. The Metop program has been planned as a series of three satellites to be launched sequentially over  Until the beginning of 2020 the Metop-A, -B, and -C overflight times took generally take place within about 45 minutes. Table   1 gives an overview on the major specifications of the Metop/IASI mission.
The number of individual observations made by the three currently orbiting IASI instruments is tremendous. During a single 105 orbit 91800 observations are made. In 24 h the three satellites conclude in total about 42 orbits, which means more than 3.85 million individual IASI spectra per day and more than 1.4 billion per year.
IASI-like observations are guaranteed for several decades. First observations are made in 2006 and in context of the Metop Second Generation (Metop-SG) satellite programme IASI Next Generation instruments will perform measurements until the 2040s. In this context the IASI programme offers unique possibilities for studying the long-term evolution of the atmospheric 110 composition.

MUSICA IASI data format
In this section we discuss the format of the MUSICA IASI full product data files and the nomenclature of the data variables.

Data files
The MUSICA IASI full product data are provided as netcdf files compliant with version 1.7 of the CF (Climate and Forecast) 115 metadata convention (cfconvention.org). The data files contain all information needed for reproducing the retrievals and for optimally reusing the data. Because the MUSICA IASI retrieval builds upon the EUMETSAT L2 cloud filter and uses the EUMETSAT L2 atmospheric temperature as the a priori atmospheric temperature, the output files contain some EUMETSAT retrieval data as well as the MUSICA retrieval data. In addition, they contain the EUMETSAT L1C spectral radiances (and the simulated radiances) as well as auxilary data needed for the retrieval (like surface emissivity from other sources, ???). 120 We provide standard output files comprising all processed IASI observations and one extended output file with detailed calculations of Jacobians (and spectral responses for surface emissivity, spectroscopic parameters, and cloud coverage) and Gain matrices for a few selected observations.
The standard output is provided in the files IASI The typical size of a tar file with the orbit-wise netcdf files of a single day is 15 GB. This number is for the typically 28 orbits per day of two satellites (for three satellites there are typically 42 orbits per day). The standard output data files are linked to a DOI (?).

135
The extended file represents 74 observations over polar, mid-latitudinal and tropical GRUAN stations (GRUAN stands for Global Climate Observing System Reference Upper Air Network, www.gruan.org). More details on the time periods and locations represented by these retrievals are given in ?. The file provides the same output as the standard files and in addition detailed information on Jacobians (and spectral responses) and gain matrices. The Jacobian matrices collect the derivatives of the radiances as measured by the satellite sensors with respect to a parameter (e.g., atmospheric temperature, instrumental conditions).

140
The spectral response matrices inform about the change of the radiances due to changes in the surface emissivities, the spectroscopic parameters, and the cloud coverage. The gain matrices are the derivatives of the retrieved atmospheric state with respect to the radiances. The name of this extendend output file is IASIAB_MUSICA_030201_L2_AllTargetProductsExtended_exam its size is 70 MB, and it is linked to an extra DOI (?).
Here we report on the MUSICA IASI processing version 3.2.1 (applied for IASI observations until the end of June 2019). For

Variables
There are three different categories of variables. The first category consists of variables that contain information resulting from the EUMETSAT L2 PPF (product processing facility) retrieval. For variables that refer to a specific retrieval product, a corresponding syllable is embedded into the respective variable names: _wv_ and _wvp_ stands for water vapour isotopologues and water vapour isotopologue proxies, respectively, _ghg_ for the greenhouse gases N 2 O and CH 4 , _hno3_ for HNO 3 , and _at_ for the atmospheric temperature.
The water vapour and greenhouse gas variables (_wv_, _wvp_, and _ghg_) contain information on two species, which can be identified by the value of the dimension musica_species_id. For _wv_ these are the species H 2 O and HDO, for 170 _wvp_ the water vapour proxy species (see Sect. 4.4.2), and for _ghg_ the species N 2 O and CH 4 , respectively.

MUSICA IASI retrieval set up
In this section the principle setup of the MUSICA IASI retrieval is presented. We discuss our filtering before processing, the used retrieval algorithm, the measurement state (spectral region), the atmospheric state that is retrieved in an optimal estimation sense, and the used a priori information and the applied constraints. A detailed explanation of these settings ensures the full 175 reproducibility of the data and is also important in the context of data reusage (see examples given in Sect. 8).

Data selection prior to processing
We focus on the processing of IASI data for which EUMETSAT L2 data files of PPF version 6.0 or later are available. For former data versions not all of the subsequently discussed L2 PPF variables are available. Furthermore, we found that there are several modifications made within version 4 and 5 that significantly affect the stability of our MUSICA IASI retrieval output 180 (see discussion in ?). EUMETSAT L2 PPF version 6 data are available from October 2014 onward, so we focus our processing on IASI observations made from October 2014 onward. In addition, the MUSICA IASI retrievals are currently restricted to cloud-free scenarios. The selection of cloud-free conditions is made by means of the EUMETSAT L2 PPF cloudiness assessment summary flag variable (called flag_cldnes in the EUMETSAT L2 netcdf data files). We only process IASI observation with this flag having the value 1 (the IASI Instrumen-185 tal Field Of View, IOFV, is clear) or 2 (the IASI IFOV is processed as cloud-free but small cloud contamination possible). This requirement for cloud free scenarios removes more than 2/3 of all available IASI observations. Furthermore, we require EUMETSAT L2 PPF temperture profiles being generated by the EUMETSAT L2 PPF optimal estimation retrieval scheme. For this purpose we use the EUMETSAT L2 PPF variable flag_itconv. We only process data with this flag having value 3 (the minimisation did not converge, sounding accepted) or 5 (the minimisation converged, 190 sounding accepted). Figure 2 gives a climatological overview on the amount of IASI data that remain after the aforementioned preselection. The maps largely reflect the cloud cover conditions. A very large amount of IASI data passed our selection criteria in the suptropical subtropical regions, where cloud-free condition conditions generally prevail. In the North Atlantic storm track region, the South American and South African tropics, and the Southern Polar Oceans the sky is generally cloudy in February leading to a low 195 number of IASI observations that passed our selection criteria. In August we can clearly identify the Asian and West African monsoon region as an area with increased coud cloud coverage and consequently less MUSICA IASI processed data.

The retrieval algorithm 200
We use the thermal nadir retrieval algorithm PROFFIT-nadir (??). It is an extension of the PROFFIT algorithm (PROFile Fit, ?) used since many years by the ground-based infrared remote sensing community (??). This extension has been made in support for all trace gases a Voigt line shape model and the spectroscopic line parameters according to the HITRAN2016 molecular spectroscopic database (?). However, we increase the line intensity parameter for all HDO lines by +10%, in order to correct for the bias observed between MUSICA IASI δD retrievals and respective aircraft-based in-situ profile data (?).
For the inversion calculations PROFFIT-nadir offers options that are essential for water vapour isotopologue retrievals. These are the options for logarithmic scale retrievals and for setting up a cross constraint between different atmospheric species (see 210 also Sect. 4.4.2). The theoretical basics for atmospheric trace gas retrievals are provided in Appendix A.

The analysed spectral region
The retrieval works with the radiances measured in the spectral region between 1190 cm −1 and 1400 cm −1 . The respective radiance values are the elements of the MUSICA IASI measurement state vector referred to as y in Appendix A. Figure   4 depicts measured and simulated radiances as well as a large variety of different Jacobians spectral responses (Jacobians 215 multiplied by parameter changes) for a typical mid-latitudinal summer observation over land. Please note the different radiance scale for measurement and simulation, on the one hand, and residuals and Jacobians, spectral responses, on the other hand.
We show trace gas Jacobians spectral responses for an a uniform increase of the trace gases throughout the whole atmosphere: 100% for H 2 O and HDO, 10% for N 2 O and CH 4 , 50% for HNO 3 . The respective values are reasonable approximations to the typical atmospheric variabilities of these trace gases. We see that the measured radiances are most strongly affected by the 220 water isotopologues. The variations of N 2 O and CH 4 are also recognizable (larger than the spectral residuals, i.e. the difference between measured and simulated radiances). The Jacobians spectral responses of HNO 3 are very close to the noise level. The atmospheric temperature Jacobians spectral responses are depicted for a uniform 2 K temperature increase over three different layers: surface -2 km a.s.l., 2 -6 km a.s.l., and 6 -12 km a.s.l. Atmospheric a.s.l. (a.s.l. means above sea level). In the analysed spectral region (1190-1400 cm −1 ), the atmospheric temperature variations close to the surface affect mainly the 225 radiances below 1300 cm −1 and variations at higher altitudes mainly the radiances above 1300 cm −1 . In Fig. 4 we depict the Jacobians spectral responses for 2 K, because this is a resonable approximation of the uncertainty in the EUMETSAT L2 PPF temperatures (?).
The Jacobian spectral response for surface emissivity and temperature reveal that surface properties hardly affect the radiances above 1250 cm −1 , but have a strong impact below 1250 cm −1 . We calcuate the emissivity Jacobias Jacobians for a −2% 230 change of the emissivity independently above an below 1270 −1 , which is a typical uncertainty of emissivity judging from its dependency on viewing angle and wind speed over ocean (?) and small scale inhomogeneities; however, this uncertainty might be significantly higher over arid areas (?).
Concerning spectroscopy, the Jacobians spectral responses calculated for the typical uncertainty range of spectroscopic parameters are relatively small. In Fig. 4 we show the Jacobians spectral responses for consistent +5% changes of the line 235 intensity and pressure broadening parameters of all water vapour isotoplogues, which is in reasonable agreement with the uncertainty values given by HITRAN (?). The Jacobians spectral responses shown are for a water continuum that is by 10% larger than the continuum according to the model MT_CKD v2.5.2 (???).
The bottom panel of Fig. 4  Clouds; ??). For cirrus clouds we assume the particle composition as given by OPAC's "Cirrus 3" ice cloud example (see 245   Table 1b in ?) and for mineral dust clouds a particle composition according to OPAC's "Desert" aerosol composition example (see Table 4 in ?). The shown Jacobians spectral responses are for 10% cumulus cloud coverage with the cloud top at 3 km, a homogeneous dust cloud between 2 and 4 km, and 25% cirrus cloud coverage between 10 and 11 km. These are relatively weak clouds and we assume that they might occasionally not correctly be identified by the EUMETSAT L2 cloud screening algorithm. Because the respective Jacobians spectral responses are significantly above the noise level, these unrecognised 250 clouds can have an important impact on the retrieval.
A comprehensive set of different Jacobians spectral responses is provided with the extended output data file for the 74 examplatory observation observations at an Arctic, mid-latitudinal and tropical site.

The state vector
In this Section we discuss the MUSICA IASI state vector, which is referred to as x in Appendix A.

Components of the state vector
We retrieve vertical profiles of the trace gases H 2 O, HDO, N 2 O, CH 4 , HNO 3 , and of atmospheric temperature. For all these profile retrievals we use constraints (for more details see Sect. 4.6). In addition we fit the surface skin temperature and the spectral frequency scale without any constraint. We discretise the profiles on atmospheric levels between the surface and the top of the atmosphere (which we set at 56 km). The grid is relatively fine in the lower troposphere (≈ 400 m) and increases surface altitude at sea level (0 m a.s.l.) nal = 28 and for a surface altitude of 4000 m a.s.l. nal = 21. Consequently, the state vector for an observation with surface altitude at sea level has a length of 6 × 28 + 2 = 170. basis transformation can be achieved by operator P:

Water vapour isotopologue proxies
Here the four matrix blocks have the dimension (nal × nal), I stands for an identity matrix and the state vectors x and x are related by:

285
Similarly logarithmic scale covariance matrices can be expressed in the two basis systems and the respective matrices S and S are related by: and respective averaging kernel matrices A and A are related by: For more details on the utility of the water vapour isotopologue proxy state please refer to ? and ?.

Summary
The atmospheric state variables that are independently constrained during the MUSICA IASI processing are the vertical profiles of the water vapour isotoplogue proxies H 2 O and δD, and the vertical profiles of N 2 O, CH 4 , HNO 3 , and atmospheric 300 temperature. For all the trace gases (not only for the water vapour isotoplogues) the retrieval works with the state variables in a logarithmic scale. For atmospheric temperature a linear scale is used. Surface skin temperature and the spectral frequency shift are also components of the state vector; however, they are not constrained during the retrieval procedure. The The variables musica_wv_apriori and musica_wv provide the a priori assumed and the retrieved values of H 2 O and HDO, respectively (see also Sect.4.5 and 5.1). The output is given in ppmv and normalised with respect to the naturally 305 occurring isotopologue abundance. In this context, δD is calculated from the content of these variables as δD = 1000 HDO H2O −1 . Information about H 2 O and δD related to differentials (constraints, averaging kernels, kernel metrics, or uncertainties) are generally provided in the proxy states (variables with the syllable _wvp_).

A priori states
The reference for the a priori data used for the MUSICA IASI trace gas retrievals is the CESM1/WACCM (Community Earth  For the MUSICA IASI a priori profiles of H 2 O, N 2 O, CH 4 and HNO 3 we consider a mean latitudinal dependence, sesonal seasonal cycles, and long-term evolution. Therefore, the a priori data are constructed by means of a low dimensional multiregression fit on the CESM1/WACCM data independently for each vertical grid level. We fit an annual cycle with the two frequencies 1/year and 2/year, and for the long-term baseline we fit a second order polynom. The fits are performed individually for 15 equidistant latitudinal bands between 90 • S and 90 • N. In order to capture the yearly anomalies in N 2 O and CH 4 a 325 priori data, we use the Mauna Loa Global Atmospheric Watch yearly mean data records for a correction of the WACCM parameterized time series (for more details on this correction procedure see ?). We also use the temperature lapse rate tropopause -according to the definition of the World Meteorological Organisation -from WACCM and construct a latitudinal dependent tropopause altitude by fitting a seasonal cycle and a constant baseline (no long-term dependency) and assume a transition zone between troposphere and stratosphere with a vertical extension of 12.5 km. The MUSICA IASI δD a priori profiles between  The a priori trace gas profiles are provided in the variables musica_wv_apriori (H 2 O and HDO with species index 1 and 2, respectively), musica_ghg_apriori (N 2 O and CH 4 with species index 1 and 2, respectively), and musica_hno3_apriori 345 (HNO 3 ). The unit is ppmv.
As a priori for the atmospheric and the surface temperatures we use the EUMETSAT L2 PPF atmospheric temperature output. These data are provided in Kelvin and in the variables musica_at_apriori and musica_st_apriori, for atmospheric temperature and surface temperature, respectively. 350 We set up simplified a priori covariance matrices by means of two parameters. The first parameter are the altitude dependent amplitudes of the variability (v amp,i , with i indexing the ith altitude level). For the trace gases we work with the relative variability, i.e. with the variability on the logarithmic scale. For atmospheric temperatures the variability is given in Kelvin.

A priori covariances and constraints
The second parameter are the altitude dependent vertical correlation lengths (σ cl,i , for considering correlated variations between with z i being the altitude at the ith altitude level. The values v amp,i and σ cl,i are oriented to the typical covariances of in-situ observations made from ground , (e.g., ??), aircraft , (e.g., ??), or balloon (e.g., ??) and also aligned to the vertical dependency of the monthly mean covariances we obtain from the WACCM simulations. For v amp,i of δD we use in addition the isotopologue enabled version of the Laboratoire de 360 Météorologie Dynamique (LMD) general circulation model as a reference (??). For atmospheric temperature we use the uncertainty in the EUMETSAT L2 atmospheric temperature as reference (?). Generally, we classify three different altitude regions with specific vertical dependencies in the values of v amp,i and σ cl,i : the troposphere (below the climatological tropopause altitude as depicted in Fig. 5), the stratosphere (starting 12.5 km above the climatological tropopause altitude), and the transition region between troposphere and stratosphere.

365
The values of v amp,i are specific for each trace gas and for the atmospheric temperature and they are provided in the MUSICA IASI standard output files in the variables having the suffix _apriori_amp. As a simplification we use the same values of σ cl,i for all trace gases and for the atmospheric temperature. These values are provided in the MUSICA IASI output files as the variable musica_apriori_cl.
As the constraint of the retrieval we use an approximation of the inverse of the covariance matrix. For this purpose the 370 constraint matrix R is constructed as a sum of a diagonal constraint, and first and second order Tikhonov-type regularisation matrices (?): with: and of the first and the second vertical derivatives of the profiles. These values can be calculated from the elements of the a priori matrix (S a ) as follows: Starting the retrievals with the constarint matrix R ≈ S a −1 optimises the computational efficiency of the retrieval processes, because according to Eqs. (A4) and (A5) the retrieval calculations work with S a −1 . Furthermore, calculating the inversion of S a approximatively as the sum of diagonal constraint, and first and second order Tikhonov-type regularisation matrices 390 offers the possibility of tuning the constraint according to specific user requirements with respect to smoothness or absolute deviations . (e.g., ??).
For the greenhouse gases (N 2 O and CH 4 ) and HNO 3 we constrain with respect to the absolute values of the profiles and the first derivative of the profile, i.e. we do not consider the term (α 2 L 2 ) T α 2 L 2 of Eq. (8). In case of the water vapour isotopologue proxies and the atmospheric temperature we additionally constrain with respect to the second derivative of the profile, i.e. we 395 consider all terms of Eq. (8). Please note that for the trace gases the constraints work on the logarithmic scale and for the atmospheric temperature on the linear scale.
Because HNO 3 has only very weak spectroscopic signatures in the analysed spectral region (see Fig. 4) we loosen the absolute constraint and at the same time strengthen the constraint with respect to the first vertical derivate: α 0 and α 1 are calculated from an S a constructed with the values of v amp,i increased by a factor of 1.5 and with the values of σ cl,i increased 400 by a factor of 2. Similarly and in order to avoid a negative impact of an underconstrained retrieval of the temperature profile on the trace gas products (e.g. (e.g., artificial oscillatory features), we strengthen the atmospheric temperature constraint: α 0 , α 1 and α 2 are calculated from an S a constructed with the values of v amp,i decreased by a factor of 0.5.
The digonal entries of the diagonal matrices α 0 , α 1 , and α 2 contain all information about the actual constraints used by the retrieval. They are provided in the MUSICA output files for each individual retrieval and for the different trace gases and the 405 atmospheric temperature as the variables with the suffix _reg. For the trace gases these vector elements are depicted in Fig. 6 for a northern hemispheric summer in the tropics, mid-latitudes and polar regions. The dotted lines indicate the climatological tropopause and the altitude 12.5 km above this tropopause (transition zone between troposphere and stratosphere).

MUSICA IASI retrieval output
In this sections we describe the variables that inform about the retrieval traget products (vertical trace gas profiles) and the char-410 acteristics of these products (averaging kernels and errors). A detailed explanation of these data supports their interoperability and is also important in the context of data reusage (see examples given in Sect. 8).

Trace gas profiles and temperatures
The retrieved trace gas profiles are provided in the variables musica_wv (H 2 O and HDO with species index 1 and 2, respectively), musica_ghg (N 2 O and CH 4 with species index 1 and 2, respectively), and musica_hno3 (HNO 3 ). The unit is 415 ppmv. The retrieved atmospheric temperature is provided in the variable musica_at and the retrieved surface temperature in the variable musica_st. The unit is Kelvin.
In order to provide a brief insight into the data diversity, Figure 7 gives examples with a apriori priori and retrieved trace gas profiles for an observation on 30. Aug. 2008 over Lindenberg (53 • N). The profile data represent 28 altitude levels and are provided with detailed information of their sensitivity, vertical representativeness, and errors (see following subsections).

Characteristics of retrieved products
For a limited number of retrievals we provide an extended netcdf output file (see Sect. 3.1). The extended output file contains the same variables as the standard output files and in addition the full averaging kernels and a large set of Jacobians (and spectral responses for surface emissivity, spectroscopic parameters and cloud coverage) together with gain matrices. The latter allows the calculation of full error covariances for a large variety of different uncertainty sources. In the standard output files 425 we do not provide the full averaging kernels (that would consider all the cross correlations between the different retrieval products) neither the full error covariances. The reason is that providing the full kernels and/or the full error covariances would strongly increase the storage needs for the data output (?). that are not retrieved but that affect the retrieval (spectroscopy, different cloud types, and surface emissivity). Using the gain matrices and the Jacobians, the full averaging kernels and the full error covariances can be calculated as indicated by Fig. 8.
The full averaging kernel for the trace gas products is marked at the right side by the thick black frame (an example for these kernels is plotted in Fig. 9). The full error covariances are indicated by the yellow frame (examples of the root-mean-square values of the diagonals of these error covariances are plotted in Fig. 12).

435
The parts of this full matrix that are provided by the standard output files for all individual retrievals are indicated as the matrix blocks filled by green and red colour. Green represents the individual averaging kernels of the water vapour isotopologues, the greenhouse gases, HNO 3 , and the atmospheric temperature. Red marks the cross kernels of the trace gas products with respect to atmospheric temperature (i.e., they inform how errors in the EUMETSAT L2 PPF atmospheric temperatures -used as MUSICA IASI a priori temperatures -affect the retrieved trace gas products). These temperature cross kernels allow the  We provide differential or derivatives (covariances, averaging kernels, gain matrices and Jacobian matrices) related to the trace gas products always in the logarithmic scale. Logarithmic scale kernels are the same as the fractional kernels used in ?.

445
Furthermore, we strongly recommend the use of the logarithmic scale kernels for analytic calculation. Because the MUSICA IASI trace gas retrievals are made on the logarithmic scale, the assumption of a moderately non-linear case according to ? can be made on logarithmic scale (i.e. requires the use of logarithmic scale kernels), but has limited validity on the linear scale.
More details on the valid assumption of a moderately non-linear problems are given in Appendix B. Similarly we also provide in all standard files all four block kernels describing the greenhouse gases (kernels A 33 , A 34 , A 43 , and A 44 in Fig. 9). Although the respective cross kernel values are rather small their availability supports the precise 465 Figure 9. Example of an averaging kernel for the full atmospheric composition state vector that is optimally estimated by the MUSICA IASI characterisation of a combined CH 4 /N 2 O product, which has a higher precision than the individual N 2 O and CH 4 products (see discussion in ?). Layer Width per DOFS (LWpD) Resolving Length (RL) Because HNO 3 has only weak spectroscopic signatures in the analysed spectral window, the respective kernel (A 55 in Fig. 9) reveals a pronounced maximum, which is limited to the lower/middle stratosphere. By tuning the constraint (see discussion at the end of Sect. 4.6), we obtain DOFS values of generally close to 1.0. We also provide atmospheric temperature profile 470 kernels (not shown in Fig. 9), for which we typically obtain a DOFS value of about 2.0.
Because we want to provide averaging kernels for each individual observation, we developed a compression procedure, which is necesary for keeping the size of the data files in an acceptable range. Section 5.2.4 describes the compression method, the format, and the variables in which the averging kernels are provided.

Metrics for sensitivity and resolution
475 Table 2 gives an overview on metrics that can be calculated from the averaging kernel elements. In the previous section the DOFS metric has been introduced as the trace of the averaging kernel matrix. Figure   i.e. the signals retrieved at different altitudes reflect all the signals of the same real atmospheric altitude region.
The resolving length informs about the vertical resolution at the centre altitude, i.e. how broad is the atmospheric altitude layer by which the retrieved value is significantly affected. As briefly discussed in ? resolving length is not a satisfactory definition of resolution for slowly decaying averaging kernels or for averaging kernels that have strong sidelobes, for instance the MUSICA IASI kernels for H 2 O (see top left panel of Fig. 9). The variables with the suffix _resolution provide the vertical information displacement and resolution metrics for each individual observation. As parameter 1 and 2 this variables provides the centre value (C) and the resolving length (RL), 515 respectively, and as parameter 3 the layer width per DOFS value (LWpD).

Errors
For the 74 observations provided in the extended output file (see Sect. 3.1) calculations of a large variety of Jacobians (and spectral responses for surface emissivity, spectroscopic parameters and cloud coverage) and full gain matrices are available for a polar, mid-latitudinal, and tropical site (?). Figure 12 presents the errors calculated for a mid-latitudinal summer observation 520 using the gain matrices and Jocabians (or spectral responses) according to Eqs. (A10) and (A11). The uncertainty assumption ∆b and S b used for these calculations are summarised in Table 3. The measurement noise error is calculated according to Eq. (A12) with S y,noise being a diagonal matrix with diagonal values set to the mean square value calculated from the spectral residuals (measured -simulated spectra).
We organise the errors in three categories: random errors (measurement noise, uncertainties of emissivity and atmospheric 525 temperature, and interferences from atmospheric humidity and δD variations), spectroscopic errors (uncertainties in the water continuum modelling and uncertainties in the intensity and pressure broading parameters of all target trace gases), and errors due to unrecognised clouds. Concerning random errors, we find that atmospheric temperature uncertainties are dominating the error budget for all retrieval products except for δD (because temperature uncertainties have similar impacts on H 2 O and HDO, they cancel out 530 in their ratio). Measurement noise is the second most important error contributor (and the dominating error source for δD).
Estimations of the dominating temperature error (assuming atmospheric temperature uncertainty covariances in line with ?) and the measurement noise error are provided in standard files in the variables with suffix _error, for all trace gas products (for the water vapour isotopologue in the proxy state basis) and for atmospheric temperature. By providing the cross averaging kernels with respect to atmospheric temperature (see matrix blocks filled by red colour at 535 the right side of the schematics of Fig. 8) we can calculate the propagation of any assumed temperature profile uncertainty ∆T individually for all observations in the standard files, according to Eq. (A10): with K T being the Jacobians for atmospheric temperature and A T being the temperature cross kernel provided for all observations in the standard data file.

540
We can also reconstruct for all observations the full error covariance matrix Sx ,noise due to the spectral noise used for constraining the solution state. For the MUSICA IASI processing we use as the spectral noise covariance S y,noise a diagonal matrix with the mean-square values of the spectral residual (difference between simulated and measured spectrum). According to Eqs. (A5) to (A8) and (A12) we can write:

545
H 2 O interferences from atmospheric δD, and δD interferences from atmospheric H 2 O are also significant (blue and cyan lines in the random error plots of Fig. 12). For this reason we provide in the standard file the four blocks of the water vapour isotopologues averaging kernels, which enables us to estimate these interferences for each individual observation. The error covariance due to interference of δD on H 2 O can be calculated by: 550 and the error due to interference of H 2 O on δD by: Here S a,δD and S a,H2O are covariances of the δD and H 2 O proxy states, respectively, and A 12 and A 12 are the cross kernels of the proxy states. Please note that the water vapour isotopologue kernels provided in the standard files are for the Spectroscopic uncertainties cause mainly systematic errors. The assumed uncertainty of line intensity ∆S and pressure broadening ∆γ (see Table 3) are in reasonable agreement with the values reported in ?. Respective error estimations can be performed for the 74 exemplary observations provided in the extended data file over a polar, mid-latitudinal and tropical site.
As shown in Fig. 12 they are typically within 5%, except for HNO 3 , where we estimate errors in the lower stratosphere due 560 to spectroscopic uncertainties of up to 12% (mainly reflecting the larger uncertainty budget allowed for the band intensity).
The uncertainties of the spectroscopic parameters of line intensity and pressure broadening mainly affect the retrieval of the trace gas, for which the parameters are assumed to be uncertain. Cross impacts are largest for uncertainties in water vapour parameters and there mostly for the water continuum (to a less extent for line intensity and pressure broadening). For this reason we plot the effect of the water continuum uncertainty for all trace gases, whereas the effects of the line intensity and 565 pressure broadening parameters are only plotted for trace gas whose parameters are assumed to be uncertain.
MUSICA IASI retrievals are only executed when the EUMETSAT L2 PPF flag flag_cldnes is set to 1 (the IASI Instrumental Field Of View, IOFV, is clear) or 2 (the IASI IFOV is processed as cloud-free but small cloud contamination possible).
This means that in particular for MUSICA IASI retrievals made with a cloud flag value of 2 clouds can have an impact, which should be examined. For this reason we calculated a variety of different cloud Jacobians spectral responses for our 74 exem-570 plary observations over polar, mid-latitudinal and tropical sites and provide them in the extended data files. Examples for the obtained errors are depicted on the right of Fig. 12. We find that clouds with the properties as described in Table 3 have a significant effect on the retrievals. The impact of a cirrus cloud is in particularly strong and the H 2 O and HNO 3 data products seem to be the most affected. However, in this context we also have to consider the natural variability of the different trace gas products. Because the natural valiability variability of δD, N 2 O, and CH 4 is very small, uncertainties due to clouds of 1% can 575 already be a large problem. In summary this estimation of errors due to unrecognized clouds indicates that we should be careful when using MUSICA IASI data products corresponding to an EUMETSAT L2 PPF cloud flag value of 2 (see also discussion in Sects. 6 and 7).

Matrix compression
In order to reduce the storage needs of the output files we compress the averaging kernel matrices. For this compression we 580 perform a singular value decomposition of the original averaging kernel and The suffixes _xavkat_rank, _xavkat_val, _xavkat_lvec, and _xavkat_rvec identify the respective variables needed for the reconstruction of the temperature cross averaging kernels. In this case the right eigenvectors have the length of the atmospheric temperature state vector, which is different from the length of the atmospheric state vector in case of the 595 water vapour isotopologue and the greenhouse gas product (i.e. for the water vapour isotopologue and the greenhouse gas temperature cross averaging kernels the left and right eigenvectors have different sizes).

Data filtering options
The MUSICA IASI retrieval data are provided with detailed information on the retrieval quality, the retrieval products' characteristics, and errors, as well as variables summarising cloud conditions and main aspects of sensitivity, vertical resolution and 600 errors. In this section we discuss the variables providing these informations and recommend possibilities for data filtering.

Clouds
The EUMETSAT L2 PPF flag flag_cldnes is written in the MUSICA IASI variable eumetsat_cloud_summary_flag.
As discussed in Sect. 5.2.3 there is some risk that the MUSICA IASI product retrieved for eumetsat_cloud_summary_flag set to 2, has significant errors due to clouds. In order to exclude this risk we can filter out these data, i.e. we can use a very 605 stringent cloud filtering criterion by only using observations where the variable eumetsat_cloud_summary_flag is set to '1'.
Another and less stringent option is to use in addition the EUMETSAT L2 fractional cloud cover, which is written in the MUSICA IASI variable eumetsat_cloud_area_fraction. If eumetsat_cloud_summary_flag is set to '2' we require in addition that the determination of a cloud area fraction has not been successful, i.e. we require that 610 eumetsat_cloud_area_fraction is set to 'NaN'. No clear determination of a value for fractional cloud cover means that the cloud signals are rather weak (the contrast between cloud and surface signals are smaller than the instrument noise).

Quality of the spectral fit
The spectral noise level considered in the cost function (A2) during the MUSICA IASI processing is the root-mean-square (RMS) of the spectral fit residual (difference between simulated and measured spectrum). By this retrieval setting we consider level is generally larger than the pure instrumental noise level, because it is a sum of the instrumental noise and the signatures that are not understood by the foreward forward model. In the MUSICA IASI retrieval this RMS value is treated as white noise, i.e. for S y,noise of the cost function (A2) we use a diagonal matrix filled by the mean-square values according to the spectral residuals.

620
As long as the residual is close to white noise, this kind of processing ensures a correct weighting of the measured spectra, on the one hand, and the a priori information, on the other hand. However, occasionally the measured spectra is very poorly understood simulated by the forward model and the residuals can not be described as a white noise instead the residuals show systematic signatures. This happens for instance, if incorrect surface emissivities are used or if the retrieval is made for an observation that is affected by a cloud. In order to identify the systematic part of the residuals we smooth the residuals using 625 a ±2 cm −1 running mean. The smoothed residuals are the systematic residuals and the difference of the original residuals and the smoothed residuals can then be interpreted as the random (or white noise) residuals. Residuals, systematic residuals, and random residuals are provided in the standard files for each observation in the variable musica_fit_quality.
In order to facilitate the filtering of data corresponding to a poor spectral fit quality we set up a flag (provided as variable musica_fit_quality_flag) that works with the RMS values of the systematic residuals and the random residuals. The 630 flag is set to 0 (poor quality) if the systematic residuals have an RMS value of larger than 40 nW/(cm 2 sr cm −1 ). For all other observations we analyse the ratio between the RMS of the systematic residuals and the RMS of the random residuals. If this ratio is larger than 1.0 the flag is set to 1 (restricted quality), if it is between 0.5 and 1.0 the flag is set to 2 (fair quality), and if it is smaller or equal 0.5 the flag is set to 3 (good quality). Figure 13 depicts residuals corresponding to different values of this fit quality flag. All observations are made during the same orbit, at closeby locations (Northern Africa), and for very similar 635 surface temperatures. It is very likely that the poor spectral fit quality is due to incorrect surface emissivity values used for the respective retrievals (over arid areas like Northern Africa, surface emissivity data have an increased uncertainty, ?). Our recommendation is to use data that belong to the quality groups fair and good.

Errors
The standard files provide for all obserations observations and all trace gas products estimations of the errors dominating the 640 random error budget: errors due to noise in the spectra and errors due to uncertainties in the atmospheric temperature a priori data (the EUMETSAT L2 PPF temperatures). The noise error and estimations of atmospheric temperature error are given in the error variable (variable with suffix _error, see Sect. 5.2.3) for all trace gas products and can be used for filtering out data with anomalously high errors.
Incorrect spectroscopic parameters (line intensity, pressure broadenig coefficients, or water continuum modelling) can be 645 responsible for large errors. Although these uncertainty sources are systematic, the errors they cause depend on the sensitivity of the remote sensing system, which in turn is affected by the geometry of the observation. In first order the optical path of the measured radiances depends on the platform zenith angle (PZA, provided as the variable platform_zenith_angle). In order to avoid that systematic uncertainties in the spectroscopic parameters cause artificial signals we can set threshold values for PZA and limit the PZA to angles close to nadir (e.g. by (e.g., by requiring PZA ≤ 30 • ).

Sensitivity and resolution
The standard files provide the averaging kernels in a compressed format for all observations (see Sect. 5.2.4) as well as metrics that capture the most important aspects of sensitivity and vertical resolution (see Sect. 5.2.2). These metrics are provided in the variables with the suffixes _response and _resolution and allow analyses of sensitivity and vertical resolution for each individual observation without the need for reconstructing the averaging kernels. We can use the metrics for filtering out data 655 where the response to the real atmospheric variability is low or where the vertical representativeness is irregular.
In order to ensure a good sensitivity (retrieval product being mainly affected by the real atmosphere and not by the a priori assumption) the measurement response (MR) should be close to unity. Layer width per DOFS (LWpD), centre altitude displacement (C − Altitude), and resolving length (RL) can be used to filter out data that do not fulfill the requirements in As an average about 30% of all measurements are made for cloud free conditions (EUMETSAT L2 PPF cloudiness assessment summary flag set to '1' or '2', see also Sect. 4.1). This makes about 25000 individual retrievals per orbit/output file. In 675 the following we present examples of this large amount of data. We select example altitudes where the respective products have generally a good sensitivity and reasonable vertical representativeness. According to Figs. 9 and 11 a good altitude choice is 4.2 km for H 2 O and δD and 10.9 km for N 2 O and CH 4 . For HNO 3 the MUSICA IASI processor does not provide profile information, instead the kernels for all altitudes show a similar vertical dependence and reveal retrieval sensitivity for a broad lower stratospheric layer. For this reason we aggregate the HNO 3 data in form of partial column averaged mixing ratios for the 680 layer between 10 and 35 km. Details on this resampling are given in Appendix C. Table 4. Filters applied for the time series and global daily maps of different species and for different altitudes (A) as shown in Figs. 14 and 15: EUMETSAT L2 PPF cloudiness assessment summary flag, MUSICA spectral fit quality flag, platform, zenith angle, MUSICA noise and temperature error, and MUSICA averaging kernel metrics (measurement response: MR; sum diagonal value of the partial column averaging kernel entries: i A * i ; kernel: A * i,i ; altitude layer width per DOFS: LWpD; vertical information displacement: C − A).

Species Cloud Fit quality
only if variable eumetsat_cloud_area_fraction is set to 'NaN' b : for dry air mixing ratios averaged for partial column 10-35 km a.s.l.
c : here the bottom thresholds are set to be below the lowest actually occurring positive value

Filtering
We filter the data according to the settings and threshold values of Table 4. For all data we requiring 'fair' and 'good' for the MUSICA IASI spectral fit quality (flag variable musica_fit_quality_flag is required to be set to '2' or '3') and we filter the data using the EUMETSAT L2 PPF cloudiness assessment flag (provided as the variable eumetsat_cloud_summary_flag).

685
For the N 2 O and CH 4 data we apply a more stringent cloud filter and further inspect data where the EUMETSAT L2 PPF cloudiness assessment summary flag indicates a possibility of small cloud contamination. For respective data we require that the EUMETSAT L2 processing cannot clearly attribute a value for fractional cloud cover, which means that the cloud signals are rather weak (see Sect. 6.1). We use this more stringent cloud filtering for N 2 O and CH 4 , because both species have relatively weak atmospheric variabilities that are very similar to the errors estimated for a small cloud coverage (10% coverage 690 with opaque cumulus clouds or 25% coverage with cirrus clouds).
Furthermore, we filter according to retrieval fit noise and estimated atmospheric temperature errors. The respective errors are provided for each observation on the MUSICA IASI standard file output variable with suffix _error. For HNO 3 we calculate the retrieval fit noise and the estimated temperature errors for the 10-35 km partial column averaged mixing ratios according to Eq. (C7), whereby we reconstruct the noise covariance matrix for HNO 3 according to Eq. (16) and gen-695 erate the atmospheric temperature covariance according to Eq. (7) using the MUSICA IASI standard file output variables musica_at_apriori_amp and musica_apriori_cl for setting up the values of v amp,i and σ cl,i , respectively.
In order to ensure that the time series signals or horizontal patterns are not significantly affected by varying sensitivity and vertical resolution we filter H 2 O, δD, N 2 O, an CH 4 data according to the averaging kernel metrics MR, LWpD, and C − Altitude. This filters out data with anomalous vertical sensitivities. For HNO 3 we calculate the 10-35 km partial column 700 Figure 14. Time series of MUSICA IASI trace gas products at selected altitudes above Karlsruhe (48-49 • N; 8-9 • E): 4.2 km a.s.l. for H2O and δD, 10.9 km a.s.l. for N2O and CH4, and 10-35 km a.s.l. for HNO3. Data have been filtered according to the setting given in Table 4. averaged mixing ratio averaging kernels according to Eq. (C6) and filter for good sensitivity by requiring a diagonal entry being close to unity.
The filter threshold values of LWpD and C − Altitude are defined relative to the a priori assumed vertical correlation length (provided in the variable musica_apriori_cl). By using this ratio we can work with the same threshold value for all altitudes, because as seen in Fig. 11, LWpD and C − Altitude have a similar vertical dependency as the vertical correlation 705 length.

Continuous time series
In this section we give an example of the temporal continuity of the data. Figure 14 depicts a time series at the midlatitudinal site of Karlsruhe, Germany, between October 2014 and January 2020 June 2021 of MUSICA IASI trace gas retrieval products. data filtering. Concerning δD, there is a reduced data amount in winter mainly due to filtering out data with reduced sensitivity (measurement response below 0.8). It is worth noting that for the {H 2 O,δD} pair optimal estimation product -generated a posteriori according to ? -we a achieve a significantly better measurement response. We

Daily global maps
In this section we give an example of the good daily global coverage achieved by high quality MUSICA IASI products. Figure   15 depicts the data retained during 24 h when using the filter setting listed in Table 4.   Table 4 (same as for Fig. 14).
The global maps of the HNO 3 10-35 km partial column averaged mixing ratios show very low values in the tropics and highest values in polar regions. However, in Antarctic winter low values are also found, found in winter, because at very 745 low temperatures (< 195 K) polar stratospheric clouds (PSCs) are formed on which HNO 3 condensates. In the Arctic winter temperatures are generally not that low and PSCs and consequently low HNO 3 values are mainly related to locally restricted to areas with a local mountain lee waves. wave occurrence.
The MUSICA IASI full retrieval product provides for each individual observation detailed information on retrieval settings 750 (a priori and constraints) and retrieval characteristics (error covariances and averaging kernels). This comprehensive set of information ensures ultimate interoperability and offers possibility for a variety of data reuse applications, in particular, because the MUSICA IASI inversion problem is a moderately non-linear problem (see Appendix B). In the following we briefly list some data reusage possiblities.
For interoperability (the common use of different data sets or their inter-comparison) the impact of different a priori data 755 should be assessed or eliminated. Assuming that the MUSICA IASI data (generated using a priori state x a ) should be commonly used with (or inter-compared to) another remote sensing data set whose retrieval processor used the a priori state x a,m .
Then we can calculate the MUSICA IASI retrieval state that would result from an x a,m a priori usage according to Eq. (B1).
For these calculations we need from the side of the MUISCA MUSICA IASI data, the originally retrieved state, the a priori state, and the averaging kernels, which is all provided by the MUSICA IASI full retrieval product.

760
For comparisons to atmospheric model simulation or for data assimilation applications a remote sensing product has to be made available together with full information about its error covariances and measurement operator. This is the case for the MUSICA IASI full retrieval product data set. For each individual observation the averaging kernels are made available and the full a posteriori covariances and the error covariances due to the fit residuals can be reconstructed from the provided constraint and the averaging kernel matrices according to Eqs. (A7) and (16), respectively.

765
As shown in Sect. 7 and Appendix C the MUSICA IASI trace gas profiles can be easily resampled according to user specific needs in form of partial column averaged mixing ratios with corresponding averaging kernels and error covariances. This is possible, because the data set provides the full information on pressure profiles, constraints (for reconstructing the error covariances due to the corresonding fit residuals, see Eq. (16)), temperature cross kernels A T (in order to calculate the error covariances due to atmospheric temperature uncertainties, see Eq. (15)), and averaging kernels.

770
? and ? discussed the advantages of a CH 4 /N 2 O ratio product. ? showed that this ratio product has a theoretically higher precision than the individual N 2 O and CH 4 products. Because N 2 O is in the troposphere chemically more stable than CH 4 it is also more homogeneously distributed than CH 4 . ? argued that by combining CH 4 /N 2 O ratio observations with a model of the N 2 O climatology, it should be able to determine tropospheric CH 4 concentration with relatively high precision. The MUSICA IASI full retrieval product prodvides informations provides information on constraints and the averaging kernels (including the 775 cross averaging kernels between N 2 O and CH 4 ) thus it offers all needed for calculating the CH 4 /N 2 O ratio product as well as the corresponding averaging kernels and error covariances.
Another interesting data reuse possibility is that the retrievals' a priori data or the retrievals' constraints can be modified a posteriori in accordance to particular user requirements. According to Eq. (18) of ? we can calculate the retrieval result (x m ) for a modified constraint (R m ) by: Here x a , A, Sx ,noise , andx are the a priori state, the averaging kernel, the error covariance due to retrieval fit noise, and the originally retrieved state, respectively. All this information is made available in (or can be reconstructed from the information provided by) the MUSICA IASI full retrieval product. ? presents an an optimal estimation {H 2 O,δD} pair product, which among others makes use of such a posteriori constraint modification.

785
? presents another possibility for MUSICA IASI data reuse. They apply the extensive information provided in the MUSICA IASI full retrieval product for optimally combining MUSICA IASI CH 4 data with the total column XCH 4 retrieval products x c =x +Ŝa n (a n TŜ a n + Sx n ,noise ) −1 [x n − x a − (a n Tx − a n T x a )].
Here the vectorx c is the optimally combined state, the row vector a n T is the column averaging kernel of the TROPOMI XCH 4 observation, the scalar x a is the a priori XCH 4 data, and the vector x a the a priori CH 4 profile.Ŝ is the a posteriori covariance of the MUSICA IASI data, which can be reconstructed with averaging kernel and constraint matrices being available according 795 to Eq. (A7). The scalar Sx n ,noise is the measurement noise error variance of the TROPOMI XCH 4 product. Optimal means here that the uncertainties and sensitivities of the MUSICA IASI CH 4 product and the TROPOMI XCH 4 product are correctly taken into account.

Summary and outlook
Measurements of the IASI instruments on the three satellites Metop-A, -B, and -C have been processed by the MUSICA 800 IASI processor. The processing has been made globally for all measurements that are declared as likely cloud free by the EUMETSAT L2 PPF cloud detection procedure. Here we report on the full retrieval product of the MUSICA IASI processing The full retrieval product is the comprehensive output of the main MUSICA IASI processing chain. It contains the simulated 805 and the residual radiances (the difference between measured and simulated radiances), some flags and retrieval outputs provided by the EUMETSAT L2 PPF processing, full information on the MUSICA IASI retrieval settings and the full MUSICA IASI retrieval output. For each observation we provide information on the MUSICA IASI a priroi priori settings and constraints, so that the data are very easily reproduceable. The retrieval output are the trace gas profiles of H 2 O, HDO, N 2 O, CH 4 , and HNO 3 as well as the atmospheric temperature profiles. Concerning H 2 O and HDO the retrieval is optimised for H 2 O and the 810 ratio of HDO/H 2 O. All products are provided with a very extensive characterisation. For each individual retrieval the leading errors are made available together with the averaging kernels. In order to reduce the data volume the kernels are provided in a compressed data format and can be reconstructed by simple matrix calculations. In addition we provide variables with averaging kernels metrics that capture the most important characteristics of the vertical representativeness (sensitivity and vertical resolution). These variables can be used for identifying data with an acceptable vertical representativeness without the need for reconstructing the averaging kernels. We give some suggestions on how to use different flags, error information, and averaging kernel metrics for data filtering recommendable for the study of global distribution maps or time series.
The output of a priori states and averaging kernels for each individual observation guarantees ultimate interoperability (the common use of different data sets or their inter-comparison). Furthermore, the additinal additional supply of constraint matrices for each individual observation together with the averaging kernels enables us to reconstruct the a posteriori covariances and 820 the retrieval fit noise error covariance. Having all these information available offeres offers excellent data reuse possibilities.
We can a posteriori adjust the a priori or the constraints to specific user needs or optimally combine the MUSICA IASI products with other remote sensing products without the need for running new retrievals. HNO 3 , which are accounted for during the postprocessing step. In version 3.3.0 these inconsistencies are already addressed before running the retrievals. This is the only difference between the two processing versions and it is actually not noticeable by the data user. The here provided report on version 3.2.1 data is equally valid for version 3.3.0 data. MUSICA IASI data for 830 observations in the second half of after June 2019 and in 2020 (processed using version 3.3.0) will soon be made available for the public in the same format as the here presented data.

Data availability
The MUSICA IASI data can be freely downloaded at http://www.imk-asf.kit.edu/english/musica-data.php. We offer two data packages with DOI. The first data package has a data volume of about 17.5 GB and is linked to https://doi.org/10.35097/408 835 (?). It contains example standard output data files for all MUSICA IASI retrievals made for a single day (more than 0.6 million) and a description of how to access the total dataset (2014-2019, data volume 25 TB) or parts of it. This data package is for users interested in the typical global daily data coverage and in information about how to download the large data volumes of global daily data for longer periods. The second data package contains the extended output data file, has only about 73 MB, and is linked to https://doi.org/10.35097/412 (?). It contains retrieval products for only 74 observations made at a polar, mid-840 latitudinal and tropical location. It provides the same variables as the standard output files and in addition the variables with the prefixes musica_jac_ and musica_gain_, which are Jacobians (or spectral responses) for many different uncertainty sources and Gain matrices (due to this additional variables it is called the extended output file). Because this data package is rather small, it is recommended to potential reviewers and to users for having a quick look on the data.
MUSICA IASI data processing is ongoing. For IASI observations after June 2019 the MUSICA IASI processing version 845 3.3.0 instead of 3.2.1 is used (the differences between the two processing versions are of technical nature and not noticeable by the data user). Data representing observations after 2019 will soon be made available for the public in the same format as the here presented data (such data are already depicted in Fig. 14).

Appendix A: Basics of retrieval theory and notations
This appendix gives an overview on the thereotical basics and notations of optimal estimation remote sensing retrieval methods.

850
It is meant as a compilation of the most important equations that are related to the discussions provided in this paper. Although it is similar to Section 2.1 of ?, we think it is here a very helpful support for readers that are no experts in the field. Further details on remote sensing retrievals can be found in ?. For a general introduction on vector and matrix algebra we recommend dedicated textbooks.
Atmospheric remote sensing means that the atmospheric state is retrieved from the radiation measured after having interacted 855 with the atmosphere. This interaction of radiation with the atmosphere is modeled by a radiative transfer model (also called the forward model, F ), which enables relating the measurement vector and the atmospheric state vector by: We measure y (the measurement vector, e.g. e.g., a thermal nadir spectrum in the case of IASI) and are interested in x (the atmospheric state vector). Vector b represents auxiliary parameters (like surface emissivity) or instrumental characteristics (like 860 the instrumental line shape), which are not part of the retrieval state vector. However, a direct inversion of Eq. (A1) is generally not possible, because there are many atmospheric states x that can explain one and the same measurement y.
For solving this ill-posed problem a cost function J is set up, that combines the information provided by the measurement with a priori known characteristics of the atmospheric state:

865
Here, the first term is a measure of the difference between the measured spectrum (represented by y) and the spectrum simulated for a given atmospheric state (represented by x), while taking into account the actual measurement noise (S y,noise is the measurement noise covariance matrix). The second term of the cost function (A2) constrains the atmospheric solution state (x) towards an a priori most likely state (x a ), whereby the kind and strength of the constraint are defined by the constraint matrix R, for which we use an approximate inversion of the a priori covariance matrix S a (more details see Sect. 4.6): The constrained solution is reached at the minimum of the cost function (A2). Due to the non-linear behavior of F (x, b), the minimisation is generally achieved iteratively. For the (i + 1)th iteration it is: K is the Jacobian matrix (derivatives that capture how the measurement vector will change for changes in the atmospheric 875 state x). G is the gain matrix (derivatives that capture how the retrieved state vector will change for changes in the measurement vector y). G can be calculated from K, S y,noise and R as: with the a posteriori covariance matrix (Ŝ): 880 which can also be written as: where I is the identidy identity operator and A the averaging kernel matrix.
The averaging kernel is an important component of a remote sensing retrieval and it is calculated as: The averaging kernel A reveals how a small change of the real atmospheric state vector x affects the retrieved atmospheric state vectorx: The propagation of errors due to parameter uncertainties ∆b can be estimated analytically with the help of the parameter Jacobian matrix K b (derivatives that capture how the measurement vector will change for changes in the parameter b). Ac-890 cording to Eq. (A4), using the parameter b + ∆b (instead of the correct parameter b) for the forward model calculations will result in an error in the atmospheric state vector of: The respective error covariance matrix Sx ,b is:

895
where S b is the covariance matrix of the uncertainties ∆b.
Noise on the measured radiances also affects the retrievals. The error covariance matrix for noise can be analytically calculated as: where S y,noise is the covariance matrix for noise on the measured radiances y.

900
Note that Eqs. (A5) to (A12) are only valid for a moderately non-linear inversion problem (see Chapter 5 of ?). In Appendix B we show that our inversion problem is of such kind. Figure B1. Setup of the linearity test using modified a priori data. The left panel shows the localisation of the footprints of the used examplary exemplary orbit and the right panels depicts latitudinal cross sections documenting the modification of the H2O and CH4 a priori data (modified -original, thick solid violett violet line indicates the tropopause altitude).

Appendix B: Linearity
As outlined in Sect. 4 the MUSICA IASI processor uses a logarithmic scale for constraining the trace gase retrievals. We strongly recommend to work on the logarithmic scale for the analytic treatment of the trace gas states. This is very obvious  (18) and it can also be used for modifiying modifying the retrieval settings without the need of performing new computationally expensive retrieval calculations (see Chapter 10 of ?). However, a requirement for the analytic treatment is that the problem is moderately non-linear (linearisation is adequate for the analytic treatment not for 910 finding the solution, see Chapter 5 of ?). In this Appendix we demonstrate that our problem is indeed moderately non-linear as long as we perform the calculations on the logarithmic scale.

B1 Setup of the linearity test
We test the validity of assuming linearity for the analytic treatment by performing retrievals with different a priori settings. The standard setting is described in Sect. 4.5. It has a dependence on latitude as well as on seasonal and interanual time scales. For 915 the test we perform additional retrievals with a priori data that have no latitudinal dependence, i.e. we use for all latitudes a latitudinal mean a priori profile. The additional retrievals are made for the Metop-A orbit #51267, whose footprints are depicted on the left of Fig. B1. We choose this orbit because it has a good global representativeness: the first part consists of observations over land and covers many different latitudes (Western Asia to South Africa) and the second part of observation over sea from pole-to-pole (Pacific Ocean). The right panels of Fig. B1 shows the differences between the modified latitudinal mean a priori 920 Figure B2. Results of the individually performed linearity tests for H2O and CH4. Shown are the following three latitudinal cross sections: (1) RMS values between the original retrieval (retrieval results using the original a priori data) and the modified retrieval (retrieval results using the modified a priori data), (2) RMS values between the modified retrieval and the original retrieval after performing the a posteriori adjustment according to Eq. (B1) on logarithmic scale, (3) same as (2), but for the a posteriori adjustment performed on linear scale. Thick solid violett violet line indicates the tropopause altitude.
profile and the a priori profiles used for the standard retrieval (a priori from Sect. 4.5). We investigate here retrievals of H 2 O and CH 4 . For H 2 O the standard a priori profiles have a large latitudinal dependence, and the difference to the latitudinal mean a priori profile is ocassionally occasionally even outside ±200%. For CH 4 there is also a clear latitudinal dependence in the standard a priori profiles, which is, however, much smaller than for H 2 O: below the stratosphere the difference with respect to the latitudinal mean CH 4 profile is within ±10%.

925
According to Eq. (A9) we can also simulate the retrieval for the modified a priori by: x m =x + (I − A)(x a,m − x a ).
Herex m is the retrieval results that would be obtained using the modified a priori,x is the original retrieval result, I is the identity matrix, A the averaging klernel kernel matrix, x a,m the modified a priori, and x a the original a priori.
The linearity test consists in comparing the results obtained by the full retrieval using the modified a priori data and the 930 results obtained by using the analytic treatment according to Eq. (B1).

B2 Test results for logarithmic and linear scale
The results of the linearity test are shown in Fig. B2. We demonstrate the impact of the modified a priori by calculating the differences between the original retrieval and the additional retrieval using the modified a priori profiles. We make a latitidinal dependent characterisation of these differences by calculating root-mean-square (RMS) values of the differences within 5 • 935 latitude bands. Latitudinal cross sections of these RMS differences are depicted on the left of Fig. B2 and reveal that the impact of the modified a priori on the retrieval is largest at the winter polar regions (high southern latitudes). This is where we find large differences between the original and the modified apriori (see Fig. B1) and where at the same time the retrieval sensitivity is realtively relatively low (see DOFS maps in Fig. 10).
The center and right columns of Fig. B2 show the 5 • latitude band RMS values for differences between the additional 940 retrieval using the modified a priori profiles and the modification according to the analytic calculations of Eq. (B1). The centre column shows the results when performing the calcuations of Eq. (B1) on the logarithmic scale. We observe that with the analytic calculations we can almost achieve the same results as with the full retrieval calculations. This indicates that the assumption of linearity for such analytic calculation is indeed valid.  Fig. B1 and the corresponding discussion): for the test the modification of the CH 4 a priori is much weaker 950 modified than the weak, but for H 2 O the a priori (see Fig. B1). modification is rather strong.
In summary, the test shows that the assumption of linearity needed for an analytic treatment of the MUSICA IASI trace gas data is valid. Nevertheless, we have to be careful. Because the retrievals are performed in the logarithmic scale, the analytic calculation that use the averaging kernels, gain matrices, or constraint matrices should also be performed on the logarithmic scale. On this scale the linearity assumption is valid. Contrary to the linear scale, where the linearity assumption is not valid, 955 meaning that an analytic treatment on linear scale can lead to large errors.
Appendix C: Partial column averaged mixing ratios For converting mixing ratio profiles into amount profiles we set up a pressure weighting operator Z, as a diagonal matrix with the following entries: . (C1)

960
Using the pressure p i at atmospheric grid level i we set ∆p 1 = p2−p1 2 −p 1 , ∆p nal = p nal − pnal−p nal−1 2 , and ∆p i = pi+1−pi 2 − pi−pi−1 2 for 1 < i < nal. Furthermore, g i is the gravitational acceleration at level i, m air and m H2O the molecular mass of dry air and water vapour, respectively, andx H2O i the retrieved water vapour mixing ratio at level i.

(C2)
We can combine the operators Z and W T and calculate a pressure weighted resampling operator by: This operator resamples linear scale mixing ratio profiles into linear scale partial column averaged mixing ratio profiles.
With operator W * T we can calculate a coarse gridded partial column averaged statex * from the fine gridded linear mixing ratio statex by: Furthermore, we introduce an operator M for transferring the differentials from logarithmic mixing ratio scale to differentials 975 in linear mixing ratio scale. It is a diagonal matrix having the elements of the linear scale atmospheric mixing ratios state as the diagonal elements: The kernels We can calculate the averaging kernel matrix of for the partial column averaged mixing ratio state can then be calculated 980 from the fine gridded logarithmic scale kernel matrix (A) by as: This kernel discribes how a change in the partial column averaged mixing ratios affects the retrieved partial colunm averaged mixing ratios. This is here an approximation, because on the right side the diagonal values of M should be the actual insted mixing ratios instead of those retrieved. The matrix W is an interpolation matrix that resamples the retrieved coarse gridded 985 partial column averaged mixing Similarly ratio profiles as a fine gridded mixing ratio profile without modifying the The covariances of the partial column averaged mixing ratio state can be calculated from the corresponding covariance matrices of the fine gridded logarithmic scale (S) by Here the approximation is because ∆x ≈ x∆ ln x.
Author contributions. Matthias Schneider set up the MUSICA IASI retrieval, designed the netcdf CF conform MUSICA IASI output files, made the calculations in context of the extended output file, developed and performed the compression of the averaging kernel output, and