Overview and update of the SPARC Data Initiative: comparison of stratospheric composition measurements from satellite limb sounders

. The Stratosphere-troposphere Processes and their Role in Climate (SPARC) Data Initiative (SPARC, 2017) performed the ﬁrst comprehensive assessment of currently available stratospheric composition measurements obtained from an international suite of space-based limb sounders. The initiative’s main objectives were (1) to assess the state of data availability, (2) to compile time series of vertically resolved, zonal monthly mean trace gas and aerosol ﬁelds, and (3) to perform a detailed intercomparison of these time series, summarizing useful information and highlighting differences among datasets. The datasets extend over the region from the upper troposphere to the lower mesosphere (300–0.1 hPa) and are provided on a common latitude–pressure grid. They cover 26 different atmospheric constituents including the stratospheric trace gases of primary interest, ozone (O 3 ) and water vapor (H 2 O), major long-lived trace gases (SF 6 , N 2 O, HF, CCl 3 F, CCl 2 F 2 , NO y ), trace gases with intermediate lifetimes (HCl, CH 4 , CO, HNO 3 ), and shorter-lived trace gases important to stratospheric chemistry including nitrogen-containing species (NO, NO 2 , NO x , N 2 O 5 , HNO 4 ), halogens (BrO, ClO, ClONO 2 , HOCl), and other minor species (OH, HO 2 , CH 2 O, CH 3 CN), and aerosol. This overview of the SPARC Data Initiative introduces the updated versions of the SPARC Data Initiative time series for the extended time period 1979–2018 and provides information on the satellite instruments included in the assessment: LIMS, SAGE I/II/III, HALOE, UARS-MLS, POAM II/III, OSIRIS, SMR, MIPAS, GOMOS, SCIAMACHY, ACE-FTS, ACE-MAESTRO, Aura-MLS, HIRDLS, SMILES, and OMPS-LP. It describes the Data Initiative’s top-down climatological validation approach to compare stratospheric composition measurements based on zonal monthly mean ﬁelds, which provides upper bounds to relative inter-instrument biases and an assessment of how well the instruments are able to capture geophysical features of the stratosphere. An update to previously published evaluations of O 3 and H 2 O monthly mean time series is provided. In addition, example trace gas evaluations of methane (CH 4 ), carbon monoxide (CO), a set of nitrogen species (NO, NO 2 , and HNO 3 ), the reactive nitrogen family (NO y ), and hydroperoxyl (HO 2 ) are presented. The results highlight the quality, strengths and weaknesses, and representativeness of the different datasets. As a summary, the current state of our knowledge of stratospheric composition and variability is provided based on the overall consistency between the datasets.


Introduction
The past four decades starting in the late 1970s represent a "golden age" of stratospheric composition measurements from satellite limb sounders, which capture the spatiotemporal structure of stratospheric composition with a vertical resolution of approximately 1 to 5 km. These limb observations have been used extensively to monitor the state of the stratospheric ozone layer that protects human and ecosystem health (e.g., Harris et al., 2015;WMO, 2011WMO, , 2014WMO, , 2018 and to study the processes leading to anthropogenic ozone depletion (e.g., Manney et al., 1994;Dessler et al., 1995;Santee et al., 2008). Such research provided the crucial science basis that underpinned actions taken under the Montreal Protocol and its amendments for the protection of the ozone layer, which is considered to be the most successful international treaty on an environmental issue to date. Limb observations, and merged products thereof, are also becoming increasingly important for the detection and attribution of climate change and potential feedback mechanisms, including the role of stratospheric water vapor and aerosol trends and variability in radiative forcing of climate (e.g., Solomon et al., , 2011Gilford et al., 2016;Schmidt et al., 2018). More generally, limb observations are used for the study of stratospheric dynamics and transport (e.g., Gray and Pyle, 1986;Solomon et al., 1986;Holton and Choi, 1988;Funke et al., 2005a;Manney et al., 2009), empirical studies of stratospheric climate and variability (e.g., Randel et al., 2006Randel et al., , 2010Randel and Thompson, 2011;Manney et al., 2008;Bourassa et al., 2010;Stiller et al., 2012;Gille et al., 2014), data merging, and trend evaluation activities (e.g., Randel and Wu, 1999;Hegglin et al., 2014;Shepherd et al., 2014;Froidevaux et al., 2015;Harris et al., 2015;Davis et al., 2016;Arosio et al., 2019;SPARC, 2019), with merged datasets also being used as forcing databases in climate models (e.g., Cionni et al., 2011, for ozone; Thomason et al., 2018, for aerosol) and for the validation of the representation of transport and chemistry in numerical models (e.g., Eyring et al., 2006;Gettelman et al., 2010;Hegglin et al., 2010;Strahan et al., 2011;Kolonjari et al., 2018;Froidevaux et al., 2019).
The validity of any data and trend analysis, however, strongly depends on the understanding of the observational uncertainty and overall quality of the datasets used, which hitherto was deemed unsatisfactory (SPARC, 2010). Uncertainty and bias estimates are particularly important to inform chemical data assimilation systems (Inness et al., 2013;Errera et al., 2016) and to develop observational metrics for the evaluation of model performance (Douglass et al., 1999;Waugh and Eyring, 2008). In response to this need, the Stratosphere-troposphere Processes and their Role in Climate (SPARC) core project of the World Climate Research Programme (WCRP) initiated the SPARC Data Initiative with the aim to coordinate a comprehensive assessment of available vertically resolved chemical trace gas and aerosol observations obtained from an international suite of satellite limb sounders. The SPARC Data Initiative's main objectives were (1) to assess the availability of datasets, (2) to compile time series of vertically resolved, zonal monthly mean trace gas and aerosol fields, and (3) to perform a detailed intercomparison of these time series, summarizing useful information and highlighting differences among datasets. The SPARC Data Initiative thereby complements other SPARC activities that have focused on the assessment of stratospheric ozone (e.g., Harris et al., 2015;SPARC, 2019), water vapor (SPARC, 2000;Khosrawi et al., 2018;Lossow et al., 2019), and aerosol (SPARC, 2006;Kremser et al., 2016). The provision of error estimates for atmospheric temperature and composition measurements from space following a unified methodological approach, which was highlighted by the SPARC Data Initiative (SPARC, 2017) to be a missing component of its analysis, is now the focus of the SPARC Towards Unified Error Reporting (TUNER) Initiative . A first application of the SPARC Data Initiative zonal monthly mean time series is the evaluation of stratospheric ozone and water vapor in global reanalyses  as part of the SPARC Reanalysis Intercomparison Project (S-RIP) . SPARC Data Initiative gridded datasets have also been contributed to the annual State of the Climate reports in the Bulletin of the American Meteorological Society .
Here, we present an update of the SPARC Data Initiative (SPARC, 2017), which focused on composition measurements from 1979-2010, extending its evaluation of the gridded data time series up to the end of 2018 (see Fig. 1). The update features gridded datasets based on more recent retrieval versions and adds the observations of OMPS-LP (on Suomi-NPP) and SAGE III/ISS to the original list of satellite limb sounders presented in SPARC (2017) (LIMS, SAGE I/II/III, HALOE, UARS-MLS, POAM II/III, OSIRIS, SMR, MIPAS, GOMOS, SCIAMACHY, ACE-FTS, ACE-MAESTRO, Aura-MLS, HIRDLS, and SMILES; see Sect. 2 for the full definitions of these acronyms). The gridded datasets include the stratospheric trace gases of primary interest (O 3 and H 2 O), major long-lived trace gases (SF 6 , N 2 O, HF, CCl 3 F, CCl 2 F 2 , NO y ), trace gases with intermediate lifetimes (HCl, CH 4 , CO, HNO 3 ), and shorterlived trace gases important to stratospheric chemistry including nitrogen-containing species (NO,NO 2 , NO x , N 2 O 5 , HNO 4 ), halogens (BrO, ClO, ClONO 2 , HOCl) and other minor species (OH, HO 2 , CH 2 O, CH 3 CN), and aerosol. The observations considered have been compiled on a common latitude-pressure grid, covering the region from the upper troposphere to the lower mesosphere (300-0.1 hPa) with a latitudinal resolution of 5 • . A summary of the available trace gas and aerosol gridded datasets from each instrument is given in Fig. 2. Almost half of these are based on newer data versions than those used in SPARC (2017) (highlighted in Fig. 2 and with details provided in Tables 1 and 2). The data are published via Zenodo (https://doi.org/10.5281/zenodo.4265393, Hegglin et al., 2020). Note that early data versions of chemical trace gases (i.e., research products) are not included (except for the SAGE III/ISS H 2 O product) and many more species could be made available. Also, there are a handful of early satellite limb sounders such as the Stratospheric and Mesospheric Sounder (SAMS) on Nimbus 7 (Jones et al., 1986), the Improved SAMS (ISAMS) (Taylor et al., 1993) and the Cryogenic Limb Array Etalon Spectrometer (CLAES) (Roche et al., 1993) on UARS, the Atmospheric Trace Molecule Spectroscopy (ATMOS) (Gunson et al., 1996), and the Millimeter-Wave Atmospheric Sounder (MAS) (Hartmann et al., 1996) on the Atlas Space Shuttle missions, and the Improved Limb Atmospheric Spectrometer (ILAS) on the Advanced Earth Observing Satellite (ADEOS) (Sasano et al., 1999) that could not be evaluated in this assessment due to a lack of resources and generally shorter time series than those from other datasets.
The paper is organized as follows. Section 2 provides information on the participating satellite instruments, which vary in terms of measurement method, geographical coverage, spatial and temporal sampling and resolution, time period, and retrieval algorithm. The methodology used to create and compare the trace gas and aerosol time series is described in Sect. 3. The SPARC Data Initiative introduced a top-down climatological validation approach to the evaluation of stratospheric composition measurements Tegtmeier et al., 2013Tegtmeier et al., , 2016SPARC, 2017), based on the comparison of gridded trace gas and aerosol datasets. This top-down approach complements (but does not replace) the more traditional validation approach that uses coincident profile measurements and sometimes focuses on bottom-up error budgets to characterize measurement uncertainty. The top-down climatological validation approach has the advantages that it is consistent between all instruments, avoids sensitivity to arbitrary coincidence criteria, and generally produces larger sample sizes, which minimizes the random part of the measurement error (or in other words, cancels any kind of random fluctuations). The information gained from the SPARC Data Initiative approach thereby allows us to obtain upper bounds of systematic biases between instruments by reducing the noise from single measurements through averaging. Importantly, it enables assessing the latitude dependence of these systematic biases. This work also provides unique information on how well the different instruments are capable of capturing distinct chemical and geophysical features in stratospheric composition, with the consistency among the instruments constraining our current knowledge of the state of the stratosphere.
Section 4 includes example trace gas evaluations of the longer-lived trace gases ozone (Sect. 4.1), water vapor (Sect. 4.2), and methane (CH 4 ;Sect. 4.3); and the mediumto shorter-lived trace gases carbon monoxide (CO; Sect. 4.4), nitrogen-containing species (NO,NO 2 , NO x , HNO 3 , and NO y ; Sect. 4.5), and also hydroperoxyl (HO 2 ; Sect. 4.6). These evaluations all use updated versions of the datasets used in SPARC (2017), with differences to the old versions highlighted. A summary and conclusions of the updated and evaluated SPARC Data Initiative data, including an overview   of our knowledge of the mean state of atmospheric trace gas distributions, are given in Sects. 5 and 6. Note that, due to the complicating factor that aerosol extinction measurements are wavelength dependent, the aerosol evaluations are based on a modified comparison approach, which will be presented in a follow-on publication. In addition to this paper, a special issue in the Journal of Geophysical Research (JGR) -Atmospheres on the SPARC Data Initiative has presented the evaluations of water vapor , ozone , the comparison of ozone from limb sounders with the nadir-viewing Aura Tropospheric Emission Spectrometer (Aura-TES) instrument (Neu et al., 2014), an assessment of the impact of instrument-specific sampling patterns on measurement bias , and a single instrument study on SMILES observations (Kreyling et al., 2013). A comparison featuring SPARC Data Initiative datasets of long-lived species CFC-11, CFC-12, HF, and SF6 can be found in Tegtmeier et al. (2016) and the dependence of the standard error of the mean on the sample size for profiles obtained with a non-random sampling pattern in Toohey and von Clarmann (2013). The reader is also referred to the WCRP SPARC Data Initiative Report (SPARC, 2017) which offers the complete assessment of all the different original atmospheric trace gas observations and aerosol, and is accessible online.

Satellite instruments
The SPARC Data Initiative (SPARC, 2017) originally evaluated observations from 18 different satellite limb sounders and additionally, the nadir sounder Aura-TES (Beer, 2006;Beer et al., 2001). The latter instrument was used for comparisons in the upper troposphere and lower stratosphere (UTLS) only, focusing on the comparability between limb (with high-vertical-resolution measurements) and nadir sounders (with high-horizontal-resolution measurements) applying observation operators (Neu et al., 2014). In this update, TES is no longer included, but the instruments SAGE III on the ISS (hereafter SAGE III/ISS) and OMPS-LP on Suomi-NPP are added for evaluations including trace gas datasets between 2011 and 2018.
The instruments considered here all use passive remote sensing techniques, which are based on the detection of natural radiation emitted from the Sun or stars, or from the atmosphere itself (unlike active sounders such as lidars). The different instruments can be classified according to their observation geometry (limb emission, solar or stellar occultation, limb scattering, or nadir emission) and the wavelengths they are measuring at, as compiled in Table 3. In the following, we provide a short description of each instrument, with the most important instrument characteristics summarized in Tables 4  and 5, and the representative sampling patterns provided in Fig. 3. Note that the vertical range observed can depend on the retrieved species. Further information on the instrument and retrieval algorithms can be found in the SPARC Data Initiative Report (SPARC, 2017).

LIMS on Nimbus 7
The Limb Infrared Monitor of the Stratosphere (LIMS) instrument was launched aboard the Nimbus 7 satellite in October 1978 (Gille andRussell, 1984). The spacecraft occu-pied a Sun-synchronous orbit, crossing the ascending node at ∼ 13:00 local time (LT) and the descending node at ∼ 23:00 LT, taking observations from 64 • S to 84 • N latitude. LIMS used broadband radiometry to observe infrared limb emission, with two radiometer channels for sensing temperature (atmospheric CO 2 ) centered near 15 µm, and further four channels for sensing trace gases: 6-7 µm for H 2 O and NO 2 , 9-10 µm for O 3 , and 11-12 µm for HNO 3 (Remsberg et al., 2004). LIMS obtained radiance profiles at every ∼ 0.8 • latitude along its orbital, tangent-point tracks, yielding ∼ 260 profiles per orbit with ∼ 14 orbits per day. LIMS operated successfully from launch through its end date in May 1979, when there was final depletion of the cryogen gas supply for cooling its detectors.  1979(McCormick et al., 1979. The spacecraft was in a ∼ 600 km orbit with an inclination of 560 • that allowed for solar occultation measure- . Representative sampling patterns for the instruments are shown in time-latitude space for solar occultation sounders to reflect annual sampling patterns (upper two rows) and in longitude-latitude space for emission/scattering and stellar occultation sounders to reflect daily sampling patterns (lower three rows). Different years or days are chosen to give a sense of change in the observed sampling patterns over time. See also Fig. 1 in  for the resulting measurement density in latitude-time space for the original SPARC Data Initiative instruments. Note that the sampling patterns of ACE-MAESTRO and POAM III are the same for ACE-FTS and POAM II, respectively, and thus are not shown here. The sampling pattern of SAGE I is very similar to that of SAGE II and HALOE. The gap in the sampling seen in OMPS and SCIAMACHY over South America is the result of the South Atlantic Anomaly, a dip in Earth's magnetic field that allows charged particles to penetrate lower into the atmosphere and as a consequence causes irregularities in the recorded spectral signals by these instruments.  (Mauldin III et al., 1985;Mc-Cormick et al., 1989). The spacecraft occupied a 57 • inclined orbit at an altitude of ∼ 610 km that allowed for observations from 80 • S to 80 • N. The SAGE II instrument was a broadband spectrometer that operated in the spectral range of ∼ 375-1030 nm for aerosol and trace gas observations (Mauldin et al., 1985). SAGE II measured 15 sunrise and 15 sunset measurements each day that covered a narrow latitude band and are separated by ∼ 24 • in longitude. After late 2000, an azimuthal pointing problem resulted in the instrument operating at half-duty cycle. The ERBS mission was decommissioned in October 2005.

SAGE
The Stratospheric Aerosol and Gas Experiment III (SAGE III/M3M) was launched aboard the Russian Meteor-3M (M3M) spacecraft in December 2001 (Mauldin et al., 1998). The spacecraft was placed on a Sun-synchronous orbit, with an altitude of ∼ 1020 km, inclination of 99.50 • , and equatorial crossing time (ascending node) at 09:15 local time (LT).
The SAGE III/M3M provided both solar and lunar measurements, with satellite sunrise events at 60 to 30 • S and satellite sunset events at 45 to 80 • N. Lunar events varied from pole to pole. The SAGE III instrument used a grating spectrometer that operated in the spectral range of ∼ 295-1025 nm and a single photodiode near 1550 nm for aerosol and trace gas observations (Mauldin et al., 1998). The M3M spacecraft ceased functioning in January 2006.
The Stratospheric Aerosol and Gas Experiment III on the International Space Station (SAGE III/ISS) is the second instrument from the SAGE III project (Mauldin et al., 1998). It was launched on the SpaceX Falcon 9 spacecraft in February 2017. Unlike the first SAGE III instrument on the Meteor-3M spacecraft (SAGE III/M3M), SAGE III/ISS is in a mid-inclination orbit (51.6 • ). The solar observations can provide near-global (70 • S-70 • N) measurements on a monthly basis with sampling similar to that of the SAGE II measurements. The SAGE III/ISS uses a grating spectrometer operating between ∼ 280 and ∼ 1035 nm as well as a single photodiode covering 1542 nm ±15 nm to retrieve aerosol and other trace gases (SAGE III ATBD, 2002). It can provide vertical profiles of O 3 , H 2 O, NO 2 , and aerosol extinctions at multiple wavelengths through the solar occultation technique. The lunar occultation measurements can augment the sampling of solar observations with measurements of O 3 and NO 2 , as well as NO 3 and ClO 2 . The sampling pattern and resulting monthly and annual sampling density of SAGE III/ISS are shown in Fig. 4, equivalent to what is shown for the other instruments in chap. 2 of the SPARC Data Initiative Report (SPARC, 2017).

Table 4.
Satellite instrument characteristics including satellite platform, observation period, spatial coverage, vertical range, vertical resolution, data density, local time (LT) at the Equator, LT of measurement, inclination, and instrument references. Note that the vertical range is generally species dependent and the vertical resolution is often species and altitude dependent; see Tables 9-16 for details. For instruments providing data on a pressure grid, the vertical range is also given in kilometers. n/a -not applicable Instrument,  Table 5. Table 4 continued. Note that the low (high) vertical resolution along with the high (low) data density entries in MIPAS refer to the two different measurement modes MIPAS had been measuring in before and after 2004, referred to as MIPAS(1) and MIPAS(2) in later tables, respectively. Also note that the ascending part of the SCIAMACHY orbit lies mostly in darkness, resulting in only a few measurements which are not included in the SPARC Data Initiative gridded datasets (thus, the ascending measurement LT is not listed). n/anot applicable Instrument, Obs.  . The OMPS sampling gap over South America results from a filter that removes measurements affected by the South Atlantic Anomaly, which causes increased noise in measured radiances from transient particle strikes to the instrument detector.

HALOE on UARS
The Halogen Occultation Experiment (HALOE) was launched aboard the Upper Atmosphere Research Satellite (UARS) in September 1991 (Russell et al., 1993). The spacecraft occupied a 57 • inclined orbit at an altitude of ∼ 585 km that allowed for observations from 80 • S to 80 • N. The HALOE instrument used a combination of broadband radiometry and gas filter correlation techniques to observe several trace gas species in the spectral range of ∼ 2.4-10.4 µm. HALOE measured 15 sunrise and 15 sunset events per day and achieved near-global coverage in approximately a month. The daily measurement spacing was equal in longitude and varied seasonally in latitude. The UARS mission was decommissioned in December 2005.

MLS on UARS
The Upper Atmosphere Research Satellite Microwave Limb Sounder (UARS-MLS) was also launched aboard UARS in September 1991 (Barath et al., 1993;Waters et al., 1993Waters et al., , 1999. The spacecraft's orbit (see Sect. 2.3) allowed for MLS observations in two sets of latitude bands, alternating roughly every 36 d (as governed by spacecraft yaw maneuvers) between mostly Northern Hemisphere and mostly Southern Hemisphere latitudes, with full coverage of low latitudes at all times. UARS-MLS performed microwave thermal emission measurements using antenna scans of the Earth's limb and three radiometers to detect spectral line and continuum signals (at 1.45, 1.63, and 4.76 mm wavelengths) and to retrieve profiles of upper atmospheric temperature, trace gases, as well as upper tropospheric (UT) H 2 O and cloud ice water content. UARS-MLS provided more than 1300 profiles (per species) along the sub-orbital track every day, during both daytime and nighttime. The UARS-MLS measurements became increasingly sparse after 1994 in order to preserve the antenna scanning mechanism and as a result of UARS battery power limitations. The last UARS-MLS profiles were obtained in 2001, before UARS was officially decommissioned in December 2005.

POAM II/III on SPOT-3/4
The Polar Ozone and Aerosol Measurement II (POAM II) was launched aboard the SPOT-3 spacecraft in September 1993 (Glaccum et al., 1996). The spacecraft occupied a Sunsynchronous orbit, crossing the descending node at 10:30 LT, that allowed for observations in two latitude bands at 88 to 62 • S and 65 to 71 • N. The POAM II instrument used broadband radiometry to observe trace gases and aerosols in the spectral range of ∼ 350-1070 nm. POAM II used the solar occultation technique and made 14 measurements per day in each hemisphere, equally spaced in longitude around a circle of approximately constant latitude. Satellite sunrise measurements were made in the Northern Hemisphere (55-71 • N) and sunsets in the Southern Hemisphere (63-88 • S). The latitude coverage changes slowly with season and is exactly periodic from year to year. The SPOT-3 spacecraft ceased functioning in November 1997. POAM II produced a 3-year dataset (1993)(1994)(1995)(1996) of polar stratospheric O 3 , NO 2 , and aerosols.

OSIRIS on Odin
The Optical Spectrograph and InfraRed Imaging System (OSIRIS) was launched aboard the Odin satellite in February 2001 (Murtagh et al., 2002;Llewellyn et al., 2004). The spacecraft occupies a 97.8 • inclined, Sun-synchronous orbit, crossing the ascending node near 18:00 LT that allows for near-global observations between 82 • S and 82 • N. The OSIRIS spectrograph has a single line of sight that vertically scans the Earth's limb measuring the spectral radiance of scattered sunlight in the spectral range of 290-810 nm. OSIRIS provides approximately 500 profiles of aerosol and trace gases per day along the orbital track during daytime. Tropical latitudes are sampled throughout the year, but due to the seasonally changing solar illumination conditions at the tangent point of the observation, the coverage of midlatitudes and high latitudes is limited to the sunlit hemisphere in the summer and winter, with near-global coverage for about 1 month around each equinox. The latitude coverage changes slowly with the degradation of the orbit but follows essentially the same pattern each year. OSIRIS reached 20 years in orbit in February 2021 and continues operation at the time of writing.

SMR on Odin
The Sub-Millimetre Radiometer (SMR) was also launched aboard the Odin satellite in February 2001 (Murtagh et al., 2002). See Sect. 2.6 for details on the satellite's orbit. The SMR has a single line of sight that vertically scans the Earth's limb measuring the thermal emission of the atmosphere in the 0.55 mm wavelength region. SMR provides approximately 900 profiles of trace gases per day along the orbital track. Not all gases can be measured simultaneously, but rather the tuning of the instrument is varied on a daily basis to optimize the various science goals. Thus, while some species such as O 3 are measured on a close-to-daily basis, others are only measured a few times per month. SMR reached 20 years in orbit in February 2021 and continues operation at the time of writing.

GOMOS on Envisat
The Global Ozone Monitoring by Occultation of Stars (GO-MOS) instrument was launched aboard the Envisat spacecraft in March 2002 . The spacecraft occupied a 98.55 • inclined, Sun-synchronous polar orbit, crossing the descending node at 10:00 LT that allowed GOMOS global nighttime observations. The GOMOS instrument used a grating spectrometer to observe trace gases O 3 , NO 2 , NO 3 , and aerosols in the spectral range of 248-690 nm. GOMOS used the stellar occultation method and made 100-200 nighttime occultations per day. The latitude coverage of GOMOS was global, except for the summertime polar regions. The Envisat spacecraft ceased functioning in April 2002.

MIPAS on Envisat
The Michelson Interferometer for Passive Atmospheric Sounding (MIPAS) was also launched aboard Envisat in March 2002 (Fischer et al., 2008). See Sect. 2.8 for details on the satellite's orbit that allowed MIPAS to attain global daytime and nighttime limb emission measurements. MIPAS was a Fourier transform spectrometer that operated in the spectral range of 4.3-15 µm wavelength region for trace gas, temperature, and aerosol observations (Fischer et al., 2008). From 2002From -2004, MIPAS recorded one limb scan of spectra each 510 km and provided about 1000 vertical profiles per day. From 2005-2012, the along-track horizontal spacing was 410 km, however, at slightly degraded spectral though improved vertical resolution. In this paper, data produced with the processor developed and operated by the Institut für Meteorologie und Klimaforschung (IMK) in cooperation with the Instituto de Astrofísica de Andalucía are used . Several other MIPAS retrieval products are available (see Lossow et al., 2019); however, they were not contributed to the SPARC Data Initiative in the required format. Note that the IMK processor also provides more species than these other processors.

SCIAMACHY on Envisat
The Scanning Imaging Absorption spectroMeter for Atmospheric CHartographY (SCIAMACHY) (Burrows et al., 1995;Bovensmann et al., 1999) was also launched aboard Envisat in March 2002. See Sect. 2.8 for details on the satellite's orbit that allowed SCIAMACHY to attain observations between 85 • S and 85 • N (65 • in the winter hemisphere). The SCIAMACHY instrument was an eight-channel passive imaging grating spectrometer that observed aerosol and trace gases in the spectral range of ∼ 214-2386 nm. SCIA-MACHY used the limb scattering, nadir backscattering, and solar occultation techniques, although only the results from limb scattering measurements are used in this study. In the limb scattering mode, SCIAMACHY made over 1000 measurements per day. Cross-track scans, each consisting of four measurements, are equally spaced in latitude and longitude. The latitude coverage changes slowly with season and is exactly periodic from year to year. Measurements performed within the South Atlantic Anomaly were rejected. From the measurements in the limb-viewing geometry, SCIAMACHY produced an almost-10-year dataset (2002-2012) of stratospheric O 3 , H 2 O, NO 2 , BrO, and aerosols.

ACE-FTS on SciSat-1
The Atmospheric Chemistry Experiment-Fourier Transform Spectrometer (ACE-FTS) was launched aboard the SciSat-1 spacecraft in August 2003 (Bernath, 2006). The spacecraft occupies a drifting orbit at an inclination of 74 • that allows for observations from to 85 • S and 85 • N. The ACE-FTS instrument is a high-resolution (0.02 cm −1 ) FTS measuring the full spectral range between 750 and 4400 cm −1 to measure the chemical composition of the atmosphere (Bernath et al., 2005). The ACE-FTS uses the solar occultation technique to measure approximately 15 sunrise and 15 sunset occultations per day and achieves global latitude coverage over a period of 3 months (i.e., one season). The latitude coverage is almost exactly periodic from year to year. At the time of writing, measurements from the ACE-FTS are ongoing.

ACE-MAESTRO on SciSat-1
The ACE Measurement of Aerosol Extinction in the Stratosphere and Troposphere Retrieved by Occultation (ACE-MAESTRO) was launched together with the ACE-FTS aboard the SciSat-1 spacecraft in August 2003(McElroy et al., 2007. See Sect. 2.11 for details on the satellite's orbit. The ACE-MAESTRO instrument consists of a dual grating spectrometer to observe trace gases and aerosols in the spectral range of ∼ 280-1030 nm. ACE-MAESTRO uses the solar occultation technique and makes 15 sunrise and 15 sunset measurements per day, equally spaced in longitude. The two ACE instruments take simultaneous measurements of the same air mass using a common Sun-tracking mirror that is located within the ACE-FTS. At the time of writing, measurements from ACE-MAESTRO are ongoing.

MLS on Aura
The Aura Microwave Limb Sounder (Aura-MLS) was launched aboard the Aura satellite in July 2004 (Waters et al., 1999(Waters et al., , 2006. The spacecraft occupied a 98 • inclined near-polar, Sun-synchronous orbit, with a 13:45 LT ascending node Equator-crossing time that allows for observations from about 80 • S to 80 • N on a daily basis. Aura-MLS, similar to its UARS predecessor version (see Sect. 2.4), performs microwave thermal emission measurements using an-tenna scans of the Earth's limb and five radiometers to detect spectral line and continuum signals (at 0.47, 1.25, 1.58, and 2.54 mm wavelengths, along with measurements at 0.12 mm of OH) and to retrieve profiles of upper atmospheric temperature and many trace gases, as well as cloud ice water content. Aura-MLS provides about 3500 profiles (per species) along the sub-orbital track every day, during both daytime and nighttime. At the time of writing, measurements from two of the Aura instruments (MLS and the Ozone Monitoring Instrument; OMI) are ongoing.

HIRDLS on Aura
The High Resolution Dynamics Limb Sounder (HIRDLS) instrument was also launched aboard Aura in July 2004 (Gille and Barnett, 1992). See Sect. 2.13 for details on the satellite's orbit. Unfortunately, during launch, a plastic film became detached and blocked the path between the scan mirror and the aperture of HIRDLS (Gille et al., 2008), reducing coverage to latitudes from about 63 • S to 80 • N on a daily basis, with observing times at 15:00 and 00:00 LT. HIRDLS was a limbscanning infrared radiometer and observed temperature, 10 trace gases, and aerosols in 21 broad spectral channels at wavelengths between 6.12 to 17.76 µm. HIRDLS observed approximately 6400 profiles each day, with profiles spaced approximately every 100 km along the orbit track. HIRDLS stopped acquiring data on 17 March 2008 due to a chopper failure. Useful HIRDLS data began in January 2005 and ended at the end of December 2007.

SMILES on the ISS
The Superconducting Submillimeter-Wave Limb Emission Sounder (SMILES) was installed on the International Space Station (ISS) in September 2009 (Kikuchi et al., 2010). As mentioned in Sect. 2.2, the ISS is in a circular, midinclination orbit (at 51.6 • ). With the SMILES antenna mounted so that its field of view is 45 • to the left of the orbital plane, the observed latitude region was increased to between 38 • S and 65 • N. Three times during the observation period (in late November, middle of February, and beginning of April), the ISS turned 180 • along its yaw axis, so that the field-of-view deflection was pointing southward, resulting in inverse hemispheric observation ranges (65 • S-38 • N). Three times SMILES was the demonstration of ultrasensitive submm limb emission observations with a 4 K-cooled receiver system. A total of 1630 observation points were obtained per day. The non-Sun-synchronous orbit of the ISS allowed the instrument to observe the diurnal variation of minor shortlived species. The instrument was in operation between October 2009 and April 2010.

OMPS-LP on Suomi-NPP
The advanced Ozone Mapping and Profiler Suite (OMPS) was launched aboard the Suomi National Polar-orbiting Partnership (NPP) spacecraft in 2002 (Jaross et al., 2014). The spacecraft occupies a Sun-synchronous orbit with a 13:30 LT ascending node Equator-crossing time that allows for observations between 81.5 • S and 81.5 • N (limited to 60 • in the winter hemisphere). OMPS consists of three spectrometers: a downward-looking nadir mapper, nadir profiler, and limb profiler. Only the measurements from the latter instrument (OMPS-LP) are used in this study. The OMPS-LP instrument is equipped with a 2-D imaging prism spectrometer and observes aerosol and trace gases in the spectral range of ∼ 280-1000 nm through three entrance slits separated horizontally It should be noted that the two OMPS-LP ozone datasets used in the SPARC Data Initiative are based on different retrieval algorithms: the Institute of Environmental Physics (IUP)-OMPS (Arosio et al., 2018) and University of Saskatchewan (USask)-OMPS . The main difference between these two products is that USask is retrieved using a 2-D tomographic algorithm and IUP uses a standard 1-D algorithm. Furthermore, the spectral information and associated tangent height ranges are used differently. NASA also produces a stratospheric ozone product from OMPS-LP (Rault and Loughman, 2013) which is not included in the SPARC Data Initiative.

Gridded dataset construction and evaluation methodology
In the following, a short summary of the method used to compile the SPARC Data Initiative zonal monthly mean time series is provided. More detailed information on the instrument-specific data preparation and handling can be found in the SPARC Data Initiative report (SPARC, 2017, chap. 3, pp. 30-36).

Gridded dataset construction and uncertainty
Zonal monthly mean time series of each trace gas species (in volume mixing ratio; VMR) and aerosol (as extinction ratio) have been calculated for each instrument on the SPARC Data Initiative dataset grid, using 5 • latitude bins (with midpoints at 87.5 • S, 82.5 • S, 77.5 • S, . . . , 87.5 • N) and 28 pressure levels (300,250,200,170,150,130,115,100,90,80,70,50,30,20,15,10,7,5, 3, 2, 1.5, 1, 0.7, 0.5, 0.3, 0.2, 0.15, and 0.1 hPa). To this end, profile data have been carefully screened before binning and a hybrid log-linear interpolation in the vertical has been performed (i.e., the VMR is interpolated linearly in log pressure). For instruments that provide data on an altitude grid, a conversion from altitude to pressure levels is performed using retrieved temperature/pressure profiles (as is the case for MIPAS, ACE-FTS, and ACE-MAESTRO) or meteorological analyses (ECMWF for OSIRIS, GOMOS, and SCIAMACHY, National Centers for Environmental Prediction (NCEP) for SAGE I and III/M3M, MERRA for SAGE II, MERRA-2 for SAGE III/ISS, the UK Met Office (UKMO) for POAM II/III, and GMAO/GEOS-5 for OMPS-LP). Similarly, this information is used to convert retrieved number densities into VMR, where needed. It should be noted that using different ancillary data for the grid and unit conversions will introduce an additional source of uncertainty, which has not been quantified here (see also discussion in Hubert et al., 2016). Any known problems in the ancillary temperature/pressure data that were used to convert measured species from their native units to VMR and pressure grids, however, have been fixed by an updated retrieval algorithm or minimized with empirical corrections. For example, problems in the older SAGE II (v6.2) temperature or pressure auxiliary files, mainly in the tropics above 2 hPa, were empirically corrected  before being incorporated in the SPARC Data Initiative gridded dataset (SPARC, 2017). The anomalous temperature problem in SAGE II (v6.2) has been fixed in the latest v7.0 retrieval, which is used in the updated SPARC Data Initiative gridded dataset and this paper. Both SAGE III/ISS (v5.1) and SAGE II (v7.0) data were also updated to remove or minimize the effects of altitude registration errors in the auxiliary temperature profiles (Wang et al., 2020). Along with the zonal monthly mean value, the standard deviation and the number of averaged data values are given for each grid point, as well as the average day of month, and the minimum, mean, and maximum local solar times for these values (see Fig. 5 and Table 6 for an illustration and summary of the variables included in each SPARC Data Initiative dataset file). Note that the methodology for the calculation of the ACE-FTS gridded datasets has changed since SPARC (2017). While for the older gridded datasets, data were binned for each midpoint between the Data Initiative pressures levels, interpolation to these levels is now used (matching what has been done in Koo et al., 2017). The methodology for the calculation of ACE-MAESTRO datasets is done in the same way as for ACE-FTS. For the new SAGE III/ISS gridded datasets, the same approach was followed as for the other SAGE instruments. For OMPS, the observations are handled in exactly the same way as those from SCIAMACHY with exception of the rejection of measurements within the South Atlantic Anomaly (SAA) region. While for SCIAMACHY, a fixed latitude-longitude range is used, the SAA flags from the level-1 product are used for OMPS.
Interpretation of the differences between the individual trace gas and aerosol datasets will need to take into account several sources of uncertainty, including systematic errors of both the measurements and the dataset construction. Random measurement errors have little impact on the zonal monthly means; however, measurement biases (e.g., related to retrieval errors) will introduce systematic differences between an individual instrument's mean field and the truth. Differences in the mean fields from the truth arise also from sampling biases (see  for the SPARC Data Initiative; Sofieva et al., 2014, Míllan et al., 2016, and Kloss et al., 2019 for other related studies) and differences in the averaging technique used to produce the gridded datasets . Since the overall uncertainty of the gridded data is not accessible in a consistent way from bottom-up estimates for all of the instruments included in the SPARC Data Initiative (a task now being addressed by SPARC TUNER), we use here as an approximate measure of the uncertainty in each zonal monthly mean field, the standard error of the mean (SEM): where σ is the standard deviation of the measurements and n the number of measurements at each grid point. The range of twice the SEM can be roughly interpreted as the 95 % confidence interval of the monthly mean under the assumption of Gaussian statistics and independent errors. Although sampling patterns and densities differ greatly between different instruments, the SEM has been shown to generally produce a conservative estimate of the true random error in the mean for both solar occultation and dense sampling patterns (Toohey and von Clarmann, 2013). This is due to the fact that sampling by satellite instruments is roughly uniform with respect to longitude. It should be noted, however, that the SEM does not reflect the potential influence of irregular or incomplete sampling of the month and latitude band, which can produce sampling biases in the mean fields .

Climatological validation approach
The SPARC Data Initiative introduces a complementary approach of testing data quality using vertically resolved, zonal monthly mean gridded datasets of trace gas observations for comparison, rather than using profile-to-profile evaluations based on measurement coincidences, which has been done extensively in the literature. We coin this methodology with the term "climatological validation approach" where "climatological" in this context is not used to refer to a timeaveraged climate state (which should be reproduced by freerunning models, averaged over many years) but to year-byyear values (which free-running models would not be ex-pected to match). The climatological approach was chosen because multiple measurements can in principle be averaged to reduce random measurement errors, leaving the systematic error (or bias, although it needs to be noted that here this bias is defined as relative to the multi-instrument mean and not an absolute truth). Comparing these gridded datasets has the advantage of removing much of the natural variability inherent to trace gas observations from both in situ sensors and measurements from space (Hegglin et al., 2008Tegtmeier et al., 2013) and yields information on the behavior of the retrievals resolved in latitude and height. In addition, monthly mean comparisons allow for testing how well the instruments' measurement characteristics are capable of resolving geophysical features (e.g., interannual variability, seasonality, or periodicities). The climatological validation approach is applied to all evaluations and its advantages and disadvantages will be discussed where appropriate. However, when using the climatological validation approach, some general guidelines should be followed. As highlighted in the sampling study by  as an integral part of the SPARC Data Initiative, sampling biases in the gridded datasets may contribute to the derived biases; this requires careful consideration in the interpretation of the results, as was attempted throughout SPARC (2017) and also in this update at least in a qualitative manner.  investigated the impact of 15 of the here-presented instrument's sampling patterns (not including LIMS, SMR, OMPS, or SAGE III/ISS) on the gridded datasets using chemical fields from a chemistry climate model as idealized truth. The evaluation found sampling biases of up to 10 % for O 3 monthly means, and up to 20 % for annual means for some instruments, generally in atmospheric regions with high natural variability such as the highlatitude stratosphere or the UTLS. Longer-lived species with lower variability such as H 2 O show smaller sampling biases (except in the UTLS). Non-uniform sampling in both space and time thereby contribute to these sampling biases. While  have characterized the sampling biases for ozone and water vapor only, the resulting bias patterns can be taken as guidelines for trace gases with similar source and sink characteristics (or lifetimes). Similar findings have been highlighted by Damadeo et al. (2018), particularly for datasets with very sparse sampling patterns.
Another important aspect of our approach is that trace gas time series are compared without any modification, such as the application of averaging kernels, to account for different vertical resolutions. We consider our simplified approach as justified, because in most cases the vertical resolutions of the limb sounders are quite similar, and the degree to which a priori information influences the retrieved profiles is usually limited. Exceptions are discussed where they appear.
Furthermore, highly structured and transient features, which can, for example, arise from different modes of natural variability such as the Quasi-Biennial Oscillation (QBO), the El Niño-Southern Oscillation (ENSO) (e.g., Diallo et al.,  Table 6. Description of the content included in each of the SPARC Data Initiative data files, here using N 2 O as an example variable. Each file includes the time series of zonal monthly mean data for 1 year. Not-a-number values are filled in with "−999.0". See Fig. 4 for an example. Note that while much effort has been put into applying a consistent file format across the different instruments, some files may still differ from the description here. 2019), or sudden stratospheric warmings (SSWs) (e.g., Manney et al., 2009) and which may not be resolved by some instruments, will most likely average out in the zonal monthly mean fields. Nonetheless, it is best to compare zonal mean fields averaged over the exact same years and for the max-imum time period for which all instruments overlap (ideally for more than 4-5 years). When this is not possible, as many years as possible should be included, keeping in mind a potential tradeoff with underlying trends in a given trace gas over the time period considered. For most species,  (2017) concluded that expected trends are generally smaller than inter-instrument differences.
Where the instruments' temporal coverage allowed for it, inter-instrument differences should be tested for different time periods to get a sense of the influence of temporal inconsistencies in the comparison. Again, SPARC (2017) concluded that the general structure in the different instruments' biases relative to another did not significantly change. However, there are some examples where the previous conclusion was not applicable. SAGE II versus HALOE differences in particular show inter-instrument differences changed over time, which was indicative of a drift in one of the instruments or an influence of volcanic aerosol that could not be fully accounted for in the retrieval. Note that, in this case, the change in the biases was not attributable to sampling, since the instruments were compared over the same time periods (see SPARC, 2017).
Finally, within the SPARC Data Initiative, agreement between instruments is defined using the terminology specified in Table 7. All these numbers indicating a certain level of agreement are with respect to the multi-instrument mean (MIM; see Sect. 3.2.3), so that where two instruments show excellent agreement of ±2.5 %, the two instruments could show a maximum difference of 5 % between them.

Evaluation diagnostics
A set of standard diagnostics is used to investigate the differences between the time series obtained from the different instruments. The diagnostics include comparisons of annual or zonal monthly mean trace gas fields, vertical and meridional mean profiles, seasonal cycles for a single year or averaged over multiple years, and multi-annual averages of latitudemonth evolution. Additional evaluations of interannual variability and known tracer-specific features (such as the taperecorder signal in water vapor or the QBO signal in ozone) which test the physical consistency of the datasets, were also carried out and those not presented here can be found in SPARC (2017). The evaluation methods for the trace gas species time series and more examples are more thoroughly described in Hegglin et al. (2013), Tegtmeier et al. (2013), and SPARC (2017).

Multi-instrument mean reference
The SPARC Data Initiative's approach is to use the MIM as a reference to which all instruments are compared. The MIM is calculated by taking the annual or monthly mean of all available instrument datasets within a given time period of interest, aiming at maximum spatial and temporal data coverage for each instrument in order to limit the impact of sampling bias. Note that the MIM does not represent the best estimate of the atmospheric state but rather is motivated by the need that it does not favor a certain instrument. Most datasets are included in its calculation regardless of their quality and without any weighting applied to them. In particular, the datasets from instruments with sparse sampling have the same weight as datasets from instruments with much higher sampling in the calculation of the MIM. Only if measurements from a particular instrument are deemed unrealistic (i.e., outside the ±3σ range), or if another version of a specific trace gas data product is available from the same instrument, are they not included. The relative percentage differences between the trace gas mixing ratios of an instrument (χ i ) and the MIM (χ MIM ) are then given by One always has to keep in mind when interpreting relative differences with respect to the MIM that the composition of instruments from which the MIM was calculated may have changed between time periods. Hence, changes in derived differences are not to be interpreted as changes in the performance (or drifts) of an individual instrument. Also, if there is an unphysical behavior in one instrument, the MIM and thus the differences with respect to the MIM of the other instruments will most certainly reflect this unphysical behavior as well, although we have tried to eliminate the largest outliers. Finally, if one instrument does not have global coverage for every month, some sampling biases may be introduced into the MIM (see discussion in Sect. 3.2.1). Due to its changing nature, the MIM is thus not made available via the Zenodo data archive.

Summary evaluation
Finally, a summary evaluation (seen in Figs. 15-17 and discussed in Sect. 5) is presented, which provides an estimate of the uncertainty in our knowledge of the atmospheric mean state of a given trace gas. This uncertainty is expressed as the relative standard deviation (i.e., calculated relative to the MIM) over all instrument values at a given latitude-pressure grid point or, in other words, the spread between the datasets around the MIM.

Examples of SPARC Data Initiative trace gas evaluations
The approach of the SPARC Data Initiative for evaluating chemical trace gas datasets from stratospheric limb sounders is illustrated in the following providing updates to the ozone  and water vapor evaluations , and presenting additional examples based on CH 4 , CO, different nitrogen-containing species like NO, NO 2 , HNO 3 , and NO y , and HO 2 measurements. These species were chosen to highlight particular differences in the evaluation approach that were necessary to account for the wide range of average lifetimes valid for the lower stratosphere among the species considered (e.g., 8 years for CH 4 , 3 months for CO, seconds for HO 2 ). Note that the definitions and abbreviations of different altitude regions in the atmosphere as used throughout this study are given in Table 8.

Ozone (O 3 )
Ozone is one of the most important trace species in the stratosphere due to its absorption of biologically harmful ultraviolet radiation and its role in determining the temperature structure of the atmosphere. A systematic comparison of the SPARC Data Initiative ozone datasets has been provided in Tegtmeier et al. (2013) and SPARC (2017), revealing that the uncertainty in our knowledge of the O 3 mean state is smallest in the tropical MS and midlatitude LS and MS (see Table 8 for abbreviations). Notable differences between the datasets, on the other hand, exist in the tropical LS and at high latitudes. Here, the multi-instrument spread increases to ±30 % at the tropical tropopause (hence indicating considerable disagreement between the instruments) and ±15 % at polar latitudes (reasonably good agreement), which is partially related to inter-instrumental differences in vertical resolution and geographical sampling.
It should be noted that diurnal ozone variations are of ∼ 10 % below 1 hPa and grow with increasing altitude up to more than 100 % for upper mesospheric levels (e.g., Wang et al., 1996;Schneider et al., 2005). In addition, the impact of temperature uncertainties on the conversion from altitude to pressure during the gridded dataset production may cause additional errors that are particularly pronounced in the LM. Therefore, the mesospheric ozone observations were not corrected (as was done for the nitrogen-containing species; see Sect. 4.5). Instead, we present the ozone evaluations up to 1 hPa only.
An update of Fig. 2 from Tegtmeier et al. (2013) is given in Fig. 6 including new versions of SAGE II, SMR, OSIRIS, MIPAS, GOMOS, SCIAMACHY, ACE-FTS, ACE-MAESTRO, Aura-MLS, and HIRDLS ozone datasets. Note that MIPAS measured in a high-spectral-resolution measurement mode between 2002 and 2004 (hereafter called MI-PAS(1)), which switched to a low-spectral-resolution measurement mode after 2004 (hereafter called MIPAS (2)). The latter led to the opportunity to measure at a higher vertical resolution. In addition, new datasets obtained from OMPS-LP and SAGE III/ISS have been added. Tables 1 and 9 provide detailed information on time period, vertical range, vertical resolution, and other information on the different data versions evaluated here. Overall, the updated datasets agree better with notably smaller differences found for SMR, SCIAMACHY, ACE-FTS, GOMOS, and MIPAS.
For SAGE II, the updated data version (v7.0) shows very similar structures in the relative differences to the MIM as version v6.2 used in Tegtmeier et al. (2013), albeit tending to more negative values throughout the atmosphere. Some of the rapid transitions between positive and negative values are a result of the combination of seasonal and diurnal sampling biases during the last few years of the mission (as evaluated here), when sampling became more sparse.
For SMR, a new data product (v3.1) is evaluated here, based on frequency mode 2 that monitors the band 544.102-544.902 GHz. This product has been improved from earlier versions (not included in SPARC, 2017) by adjusting the line broadening constant and removing the pointing offset (Murtagh et al., 2020). Compared to SMR frequency mode 1, version 2.1 ozone product (included in SPARC, 2017), the negative bias of 10 %-20 % in the upper stratosphere has been reduced to values of 2.5 %-10 % (Fig. 6), thus showing very good to good agreement with the other instruments. The updated MIPAS(2) ozone (v224) benefits from better tem-   Wang et al. (2020) perature data in the mesosphere and optimization of spectroscopic data for some spectral regions. In comparison to the old MIPAS(2) ozone (v220), differences in the upper stratosphere are now reduced to 2.5 %-5 %, which is about half of their original amount. SCIAMACHY provides an updated data version (V3-5) based on a new retrieval algorithm (Jia et al., 2015), which improved the retrievals considerably compared to the previously evaluated version (V2.5; , with a positive bias in the MS and US now reduced from 10 %-20 % to 2.5 %-10 %. Updated ACE-FTS ozone (v3.6) in the MS and US shows considerably smaller differences to the MIM (mostly up to 5 %, Fig. 6) than the old dataset (v2.2), which had a low bias in the MS of up to 10 % and a high bias in the US of up to 10 %-20 % . Interpolation of mixing ratios to the SPARC Data Initiative grid in log pressure, data filtering based on quality flag information (Sheese et al., 2015), and reduced non-physical oscillations in the updated pressure and temperature retrievals all contribute to the improved performance (Koo et al., 2017;Waymark et al., 2014). The ACE-MAESTRO dataset (v3.13), on the other hand, has larger biases than the previously evaluated version (v2.1; Tegtmeier et al., 2013). In particular, the low bias in the LS and the high bias in the US increased from 2.5 %-5 % to 10 %-20 % ( Fig. 6; see also Bognar et al., 2019). Both MAESTRO versions use ACE-FTS temperature profiles in the retrieval, which requires information on the relative time difference between the measurements. For v3.13, this time difference is determined from MAESTRO O 2 slant column and ACE-FTS air mass slant column instead of using a con- stant value based on the best match between the ozone profiles. However, it is not clear if these changes cause the larger biases or if they are related to other issues of the v3.13 processing. This is under investigation.
The GOMOS O 3 dataset (v5.0) used previously has shown a substantial positive bias in the LS (30 %) and UT (80 %)  due to the high sensitivity of the retrieval algorithms to the aerosol extinction model. The new GOMOS datasets (ALGOM2s; Sofieva et al., 2017) are based on a new O 3 profile inversion algorithm, which is optimized by enhancing the spectral inversion at visible wavelengths for the UTLS, thus decreasing the impact of the aerosol model. As a result, GOMOS performs much better with excellent agreement in the LS (Fig. 6). In the UT, GO-MOS retrieves lower ozone values than the other instruments, with differences to the MIM of 20 % to 50 %.
New O 3 data products from IUP-OMPS (Arosio et al., 2018) and USask-OMPS (based on a 2-D retrieval) agree very well with the other datasets in the middle and upper stratosphere (Fig. 6). The two data products are based on different retrieval algorithms but show very similar structure with positive differences of 2.5 %-5 % in the MS and US increasing up to 10 %-20 % at the SH high latitudes and higher deviations of up to 50 % in the tropical UTLS. The new O 3 data product from SAGE III/ISS (v5.1) agrees also well with the other datasets with a reasonably good agreement up to the US. For this work, the "AO3" product was used because it has reduced noise compared to the "MLR" product, particularly in the UT and US (see Wang et al., 2020 for details).
In summary, the updated O 3 datasets show improved agreement in most regions of the atmosphere (Fig. 15). In particular, the 1σ multi-instrument spread in the UT decreased significantly at all latitudes from ±45 % on average to ±25 %, among other things due to improved GOMOS performance. The region of very good agreement (1σ of ±5 %), previously restricted to below 3 hPa, extends now further up into the US reaching the level of 1 hPa. In the LM, agreement also improves, with maximum deviations of ±30 % due to POAM III not being included in the updated evaluations. At polar latitudes, however, deviations are still large with maximum values of ±30 % found in the Antarctic LS, indicating considerable disagreement between the datasets.

Water vapor (H 2 O)
H 2 O is the single most important natural greenhouse gas and provides a positive feedback to climate change driven by anthropogenic emissions of carbon dioxide and other greenhouse gases. H 2 O is also a key constituent in atmospheric chemistry as source gas of the hydroxyl (OH) radical, which controls the lifetime of atmospheric pollutants, ozone, and greenhouse gases.
A comprehensive assessment of the SPARC Data Initiative H 2 O gridded datasets has been provided by  and SPARC (2017). These evaluations revealed that the uncertainty in our knowledge of the H 2 O mean state is best in the LS and MS, with a relative uncertainty of only ±2 %-6 %. However, substantial biases were found between the datasets in the LM (±15 %), the polar regions (±10 %-15 %), and the UTLS below 100 hPa (±30 %-50 %), where sampling issues add uncertainty due to large gradients and high natural variability. However, once these biases are removed, the instruments showed very good agreement in the magnitude and structure of interannual variability. Figure 7 shows an update of Fig. 5 from  including new data versions for ACE-FTS, Aura-MLS, MIPAS(1), and MIPAS (2), SAGE II and SCIA-MACHY, and adding new datasets obtained from HIRDLS, ACE-MAESTRO, and SAGE III/ISS. Tables 1 and 10 provide detailed information on data versions, time period, vertical range, vertical resolution, and other information on the different data versions evaluated here. LIMS and UARS-MLS (although having been measured during an earlier period) are also added for comparison. All other datasets remain the same. Notable changes in the difference patterns arising from the updated data versions are identified in the following.
SAGE II (v7.0) shows large changes when compared to SAGE II (v6.2) used in Hegglin et al. (2013) and SPARC (2017), with positive differences replacing negative differences over large parts of the stratosphere. In the MS, the differences to the MIM have decreased from between −5 % and −10 % (v6.2) to values mostly within ±2.5 % (v7.0) (Fig. 7), now indicating excellent agreement with the other datasets. Much smaller differences compared to the MIM (±5 %), indicating very good agreement, are also found in the UTLS, where large negative biases (> 10 %-20 %) existed in the previous version (v6.2) . This overall improvement is a consequence of modifying a spectral filter channel correction in the SAGE II retrieval (Thomason et al., 2004) using SAGE III/M3M as the basis for comparison in v7.0 instead of HALOE in v6.2 (Damadeo et al., 2013; see also Hegglin et al., 2014). In the US, on the other hand, differences from the MIM have increased from near zero to 5 % and higher.
The new MIPAS(1) (V3o_H2O_21) and MIPAS(2) (V5r_H2O_224) data versions show generally very similar features in the differences to the MIM compared to the earlier data versions (V3o_H2O_13 and V5r_H2O_220) used in Hegglin et al. (2013), respectively. MIPAS(2) exhibits some improvements in the tropical US, where differences to the MIM decreased from around 10 % to 5 % in the newer version. MIPAS(1) improved in the LM, where differences to the MIM decreased from > 10 % in V3o_H2O_13, which was evaluated in Hegglin et al. (2013), to smaller or even slightly negative values (between 2.5 % to −5 %). As a consequence, the new data versions of MIPAS(1) and MIPAS(2) seem more similar in character throughout the stratosphere and LM, except in the UTLS, where MIPAS(2) generally shows positive differences compared to the MIM (> 10 %),   (2) (V5r_H2O_224) Gille and Gray (2013) values high at p > 100 hPa Lossow et al. (2019) values low at p < 40 hPa SAGE III/ISS (v5.1) 06/2017-present 5 km/c.t.-100 km 1.5 km Davis et al. (2021) while MIPAS(1) shows both positive (> 5 %) and negative differences compared to the MIM (> −5 %) depending on the region. Note that it is expected that the application of averaging kernels would likely improve the comparison (which should be tested in future work).
The new ACE-FTS (v3.6) and Aura-MLS (v4.2) data versions both show slight improvements in the UTLS, and Aura-MLS also has slightly smaller positive differences to the MIM in the US. The negative bias seen in Aura-MLS around 200 hPa in the evaluation of Hegglin et al. (2013), which extended the findings by Vömel et al. (2007) based on balloon soundings to all latitudes, is, however, still apparent. ACE-MAESTRO (v31), a new instrument in the comparison, shows rather large positive differences to the MIM (mostly > 10 %-20 %) across its measurement range in the UTLS (except between approximately 300 and 205 hPa in the extratropics, where negative values of a similar size are found).
The wet bias in the tropical LS is a known issue for this version of ACE-MAESTRO (Lossow et al., 2019).
SCIAMACHY's negative bias to the MIM of around 10 % found for data version v3.0 by Hegglin et al. (2013) in the NH LS slightly improved in the version evaluated here (v4.0) to 5 %, as well as the positive bias when compared to the MIM in the tropical UTLS (from 20 % to 10 %). HIRDLS (v7.0) exhibits a negative bias of > 10 % with respect to the MIM extending across the MS, and SMR (v2.0) shows an even larger negative bias of > 20 %.
While LIMS (v6.0) and UARS-MLS (v6) are not directly comparable to the other instruments due to the time period they measured in, the very different character in the differences still highlights that trends in H 2 O are of minor importance when compared to inter-instrument differences. UARS-MLS shows a very uniform negative bias with respect to the MIM of −10 %, whereas LIMS exhibits a positive bias  Olsen et al. (2016) in the extratropical LS and MS, and a more negative bias across the US and in the tropical LS. A part of the negative H 2 O bias for the US in LIMS may be due to the increases in CH 4 and its conversion to H 2 O during the intervening years.
The new dataset (v5.1) of the SAGE III/ISS instrument shows excellent agreement with the MIM across the MS, US, and into the LM (with relative differences of ± 2.5 % only), although the data can be less trusted at altitudes above 0.5 hPa, where strong positive relative differences from the MIM (> 20 %) are found. This feature persists even when comparing datasets from instruments available during the same years (2017-2018) (including ACE-FTS and Aura-MLS) (not shown) and is likely due to some reminiscent profiles that "keel over" to very high values in the USLM, potentially biasing the mean field high. These profiles will be filtered out and/or corrected in future versions. In the UT and LS, SAGE III/ISS (v5.1) shows generally negative differences to the MIM, with the values improving from < −20 % at 300 hPa to −5 % around 30 hPa.
Overall, the update in the H 2 O datasets has only led to some small improvements and only in some regions of the atmosphere (see Fig. 15). In the NH LS, the 1σ multiinstrument spread decreased from ±10 % to ±5 %, and in the tropical UTLS from ±20 % to ±10 %, among other reasons due to improved performance of SCIAMACHY and SAGE II. In the US, on the other hand, the multi-instrument spread increased slightly from ±10 % to ±12.5 %, most likely due to the changes found in the new data version of SAGE II.

Methane (CH 4 )
CH 4 is the most abundant hydrocarbon in the atmosphere, and with a lifetime of around 8 years (Lelieveld et al., 1998) it is considered long lived. It is a very effective greenhouse gas and the second-largest contributor to anthropogenic radiative forcing since pre-industrial times after CO 2 . CH 4 is a source gas for stratospheric water vapor (resulting in a positive climate feedback), which affects stratospheric ozone chemistry, and in the troposphere it acts to reduce the atmosphere's oxidizing capacity.
The earliest CH 4 measurements from space were obtained from SAMS on Nimbus 7 between 1979and 1981(Taylor, 1987, followed by measurements from ATMOS starting from the mid-1980s (Gunson et al., 1996), and from ISAMS (Taylor et al., 1993) and CLAES (Roche et al., 1993) on UARS (along with HALOE). As mentioned above, these datasets were not considered in the SPARC Data Initiative. The first vertically resolved satellite datasets of CH 4 available to the SPARC Data Initiative were made by HALOE in 1991. MIPAS started measuring CH 4 in 2002, providing about 4 years of overlap (although with a major gap in 2004). From 2004 onwards, there are also ACE-FTS measurements available for comparison. Tables 1 and 11 provide information on the availability of CH 4 measurements, including data version, time period, height range, vertical resolution, and references relevant for the data product. Figure 8 shows meridional profiles of CH 4 at different pressure levels for August averaged over 1998-2008. These comparisons provide information on the latitudinal distribution of CH 4 and latitude-height dependency of the differences between the instruments. At 50 hPa, the instruments tend to agree very well with each other mostly within ±5 %. The same is largely true for the 10 hPa level. In both cases, ACE-FTS (v3.6) and HALOE (v19) agree best with each other, while MIPAS(2) (v224) seems to show somewhat higher differences from the MIM than from the other instruments and also exhibits differences that vary more with latitude. At 5 hPa, however, the differences of the instruments with respect to the MIM increase to ±20 %. Here, HALOE is closest to the MIM, ACE-FTS shows largest negative values and MIPAS(1) (v21) largest positive values. The deterioration in the agreement between the instruments with height is qualitatively consistent with the results of SPARC (2017). However, the new data versions used here agree quantitatively much better with each other, particularly at the 50 and 10 hPa levels.
We now turn to an example which can be used to test the physical consistency of the available datasets. To this end, the latitude-time evolution of CH 4 for the different instruments at 2 hPa is shown in Fig. 9. ACE-FTS and HALOE fields    Pumphrey et al. (2007) have been constructed using linear interpolation to fill in data gaps that arise from their sparse latitude-time sampling patterns. Figure 9 reveals local maxima located in the tropics just off the Equator in the respective summer hemisphere, distinct features that were found in earlier studies (e.g., Jones and Pyle, 1984;Ruth et al., 1997) and attributed to the equatorial semi-annual oscillation (Choi and Holton, 1991). The maxima in the trace gas thereby coincide with maxima in upwelling, which brings younger air (less depleted in CH 4 ) to higher altitudes. Tropical CH 4 thus should show a semiannual cycle. Photochemistry, on the other hand, causes minima at high latitudes during summer and autumn, with CH 4 lifetimes decreasing to 4 months at these altitudes (Solomon et al., 1986;Randel et al., 1998). HALOE captures the tropical semi-annual oscillation well and also indicates the high-latitude minima during the summer months. MIPAS shows very similar features but due to its better spatiotemporal sampling extends further into the polar regions, revealing the full extent and timing of these features. The tropical maxima in both MIPAS(1) and MI-PAS(2) are stronger than those in HALOE. ACE-FTS exhibits a much noisier field due to its limited sampling and hence exhibits sharp maxima and edges especially in the tropics, where the instrument scans through only once a season. While datasets in equivalent latitude would help to reduce the noise, this quantity was not available to the SPARC Data Initiative. Knowledge of the representativeness of ACE-FTS in geographical latitude is, however, still valuable for model-measurement comparisons.
The difference plots indicate a low bias in HALOE and ACE-FTS versus a known high bias in MIPAS(1). MIPAS(2), despite exhibiting a somewhat patchier difference field, provides supporting evidence for a high bias in MIPAS(1) at this pressure level. Compared to the data versions used in SPARC (2017), the new data versions used here agree generally better even at this level. While HALOE's difference field to the MIM remains the same (no new data version available), ACE-FTS has somewhat less noise, now tending to more negative values, MIPAS(1) shows smaller differences espe-cially in the tropical and midlatitude regions, and MIPAS(2) shows slightly increased differences across the time-latitude domain. It is important to note that CH 4 showed only small trends in the troposphere over the time period 1998-2008; thus, a trend in this trace gas is not expected to contribute significantly to the inter-instrument differences. An evaluation limited to the year 2005 (during which all instruments were reporting data) mostly confirms the results described here (not shown).
The overall impact of updated data versions on our knowledge of the mean state of the atmosphere in terms of CH 4 is shown in Fig. 15. Compared to SPARC (2017), the new data versions have led to decreases in the 1σ multi-instrument spread across the UTLS and MS from ±10 % to ±5 %. A decrease of around 5 % in the 1σ multi-instrument spread is also found across the USLM in comparison with SPARC (2017), although the values are much more variable in this region.

Carbon monoxide comparisons
Carbon monoxide (CO) has a lifetime of approximately 3 months in the UT and LS. In the troposphere, CO impacts air quality and has an indirect radiative forcing effect, since it scavenges OH that would otherwise react with (and deplete) the greenhouse gases methane and ozone (Daniel and Solomon, 1998). Due to its intermediate lifetime, it is often used as tracer to identify troposphere-stratosphere exchange (e.g., Hoor et al., 2004;). In the lower stratosphere, CO reaches a background value ranging between 8 and 15 ppbv (Flocke et al., 1999), as determined by the equilibrium between its production (from methane oxidation) and loss (from CO oxidation).
Only a few limb sounders provide CO measurements, with the zonal monthly mean fields from SMR, MIPAS, ACE-FTS, and Aura-MLS contributing to the SPARC Data Initiative. The earliest dataset that would offer CO, but which is not included in the comparisons here, can be obtained from SAMS on Nimbus 7 (although with a very high noise level;  -2009 (upper half). Also shown are the relative differences between the individual instruments and the MIM (lower half). Note that SMR and MIPAS(1) are excluded from the calculation of the MIM. Taylor, 1987). Other useful CO measurements were obtained by ATMOS on the Space Shuttle (Gunson et al., 1996) and from ISAMS on UARS (Taylor et al., 1993). Tables 1 and 12 compile information on the availability of CO measurements, including time period, height range, vertical resolution, and references relevant for the data product used in this report.
For CO, we focus first on the zonal annual mean evaluation, as shown in Fig. 10, which is one of the standard evaluations in the SPARC Data Initiative (as also shown in Figs. 6 and 7). ACE-FTS and Aura-MLS are averaged over the period 2004-2009, while MIPAS(2) is averaged over , MIPAS(1) over 2002-2004, and SMR over 2003-2004. The figure reveals large differences in the struc-ture and values of CO as measured by the different instruments. Nevertheless, common features are minimum values around 15 ppbv in the LS and MS, and strongly increasing values towards the USLM with maxima in the polar regions. These large values stem from the photodissociation of CO 2 in the mesosphere and subsequent downward transport (Solomon et al., 1985). Increasing values can also be seen when moving towards the UT (with tropospheric CO coming mostly from anthropogenic sources). The mid-infrared sensors -MIPAS(1) (v20), MIPAS(2) (v222), and ACE-FTS (v3.6) -agree best. SMR (v2.1) CO exhibits a fair amount of noise, which stems from the fact that the CO was retrieved about 2 d per month and during a limited time period from October 2003 to October 2004 only. SMR does not reproduce the low background values of 8-15 ppbv expected in the LS to MS. Aura-MLS (v4.2), on the other hand, shows stratospheric CO values of smaller than 10 ppbv that are somewhat lower than those observed by MIPAS and ACE-FTS (see also Pumphrey et al., 2007). Aura-MLS also shows a local minimum in CO in the tropical LM (around 0.2 hPa), which is not seen in other datasets. Aura-MLS and also SMR do not reproduce the same downward-and poleward-sloping trace gas isopleths in the LS as seen by MIPAS and ACE-FTS, a typical feature observed for long-lived trace gases as a result of transport and mixing within the Brewer-Dobson circulation (Tung, 1982).
In comparison with the CO evaluations in SPARC (2017), significant improvements are found for the new data versions of MIPAS(1) and ACE-FTS. For these instruments, the relative biases with respect to the MIM in the tropical MS have decreased from 10 %-20 % to ±5 %. While the shortcomings in Aura-MLS were already pointed out in SPARC (2017), the relative biases with respect to the MIM in the LM (around ±10 %) are now much closer to ACE-FTS and MIPAS(2). Positive biases of more than 50 % in Aura-MLS (v3) in the UTLS have also decreased to 20 %, although the isopleths are still relatively flat compared to those found by the other instruments.
In addition, Fig. 11 shows deseasonalized anomalies for CO at three different pressure levels in either the tropics or extratropics. The instruments all capture the interannual variability well. Despite its limited tropical sampling, ACE-FTS seems to capture the interannual variability in the tropical UT at 200 hPa well and notably better than data version 2.6 used for SPARC (2017). It is also noteworthy that the shortcomings of Aura-MLS in reproducing the zonal annual mean are not hampering the ability of the retrieval to observe the correct interannual variability in these time series, hence still pointing out the usefulness of the Aura-MLS product for such evaluations.
Overall, our knowledge of the mean state for CO as expressed by the 1σ multi-instrument spread (see Fig. 15) has improved across the USLM by about 5 %. In the UTLS and MS, however, the 1σ spread remains similar, at above ±30 %. At least in the LS, this is largely due to the persisting problems in the CO distribution obtained from Aura-MLS. Note that SMR and MIPAS(1) were not included in Fig. 15, so as to remain comparable with the summary evaluation of CO in the SPARC Data Initiative Report (SPARC, 2017). (NO,NO 2 , NO x , HNO 3 , and NO y ) comparisons

Nitrogen species
Total reactive nitrogen (NO y ) is the sum of all atmospheric reactive nitrogen species (NO y = NO + NO 2 + NO 3 +HONO+HNO 3 +HNO 4 +peroxyacetylnitrate(PAN)+ RONO 2 +ClONO 2 +2xN 2 O 5 +BrONO 2 + organic nitrate + particulate nitrate) (NRC, 1984 p. 34), with largest contributions from nitric oxide (NO), nitrogen dioxide (NO 2 ) and nitric acid (HNO 3 ). While HNO 3 and NO x constitute 80 %-100 % of all possible species of NO y in the LS, PAN can constitute as much as 20 %-50 % in the tropical UT and extratropical UTLS (i.e., altitudes below 200 hPa) (Kondo et al., 1997;Fadnavis et al., 2014). Tropospheric NO y originates mostly from sources of NO and NO 2 (together known as the nitrogen oxide family; NO x ) released from fossil fuel burning, lightning, chemical processes in soils, and biomass burning. In the stratosphere, NO y is primarily produced from the oxidation of N 2 O, which originates from soil and ocean emissions, biomass and fossil fuel burning, livestock manure, and fertilization in agriculture. Another important source is the enhancement of upper atmospheric NO x through ionizing energetic particle precipitation (Solomon et al., 1982) and the NO x downward transport inside the polar vortex (Funke et al., 2005a). Reactive nitrogen species play an important role in stratospheric ozone chemistry through different mechanisms including the catalytic NO x cycle (Crutzen, 1970), the role of HNO 3 in polar stratospheric cloud formation (Fahey et al., 2001), and NO 2 -driven conversion of halogens into reservoir substances. Stratospheric nitrogen will remain a future research focus as unregulated N 2 O emissions are expected to become the most important contribution to ozonedepleting substances in the atmosphere during the 21st century (Ravishankara et al., 2009). Sunlight-driven conversion between stratospheric NO and NO 2 causes a strong diurnal cycle in both species with large NO abundances during daytime, large NO 2 abundances during nighttime, and steep gradients at sunrise and sunset in both species. A direct comparison of satellite-based NO and NO 2 measurements (which correspond to different local solar time; LST) is not possible, unless the dependence on the solar zenith angle (SZA) is taken into account. Solar occultation measurements made at SZA = 90 • (NO from HALOE and ACE-FTS, NO 2 from SAGE II, HALOE, POAM II, POAM III, SAGE III/M3M, and ACE-FTS) can be compared amongst themselves if separated into local sunrise and sunset. Limb scattering and emission measurements (NO from MIPAS and SMR, NO 2 from LIMS, OSIRIS, SCIAMACHY, MIPAS, and HIRDLS) and stellar occultation measurements (NO 2 from GOMOS) correspond to different SZAs and need to be scaled to a common LST. We follow the approach to scale the NO measurements from ACE-FTS and SMR as well as the NO 2 measurements from OSIRIS, SCIAMACHY, and ACE-FTS with a chemical box model (McLinden et al., 2010) to the LST of the MIPAS measurements 10:00 and 22:00 LST. NO 2 from HIRDLS (June 2005 to May 2006) has been scaled to 10:00 and 22:00 LST with the Whole Atmosphere Community Climate Model with specified dynamics (SD-WACCM) version 3 (Garcia et al., 2007). Tables 13-15 summarize information on the availability of NO, NO 2 , and HNO 3 measurements, including data version, time period, height range, vertical resolution, and references relevant for the data product used in this study. For these species, updated data versions are available from ACE-FTS (v3.6), GOMOS (v6.01), HIRDLS (v7.0), MIPAS(1) (v20), SAGE II (v7.0), and SCIAMACHY (v4-0). NO x shows only a weak diurnal cycle in the LS to MS and is available from HALOE, ACE-FTS, and MIPAS based on the sum of NO and NO 2 . OSIRIS and SCIAMACHY measure NO 2 but not NO, and their NO x datasets are compiled with the help of a chemical box model (McLinden et al., 2010). In the following evaluations, the NO x datasets from ACE-FTS, HIRDLS, OSIRIS, and SCIAMACHY are scaled to 10:00 and 22:00 LT, respectively.
The nitrogen species HNO 3 (from LIMS, UARS-MLS, SMR, MIPAS, ACE-FTS, Aura-MLS, and HIRDLS) and also the reactive nitrogen family NO y (from ACE-FTS, MIPAS, and a combination of the Odin measurements of OSIRIS and SMR) are long lived, except for some diurnal variations of HNO 3 in the LM. Note that not all reactive nitrogen species that make up NO y are measured by the stratospheric limb sounders presented here. The NO y datasets from ACE-FTS (based on the methodology of Jones et al., 2011, except for the vertical binning) and MIPAS  are compiled from NO, NO 2 , HNO 3 , HNO 4 , 2×N 2 O 5 , and ClONO 2 (six-species datasets), all directly measured by the instruments. The NO y Odin dataset (Brohede et al., 2008) is based on NO 2 from OSIRIS, HNO 3 from SMR and NO, 2 × N 2 O 5 and ClONO 2 taken from scan-based chemical box model simulations (McLinden et al., 2010), while HNO 4 is not included (five-species dataset). Note that the ACE-FTS and Odin NO y products are daytime datasets and do not include polar night data as opposed to MIPAS. In all figures, the instrument names will be completed by lower indices giving the number of species used to compile the dataset, e.g., Odin 5 for the Odin five-species dataset. It should also be noted that although they are available from both MIPAS and ACE-FTS, none of the NO y datasets presented here include PAN, which can be a significant contribution to NO y at the lower end of the altitude range shown.
We present here the evaluation of the seasonal cycle of the nitrogen species NO, NO 2 , NO x , HNO 3 , and NO y in the midlatitudes (30-60 • S and 30-60 • N) and tropics (10 hPa 20 • S-20 • N) at 10 hPa (Fig. 12). The latitude bands and pressure level have been chosen to include as many species and instruments as possible. While the NO maximum can be found around 1 hPa, the HNO 3 maximum is situated much lower in the atmosphere at around 30 hPa. The choice of evaluations at the 10 hPa level in the MS thereby ensures that both species are abundant.
For NO, ACE-FTS shows good agreement with MIPAS except for NH midlatitudes during boreal winter when ACE-FTS can be up to 25 % lower. In particular in the SH midlatitudes, the new data version of ACE-FTS (v3.6) has led to clear improvements and the consistently too-low NO values (ACE-FTS v2.2) are now much closer to MIPAS. Scaled SMR data agree well with the other two datasets in the US to LM but show large deviations in the MS and are thus omitted from the comparison in Fig. 12.
The NO 2 comparison (Fig. 12, second row) in the midlatitudes shows a very good agreement of all datasets except for ACE-FTS and HIRDLS during boreal winter. The seasonal cycle of NO 2 from ACE-FTS and HIRDLS in the NH, and to some degree also in the SH, has a larger amplitude than the one derived from the other three instruments. In the  tropics, all instruments agree on a very weak seasonal signal except for HIRDLS, which displays an annual cycle with an amplitude of 50 %. Over the whole measurement range (LS to US), the datasets from MIPAS, OSIRIS, and SCIA-MACHY agree better with each other than with ACE-FTS or HIRDLS. Compared to the old data versions (SPARC, 2017), the largest improvement is found for the updated ACE-FTS (v3.6) in the SH midlatitudes, where the negative bias has been removed, consistent with NO evaluations.
The NO x seasonal cycles of all datasets agree well on the phase but show some deviations in the amplitude of the signal (Fig. 12, third row). For the midlatitudes, absolute values of ACE-FTS NO x are considerably lower than those of the other instruments during the respective winter season consistent with the findings of the NO and NO 2 evaluations. The latter characteristic also causes a larger amplitude of the ACE-FTS seasonal cycle in both midlatitude bands. In the tropics, datasets agree well, with a relatively weak seasonal cycle that is most pronounced in MIPAS. Again, the largest improvement is found for ACE-FTS (v3.6) in the SH midlatitudes and NH midlatitudes during winter.
The comparison of the HNO 3 seasonal cycle (Fig. 12, fourth row) also includes, in addition to ACE-FTS, HIRDLS, and MIPAS, the SMR and Aura-MLS datasets. All datasets  Remsberg et al. (2010Remsberg et al. ( , 2021 original vertical resolution is 2 km but adjusted to make compatible with lower-resolution LIMS products UARS-MLS v6 10/1991-10/1999 100-4.6 hPa 5-10 km Livesey et al. (2003) data with significant (1-3 ppbv) low bias at p < 15 hPa and high bias below the VMR peak SMR v2.0 07/2001-present 18-45 km 1.5-2 km Urban et al. (2006) empirical Urban et al. (2005) scaling applied Finally, evaluations of the NO y seasonal cycle (Fig. 12, fifth row) show some severe differences (although not necessarily in the mean value, just the amplitude), most notably in the SH midlatitudes where the seasonal cycle from Odin is completely the opposite of those from ACE-FTS and MIPAS. These deviations can be understood from the OSIRIS NO 2 and NO x as well as the SMR HNO 3 seasonal cycles in the SH, which show a smaller amplitude than the respective MI-PAS and ACE-FTS datasets. In general, we expect increasing NO y values during the dynamically quiescent spring-and summertime, and this is observed by ACE-FTS and MIPAS. In the NH, the NO y maximum is observed in boreal autumn by all three instruments. In the SH spring, Odin shows a secondary maximum and an apparently opposite seasonality to the other datasets. For ACE-FTS, the too-low NO x values in the SH and NH boreal winter cancel out with the too-high HNO 3 values, resulting in overall good NO y agreement with MIPAS. The overall annual mean state of NO y is well known, and the three datasets show excellent agreement ( Fig. 16) with differences smaller than ±5 %. However, deviations can be larger for individual months (up to ±10 %; Fig. 12) and cancel out in the annual mean.

MIPAS
Apart from the climatological and seasonal differences between the datasets, it is of interest to evaluate how well the instruments detect signals of interannual variability. Figure 13 shows the time series of NO 2 mean values (upper panels) and deseasonalized anomalies (lower panels) for the tropical latitude band (20 • S-20 • N) at 10 hPa. We focus on the evaluation of the NO 2 interannual anomalies of the longer time series of SAGE II and HALOE in comparison with interannual variability of ACE-FTS, MIPAS, OSIRIS, SCIAMACHY, GOMOS, and HIRDLS. Anomalies calculated in an additive sense by subtracting monthly multi-year mean values for each month might also display a diurnal cycle and are therefore not suitable evaluation tools for unscaled datasets. However, anomalies calculated in a multiplicative sense as percentage deviations from the monthly multi-year mean values are less affected by the diurnal variations. Since no scaled versions of SAGE II and HALOE data are available, the comparison focuses on multiplicative anomalies of the sunset/nighttime NO 2 datasets including SAGE II, HALOE, and ACE-FTS local sunset datasets and MIPAS, OSIRIS, SCIA-MACHY, GOMOS 22:00 LT, and HIRDLS night datasets.
The comparison of the mean values (upper panel) shows very good agreement of MIPAS, GOMOS, and scaled SCIA-MACHY measurements. Scaled OSIRIS data are somewhat Datasets correspond to local sunset or to 22:00 LST as described in the text. The s10pm denotes zonal monthly mean fields scaled to 22:00 LT, the ss zonal monthly mean fields from sunset measurements. lower than those in the other three datasets. Diurnal NO 2 variations between 22:00 LT and local sunset at the 10 hPa level are so small that SAGE II, HALOE, and ACE-FTS data taken at local sunset mostly agree with the other datasets for the overlap period (2003)(2004)(2005). From 2003 onwards, the multiplicative anomalies of all datasets display the expected QBO signal with the best agreement between MI-PAS, OSIRIS, GOMOS, and SCIAMACHY. The 3 years of HIRDLS measurements display a larger amplitude of the QBO signal and also larger month-to-month fluctuations, possibly due its higher vertical resolution (which should be tested in future work). Interannual anomalies from ACE-FTS agree for some months with the other datasets but show large deviations for other months. Due to the sparse sampling, it is not possible to diagnose a QBO signal in the ACE-FTS time series. Local sunset evaluations from SAGE II and HALOE show also large month-to-month variations but agree reasonably well on their interannual variability and display the QBO signal over the whole time period. The same is not true, however, for the local sunrise evaluations of the two instruments, where HALOE shows only a weak and SAGE II shows no clear indication of a QBO signal (SPARC, 2017).
The overall knowledge on the atmospheric mean state of the different trace gases treated in this section as expressed by the 1σ multi-instrument spread is shown in Fig. 16. In comparison to earlier evaluations (SPARC, 2017), the updated nitrogen datasets show a slightly improved agreement. In particular, the scaled ACE-FTS datasets agree better with the other time series in terms of absolute bias and seasonal cycle.

Hydroperoxyl (HO 2 ) comparisons
Hydroperoxyl (HO 2 ) together with the hydrogen atom (H) and hydroxyl (OH) form the HO x family. HO 2 is formed in the reaction between a hydrogen atom (H) and molecular oxygen (O 2 ), or between ozone (O 3 ) and OH. OH affects stratospheric ozone chemistry through its role in the HO x catalytic reaction cycle that destroys ozone. The HO x cycle was the first catalytic reaction cycle to be identified (Bates and Nicolet, 1950). HO x chemistry dominates ozone destruction above 40 km, while NO x dominates ozone destruction in the MS (Salawitch et al., 2005). In the troposphere, HO 2 is generated as an intermediate product of the oxidation of many hydrocarbons.
Measurements of HO 2 are available from instruments that measure in the sub-mm/microwave wavelength bands, namely SMILES, SMR, and Aura-MLS. Other available HO 2 datasets are restricted to balloon campaigns, such as from the Far Infrared Spectrometer (FIRS-2) (Johnson et al., 1995;Jucks et al., 1998). There is no temporal overlap between the three satellite instruments, since SMR currently only provides HO 2 data as research product during 1 year (October 2003(October -2004 Kuribayashi et al. (2013) time datasets are compared only. Tables 2 and 16 compile information on the availability of HO x measurements, including data version, time period, height range, vertical resolution, and references relevant for the data product used in this study. Figure 14 shows the zonal monthly mean evaluation between Aura-MLS and SMILES for November 2009 and February 2010. SMR is not shown due to a very limited temporal and spatial coverage (see Fig. 4.23.2 in SPARC, 2017). Mixing ratios are similar in both months in the tropics (where SZAs do not vary much with season), indicating only a weak seasonal cycle in the daytime zonal monthly mean field. Lowest mixing ratios are found in the polar region of the winter hemisphere (during high SZA conditions), indicating a somewhat more pronounced seasonal cycle in these regions of the atmosphere. The differences to the MIM indicate very good (up to ±5 %) to excellent (up to ±2.5 %) agreement between SMILES and Aura-MLS, except in the lower part of the measurement range (around 20 hPa) where differences compared to the MIM increase to ±10 % and more. The results presented here are comparable to (if not somewhat better than) what was found in SPARC (2017), where multi-year monthly mean HO 2 fields were used for the comparison. Figure 15. Synopsis of the uncertainty in the annual zonal mean state of the longer-lived species evaluated within the SPARC Data Initiative. The relative standard deviation over all instruments' multi-annual zonal mean datasets is presented for different chemical trace gas species (color contours). The relative standard deviations are calculated by dividing the absolute standard deviations by the MIM. The black contour lines in each panel represent the MIM trace gas distribution for each species. The number of instruments included is given by the right-hand grey bar. Note that the time periods used depend on the availability of the instruments included in the assessment and hence differ from trace gas to trace gas.

Summary evaluations
The SPARC Data Initiative provides an estimate of the systematic uncertainty in our knowledge of the measured fields' mean state derived from the inter-instrument spread defined as ±1σ . Figure 15 shows these fields for the long-lived trace gases. Note that we adopt the same vocabulary (see Table 7) for the summary comparisons (based on relative standard deviations) as used earlier for instrument-specific evaluations (based on relative differences). For CH 4 , the uncertainty is smallest in the tropical and midlatitude MS and LS and larger towards the UTLS, US, and LM. The same has been found for other long-lived trace gases such as O 3 , H 2 O, N 2 O, and HF. In contrast, the trace gases CFC-11 (or CCl 3 F), CFC-12 (or CCl 2 F 2 ), and SF 6 show the best agreement in the UTLS and larger deviations in the MS. Nearly all trace gases show larger deviations in the polar regions than at lower latitudes, which is at least partially due to increased sampling biases found at higher latitudes. Datasets of CO, which is a trace gas with an intermediate lifetime, are characterized by large relative differences throughout most of the measurement range. The large CO differences in the annual zonal mean structure (±30 % in the LS) should be further addressed in forthcoming retrieval revisions. Overall, the ±1σ multi-instrument spread has decreased for all long-lived trace gas species by up to 10 % since SPARC (2017), except possibly for CO, indicating a more consolidated knowledge of the state of the atmosphere resulting from improvements in the retrievals of these species.
The agreement of the nitrogen species NO, NO 2 , and HNO 3 , as derived from the relative deviations between the datasets, depends strongly on the atmospheric distribution of the respective gas with larger relative differences in regions of smaller mixing ratios (Fig. 16). While NO and NO x agree very well in the tropical and subtropical MS and US, NO 2 and HNO 3 have larger deviations in the US and show the best agreement in the tropical and midlatitude MS and for HNO 3 also in the LS. All datasets (except for HNO 3 and NO y in the Figure 16. Same as Fig. 14 but for nitrogen-containing species. The assessment of the uncertainty in the annual mean state of NO, NO x , and NO 2 is based on gridded datasets corresponding to 10:00 and 22:00 LT, and for the latter also on datasets corresponding to local sunrise (sr) and local sunset (ss). Note that some of the included datasets have been derived by scaling the individual measurements with a chemical box model to 10:00 and 22:00 local solar time (LST). See SPARC (2017) for more detailed information. Reactive nitrogen (NO x ) is here defined as NO+NO 2 . The odd nitrogen family (NO y ) is defined as NO x +HNO 3 + 2×N 2 O 5 +ClONO 2 +HNO 4 . For N 2 O 5 , ClONO 2 , and HNO 4 , an assessment of the uncertainty in the annual mean field cannot be provided since no data products at the same local solar time are available.
Northern Hemisphere) have considerably larger deviations in the polar regions, at least in part again because of sampling issues and the large atmospheric variability that is less well sampled by the measurements going into the monthly mean datasets (see . Finally, the NO y datasets show excellent agreement throughout most of the measurement range except for the polar latitude LM. Overall, the ±1σ multi-instrument spread in the nitrogen species has decreased only slightly (by 5 %) when compared to SPARC (2017).
The agreement between datasets of chlorine compounds (Fig. 17) and shorter-lived species depends strongly on the lifetime of the trace gas considered. HCl, which is longer lived, exhibits very good agreement, and the daytime datasets of the shorter-lived ClO show good to reasonable agreement in the MS and US, where mixing ratios are highest. HOCl, which is short lived, shows mostly reasonable agreement in the US during nighttime. HO 2 is available from a small num-ber of instruments only and is thus not included in the synopsis plots, although the HO 2 comparisons show promising results with mostly good agreement throughout the MS, US, and LM. The large deviations between the datasets of shorterlived species stem partially from the difficulty of accounting for the strong diurnal cycles these trace gases exhibit. Scaling of the data to a common daytime or nighttime using a chemical box model helped improve the comparisons in some cases. However, it remains a challenge to estimate how much these deviations are related to errors introduced by the scaling procedures and how many of the deviations correspond to direct measurement differences. Overall, the ±1σ multi-instrument spread in the chlorine-containing species has improved for HCl but has remained very similar for ClO and HOCl when compared to SPARC (2017). Figure 17. Same as Fig. 14 but for chlorine-containing species. The assessment of the uncertainty in the annual mean state is based on ClO daytime and HOCl nighttime datasets. Note that for ClO, the dataset from SMR is included which has been derived by scaling the individual measurements with a chemical box model to 13:30 LST. See SPARC (2017) for more detailed information.

Data availability
All SPARC Data Initiative zonal monthly mean datasets can be found in the Zenodo data archive  https://doi.org/10.5281/zenodo.4265393).

Conclusions
This paper presents an overview and update of the evaluations performed within the WCRP SPARC Data Initiative as published in the SPARC Data Initiative Report (SPARC, 2017). To date, the SPARC Data Initiative represents the most comprehensive assessment of stratospheric composition measurements obtained from an international suite of limb sounders from various space agencies and other national institutions. The SPARC Data Initiative thereby offers the first systematic assessment of the availability of chemical trace gas and aerosol observations from satellite limb sounders, provides these observations in a common and easy-to-handle data format (zonal monthly means), and presents a detailed comparison between these datasets, importantly covering different generations of satellite limb instruments and contrasting the products of different agencies around the world. Here, we extended the SPARC (2017) evaluations, which covered the period 1978-2010, up to the end of 2018 and used the most recent data versions that have become available in the meantime. New observations from OMPS-LP (on Suomi-NPP) and SAGE III/ISS are also added to the original list presented in SPARC (2017), which included LIMS, SAGE I/II, SAGE III/M3M, HALOE, UARS-MLS, POAM II/III, OSIRIS, SMR, MIPAS, GO-MOS, SCIAMACHY, ACE-FTS, ACE-MAESTRO, Aura-MLS, HIRDLS, and SMILES. (Note that aerosol evaluations and zonal monthly mean time series data will be presented in a follow-on study.) The SPARC Data Initiative comparisons are based on vertically resolved zonal monthly mean datasets of 26 different atmospheric constituents, including the stratospheric trace gases of primary interest (O 3 and H 2 O), major long-lived trace gases (SF 6 , N 2 O, HF, CCl 3 F, CCl 2 F 2 , NO y ), trace gases with intermediate lifetimes (HCl, CH 4 , CO, HNO 3 ), and shorter-lived trace gases important to stratospheric chemistry including nitrogen-containing species (NO, NO 2 , NO x , N 2 O 5 , HNO 4 ), halogens (BrO, ClO, ClONO 2 , HOCl) and other minor species (OH, HO 2 , CH 2 O, CH 3 CN), and aerosol. The observations considered have been compiled on a common latitude-pressure grid, covering the region from the upper troposphere to the lower mesosphere (300-0.1 hPa) with a latitudinal resolution of 5 • . The zonal monthly mean time series are available from the Zenodo data archive (https://doi.org/10.5281/zenodo.4265393, . A consistent file format was designed and is being used across the different composition measurements and instruments, so as to allow for easy handling by the user (see Popp et al., 2020, for a discussion of the importance of a consistent data format in the provision of observational datasets).
The trace gas time series have then been evaluated by a common approach, comparing multi-year annual or monthly mean fields, allowing for maximum overlap between different instruments. By evaluating zonal monthly mean averages, the SPARC Data Initiative has taken a "climatological" approach to data validation (Hegglin et al., 2008Tegtmeier et al., 2013;SPARC, 2017) in contrast to the more common approach of using coincident profile measurements. The climatological comparison method averages over multiple measurements, thereby reducing both instrument noise and geophysical variability from single profile comparisons and offering a top-down instead of a bottom-up assessment of the (systematic) biases between different measurements. Importantly, the climatological validation approach resolves these biases in the full latitude-height space. The climatological validation method has therewith the advantage that it is consistent for all instrument comparisons, avoids sensitivity to chosen limits defining coincident measurements, and produces larger sample sizes, which should in theory minimize the random part of the measurement error. This climatological approach, however, has the disadvantage that climatological means can be biased due to non-uniformity of sampling or potential long-term trends in the trace gases. The extent to which the monthly and annual zonal mean datasets are representative of the true mean has been evaluated as part of the SPARC Data Initiative for two trace gases (O 3 and H 2 O) in a separate paper by . This study yields information on the potential sampling bias in the zonal monthly mean fields of these tracers and instruments and provides an approximate measure of the sampling bias also for trace gases with similar lifetimes to users who examine variability and trends or perform comparisons with free-running models.
The findings of the trace gas datasets comparisons presented here are generally consistent with the results of previous validation efforts based on the classical validation approach using profile coincidences (where available). Instruments with sparser sampling show noisier zonal means. Profiles with wide averaging kernels do not resolve sharp structures such as those found across the tropopause region. However, the climatological approach yields generally more comprehensive information on measurement uncertainty in terms of latitude-pressure range covered. The comparisons of the datasets have in many cases improved our knowledge of the systematic biases between the available data products. Although not shown here, the comparison results generally do not change substantially when changing the number of years going into a averaged field or, in the case of the longerlived species, when calculating instrument differences for a month instead of a year. From this, it follows that the comparisons shown yield relatively robust conclusions about instrument/retrieval performance (see SPARC, 2017, for detailed examples).
The conclusions from the SPARC Data Initiative highlight the use (or necessity) of observations from multiple instruments in order to characterize retrieval behavior and overall observation quality as a function of latitude and pressure (or altitude). The small number of stratospheric limb sounders currently remaining in space (with most of them being long past their expected lifetime) and the even smaller number of planned future missions will likely have serious implications. These may impact not only our ability to perform a robust assessment of the quality of stratospheric composition measurements but more importantly to derive stratospheric composition changes from these measurements, which are needed to better understand the state of the ozone layer that protects life on Earth and its response to (as well as feedbacks on) climate change (e.g., Hegglin and Shepherd, 2009). As such, the gridded trace gas datasets from the SPARC Data Initiative may serve as an atlas and reference of stratospheric composition mean state and variability during the "golden age" of limb satellite sounding of the atmosphere well into the future.
Author contributions. MIH and ST designed and co-led the SPARC Data Initiative, performed all the evaluations, and wrote the text. The instrument PIs and their research staff compiled the SPARC Data Initiative datasets to their best current knowledge and contributed to the writing and interpretation of the evaluation results.
Competing interests. The authors declare that they have no conflict of interest.
Acknowledgements. While the SPARC Data Initiative has been driven from a user perspective, the measurement partners have been critical to its success. These partners to whom the SPARC Data Initiative extends its thanks include the relevant instrument teams, the various space agencies (CSA, ESA, NASA, JAXA, SNSA, and other national agencies), and organizations such as CEOS-ACC and IGACO. We thank the World Climate Research Programme (WCRP) for travel funding through the SPARC office to support our activities. The SPARC Data Initiative also thanks the International Space Science Institute in Bern (ISSI) who supported the activity through their ISSI International Team activity program and facilitated two successful team meetings in Bern.  -2007-1-226224). Work at the Jet Propulsion Laboratory, California Institute of Technology, was funded by the National Aeronautics and Space Administration (NASA). The Atmospheric Chemistry Experiment is a Canadian-led mission mainly supported by the CSA. Development of the ACE-FTS gridded datasets was supported by grants from the Canadian Foundation for Climate and Atmospheric Sciences and the CSA. MIPAS data analysis and validation was supported by the German Federal Ministry for Economic Affairs and Energy (grant no. 50EE1547) and by the ESA Ozone Climate Change Initiative. Bernd Funke acknowledges support by the Spanish MCINN (grant no. ESP2017-87143-R and PID2019-110689RB-I00) and EC FEDER funds. Development of the SCIA-MACHY and IUP-OMPS gridded datasets at the University of Bremen was funded in part by the German Research Foundation (DFG) Research Units SHARP (grant no. FOR1095) and VolImpact (grant no. FOR2820), the German Aerospace Agency (DLR) SADOS project, ESA SQWG and Ozone CCI projects, EU/ECMWF C3S project, and the University and State of Bremen. Alexei Rozanov and Carlo Arosio also acknowledge the German HLRN (High-Performance Computer Center North) and the thread-safe FOR-TRAN library GALAHAD. In addition, Carlo Arosio acknowledges the support by the PRIME program of the German Academic Exchange Service (DAAD) and ESA's Living Planet Fellowship SOLVE. Work on HIRDLS was supported in the US by the National Aeronautics and Space Administration (NASA) and in the UK by the National Environmental Research Council (NERC). Development of the Odin/SMR gridded datasets was supported by the Swedish National Space Agency (SNSA). Review statement. This paper was edited by David Carlson and reviewed by Sean Davis and two anonymous referees.