An updated version of a gap-free monthly mean zonal mean ozone database

. An updated and improved version of a global, vertically resolved, monthly mean zonal mean ozone database has been calculated – hereafter referred to as the BSVertOzone database. Like its predecessor, it combines measurements from several satellite-based instruments and ozone proﬁle measurements from the global ozonesonde network. Monthly mean zonal mean ozone concentrations in mixing ratio and number density are provided in 5 ◦ latitude bins, spanning 70 altitude levels (1 to 70 km ), or 70 pressure levels that are approximately 1 km apart (878.4 hPa to 0.046 hPa ). Different data sets or "Tiers" 5 are provided: "Tier 0" is based only on the available measurements and therefore does not completely cover the whole globe or the full vertical range uniformly; the "Tier 0.5" monthly mean zonal means are calculated as a ﬁlled version of the Tier 0 database where missing monthly mean zonal mean values are estimated from correlations against a total column ozone database. The Tier 0.5 data set includes the full range of measurement variability and is created as an intermediate step for the calculation of the "Tier 1" data where a least squares regression model is used to attribute variability to various known forcing 10 factors for ozone. Regression model ﬁt coefﬁcients

other data sources used in the BDBP v1.1.0.6 database, the large data quantity and high data quality of the Microwave Limb Sounder (MLS) makes it an attractive target data source for measurements for incorporation into the BSVertOzone database.
The usefulness of MLS ozone data has already been shown through its use in several other combined ozone databases, e.g.
GOZCARDS (Froidevaux et al., 2008) and SWOOSH (Davis et al., 2016). Given this pedigree, MLS measurements were included in the creation of the BSVertOzone database.

5
The MLS instrument sits on NASA's Aura satellite which was launched in mid-July 2004 and remains operational to date (see Fig. 1). MLS measures microwave emission from the limb of the Earth's atmosphere to provide measurements of a multitude of atmospheric trace gases including ozone and temperature. MLS retrieved ozone profiles cover the region from the upper troposphere to the mesosphere, i.e. from 261hPa to about 0.02hPa. The vertical resolution of the ozone profiles ranges from 2.5km to 3km in the stratosphere, but increases to up to 8km in the mesosphere. MLS provides about 3500 ozone profiles 10 every day, with a near global coverage from 82 • S to 82 • N (Waters et al., 2006). Version 4.2 MLS data were incorporated into the monthly mean zonal means in BSVertOzone. More information about the MLS instrument can be found in Waters et al. (2006), and more information about the MLS ozone validation in Froidevaux et al. (2008) and Hubert et al. (2016).
The screening of the MLS ozone measurements is based on the official MLS v4.2 data description document provided by JPL (http://mls.jpl.nasa.gov/data/v4-2_data_quality_document.pdf), and is similar to the ozone screening described in Froidevaux 15 et al. (2008). The uncertainties on each ozone measurement, at each pressure level, which is more detailed than the uncertainty information included in the MLS v4.2 data description document, were provided by the MLS team (Lucien Froidevaux, personal communication, April 2017).
One of the main foci of BSVertOzone is tracing all sources of uncertainty from the individual measurements through to the final monthly mean zonal mean ozone values. The uncertainty estimates that are provided with the individual measurements 20 obtained from each satellite instrument shown in Fig. 1, are used as a starting point for the uncertainty treatment throughout the creation process of the different BSVertOzone data sets (see following Sections). These uncertainties normally include, amongst others, calibration errors, spectroscopy uncertainties, and uncertainties introduced by the use of an a priori profile.
Ozonesondes remain the only source of ozone measurements throughout the troposphere (see Fig. 1) and were obtained from four different data archives, viz. the World Ozone and Ultraviolet Radiation Data Centre (WOUDC), the Network for 25 the Detection of Atmospheric Composition Change (NDACC), the Southern Hemisphere Additional Ozonesondes network (SHADOZ), and the National Oceanic and Atmospheric Administration (NOAA). While the goal is that each ozonesonde site provides profiles along with uncertainty estimates (Ozonesonde Data Quality Assessment, O3S-DQA) (Tarasick et al., 2016;Deshler et al., 2017;Witte et al., 2017), most data files obtained for use in the generation of BSVertOzone still did not include uncertainty estimates. We therefore continue to use the uncertainty estimates as described in Hassler et al. (2008). To extend 30 BSVertOzone through to the end of 2016, and to ensure that the most recently available ozonesonde data are used, an updated data set of ozonesonde data was obtained from the mentioned data archives.

Database grid
The monthly mean zonal means comprising the BSVertOzone database are provided in 5 • latitude bands on a pressure grid and on a geopotential height grid. Ozone concentrations on these grids are provided both in number density and mixing ratio.
To provide the input to those four databases, all individual measurements are converted from their original vertical grid and concentration unit, to the vertical grids and concentration units prescribed in the BSVertOzone database. For these conversions, 5 the temperature and pressure at each measurement is required. These ancillary data were available in the data files of the individual data sources. The BSVertOzone levels of the vertical altitude grid range from 1km to 70km with a spacing of 1km.
The BSVertOzone levels of the vertical pressure grid range from about 878.4hPa to 0.046hPa and levels are approximately 1km apart (see Hassler et al. (2008) for more detail). Only a few satellite instruments provide data on such a grid and therefore, in most cases, the measurements were interpolated onto the two pre-defined vertical grids. The uncertainties provided with each 10 measurement are also linearly interpolated onto the pre-defined vertical grids using log pressure when interpolating between pressure levels. For simplicity, any reference to either a specific geopotential height or pressure level is hereafter referred to as "level".
3 Construction of the unfilled monthly mean zonal mean database (Tier 0) Quantifying offsets and drift between different measurement systems can be made far more robust by using an independent 15 data source, especially when temporal and spatial coincidences between the two measurement systems are sparse. If the independent data source has high spatial and temporal sampling and covers the combined range of the two measurement systems to be homogenized, it can be used as a transfer standard. The independent data source does not need to be quantitatively this study, we remove the bias at those levels by first fitting a regression model that includes an offset term (whole time period) and step function (heavy side function) after 1998 to the ozone data (Dhomse et al., 2011). We then subtract the contribution of the step basis function from the model data, obtaining the corrected CTM data where the discontinuities have been removed.
For more details about the SLIMCAT model see Chipperfield (2006), and Chipperfield et al. (2015Chipperfield et al. ( , 2017. Using CTM output as an evaluation and adjustment tool for coarsely distributed global ozone measurements is not a novel 20 idea. In Sofieva et al. (2014) the gap-free ozone fields of a highly temporally and spatially resolved CTM run were used to characterize sampling biases for coarse satellite samplers when their measurements were used for the calculation of monthly mean zonal mean ozone values. Hegglin et al. (2014) used a CCM that was nudged to ERA-Interim reanalysis to correct for offsets and drifts between stratospheric water vapor measurements from multiple satellite instruments. This is very similar to the approach used in our study. While Hegglin et al. (2014) used the CCM output to adjust monthly mean values of the different 25 instruments, here the SLIMCAT output is used to adjust individual measurements with a correction that is based on zonal mean comparisons (see Sect. 3.2).

Homogenization
As shown in several recent studies Tegtmeier et al., 2013), satellite instruments can experience drifts over their lifetime, and coincident measurements of the same constituent from different satellite-based and ground-based instruments 30 can differ. It is necessary to account for such potential differences when combining measurements from different measurement platforms into a single product such as a monthly mean zonal mean. Two such approaches have been used recently within the community to combine measurements from different platforms. In both cases a standard is selected to which the measurements  from other sources are adjusted. Preferably, this initial standard is sufficient in its global and temporal coverage to allow robust estimates of biases and drifts of all other measurement sets. Here we follow the quality assessment from Hubert et al. (2016) and the suggested ozone standard from Davis et al. (2016) and, for levels above 15km, chose measurements from SAGE II as the standard. Recognizing the higher data quality of the ozonesonde measurements below 15km, ozonesonde data are used at 15km and below as the standard. As each ozonesonde is individually prepared and calibrated, and most soundings are vertically not available for every measurement from the data set to be adjusted. For the second approach, reanalysis data, as shown, for example, by Foelsche et al. (2011), or a temporally and spatially highly-resolved output from of a CTM, as shown, for example, by Toohey et al. (2013) and Sofieva et al. (2014), can be used as a transfer standard, capturing small-scale ozone variability.
Here we chose the second approach for generating a homogeneous data set, since measurements from some data sources are The homogenization of the satellite-based measurements that contribute to BSVertOzone is a sequential process where a selected satellite instrument is adjusted with respect to the standard, hereafter referred to as the "inter-satellite bias correction", ISBC (see Fig. 2). After the measurements have been adjusted to the standard, they are merged with the standard to produce a 5 new standard with greater spatial and temporal coverage to which the measurements from a different satellite will be adjusted to and then merged. This process is repeated until all satellite-based or ozonesonde measurements are adjusted and merged with the standard. The order in which the satellites are chosen for ISBC is determined by the satellite that has the largest temporal overlap with the standard. In our case, above 15km where SAGE II is the standard, measurements from HALOE are first adjusted and merged with the standard as HALOE provides the largest temporal overlap with SAGE II measurements.

10
Below and at 15km, measurements from SAGE II are first adjusted and merged with ozonesondes. To adjust measurements from a different source to the standard, the following steps are taken at each level individually: 1. Calculate differences between individual ozone measurements from the standard and the CTM simulated ozone values.
2. Calculate an error weighted, latitude weighted (based on 1 • latitude bands), monthly mean zonal mean of those differences.

15
3. Fit a linear regression model to the calculated monthly mean zonal mean differences (hereafter referred to as "modeled differences") to obtain an analytical representation of the difference field that can be evaluated at any latitude and time; ((standard − CT M ) modeled (φ, z, t) at latitude φ, level z and time t). The regression model comprises an offset and trend term, where the fit coefficients are expanded in Fourier and Legendre polynomials to account for seasonal and latitudinal structure in the monthly mean zonal mean differences respectively. While the known discontinuities in ozone 20 at the upper levels of the CTM data were corrected for to the extent possible, recognizing that some small inconsistencies may remain, it was decided to exclude the trend term from the regression model when modeling the difference fields for the levels above 47km (where the CTM data were corrected).
4. Repeat steps 1 to 3 using the measurements from the target new data source that requires bias correction to obtain an analytical representation for its difference field; 5. Calculate an adjusted measurement at the location and time of a given satellite measurement using: where θ is the longitude, O 3,adj (θ, φ, z, t) is the homogenized/adjusted measurement and O 3,ISBC (φ, z, t) is the applied 30 adjustment. 6. Merge the adjusted measurements O 3,adj (θ, φ, z, t) with the standard measurements, to create a new homogenized standard that now includes more measurements, and likely a larger spatial and temporal coverage, for the next iteration.
7. Repeat steps 1 to 6 for all data sources to be included in the BSVertOzone database.
The number of Fourier and Legendre polynomial expansions used to model the monthly mean zonal mean differences depend on the individual differences provided as input to the regression model, as each satellite instrument provides measurements The Akaike Information Criterion (AIC; deLeeuw, 1992; Bozdogan, 1987) was used to provide a relative assessment of the quality of the candidate models. The candidate model which minimizes AIC is selected for use, as it represents the model which minimizes the amount of information lost. The process is repeated until a set of expansions is found that does not result 15 in overfitting of the model.
In the generation of a homogenized data set, in addition to adjusting measurements from different measurement systems to account for bias and drifts, the uncertainties on those measurements also need to be revised since the application of these adjustments introduces additional uncertainty. Following error propagation rules, the uncertainty on the adjusted measurements where the uncertainty (σ ISBC ) reflects the uncertainties on the regression modeled fields (standard − CT M ) modeled and While this re-evaluation of the measurement uncertainties does not include the effects of uncertainties in the CTM output, 25 the effects of CTM uncertainties on the adjustment of the satellite data are minor, since the CTM data are only used as a transfer standard and, as can be seen from Eq. 4, cancel out if the CTM bias is consistent at both measurement locations.
A bootstrap method (Efron and Tibshirani, 1986) is used to estimate the uncertainties on the modeled differences, σ 2 (standard−CT M ) modeled and σ 2 (satellite−CT M ) modeled . For example, to calculate σ 2 (standard−CT M ) modeled , the following steps are executed: 1. Fit the regression model (as described above) to the monthly mean zonal mean differences between standard measure-30 ments and CTM data.
2. Subtract the regression model fit from the data to obtain the residuals.
Such a bootstrap approach encapsulates two sources of uncertainty which are present in the modeled field: the uncertainty from the fact that the chosen model is imperfect, and the uncertainty in the measurements. The standard deviation of the difference fields, derived from the ensemble of fields, is the uncertainty on the modeled difference field.

Calculating the individual monthly mean zonal mean values
To create a homogeneous database, each measurement and its uncertainty, on a specific level, and in a specific latitude band, 15 is adjusted using the ISBC method described above. The uncertainties on the corrections applied are included in the total uncertainty for each individual data point. For the final calculation of monthly mean zonal means values, additional data filtering was applied, similar to the filtering described in Bodeker et al. (2013), e.g. SAGE II data were not used below 10 km (below 242.8hPa for the pressure grid), and SAGE I data were not used below 18km (77.44hPa).
Monthly mean zonal mean ozone values obtained from different data sources pre-and post-homogenization, for an example 20 level and latitude band, are shown in Fig. 3. The different data sources complement one another and the homogenization process reduces the spread between the different data sources, so that they are more consistent with each other. The average adjustment for each measurement from the different data sources ranges from -14.4% for MLS to 37.7% for HALOE at 182 hPa between 25 • N and 30 • N (the chosen level and latitude band in Fig. 3), i.e. MLS measurements have a positive bias compared to the standard, while HALOE measurements have a negative bias.

25
The time series of ozone in Fig. 3 are shown for illustrative purposes only since the method to calculate the monthly mean zonal means does not depend on pre-calculated monthly mean zonal means for each individual data source, but rather calculates the monthly mean zonal mean values from all available individual measurements at once. All available measurements x i (i = 1..N ) and their uncertainties σ i for each latitude band and level are then used to calculate the error-weighted monthly mean: where σ i_new represents not only the measurement uncertainty but also the confidence we have that each measurement can be seen as an estimator of the monthly mean, and σ i_new is given by: with the expectation value x i_exp being the unweighted monthly mean zonal mean. The uncertainty on the monthly mean zonal mean value x is then calculated using: with N being the number of measurements available for the chosen latitude band and level. The uncertainty on the monthly mean zonal mean calculated using Eq. 7 is sensitive to the magnitude of the uncertainties on each measurement but also to the variance in the measurements, which is not the case by using equations for calculating the uncertainty on monthly means as provided in the current literature. As the uncertainty on the monthly mean zonal mean takes into account how 10 many measurements were available to calculate the mean value, the uncertainty will be larger for mean values where fewer measurements were available compared to mean values based on more measurements. However, having only one measurement per latitude band available is not sufficient to calculate a monthly mean and its uncertainty. As a result, the requirement here is that there are at least six measurements per latitude band and level available to calculate a monthly mean zonal mean. If fewer measurements are available, no monthly mean zonal mean will be calculated.

15
The calculated monthly mean zonal mean ozone time series, combining all measurements from different data sources, are shown in Fig. 4 as solid orange lines (unadjusted) and blue lines (adjusted) with their corresponding uncertainties. Despite the data gaps, the annual cycle and the interannual variability within the monthly mean zonal mean time series at the level and latitude band are apparent as shown in Fig. 3 and Fig. 4. Without a homogenization process and the resulting measurement adjustment towards the standard, the monthly mean zonal mean time series would represent mostly the mean of the data sources 20 that are available at a high spatial and temporal measurement density in a given month and latitude band, introducing a bias into the monthly mean zonal mean time series. The uncertainties on the monthly mean zonal mean ozone values are significantly smaller once MLS measurements are added to the merged data product, as there are thousands of MLS measurements available per day.
The homogeneous database of monthly mean zonal means constitute "Tier 0" of BSVertOzone.  The first step in creating the Tier 0.5 data set is to regress the monthly mean zonal mean ozone at 20km/58.2hPa against monthly mean zonal mean total column ozone (TCO), i.e.: where m is the month, φ is the latitude, α and β are the two regression model fit coefficients, each expanded in Fourier and Legendre series, and R are the residuals (see Bodeker et al. (2013) for details).

5
The TCO database used here is described in detail in Bodeker et al. (2018) Equation 9 is then applied from level 22 upwards to level 70, and then down from level 19 (now using levels 20 and 21 as the 10 predictors for ozone at level 19) to level 1.
The result is a pre-filled ozone data set, where filled values include some indication of the true month-to-month variability as suggested by the TCO month-to-month variability. Monthly mean zonal mean ozone values at 20km after the homogenization of data sources has been applied (Fig. 5a) and the pre-filled data set are shown in Fig. 5.
This Tier 0.5 data set was then used as input to a least squares regression model to generate the Tier 1.1 to Tier 1.4 data sets 15 described in the next section. It describes the full natural variability and is therefore particularly useful for CCM evaluation studies when the model runs with prescribed dynamics.
5 Creating global, filled data sets with only forced variability (Tier 1.x ozone data sets) The methodology to generate the Tier 1.1 to Tier 1.4 data sets is much the same approach as the one described for previous version of the database (BDBP v1.1.0.6) in Bodeker et al. (2013). The same regression model is applied, but rather than using 20 this regression model to both fill the data gaps and conduct the attribution, it is now only used to conduct the attribution to various factors affecting ozone. This is a way to provide different data sets for very specific purposes with regards to CCM evaluation. For an in depth description of the approach see Sect. 4 of Bodeker et al. (2013). The most important features of the used approach are briefly outlined below.
The least square regression model that was applied to the Tier 0.5 data consists of eight basis functions, viz: 25 1. A constant offset that is expanded in a Fourier series to represent the mean annual cycle, 2. An EESC (equivalent effective stratospheric chlorine) term that differs with age of air, 3. A linear trend term, 4. A quasi-biennial oscillation (QBO) basis function that was specified as the monthly mean 50hPa Singapore zonal wind.     functions used in the regression model. For example, in 2011 unprecedented depletion of ozone happened in the Arctic winter (Manney et al., 2011), which is detectable as less ozone compared to earlier years for pressure levels between 10hPa and 1hPa in Tier 0.5, but which is not apparent in Tier 1.4 (Fig. 7)    The lower panel of Fig. 9 shows some small differences at around 70hPa over the tropics, where ozone concentrations from 10 SWOOSH are occasionally higher than the ozone concentrations from BSVertOzone. These small differences are most likely a result of applying a different methodology when combining the measurements from different data source. Additionally, the selection of data sources could explain some of the differences seen in Fig. 9: SWOOSH does not include ozonesonde measurements, as well as SAGE I, POAM-II/III and ILAS-I/II measurements, whereas BSVertOzone does not include any measurements from SAGE III and UARS MLS. In the middle to upper stratosphere, due to the good coverage of satellite data, 15 selecting different of data sources will not affect the combined data product. However, in the lower stratosphere and upper troposphere where satellite measurements are sparse and more uncertain and ozonesonde measurements are more robust, not including ozonesonde measurements will lead to differences, as can be seen in the comparison between SWOOSH and BSVertOzone ( Fig. 9  transfer standard (see Sect. 3.2) and a regression model to adjust the measurements from different sources to a given standard.

A second QBO basis function
For levels above 15km, measurements from SAGE II are the chosen standard, while ozonesonde data are used at 15km and below as the standard (recognizing the higher data quality of the ozonesonde measurements below 15km). Ozone concentrations from BSVertOzone compare well to the ozone values from SWOOSH in most latitude bins, some discrepancies between the two data sets remain. These can be explained by the differences in the methodology of combining measurements from different 5 data sources. The applied homogenization results in an improvement of BSVertOzone compared to the earlier version BDBP v1.1.0.6, i.e. a more realistic representation of the ozone variability in the atmosphere.
As for the BDBP, BSVertOzone provides different Tier data sets (Bodeker et al., 2013): -Tier 0 contains the monthly mean zonal mean values that are directly calculated from the individual (adjusted) data sources; containing data gaps where no measurements were available. There are several improvements that could be implemented when preparing the measurements and for the used homogenization method. In the current version (v1.0) of BSVertOzone, the global troposphere is only covered by ozonesonde profile measurements. These profiles are available for many decades (see Sect. 2.1), but they only cover a limited portion of the globe. As a result, estimating the long-term global tropospheric ozone distribution from these measurements alone using the Besides including more ozone measurements from different instruments, there are some planned improvements in the processing of the measurements that are planned to be implemented in the future. Firstly, as the CTM output used here as a transfer standard to homogenize the satellite and ozonesonde measurements, has a temperature bias due to the underlying meteorological ERA-Interim reanalysis (see Sect. 3.1), it is planned to use CTM output forced with the most recent reanalysis data set 15 ERA5. This most likely will remove the remaining inconsistencies in the ozone concentrations simulated by SLIMCAT in the upper levels.
All available measurements for each latitude band, each level and each month are most likely not evenly distributed spatially and temporally, which can result in a skewed (non-representative) monthly mean value, and an underestimation of the monthly mean uncertainty. The individual ozone measurements should therefore undergo a spatial and temporal bias correction before 20 monthly mean zonal means are calculated, to represent the monthly distribution correctly. Additionally, it might be necessary to consider possible existing spatial and temporal autocorrelations between individual data points. As mentioned in Section 5, a two-term autocorrelation model was used in the regression model generating the Tier 1 data sets. This only takes into account the temporal dependence of the already calculated monthly mean zonal mean values, but does not correct for the temporal dependence of measurements within one month, or any spatial dependence of measurements within the chosen latitude band.

25
While this consideration would not affect the calculated monthly mean zonal means, it would change the number of available independent measurements, and therefore would change the uncertainties on the calculated means.
In the upper stratosphere and mesosphere, ozone formation and destruction happens so fast that it follows the availability of sunlight. As a results, diurnal variations in ozone concentrations are observable in the upper stratosphere and lower mesosphere (Schanz et al., 2014). Therefore, differences in ozone concentrations measured in the upper stratosphere/lower mesosphere 30 could result from satellite-based instruments measuring ozone at a different solar zenith angles, i.e. different local time of the day (e.g. Pallister and Tuck, 1983;Schanz et al., 2014;Studer et al., 2014). With the current version of BSVertOzone, the potential differences in ozone measurements caused by the diurnal cycle are ignored as the effect on the monthly mean zonal mean ozone values it is expected to be small. However, to test this expectation, and for an update of BSVertOzone, it is planned to implement methods to account for the diurnal cycle effects on the upper levels of the ozone database. BSVertOzone v1.0 that is described in this paper is archived and publicly available at Zenodo (Zenodo is a research data repository that was created by OpenAIRE and the European Organization for Nuclear Research (CERN)) with the DOI number: http://doi.org/10.5281/zenodo.1217184. Additionally, it is available at the Bodeker Scientific website (http://www.bodekerscientific.com/da in its most recent and updated version.