Articles | Volume 15, issue 7
Data description paper
14 Jul 2023
Data description paper |  | 14 Jul 2023

A 16-year global climate data record of total column water vapour generated from OMI observations in the visible blue spectral range

Christian Borger, Steffen Beirle, and Thomas Wagner

We present a long-term data set of 1×1 monthly mean total column water vapour (TCWV) based on global measurements of the Ozone Monitoring Instrument (OMI) covering the time range from January 2005 to December 2020.

In comparison to the retrieval algorithm of Borger et al. (2020), several modifications and filters have been applied accounting for instrumental issues (such as OMI's “row anomaly”) or the inferior quality of solar reference spectra. For instance, to overcome issues related to low-quality reference spectra, the daily solar irradiance spectrum is replaced by an annually varying mean earthshine radiance obtained in December over Antarctica. For the TCWV data set, we only consider measurements with an effective cloud fraction less than 20 %, an air mass factor (AMF) greater than 0.1, a snow- and ice-free ground pixel, and an OMI row that is not affected by the row anomaly over the complete time range of the data set. The individual TCWV measurements are then gridded to a regular 1×1 lattice, from which the monthly means are calculated.

The investigation of sampling errors in the OMI TCWV data set shows that these are dominated by the clear-sky bias and cause on average deviations of around 10 %, which is consistent with the findings of previous studies. However, the spatiotemporal sampling errors and those due to the row-anomaly filter are negligible.

In a comprehensive intercomparison study, we demonstrate that the OMI TCWV data set is in good agreement with the global reference data sets of ERA5 (fifth-generation ECMWF atmospheric reanalysis), RSS SSM/I (Remote Sensing Systems Special Sensor Microwave Imager), and CM SAF/CCI TCWV-global (COMBI): over ocean the orthogonal distance regressions indicate slopes close to unity with very small offsets and high coefficients of determination of around 0.96. However, over land, distinctive positive deviations of more than +10 kg m−2 are obtained for high TCWV values. These overestimations are mainly due to extreme overestimations of high TCWV values in the tropics, likely caused by uncertainties in the retrieval input data (surface albedo, cloud information) due to frequent cloud contamination in these regions. Similar results are found from intercomparisons with in situ radiosonde measurements from version 2 of the Integrated Global Radiosonde Archive (IGRA2) data set. Nevertheless, for TCWV values smaller than 25 kg m−2, the OMI TCWV data set shows very good agreement with the global reference data sets. Furthermore, a temporal stability analysis proves that the OMI TCWV data set is consistent with the temporal changes in the reference data sets and shows no significant deviation trends.

As the TCWV retrieval can be easily applied to further satellite missions, additional TCWV data sets can be created from past missions, such as the Global Ozone Monitoring Experiment-1 (GOME-1) or the SCanning Imaging Absorption spectroMeter for Atmospheric CartograpHY (SCIAMACHY); under consideration of systematic differences (e.g. due to different observation times), these data sets can be combined with the OMI TCWV data set in order to create a data record that would cover a time span from 1995 to the present. Moreover, the TCWV retrieval will also work for all missions dedicated to NO2 in the future, such as Sentinel-5 on MetOp-SG.

The Max Planck Institute for Chemistry (MPIC) OMI total column water vapour (TCWV) climate data record (CDR) is available at (Borger et al.2023).

1 Introduction

Water vapour is the most important natural greenhouse gas in Earth's atmosphere: it alters the Earth's energy balance by playing a dominant role in the atmospheric thermal opacity and has a major amplifying influence on several factors of anthropogenic climate change through various feedback mechanisms (Kiehl and Trenberth1997; Randall et al.2007; Trenberth et al.2009). Although water vapour is of great importance for processes at a global and climate scale, the complex interactions between the components of the hydrological cycle (including water vapour) and the atmosphere are still one of major challenges of climate modelling and for a better understanding of the Earth's climate system in general (Stevens and Bony2013). Moreover, the amount and distribution of water vapour are highly variable; thus, for global observations, these must also be measured with high spatiotemporal resolution. Considering that changes in water vapour are closely linked to changes in temperature via the Clausius–Clapeyron equation (i.e. for typical atmospheric conditions, a temperature increase of 1 K yields an increase in the water vapour concentration of approximately 6 %–7 %; Held and Soden2000), it is essential to accurately monitor the variability and change in the amount and distribution of water vapour on a global scale.

To observe the water vapour distribution on a global scale, satellite measurements provide invaluable information. Due to its spectroscopic absorption properties, water vapour can be retrieved from satellite spectra in various different spectral ranges, ranging from the radio (e.g. Kursinski et al.1997), microwave (e.g. Rosenkranz2001), thermal infrared (e.g. Susskind et al.2003; Schlüssel et al.2005; Schneider and Hase2011), short, and near-infrared (e.g. Bennartz and Fischer2001; Gao and Kaufman2003; Schrijver et al.2009; Dupuy et al.2016; Schneider et al.2020) to the visible spectral region (e.g. Noël et al.1999; Lang et al.2003; Wagner et al.2003; Grossi et al.2015).

Within the past decade, substantial progress has been made to retrieve total column water vapour (TCWV) within the visible blue spectral range (e.g. Wagner et al.2013; Wang et al.2019; Borger et al.2020; Chan et al.2020), enabling the use of measurements from satellite instruments like the TROPOspheric Monitoring Instrument (TROPOMI; Veefkind et al.2012) and even the Global Ozone Monitoring Experiment-2 (GOME-2; Munro et al.2016) for which so far only retrievals in the visible red and near-infrared spectral range have been available. In comparison to these aforementioned spectral ranges, TCWV retrievals in the visible “blue” have several advantages, such as similar sensitivity for the near-surface layers over land and ocean due to a more homogenous surface albedo distribution than at longer wavelengths (Koelemeijer et al.2003; Wagner et al.2013; Tilstra et al.2017). Moreover, any satellite mission dedicated to NO2 monitoring covers this spectral range.

For investigations of climate change or global warming, the Ozone Monitoring Instrument (OMI; Levelt et al.2006, 2018) onboard NASA's Aura satellite is particularly interesting; launched in July 2004, OMI offers an almost continuous measurement data record of more than 16 years up until today. In this study, we make use of this long-term data record and retrieve total column water vapour (TCWV) from OMI's measurements in the visible blue spectral range in order to generate a climate data set.

The paper is structured as follows: in Sect. 2, we describe the data set generation and briefly explain the retrieval methodology and the applied modifications in comparison to the TCWV retrieval from Borger et al. (2020); in Sect. 3, we investigate potential sampling errors and how the limitation to clear-sky satellite observations influences the representativeness of the TCWV values of the data set; in Sect. 4 and Sect. 5, we characterize the data set via an intercomparison with the various global reference TCWV data sets and with IGRA2 radiosonde observations, respectively; in Sect. 6, we analyse the temporal stability of the OMI TCWV data set; and, finally, we briefly summarize our results in Sect. 8 and draw conclusions.

2 The Max Planck Institute for Chemistry (MPIC) OMI TCWV data set

2.1 Ozone Monitoring Instrument (OMI)

OMI (Levelt et al.2006, 2018), onboard NASA's Aura satellite, is a nadir-looking UV–Vis push-broom spectrometer that measures the Earth's radiance spectrum from 270 to 500 nm with a spectral resolution of approximately 0.5 nm following a Sun-synchronous orbit with an Equator crossing time of around 13:30 LT. The instrument employs a 2D charge-coupled device (CCD) consisting of 60 across-track rows that cover a total swath width of approximately 2600 km with a spatial resolution of 24 km×13 km at nadir increasing to 24 km×160 km towards the edges of the swath. Launched in July 2004, OMI provides an almost continuous measurement record until today with more than 100 000 orbits.

However, since July 2007 OMI has suffered from the so-called “row anomaly” (RA), a dynamic artefact causing abnormally low radiance readings in the across-track rows, i.e. several rows of the CCD detector receive less light from the Earth, whereas some other rows appear to receive sunlight scattered off a peeling piece of spacecraft insulation. One plausible explanation for these effects is a partial obscuration of the entrance port by insulating layer material that may have come loose on the outside of the instrument (Schenkeveld et al.2017; Boersma et al.2018). Thus, in this study, the affected measurements are excluded for the entire period of the data set.

2.2 Methodology and modifications of the spectral analysis

To retrieve total column water vapour (TCWV) from UV–Vis spectra from OMI, we apply the TCWV retrieval of Borger et al. (2020) developed for TROPOMI onboard Sentinel-5P. The retrieval is based on the principles of differential optical absorption spectroscopy (DOAS; Platt and Stutz2008) with a fit window between 430 and 450 nm, and it consists of the common two-step DOAS approach. In the first step, the absorption along the light path is calculated as follows:

(1) ln I I 0 - i σ i ( λ ) SCD i + Ψ + Φ .

Here, I0 and I represent the solar irradiance and the radiance backscattered from Earth, respectively; i denotes the index of a trace gas of interest; σi(λ) is the respective molecular absorption cross-section of the aforementioned trace gas; SCDi=scids is the aforementioned trace gas's concentration integrated along the light path s (the so-called “slant column density”); Ψ represents summarizing terms accounting for the Ring effect and additional pseudo-absorbers; and Φ is a closure polynomial accounting for Mie and Rayleigh scattering as well as parts of the low-frequency contributions of the trace gas cross sections.

In the second step, to convert the slant column density (SCD) to a vertical column density (VCD), we apply the so-called air mass factor (AMF):

(2) VCD = SCD AMF .

The AMF accounts for the non-trivial effects of atmospheric radiative transfer and depends on the conditions of the retrieval scenario (i.e. aerosol and cloud effects, viewing geometry, and surface properties) as well as the profile shape of the trace gas of interest. The algorithm of Borger et al. (2020) makes use of the relation between the H2O VCD and the profile shape, and it iteratively finds the optimal VCD by assuming an exponential water vapour profile shape.

For the application of the algorithm to OMI measurements, several modifications had to be applied to the algorithm of Borger et al. (2020). For climate studies such as trend analyses, it is necessary to provide a consistent data record. Thus, all rows that have ever been affected by the so-called row anomaly are excluded from the data set for the complete time series, which corresponds to approximately half of the OMI swath. Furthermore, instead of a daily solar irradiance, an earthshine radiance is used as the reference spectrum within the DOAS analysis. The rationale for using an earthshine radiance over a solar irradiance is as follows:

  • The daily OMI solar irradiance spectra (OML1BIRR version 3) are very noisy and have several gaps, causing high H2O SCD fit errors and, thus, leading to an overall poor quality of the H2O VCD data set.

  • By using an annual mean solar irradiance spectrum from the year 2005 (also used during the QA4ECV project; Boersma et al.2018), a good fit quality can be obtained; however, OMI is also suffering from degradation effects (Schenkeveld et al.2017). Thus, for the case of climate trend analyses, it will be almost impossible to disentangle if a trend signal originates from the spectral degradation of OMI or indeed from a geophysical trend (see also Fig. A1). By using an earthshine radiance as the reference spectrum, these degradation effects will largely cancel out.

  • When using an earthshine radiance as the reference spectrum, the across-track biases within the OMI swath are also strongly reduced (see Fig. 1c); consequently, no destriping is necessary during post-processing (see also Anand et al.2015).

  • However, a disadvantage of the use of earthshine spectra is that the retrieved H2O slant columns do not represent absolute slant columns, as the earthshine reference spectra also contain H2O absorptions. Hence, a slant column representative of the chosen reference sector has to be added to the retrieved values.

For the creation of annual earthshine reference spectra, we selected the Antarctic continent as the reference sector (high surface albedo due to snow and ice cover) and the time period of December (i.e. during austral summer), yielding a relatively high signal-to-noise ratio for our radiance measurements despite large solar zenith angles. Furthermore, only pixels above an altitude of 2000 m above sea level were selected; as the air temperatures are very low in this altitude range, the water vapour concentrations are very low as well, thereby representing a reference atmosphere that is as dry as possible (i.e. the reference SCD or rather the absolute value of its uncertainty has to be as low as possible). Moreover, to avoid the inclusion of noisy measurements (in particular from the descending part of the OMI orbit), only pixels with a solar zenith angle (SZA) below 80  are considered. From these measurements, we calculate the monthly mean radiance for December for each year for every OMI row and then use the resulting reference spectra for the retrievals of the upcoming year.

Figure 1 illustrates the effect of different reference spectra on the H2O SCD distribution for an exemplary orbit. In particular, distinctive stripe patterns are prominent when using the daily solar irradiance as the reference spectrum (Fig. 1a). Although the usage of the annual mean solar irradiance (Fig. 1b) can reduce the strength of the stripes, they are still clearly visible. In contrast, no across-track stripes are detectable for the case of the earthshine reference, and the SCDs are also lower overall due to the H2O absorption in the earthshine reference (Fig. 1c).

Further details about destriping in general and a comparison of the temporal behaviour of the irradiance-based and earthshine SCD are available in Appendix A.

Figure 1Exemplary orbit (Orbit 34382, 1 January 2011) showing the impact of different reference spectra on the OMI H2O SCD distribution: (a) daily solar irradiance, (b) annual mean solar irradiance, and (c) monthly mean earthshine reference.

2.3 VCD conversion and data set generation

To account for the potential water vapour contamination within the earthshine reference spectra, the SCDs based on the earthshine reference have to be corrected for the corresponding offset. In this study, we determine this offset, ΔSCD, for each row based on the difference between the earthshine-based SCDs and solar-irradiance-based SCDs for the first 5 years of OMI operation (see Appendix A). Equation (2) can then be rewritten as follows:

(3) VCD = eSCD + Δ SCD AMF ,

where eSCD denotes the SCD derived using the earthshine reference.

The AMFs are calculated as described in Borger et al. (2020). For the determination of the AMF, additional information about the retrieval scenario, like cloud cover and surface properties, is necessary. We use the cloud information from the OMI L2 NO2 product (OMNO2; Lamsal et al.2021) and the modified OMI surface albedo version of Kleipool et al. (2008) as described in Borger et al. (2020). We also tested the surface albedo information from the OMNO2 product; however, within the framework of a trend analysis study (Borger et al.2022), we observed spatial artefacts in the surface albedo trends that likely arise from the use of an older version of the MODIS data for the albedo calculation (Lok Lamsal, personal communication, 2021). The distribution of TCWV trends is mainly determined by the trends in the SCD. The albedo or AMF trends usually only determine whether the trend signal becomes stronger or weaker, but this only affects trends over land, as an albedo climatology from Kleipool et al. (2008) is used over ocean. As the ice flags from the OMI processor sometimes indicate snow/ice-free surfaces over Antarctica or Greenland, we additionally use the monthly mean sea-ice cover information from ERA5 (fifth-generation ECMWF atmospheric reanalysis; Hersbach et al.2020) and the annual mean land cover information from MODIS Aqua (Sulla-Menashe et al.2019).

To create the OMI TCWV data set, we have chosen the time range from January 2005 to December 2020 and only include observations with an effective cloud fraction <20 % and AMF >0.1. Furthermore, the pixels have to be free of snow and ice and must not be affected by the row anomaly. Hence, while about 50 % of the orbit is missing because of the RA filter, the remaining data still cover an “effective” swath of about 1300 km; this is larger than the swaths of GOME-1, the SCanning Imaging Absorption spectroMeter for Atmospheric CartograpHY (SCIAMACHY), or GOME-2A (all about 1300 km) and of the order of the Special Sensor Microwave/Imager (SSM/I; about 1394 km). Thus, OMI still achieves complete coverage of the Earth about every 2 to 3 calendar days, which should provide enough observational data for good representativeness in the case of a monthly mean (see also Appendix C and the good agreement with the reference data in Sect. 4). In total, this leaves about 30 % of TCWV data from an RA-filtered orbit and about 12 % of data from a complete orbit. The results of every orbit are then gridded to a 1×1 lattice for every day. From these daily grids, the monthly mean H2O VCD distributions are then calculated, ensuring that a continuous TCWV time series is available for as many grid cells as possible.

Figure 2Global mean OMI H2O VCD distribution from 2005 to 2020 based on the OMI analysis using earthshine reference spectra and corrected for the H2O SCD bias. Areas with no valid values are coloured grey.

Figure 2 shows the global mean OMI H2O VCD averaged over the complete time range of the TCWV data set. The resulting distribution demonstrates that the retrieval is capable of capturing the macroscale water vapour patterns, like high VCD values in the tropics (in particular over the Maritime Continent) and low values towards the polar regions, but also characteristic regional patterns, like the South Pacific convergence zone.

3 Sampling errors and clear-sky bias

Although satellite observations enable the analysis of trace gas concentrations on a global scale, a fundamental problem is that a satellite measurement is typically only taken once a day for one location. Furthermore, satellite measurements are usually only available under cloud-free conditions, especially in the visible or infrared spectral range, and thus no continuous time series is guaranteed. Consequently, they cannot provide a complete picture of geophysical variability, which leads to sampling errors in the calculation of averaged values (e.g. monthly means).

Moreover, the following question arises: to what extent does the limitation to cloud-free pixels influence the monthly averages determined from the OMI satellite measurements (i.e. whether a so-called “clear-sky bias” exists in the OMI TCWV data set)? Gaffen and Elliott (1993) investigated this bias using radiosonde ascents and found that the TCWV is about 0 %–15 % lower under cloud-free conditions than under cloudy conditions. Similarly, Sohn and Bennartz (2008) found a clear-sky bias of about 10 % between the Medium Resolution Imaging Spectrometer (MERIS) and the Advanced Microwave Scanning Radiometer for EOS (AMSR-E).

To estimate the sampling errors, we follow the methods of Xue et al. (2019) and Gleisner et al. (2020): we choose hourly resolved ERA5 data with a spatial resolution of 0.25×0.25 as the reference data and collocate the ERA5 data with OMI overpass times. These data are then resampled to the 1×1 resolution of the OMI TCWV data set and the monthly averages are calculated (TCWVsampled). We then take the complete, original ERA5 data, resample them to the same spatial resolution, and calculate monthly means from these data (TCWVtrue). The difference between the two data sets then represents the sampling error:

(4) ε sampling = TCWV sampled - TCWV true .

With this definition, the sampling error summarizes the uncertainties due to gaps in the swath, temporal differences, or missing data (e.g. due to clouds) (Xue et al.2019).

Figure 3Global distributions of the mean sampling errors derived from monthly mean sampling differences for the time range from January 2005 to December 2020. Panel (a) depicts absolute sampling error (i.e. εsampling) and panel (b) shows relative sampling error (i.e. εsampling/TCWVtrue). Grid cells for which no data are available are coloured grey.

Figure 3 shows the mean absolute and relative sampling errors for the complete time range of the OMI TCWV data set (January 2005 to December 2020). Overall, it can be seen that most deviations are negative, i.e. the actual TCWV is underestimated. Regarding the absolute deviations, the strongest deviations can be seen in the area of storm tracks in the mid-latitudes (e.g. North Atlantic) and the polar regions, with values of around 5 kg m−2. The smallest deviations are found in the quasi-permanent cloud-free regions in the subtropics. As expected, the relative differences increase from the Equator towards the poles due to the decreasing TCWV values and reach values stronger than 30 %.

To investigate the extent to which these deviations are related to the clear-sky bias, we proceed similarly to the calculation of the sampling error: we collocate the ERA5 data to the OMI overpass time and once apply a cloud filter (effective cloud fraction < 20 %) and once not. We then resample both data sets to 1×1 and calculate monthly means. The difference between both data sets then represents the clear-sky bias:

(5) ε clear = TCWV clear - TCWV all .

Figure 4Global distributions of the absolute differences (εclear; a, c, e, g) and relative differences (εclear/TCWVall; b, d, f, h) of the mean differences between clear-sky and all-sky ERA5 based on the OMI cloud information for winter (DJF; a, b), spring (MAM; c, d), summer (JJA; e, f), and autumn (SON; g, h) for the time range from January 2005 to December 2020. Grid cells for which no data are available are coloured grey

To determine seasonal structures, the global distributions of the absolute and relative clear-sky bias for the different seasons were determined from the monthly differences (see Fig. 4). Overall, the distributions of the clear-sky bias correspond very closely to the distributions of the sampling error, with respect to both strength and pattern. Moreover, the absolute and relative deviations show only slight changes between the different seasons.

Figure 5Distributions of the absolute differences (εsampling; a) and relative differences (εsampling/TCWVtrue; b) of the monthly mean differences between clear-sky and all-sky ERA5 data based on the OMI cloud information. The solid and dashed orange lines indicate the mean and the median of the distributions, respectively.


Figure 6Distributions of the absolute differences (εclear; a) and relative differences (εclear/TCWVall; b) of the monthly mean differences between clear-sky and all-sky ERA5 data based on the OMI cloud information. The solid and dashed orange lines indicate the mean and the median of the distributions, respectively.


Figures 5 and 6 summarize the sampling error and clear-sky bias distributions, respectively. For the sampling error, we obtain a mean absolute deviation of 1.6 kg m−2 (median of 1.4 kg m−2) and a mean relative deviation of 9.5 % (median of 6.2 %); for the clear-sky bias, we get a mean absolute deviation of 1.7 kg m−2 (median of 1.3 kg m−2) and a mean relative deviation of 10.0 % (median of 5.9 %). However, the distributions of the absolute and relative deviations for the sampling error and the clear-sky bias are highly left-skewed; thus, in particular, the mean value is influenced by the long tails of the distributions. Nevertheless, for the clear-sky bias, the obtained values agree well with the findings of Gaffen and Elliott (1993) and Sohn and Bennartz (2008).

As the effect of the clear-sky bias is already included in the sampling error and the results for both errors are very similar, it can be assumed that the spatial and temporal sampling errors play only a minor or negligible role in comparison to the clear-sky bias.

In addition to the sampling error and the clear-sky bias, we also examined the extent to which the monthly means would change if no RA filter was applied, i.e. if all of the data of the complete OMI swath were available (see Appendix C). We found that, although deviations arise due to the RA filter, these deviations are almost an order of magnitude smaller than those of the clear-sky bias, and the global distribution of the deviations is mostly noisy. Due to this small influence of the RA filter, we conclude that the filtered OMI TCWV data are a good representation of the actual TCWV values.

4 Intercomparison with existing water vapour climate data records

To evaluate the overall quality of the OMI TCWV data set, we conducted an intercomparison study with various reference data sets of monthly mean TCWV products. For this purpose, we use the merged 1 total precipitable water (TPW) data set version 7 from Remote Sensing Systems (RSS) (Mears et al.2015; Wentz2015), TCWV data from the ERA5 reanalysis model (Hersbach et al.2019, 2020), and the CM SAF/CCI TCWV-global (COMBI) data set (Schröder et al.2023) from the European Space Agency (ESA) Climate Change Initiative (CCI) as reference.

The RSS data set consists of merged geophysical ocean products, the values of which are retrieved from various passive satellite microwave radiometers. These microwave radiometers have been intercalibrated at the brightness temperature level, and the ocean products have been produced using a consistent processing methodology for all sensors (more details in Wentz2015; Mears et al.2015). The major advantages of microwave TCWV retrievals are their high precision and accuracy and that they are insensitive to clouds; therefore, TCWV values can also be retrieved even under cloudy-sky conditions. A disadvantage, however, is that these retrievals are (mostly) only available over the ocean surface.

Thus, we also compare the OMI TCWV data to the CM SAF/CCI TCWV-global (COMBI) data set provided by ESA WV_CCI (Schröder et al.2023). The climate data record (CDR) combines microwave and near-infrared imager based TCWV over the ice-free ocean as well as over land, coastal ocean, and sea ice. The data record relies on microwave observations from the SSM/I, the Special Sensor Microwave Imager/Sounder (SSMIS) , the AMSR-E, and the Tropical Rainfall Measuring Mission Microwave Imager (TMI), partly based on a fundamental climate data record (Fennig et al.2020) and on near-infrared observations from MERIS, MODIS Terra, and the Ocean and Land Colour Instrument (OLCI) (Danne et al.2022). Hence, it is one of the few (satellite) measurement data sets that provide global coverage over ocean and land surface. Moreover, the data set has been extensively validated with respect to global reference data sets (e.g. ERA5), satellite products, GPS measurements from SuomiNet, and radiosonde observations from the Global Climate Observing System Reference Upper-Air Network (GRUAN) (more details in the validation report of Schröder et al.2023).

Within comparisons between different satellite data sets, a major drawback is the influence of sampling errors due to different observation times, pixel footprint sizes, or orbit patterns. To minimize this source of error, data from reanalysis models are useful. ERA5 is the fifth-generation ECMWF reanalysis (Hersbach et al.2020) and combines model data with in situ and remote sensing observations from various different measurement platforms. For our purpose, we use the “monthly averaged reanalysis by hour of day” from the Copernicus Climate Data Store on a 1×1 grid. To account for OMI's observation time (around 13:30 LT), we first calculate the local time for each longitude in the ERA5 data set, select the TCWV data for the time period between 13:00 and 14:00 LT, and finally merge the selected data. For the intercomparison, it is also important to consider that the reference data sets are not perfect nor error-free; thus, we perform an orthogonal distance regression (ODR; Cantrell2008).

In the case of the ODR, it is necessary to use reasonable ratios of the relative errors of the compared data sets, instead of using absolute errors, in order to obtain meaningful results. In a comprehensive uncertainty analysis, Wentz (1997) determined a typical error of 1.22 kg m−2 for SSM/I observations. Mears et al. (2015) found that the uncertainty of daily microwave TCWV observations for TCWV = 10 kg m−2 was around 1 kg m−2 with respect to Global Navigation Satellite System (GNSS) measurements, whereas this value for TCWV = 60 kg m−2 was around 2–4 kg m−2. Hence, we assume that the uncertainty of the RSS data set is 5 % or at least 1 kg m−2. For ERA5 and COMBI, we can assume similar uncertainties over ocean, as the TCWV values there are also mainly based on microwave observations. Unfortunately, no uncertainties are provided for TCWV over land. Thus, for the sake of simplicity, we assume that the relative errors in the reference data sets over land are twice as high as over ocean, i.e. 10 % or at least 2 kg m−2. For the OMI TCWV data set, we assume an uncertainty of 20 % (Borger et al.2020) but at least 2 kg m−2. We also tested other error assumptions and found that the exact choice of errors is negligible for the regression results as long as the ratio of uncertainties remains similar.

First comparisons with the reference data over land indicated that the OMI data set shows different levels of agreement for low and high TCWV values, with high deviations being particularly prominent for high TCWV values. To be able to estimate the quality of the OMI data set for low and high TCWV, a piecewise linear regression (PWLR) is additionally performed for data over land. For the PWLR, a function of the form

(6) f ( x ) = a 0 x + b 0 x < x 0 , a 1 x + b 1 x > x 0

is assumed, whereby the function parameters (including x0) are determined via a non-linear least-squares fit.

4.1 Intercomparison with RSS SSM/I

The results of the intercomparison between OMI and the RSS TCWV data set are summarized in Fig. 7. Figure 7a depicts the 2D histogram from the comparison between the monthly mean values from RSS and the OMI TCWV data set. The data are distributed closely along the 1-to-1 diagonal (dashed black line), and the results of the orthogonal distance regression (ODR, solid red line) indicate an overall very good agreement with slopes of around 1.01 and a coefficient of determination of RODR2=0.96. If only the TCWV anomalies are compared (i.e. the seasonal cycle is removed), we obtain correlations of R2=0.50.

Figure 7Intercomparison between monthly mean TCWV from the OMI and RSS merged SSM/I data set for data over ocean. Panel (a) illustrates a 2D histogram in which the colour indicates the count density; the solid red line represents the results of the orthogonal distance regression (ODR). The results of the respective fits are given in the bottom right box and the correlation coefficient is presented in the top left corner. The dashed black line indicates the 1-to-1 diagonal. Panel (b) depicts the TCWV difference of OMI minus RSS within the latitude–time space; reddish colours indicate an overestimation of the OMI TCWV data set, whereas blueish colours denote an underestimation.


Figure 7b illustrates the zonally averaged monthly mean difference of OMI minus RSS TCWV within the latitude–time space. In general, the deviations between OMI and RSS are quite low with a positive bias of +1.0 ± 1.5 kg m−2. Within the tropics (i.e. between 20 and 20 N), we obtain a mean deviation of +2.0 ± 1.6 kg m−2, whereas we find values of +0.6 ± 1.3 kg m−2 in the extratropics. However, within the tropics, distinctive periodic patterns of positive deviations are also observable.

Figure 8Global mean TCWV difference of OMI minus RSS SSM/I for the time range from January 2005 to December 2020. Areas with no valid values are coloured grey.

Figure 8 shows the global mean TCWV difference between OMI and RSS SSM/I over the complete time period of the OMI TCWV data set. Consistent with the findings from Fig. 7, the highest positive deviations can be found in the tropical Pacific Ocean and near the coastlines of South America, Africa, and Indonesia, whereas the strongest negative deviations are obtained around the South Pacific convergence zone and the East Siberian Sea. In the case of the tropical Pacific Ocean, the distribution of the systematic positive deviations matches regions of cold water or of the so-called “cold tongue”, which is frequently affected by low clouds, quite well. As the highest water vapour concentrations occur in the lower troposphere, small deviations of a few hundred metres in cloud top height can have relatively large effects on the AMF (and thus on the retrieved TCWV). In the case of Central America or the Atlantic Ocean, an overly low albedo due to additional absorption by phytoplankton (Kleipool et al.2008) could explain the systematic positive deviations.

Additional comparisons considering only valid grid cells according to the “common mask” from the COMBI data set are presented in Appendix B. This mask filters regions where no continuous time series of data is available or where the data are affected by high uncertainties, e.g. due to frequent cloud cover. Therefore only high-quality measurements are compared to each other. However, as mainly regions over land surface are affected, the comparisons with the filtered data are almost identical to the unfiltered data.

4.2 Intercomparison with ERA5

The results of the intercomparison with ERA5 are depicted in Fig. 9. To investigate potential dependencies on the surface type, we separated the data into data over ocean (Fig. 9a, b) and data over land (Fig. 9c, d). The intercomparison for data over ocean reveals similar results to the intercomparison between OMI and RSS: the ODR results indicate a slight overestimation (slopes of around 1.03) and a coefficient of determination close to unity (RODR2 of around 0.96). Moreover, the periodic pattern of positive deviations in the tropics occurs again. Overall, a small positive bias of +1.7 ± 1.7 kg m−2 is observed, which increases to +3.4 ± 1.7 kg m−2 in the tropics (20 to 20 N) but is around +1.1 ± 1.3 kg m−2 in the extratropics. In addition, the correlation of the anomalies is approximately R2=0.49.

Figure 9Same as Fig. 7 but with ERA5 data over ocean (a, b) and over land (c, d). In panel (c), the solid red line represents the results of the orthogonal distance regression (ODR) and the solid black line presents the results of the piecewise linear regression (PWLR).


For data over land, the picture is different: although the ODR gives similar results for the slope as for data over ocean, the distribution in the 2D histogram (Fig. 9c) shows particularly strong positive deviations of approximately +10 kg m−2 at high TCWV values and an overall systematic offset of around +1.43 kg m−2. Within the PWLR analysis, we find a good agreement with the reference data for TCWV values up to about 25 kg m−2 (which represents approximately 74 % of all data points) with slopes of around 0.96. However, for higher TCWV values, we find distinctive positive overestimations of up to 24 %. Nevertheless, even for low TCWV values, a systematic offset of approximately +2.52 kg m−2 is obtained. Furthermore, the correlation of the TCWV anomalies is only around R2=0.40.

According to the corresponding latitude–time difference plot (Fig. 9d), the systematic positive deviation in the tropics is now much stronger with values of around +6.2 ± 3.4 kg m−2 (for latitudes < 20); however, in the extratropics, the positive deviation is around +1.7 ± 1.2 kg m−2 on average and thus of similar magnitude as for the ocean comparisons.

Figure 10Same as Fig. 8 but for ERA5.

Closer inspection of the mean TCWV difference between OMI and ERA5 (see Fig. 10) reveals that the strong deviations over the tropical landmasses mainly occur in the regions that are affected by frequent cloud cover, such as the Amazon Basin, Central Africa, and the Maritime Continent. In part, these overestimations are further amplified by sampling errors due to complex topography or high mountains (which are also associated with high snow/ice cover and, thus, fewer valid observations).

Hence, the reasons for the distinctive positive deviations with respect to ERA5 may arise from different causes. For the case of the OMI TCWV retrieval, two main uncertainty sources may cause the strong, systematic positive deviations. First, there is the possibility that the used land surface albedo from Borger et al. (2020) is too low, leading to an underestimation of the AMF and, consequently, to an overestimation of the H2O VCD. However, Borger et al. (2020) also showed that their modified albedo map led to overall better results for the case of the TROPOMI TCWV retrieval. On the other hand, there may also be uncertainties in the retrieval input data of the cloud information from the L2 NO2 product; for example, if the surface albedo is underestimated in the input of the cloud algorithm, this leads to an overestimation of the cloud top height and, thus, to an underestimation of the AMF and, finally, to an overestimation of the H2O VCD. For the case of ERA5, the frequent cloud cover can be also major source of uncertainty, as only few satellite measurements (or none at all in the thermal infrared) are available due to frequent cloud contamination. This might lead to clear-sky dry biases in the cloud-affected regions and increased uncertainties within the assimilation process due to the complex radiative transfer in cloudy scenarios (e.g. Li et al.2016). Likewise, these remote regions are affected by an overall sparseness in the observation density of in situ measurements; therefore, the ERA5 TCWV values are likely to be based mainly on modelled data. Thus, overall, the strong positive deviation of the OMI TCWV data set likely results from the combination of an overestimation of the OMI TCWV retrieval and an underestimation of the ERA5 data.

One way to address these errors would be to develop an independent albedo and cloud product, but this is far beyond the scope of this paper. Moreover, considering the demise of OMI in the near future (probably in 1–2 years) and the ongoing reprocessing of L1 data, the development of such an algorithm would not be worthwhile at the moment of writing this paper.

Hence, considering these large uncertainties in the OMI retrieval and that the uncertainties in ERA5 for data over tropical landmasses are not negligible anymore, we conclude that the OMI TCWV data set can represent the global distribution of the atmospheric water vapour content well, at least over ocean. Over land, however, the data set should be treated with caution due to the systematic positive deviations from the reference data sets, especially in areas of high TCWV values (i.e. above 25 kg m−2).

An additional comparison in which particularly critical regions were filtered using the common mask from the COMBI data set (see Fig. B1) is given in Appendix B. When this mask is applied, only high-quality measurements are taken into account for the intercomparison. As a result, the extreme overestimations are filtered out and the distribution in the 2D histogram for the comparison over land improves considerably (see Fig. B6a). The slope of the ODR is now around 0.96, which is closer to the results of the PWLR regression for TCWV < 25 kg m−2.

4.3 Intercomparison with COMBI

For the intercomparison with the COMBI data set, we resampled the CDR from its native spatial resolution (0.5×0.5) to the lattice of the OMI TCWV data set. Furthermore, although COMBI covers a time span from July 2002 to December 2017, we focus on the time period from January 2005 to March 2016, as the CDR's difference relative to ERA5 over land is only stable over the MERIS and MODIS period, i.e. from July 2002 to March 2016 if looking at clear-sky data. For the sake of completeness, the results for the comparison over the complete time range are depicted in Figs. E1 and E2 in the Appendix.

Figure 11Same as Fig. 7 but with COMBI data for data over ocean (a, b) and for data over land (c, d). In panel (c), the solid red line represents the results of the ODR and the solid black line presents the results of the PWLR.


Figure 11 summarizes the results of the intercomparison. Not surprisingly, the results for data over ocean (Fig. 11a) are similar to the findings of the RSS SSM/I and ERA5 comparison, as measurements from the same (or similar) sensors have been considered: the ODR results indicate slight overestimations of around 2 % with a coefficient of determination of around 0.95, and the latitude–time diagram indicates an average deviation of +1.3 ± 1.7 kg m−2 (+2.5 ± 2.0 kg m−2 in the tropics and +0.8 ± 1.4 kg m−2 in the extratropics). However, the correlation of the TCWV anomalies is slightly lower compared to the other data sets, with values of around R2=0.45 over ocean.

Similar to the intercomparison with ERA5, the intercomparison over land (Fig. 11c) shows roughly similar ODR fit results to those over ocean, but we also find striking positive deviations for high TCWV values and an overall positive offset of 2.23 kg m−2. Again, when applying a PWLR analysis, we obtain good agreement, with slopes of around 0.95 for TCWV values to about 25 kg m−2, but still a distinctive positive offset of 3.51 kg m−2 for low TCWV values and distinctive overestimations of up to 33 % for higher TCWV values, which is even higher than for the comparison to ERA5. Consequently, the systematic deviations are also much stronger (see Fig. 11d) and reach values of around +7.2 ± 3.6 kg m−2 in the tropics, around +2.7 ± 1.4 kg m−2 in the extratropics, and a global average of +4.1 ± 3.1 kg m−2. These even higher deviations compared with the analysis with ERA5 could be due to the different observation times of the data sets: MERIS on Envisat and MODIS on Terra have overpass times of 10:00 and 10:30 LT, respectively, and follow a descending orbit, whereas OMI measures at 13:30 LT in an ascending orbit. This might also explain the worse correlation of anomalies (R2=0.32) for data over land.

Figure 12Same as Fig. 8 but for COMBI.

Overall, similar to the comparison to ERA5, the strongest positive deviations again occur over the tropical landmasses that are mostly affected by frequent cloud cover (see Fig. 12). Likewise, further overestimations appear in areas with complex high topography (e.g. Indonesia, the Andes, and the Himalayas), suggesting sampling errors when merging the spatial resolutions of the data sets and missing observations due to snow/ice cover.

In Appendix B, we present a comparison in which critical regions were filtered using the common mask from the COMBI data record. When this mask is applied, there are clear improvements for the comparison over land: the prominent overestimates at high TCWV values are filtered out and the distribution is now closer to the 1-to-1 diagonal (see Fig. B6b). For the ODR, the slope is around 0.98, which agrees quite well with the slopes obtained for the PWLR for TCWV < 25 kg m−2.

5 Intercomparison with IGRA2 radiosonde observations

For further comparisons besides reanalysis and satellite data, in situ measurements from radiosondes are invaluable, as these measurements can provide information on the vertical water vapour distribution with high accuracy (Dirksen et al.2014). Here, the Integrated Global Radiosonde Archive (IGRA) is particularly well suited for global intercomparisons: IGRA is a collection of historical and near-real-time radiosonde and pilot balloon observations from around the globe (Durre et al.2006, 2018) provided by the National Centers for Environmental Information (NCEI) of the National Oceanic and Atmospheric Administration (NOAA). For IGRA version 2 (IGRA2; Durre et al.2016, 2018), 40 data sources were converted into a common data format and merged into one coherent data set which then went through a quality-assurance system. While, to our knowledge, no explicit uncertainty estimates have been conducted for water vapour measurements, the IGRA2 humidity measurements are subject to rigorous quality control (Durre et al.2018) and the completeness of the IGRA2 humidity observations has also been checked by Ferreira et al. (2019).

Figure 13Global distribution of radiosonde stations used for the TCWV intercomparison to the MPIC OMI data set. Colours indicate the number of data pairs used for the intercomparison.

Although IGRA2 also provides TCWV data, these are calculated from the surface up to only 500 hPa. Typically, this pressure level is at about 5 km above mean sea level; therefore, if one assumes a typical scale height of the water vapour of 2.1 km (Weaver and Ramanathan1995), a low bias of 10 % could be introduced. Thus, to ensure a consistent calculation of the TCWV monthly means from the IGRA2 data, the following criteria were applied to the individual radiosonde ascents:

  1. Only radiosonde ascents that have reached an altitude of at least 300 hPa were considered for the calculation of the TCWV. This pressure level corresponds to a typical geometric altitude of around 9 km. This ensures that the radiosondes covered a large part of the troposphere and, thus, captured the majority of the TCWV without introducing non-negligible low biases.

  2. For the calculation of the monthly means, valid radiosonde ascents of at least 10 different days in the month must have taken place in order to achieve a good temporal coverage of the month.

  3. Only stations with at least 12 valid data pairs between the monthly means of IGRA2 and Max Planck Institute for Chemistry (MPIC) OMI TCWV data set were considered for the statistical analysis.

Figure 13 shows the global distribution of the locations of the radio sounding stations as well as the numbers of valid data pairs used for the comparison. Altogether, 731 different radiosonde stations are considered for this comparison study. In addition to a high density of measurement stations, there is a general good temporal coverage in the northern mid-latitudes (especially in North America and Europe) and, thus, good temporal collocation between MPIC OMI and IGRA2 data. For the other parts of the world, however, the measurement network is much less dense; hence, the number of temporal collocations of the two data sets do not reach the values from the northern mid-latitudes. Thus, due to the limited sample size at many stations, the median of the deviation is now used instead of the mean deviation for the comparisons.

Figure 14Global distribution of the median TCWV difference between the monthly means of the MPIC OMI TCWV data set and those derived from the IGRA2 radiosoundings.

The distribution of these median deviations is given in Fig. 14. Overall, the results are consistent with the findings from the previous comparisons with the global satellite and reanalysis data sets (see Sect. 4), in which a good to very good agreement was found for the extratropics and an overestimation was observed for the tropics. On average, the median deviation is about +1.6 ± 3.4 kg m−2, with about +0.9 ± 2.0 kg m−2 in the extratropics and +4.3 ± 5.5 kg m−2 in the tropics.

Nevertheless, this comparison to radiosonde measurements demonstrates that the MPIC OMI TCWV data set is also in good to very good agreement with in situ reference data sets but tends to be systematically overestimated in the tropical landmass regions; this is in line with the previous findings from the comparisons with reanalysis and satellite data (Sect. 4).

6 Temporal stability

In addition to a good agreement with existing reference data sets, the temporal stability is an important property of a climate data record. As the COMBI data set only covers the time range up to December 2017, we focus on the comparison to the RSS SSM/I and ERA5 data sets, as these two cover the complete time range of the OMI TCWV data set. For the sake of completeness, however, we also show the results for COMBI.

To assess the stability of the OMI TCWV data set, the global mean relative deviation ϵ is first derived for every time step:

(7) ϵ = OMI - TCWV ref TCWV ref .

For the calculation of global means, only data points or grid cells are considered for which data are available in both the OMI TCWV and the reference data sets for every time step (i.e. no gaps in the time series). In the case of the COMBI data set, a common mask has been provided (see also Fig. B1).

The temporal linear trends of these deviations are then calculated using a generalized least-squares (GLS) regression for the fit function:

(8) Y t = m + b X t = M x + N t

with the intercept m, the trend b, and the increasing time index Xt (in months), which can all be summarized in a matrix Mt. The term Nt stands for the fit residuals with respect to the time series. To account for the temporal autocorrelation of the fit residuals Nt of the GLS, the Prais–Winsten transformation (Prais and Winsten1954; Greene2019) is used, assuming that the residuals follow an autoregressive (AR) process. For this purpose, the autocorrelation function (ACF) is estimated using the Gaussian-kernel-based cross-correlation function algorithm, as described in Rehfeld et al. (2011). For the estimation of the order of the AR model, we use the partial autocorrelation function (PACF) and investigate after which lag all values of the PACF lie within a confidence interval ±δ. Assuming that the PACF values for high lags follow a white noise, the confidence interval is defined by the Z score (in our case, of a significance level of 95 %) and the length of the time series L according to the following formula (Box et al.2015):

(9) δ = Z L .

An AR model can then be created from the determined AR order, which is then used to transform the GLS using the transformation matrix P:

(10) P Y t = Y t = P ( M t x + N t ) = M t x + ε t .

For details about the construction of the transformation matrix, we refer to Weatherhead et al. (1998), Mieruch et al. (2008), and Borger et al. (2022). The trends are then determined from the transformed system in Eq. (10) by simple linear algebra. The results and their uncertainties then already include the effect of the temporal autocorrelation.

Figure 15Stability analyses of the global mean relative deviations between the OMI TCWV data set with respect to (a) RSS SSM/I, (b) ERA5, and (c) COMBI. The red line presents the global mean relative deviation, the blue line shows the results of the transformed GLS regression, the dotted black line denotes the respective 25th and 75th percentiles, the dashed lines represent data for the time range from January 2005 to March 2016, and the solid lines represent data for the time range from January 2005 to the end of the respective data set. The bias and root-mean-square (RMS) values provided in the legends correspond to the time series of the global mean deviation for the respective time range.


Figure 15 illustrates the temporal variability of the relative differences between the OMI TCWV data set and RSS SSM/I, ERA5, and COMBI for the time range from January 2005 to March 2016 (dashed blue lines) and from January 2005 to the end of the respective data set (solid blue lines). For all three data sets and all time ranges, the PACF analyses showed that an AR(3) model is the most appropriate choice. For the time series until March 2016, we find trends of +0.21 ± 1.20 % per decade for the comparison to RSS SSM/I, +0.21 ± 1.28 % per decade for the comparison to ERA5, and 1.74 ± 1.50 % per decade for the comparison to the COMBI data.

For the time series until the end of the reference data set, one finds trends of +0.01 ± 0.70 % per decade for the comparison to RSS SSM/I and 0.09 ± 0.71 % per decade for the comparison to ERA5. Moreover, the statistical analyses reveal that these trends are not significantly different from 0 % per decade. For the comparison to the COMBI data, there is a stronger trend (around 0.43 ± 1.16 % per decade) than for the other two data sets; however, the time range is also much shorter and does not cover the complete time range of the OMI TCWV data set. Altogether, the obtained trends of the relative deviations are in line with typical stability requirements for climate data products of ±1 % per decade (see e.g. Beirle et al.2018, and references therein or the ESA WV_cci user requirements;, last access: 23 May 2023). Moreover, these trends are also in line with the recently published stability requirements for Essential Climate Variables (ECV) according to the Global Climate Observing System (GCOS) implementation plan with stabilities of ±0.1 % per decade as “goal”, ±0.2 % per decade as “breakthrough”, and ±0.5 % per decade as “target” stability (see GCOS-245;, last access: 23 May 2023).

To understand the extent to which the temporal stability differs over land and over ocean, the data were separated and analysed. The results of this separate analyses are shown in Fig. 16 (over ocean) and Fig. 17 (over land). The RSS data set was not investigated again, as it is only available over ocean; therefore, it is redundant to re-examine it here. The PACF analyses revealed that an AR(3) model and an AR(2) model are the most appropriate choices over ocean and over land, respectively, for all stability analyses.

Figure 16Same as Fig. 15 but only for (a) ERA5 and (b) COMBI and only for data over ocean.


Over ocean, the OMI data set also meets the 1 % per decade stability criterion (as well as various GCOS stability criteria) for both the long and short periods for the case with ERA5 as the reference (+0.01 ± 1.17 % and 0.28 ± 0.67 % per decade, respectively). In contrast, no stability criterion for the comparison with the COMBI data set is fulfilled for both time periods any more (0.87 ± 1.08 % per decade for the longer and 1.78 ± 1.39 % per decade for the shorter time period). This is surprising, as both reference data sets should consist largely of similar measurement data from mainly microwave satellites. ERA5 is possibly better constrained again due to its larger volume of assimilated observation data.

Figure 17Same as Fig. 15 but only for (a) ERA5 and (b) COMBI and only for data over land.


Over land, the situation is even more complicated: whereas the 1 % stability criterion is still met at +0.62 ± 0.96 % per decade for the period from 2005 to 2020 for ERA5, this is no longer the case for the shorter period at +1.24 ± 1.75 % per decade. In the case of the COMBI data set, the stability criterion is not even close to being fulfilled for the period from 2005 to 2017 (+3.36 ± 2.04 % per decade) nor for the period from 2005 to 2016 (0.79 ± 2.39 % per decade).

Considering the obtained results, it seems that both stability trends over land and ocean largely cancel each other. However, one reason for the high relative deviations over land could be that mainly desert-like regions are used in the analysis due to the aforementioned filter criterion. Thus, rather low TCWV values are used in the normalization, which means that extreme relative deviations can occur even with rather small, absolute deviations.

7 Data availability

The MPIC OMI total column water vapour (TCWV) climate data record is available at (Borger et al.2023).

8 Summary

In this study, we present a long-term 16-year data record of total column water vapour (TCWV) retrieved from multiple years of OMI observations in the visible blue spectral range by means of differential optical absorption spectroscopy. To derive TCWV from OMI measurements, we applied the TCWV retrieval developed for TROPOMI (Borger et al.2020) and modified the spectral analysis to account for the degradation of OMI's daily solar irradiance. Thus, annual earthshine reference spectra were calculated from radiance measurements over Antarctica during December (austral summer).

The estimation of the sampling errors in the OMI TCWV data set results in average errors of about 10 % (6 % for the median), and the largest deviations occur mainly in the the mid-latitude storm tracks and polar regions. Further investigations show that the large deviations of the sampling error correlate well with the deviations of the clear-sky bias. However, the investigation of a seasonal effect of the clear-sky bias did not show any seasonal dependence. Considering the dominant role of the clear-sky bias in the sampling error, we conclude that the spatiotemporal sampling errors are rather negligible.

Within an intercomparison study, the OMI TCWV data set proves to be in good agreement with the reference data sets of RSS SSM/I, ERA5, and the ESA WV_cci CDR-2 COMBI, in particular over ocean surface. However, over land surface, the OMI data set systematically overestimates high TCWV values compared with ERA5 and COMBI by more than 24 %, especially in the tropical regions affected by frequent cloud cover. Similar results are found from intercomparisons with in situ radiosonde measurements of the IGRA2 data set. The reasons for these overestimations are manifold, but they are likely due to an overestimation of the OMI TCWV retrieval owing to uncertainties in the retrieval input data (surface albedo, cloud information) and an underestimation of the reference data caused by missing or uncertain observations. Nevertheless, the validation shows that good agreement with the reference data can be obtained for TCWV < 25 kg m−2 and also for the case when regions of large uncertainty are filtered. Considering the temporal stability analysis, no significant deviation trends could be obtained.

For the cases of ERA5 and RSS SSM/I, the temporal stabilities of less than ±0.1 % per decade meet the “goal” requirements of the latest GCOS report; furthermore, for the case of COMBI, the “goal” requirement is still met. This demonstrates that the OMI TCWV data set is well suited for climate studies.

Overall, the OMI TCWV data set provides a promising basis for investigations of climate change: on the one hand, it covers a long time series (more than 16 years and with measurements still in operation); on the other hand, these measurements are based on a single instrument, so that no bias corrections between different sensors need to be taken into account (e.g. in trend analysis studies). Although OMI is affected by degradation effects, we were able to successfully suppress these effects by using earthshine reference spectra. Furthermore, the data set is based on a retrieval in the visible blue spectral range, where a similar sensitivity for the near-surface layers over ocean and land is given and, thus, a consistent global data set can be obtained from measurements of only one sensor.

In the future, we plan to complement the data set with TCWV measurements from TROPOMI to ensure the continuation of the data set after the end of the OMI mission. As the TCWV retrieval can be easily applied to other UV–Vis satellite instruments, additional data sets from other instruments from past and present missions, such as GOME-1/2 and SCIAMACHY, and future instruments, such as Sentinel-5 on MetOp-SG, can be created and eventually combined with the OMI TCWV data set, thereby taking the different instrumental properties (e.g. observation time) into account. This would allow the construction of a data record that extends from 1995 to today. Similarly, a combination of data from low-earth-orbit satellites and geostationary satellite instruments, such as GEMS, TEMPO, or Sentinel-4, could be a promising option to fill temporal gaps in daily observations as well as to investigate (semi-)diurnal cycles of the water vapour distribution.

Appendix A: Irradiance-based vs. earthshine SCD

To reduce the across-track biases of the retrieved H2O SCDs based on a solar reference spectrum, a destriping algorithm can be performed during post-processing. For instance, one way to destripe the swath of an OMI orbit is to

  1. calculate the median SCD for each OMI row along-track,

  2. calculate the across-track median SCD from the along-track median SCDs,

  3. calculate the deviation of the along-track median SCDs from this across-track median SCD,

  4. subtract the deviation from the SCDs of the respective OMI row.

For the case of an earthshine reference, this is already implicitly accounted for during the spectral analysis; however, one still has to consider that the earthshine reference spectrum is not perfectly pristine with respect to the trace gas of interest. For example, in our case, although the water vapour concentrations in Antarctica are very low, the earthshine reference might still be contaminated because of the long light path at such high solar zenith angles.

Figure A1 illustrates the time series of the global monthly mean H2O SCDs derived from the annual mean solar irradiance (and destriped following the aforementioned destriping process) and the earthshine reference for SZA < 80. Until 2009, the offset between both SCDs remains constant at values of around 0.2×1023 molec. cm−2. Between 2009 and 2015, the irradiance-based SCDs first decrease and then increase distinctively compared with the earthshine-based SCDs, and a strong increase in the irradiance-based SCDs can be observed from 2015 onwards. In contrast, the earthshine SCDs show no jumps or steps and remain at the same magnitude after 2015 and over the complete time range in general.

To get an overview of how the SCD difference (i.e. solar-irradiance-based minus earthshine-based SCD) behaves with time over the complete OMI swath, Fig. A2 depicts the monthly mean SCD difference for each OMI row. Between 2005 and 2009, the SCD differences remain quite constant for each row; however, after 2009, artefacts arise first at rows 55–60 and then start to expand to other rows and become even stronger. This clearly illustrates that an OMI TCWV product based on a solar irradiance fit cannot be used for trend analyses.

Figure A1Globally averaged monthly mean of the destriped H2O SCDs derived from annual mean solar irradiance and H2O SCDs derived using the annual earthshine reference from 2005 to 2020.


Figure A2Global mean monthly averaged difference between annual mean irradiance and earthshine H2O SCD for each OMI row separately. Only snow- and ice-free observations with a solar zenith angle < 80 are included. Rows affected by the row anomaly (coloured in grey) are excluded for the complete time series.


Appendix B: Intercomparisons considering masks and flags

The intercomparison in Sect. 4 also considers regions for which only a small number of measurements are available, for example due to frequent cloud cover or seasonality of the solar zenith angle. On the one hand, the small sample size of measurements leads to a higher statistical uncertainty with regard to the monthly mean; on the other hand, it also leads to a non-continuous time series when data are missing for the complete month. Moreover, the errors in the individual measurements are also significantly larger in these regions. With the help of the common mask of the COMBI data set (see Fig. B1), these regions can be identified and filtered for additional intercomparisons. The common mask only considers grid cells for which valid TCWV values are available over all time steps in the COMBI data set in the time period from July 2002 to March 2016 (Schröder et al.2023).

In addition, two flags were created from the MPIC OMI TCWV data set itself:

  1. A static flag was created for filtering coastlines. For each grid cell, it was checked whether all corners and the centre of the grid cell were either over ocean or land. If all coordinates were over land, the cell was declared “land”; if all cells were over ocean, it was declared “ocean”; and it was declared as coastline (“coast”) otherwise. The resulting map is shown in Fig. B2.

  2. A dynamic monthly flag was created based on the number of measurements to calculate the individual monthly means per grid cell. We have chosen to consider a grid cell as valid if the monthly mean was calculated from more than 100 measurements. This represents a good compromise between global coverage and a good statistic for calculating the monthly mean. Figure B3 shows the fractional coverage for the complete time range of the data set using this mask. Compared to the COMBI mask (see Fig. B1), similar regions are filtered. However, a major advantage of the MPIC mask is that it considers temporal changes, so that the seasonal variability in the atmosphere (e.g. cloud cover and solar zenith angle) is also taken into account when flagging.

Figure B1The common mask of the COMBI data set. Yellow grid cells indicate data points that are accounted for within a temporal stability analysis. Invalid grid cells are coloured grey.

Figure B2Global distribution of the coastline flag of the MPIC OMI TCWV data set.

Figure B3Global distribution of the fractional coverage considering the count flag of the MPIC OMI TCWV data set.

The results of the intercomparisons considering the COMBI mask and the MPIC OMI flags, respectively, are shown in Figs. B4 and B5 for data over ocean and in Figs. B6 and B7 for data over land. Overall, it can be seen that the mask of COMBI and the flags of the OMI data set lead to similar changes in the comparisons to the reference data. For all comparisons, the coefficients of determination for the ODR regression remain at approximately a similar level (i.e. R2 above 0.90) as for the “non-filtered” comparisons. For the comparisons over ocean, hardly any changes are obtained, as the filter is mainly applied over land surfaces. The differences between the comparisons with the different filters result mainly from the fact that the MPIC flags filter measurements in the higher latitudes (especially during the winter months).

However, there is a remarkable improvement for the comparison over land: although the fit results of the ODR change only slightly, the extreme overestimates at high TCWV values are now filtered out and the distributions are now closer to the 1-to-1 diagonal. Overall, the results for the “filtered” comparison over land also agree very well with the results of the PWLR, for which similar slope regression results were found for TCWV < 25 kg m−2.

Figure B4Correlation analysis of the OMI TCWV data set and RSS SSM/I, ERA5, and COMBI for data over ocean considering only valid grid cells according to the common mask in Fig. B1.


Figure B5Same as Fig. B4 but only considering valid grid cells according to the coastline and count flag of the MPIC OMI TCWV data set.


Figure B6Same as Fig. B4 but for data over land. The solid red line represents the ODR results and the solid black line represents the PWLR results.


Figure B7Same as Fig. B6 but only considering valid grid cells according to the coastline and count flag of the MPIC OMI TCWV data set.


Appendix C: Representativeness of RA-filtered data in comparison to the full swath

Due to the row-anomaly filter, approximately 50 % of the complete satellite swath of OMI is not considered in the TCWV data set. This raises the question of how much the monthly mean values would differ if the data of the complete swath were available. To investigate this, we follow the same scheme as in Sect. 3 and use the same ERA5 data as a reference. We select the ERA5 data to match the OMI overpass, once applying the row-anomaly filter and once not. However, in both cases, the clear-sky filter based on the OMI cloud information is applied (effective cloud fraction < 20 %).

Figure C1Global distributions of the mean differences between RA-filtered and full-swath ERA5 based on the OMI cloud information for the time range from January 2005 to December 2020. Panel (a) depicts the absolute differences (i.e. RA-filtered minus full swath) and panel (b) presents relative differences (i.e. (RA-filtered minus full swath)/full swath). Grid cells for which no data are available are coloured grey.

Figure C2Distributions of the absolute differences (RA-filtered minus full swath; a) and relative differences ((RA-filtered minus full swath)/full swath; b) of the monthly mean differences between RA-filtered and full-swath ERA5 data based on the OMI cloud information. The solid and dashed orange lines indicate the mean and the median of the distributions, respectively.


Compared with the clear-sky bias, the deviations are much weaker, and no particular spatial patterns are discernible in the global distributions except in the deep Pacific tropics and parts of Southeast Asia (see Fig. C1). Furthermore, the histograms for the absolute and relative deviations in Fig. C2 show a normal distribution for both cases with mean values of 0.30 kg m−2 and 2.1 % (median of 0.23 kg m−2 and 1.1 %). Considering the much larger uncertainties of the OMI TCWV retrievals of typically 20 % or more and that the clear-sky bias is almost an order of magnitude larger, the obtained deviations are negligible; thus, the monthly means from the RA-filtered data are a good representation compared to the monthly means from the data for a full swath, even though only half of the satellite data are actually used.

Appendix D: Temporal stability analysis with respect to IGRA2

In addition to the global data sets, a stability analysis was also carried out with the IGRA2 radiosonde data. Due to the criterion of temporal coverage, only 62 of the more than 700 IGRA2 stations are left for the analysis. As almost all of these are located in the northern mid-latitudes, the stability analysis is not globally representative, but the comparison can provide further important independent information.

The course of the temporal stability and the results of the analysis are depicted in Fig. D1. To calculate the temporal stability, a PACF analysis was conducted which revealed that an AR(2) model is most appropriate. Following the same procedure as in Sect. 6, the transformed GLS regression yielded a stability of +1.33 ± 1.37 % per decade. Although this does not fulfil any stability criterion, these results are considerably better than the findings for the COMBI TCWV data set over land (see Fig. 17). Furthermore, it is difficult to determine whether the trend may come from the radiosondes themselves, as it is not clear how regularly the radiosondes are calibrated (e.g. according to the GRUAN standard; Dirksen et al.2014).

Figure D1Stability analysis of the mean relative deviations of the OMI TCWV data set with respect to IGRA2 radiosonde data for the time range from January 2005 to December 2020. The red line represents the global mean relative deviation, the blue line shows the results of the transformed GLS regression, and the dotted black lines present the respective 25th and 75th percentiles. The bias and RMS values provided in the legends correspond to the time series of the global mean deviation for the respective time range.


Appendix E: Intercomparison with COMBI over the full time period

Figure E1Same as Fig. 11 but with COMBI data for data over ocean (a, b) and for data over land (c, d) for the complete time range.


Figure E2Same as Fig. 12 but with COMBI data over the complete time range.

Author contributions

CB performed all calculations for this work and prepared the manuscript; SB and TW contributed to manuscript preparation; and TW supervised the study.

Competing interests

The contact author has declared that none of the authors has any competing interests.


Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Special issue statement

This article is part of the special issue “Analysis of atmospheric water vapour observations and their uncertainties for climate applications (ACP/AMT/ESSD/HESS inter-journal SI)”. It is not associated with a conference.


The combined microwave and near-infrared imager based product COMBI was initiated, funded, and provided by the Water Vapour (WV) project of the ESA Climate Change Initiative, with contributions from Brockmann Consult, Spectral Earth, Deutscher Wetterdienst, and the European Organisation for the Exploitation of Meteorological Satellites (EUMETSAT) Satellite Climate Facility on Climate Monitoring (CM SAF). The combined MW and NIR product will be owned by EUMETSAT CM SAF. In particular, we would like to thank Marc Schröder and the ESA CCI WV team for providing the CDR TCWV and common mask data.

Review statement

This paper was edited by Christof Lorenz and reviewed by two anonymous referees.


Anand, J. S., Monks, P. S., and Leigh, R. J.: An improved retrieval of tropospheric NO2 from space over polluted regions using an Earth radiance reference, Atmos. Meas. Tech., 8, 1519–1535,, 2015. a

Beirle, S., Lampel, J., Wang, Y., Mies, K., Dörner, S., Grossi, M., Loyola, D., Dehn, A., Danielczok, A., Schröder, M., and Wagner, T.: The ESA GOME-Evolution “Climate” water vapor product: a homogenized time series of H2O columns from GOME, SCIAMACHY, and GOME-2, Earth Syst. Sci. Data, 10, 449–468,, 2018. a

Bennartz, R. and Fischer, J.: Retrieval of columnar water vapour over land from backscattered solar radiation using the Medium Resolution Imaging Spectrometer, Remote Sens. Environ., 78, 274–283,, 2001. a

Boersma, K. F., Eskes, H. J., Richter, A., De Smedt, I., Lorente, A., Beirle, S., van Geffen, J. H. G. M., Zara, M., Peters, E., Van Roozendael, M., Wagner, T., Maasakkers, J. D., van der A, R. J., Nightingale, J., De Rudder, A., Irie, H., Pinardi, G., Lambert, J.-C., and Compernolle, S. C.: Improving algorithms and uncertainty estimates for satellite NO2 retrievals: results from the quality assurance for the essential climate variables (QA4ECV) project, Atmos. Meas. Tech., 11, 6651–6678,, 2018. a, b

Borger, C., Beirle, S., Dörner, S., Sihler, H., and Wagner, T.: Total column water vapour retrieval from S-5P/TROPOMI in the visible blue spectral range, Atmos. Meas. Tech., 13, 2751–2783,, 2020. a, b, c, d, e, f, g, h, i, j, k, l

Borger, C., Beirle, S., and Wagner, T.: Analysis of global trends of total column water vapour from multiple years of OMI observations, Atmos. Chem. Phys., 22, 10603–10621,, 2022. a, b

Borger, C., Beirle, S., and Wagner, T.: MPIC OMI Total Column Water Vapour (TCWV) Climate Data Record, Zenodo [data set],, 2023. a, b

Box, G. E. P., Jenkins, G. M., Reinsel, G. C., and Ljung, G. M.: Time Series Analysis: Forecasting and Control, Wiley Series in Probability and Statistics, Wiley & Sons, Incorporated, John, 5th Edn., ISBN 9781118674925, 2015. a

Cantrell, C. A.: Technical Note: Review of methods for linear least-squares fitting of data and application to atmospheric chemistry problems, Atmos. Chem. Phys., 8, 5477–5487,, 2008. a

Chan, K. L., Valks, P., Slijkhuis, S., Köhler, C., and Loyola, D.: Total column water vapor retrieval for Global Ozone Monitoring Experience-2 (GOME-2) visible blue observations, Atmos. Meas. Tech., 13, 4169–4193,, 2020. a

Danne, O., Falk, U., Preusker, R., Brockmann, C., Fischer, J., Hegglin, M., and Schröder, M.: ESA Water Vapour Climate Change Initiative (Water_Vapour_cci): Total Column Water Vapour monthly gridded data over land at 0.5 degree resolution, version 3.2, NERC EDS Centre for Environmental Data Analysis [data set],, 2022. a

Dirksen, R. J., Sommer, M., Immler, F. J., Hurst, D. F., Kivi, R., and Vömel, H.: Reference quality upper-air measurements: GRUAN data processing for the Vaisala RS92 radiosonde, Atmos. Meas. Tech., 7, 4463–4490,, 2014. a, b

Dupuy, E., Morino, I., Deutscher, N. M., Yoshida, Y., Uchino, O., Connor, B. J., De Mazière, M., Griffith, D. W. T., Hase, F., Heikkinen, P., Hillyard, P. W., Iraci, L. T., Kawakami, S., Kivi, R., Matsunaga, T., Notholt, J., Petri, C., Podolske, J. R., Pollard, D. F., Rettinger, M., Roehl, C. M., Sherlock, V., Sussmann, R., Toon, G. C., Velazco, V. A., Warneke, T., Wennberg, P. O., Wunch, D., and Yokota, T.: Comparison of XH2O Retrieved from GOSAT Short-Wavelength Infrared Spectra with Observations from the TCCON Network, Remote Sens., 8, 982,, 2016. a

Durre, I., Vose, R. S., and Wuertz, D. B.: Overview of the Integrated Global Radiosonde Archive, J. Climate, 19, 53–68,, 2006. a

Durre, I., Xungang, Y., Vose, R. S., Applequist, S., and Arnfield, J.: Integrated Global Radiosonde Archive (IGRA), Version 2, NOAA National Centers for Environmental Information [data set],, 2016. a

Durre, I., Yin, X., Vose, R. S., Applequist, S., and Arnfield, J.: Enhancing the Data Coverage in the Integrated Global Radiosonde Archive, J. Atmos. Ocean. Tech., 35, 1753–1770,, 2018. a, b, c

Fennig, K., Schröder, M., Andersson, A., and Hollmann, R.: A Fundamental Climate Data Record of SMMR, SSM/I, and SSMIS brightness temperatures, Earth Syst. Sci. Data, 12, 647–681,, 2020. a

Ferreira, A. P., Nieto, R., and Gimeno, L.: Completeness of radiosonde humidity observations based on the Integrated Global Radiosonde Archive, Earth Syst. Sci. Data, 11, 603–627,, 2019. a

Gaffen, D. J. and Elliott, W. P.: Column Water Vapor Content in Clear and Cloudy Skies, J. Climate, 6, 2278–2287,<2278:CWVCIC>2.0.CO;2, 1993. a, b

Gao, B.-C. and Kaufman, Y. J.: Water vapor retrievals using Moderate Resolution Imaging Spectroradiometer (MODIS) near-infrared channels, J. Geophys. Res.-Atmos., 108, 4389,, 2003. a

Gleisner, H., Lauritsen, K. B., Nielsen, J. K., and Syndergaard, S.: Evaluation of the 15-year ROM SAF monthly mean GPS radio occultation climate data record, Atmos. Meas. Tech., 13, 3081–3098,, 2020. a

Greene, W. H.: Econometric Analysis, Global Edition, Pearson Higher Education & Professional Group, Harlow, England, 8th Edn., Global Edition, ISBN 9781292231136, 2019. a

Grossi, M., Valks, P., Loyola, D., Aberle, B., Slijkhuis, S., Wagner, T., Beirle, S., and Lang, R.: Total column water vapour measurements from GOME-2 MetOp-A and MetOp-B, Atmos. Meas. Tech., 8, 1111–1133,, 2015. a

Held, I. M. and Soden, B. J.: Water Vapor Feedback and Global Warming, Annu. Revi. Energ. Env., 25, 441–475,, 2000. a

Hersbach, H., Bell, B., Berrisford, P., Biavati, G., Horányi, A., Muñoz Sabater, J., Nicolas, J., Peubey, C., Radu, R., Rozum, I., Schepers, D., Simmons, A., Soci, C., Dee, D., and Thépaut, J.-N.: ERA5 monthly averaged data on single levels from 1979 to present, Copernicus Climate Change Service (C3S) Climate Data Store (CDS) [data set],, 2019. a

Hersbach, H., Bell, B., Berrisford, P., Hirahara, S., Horányi, A., Muñoz-Sabater, J., Nicolas, J., Peubey, C., Radu, R., Schepers, D., Simmons, A., Soci, C., Abdalla, S., Abellan, X., Balsamo, G., Bechtold, P., Biavati, G., Bidlot, J., Bonavita, M., De Chiara, G., Dahlgren, P., Dee, D., Diamantakis, M., Dragani, R., Flemming, J., Forbes, R., Fuentes, M., Geer, A., Haimberger, L., Healy, S., Hogan, R. J., Hólm, E., Janisková, M., Keeley, S., Laloyaux, P., Lopez, P., Lupu, C., Radnoti, G., de Rosnay, P., Rozum, I., Vamborg, F., Villaume, S., and Thépaut, J.-N.: The ERA5 global reanalysis, Q. J. Roy. Meteor. Soc., 146, 1999–2049,, 2020. a, b, c

Kiehl, J. T. and Trenberth, K. E.: Earth's Annual Global Mean Energy Budget, B. Am. Meteorol. Soc., 78, 197–197,<0197:EAGMEB>2.0.CO;2, 1997. a

Kleipool, Q. L., Dobber, M. R., de Haan, J. F., and Levelt, P. F.: Earth surface reflectance climatology from 3 years of OMI data, J. Geophys. Res.-Atmos., 113, D18308,, 2008. a, b, c

Koelemeijer, R. B. A., de Haan, J. F., and Stammes, P.: A database of spectral surface reflectivity in the range 335–772 nm derived from 5.5 years of GOME observations, J. Geophys. Res.-Atmos., 108, 4070,, 2003. a

Kursinski, E. R., Hajj, G. A., Schofield, J. T., Linfield, R. P., and Hardy, K. R.: Observing Earth's atmosphere with radio occultation measurements using the Global Positioning System, J. Geophys. Res.-Atmos., 102, 23429–23465,, 1997. a

Lamsal, L. N., Krotkov, N. A., Vasilkov, A., Marchenko, S., Qin, W., Yang, E.-S., Fasnacht, Z., Joiner, J., Choi, S., Haffner, D., Swartz, W. H., Fisher, B., and Bucsela, E.: Ozone Monitoring Instrument (OMI) Aura nitrogen dioxide standard product version 4.0 with improved surface and cloud treatments, Atmos. Meas. Tech., 14, 455–479,, 2021. a

Lang, R., Williams, J. E., van der Zande, W. J., and Maurellis, A. N.: Application of the Spectral Structure Parameterization technique: retrieval of total water vapor columns from GOME, Atmos. Chem. Phys., 3, 145–160,, 2003. a

Levelt, P. F., van den Oord, G. H., Dobber, M. R., Malkki, A., Visser, H., de Vries, J., Stammes, P., Lundell, J. O., and Saari, H.: The ozone monitoring instrument, IEEE T. Geosci. Remote, 44, 1093–1101,, 2006. a, b

Levelt, P. F., Joiner, J., Tamminen, J., Veefkind, J. P., Bhartia, P. K., Stein Zweers, D. C., Duncan, B. N., Streets, D. G., Eskes, H., van der A, R., McLinden, C., Fioletov, V., Carn, S., de Laat, J., DeLand, M., Marchenko, S., McPeters, R., Ziemke, J., Fu, D., Liu, X., Pickering, K., Apituley, A., González Abad, G., Arola, A., Boersma, F., Chan Miller, C., Chance, K., de Graaf, M., Hakkarainen, J., Hassinen, S., Ialongo, I., Kleipool, Q., Krotkov, N., Li, C., Lamsal, L., Newman, P., Nowlan, C., Suleiman, R., Tilstra, L. G., Torres, O., Wang, H., and Wargan, K.: The Ozone Monitoring Instrument: overview of 14 years in space, Atmos. Chem. Phys., 18, 5699–5745,, 2018. a, b

Li, J., Wang, P., Han, H., Li, J., and Zheng, J.: On the assimilation of satellite sounder data in cloudy skies in numerical weather prediction models, J. Meteorol. Res., 30, 169–182,, 2016. a

Mears, C. A., Wang, J., Smith, D., and Wentz, F. J.: Intercomparison of total precipitable water measurements made by satellite-borne microwave radiometers and ground-based GPS instruments, J. Geophys. Res.-Atmos., 120, 2492–2504,, 2015. a, b, c

Mieruch, S., Noël, S., Bovensmann, H., and Burrows, J. P.: Analysis of global water vapour trends from satellite measurements in the visible spectral range, Atmos. Chem. Phys., 8, 491–504,, 2008. a

Munro, R., Lang, R., Klaes, D., Poli, G., Retscher, C., Lindstrot, R., Huckle, R., Lacan, A., Grzegorski, M., Holdak, A., Kokhanovsky, A., Livschitz, J., and Eisinger, M.: The GOME-2 instrument on the Metop series of satellites: instrument design, calibration, and level 1 data processing – an overview, Atmos. Meas. Tech., 9, 1279–1301,, 2016. a

Noël, S., Buchwitz, M., Bovensmann, H., Hoogen, R., and Burrows, J. P.: Atmospheric water vapor amounts retrieved from GOME satellite data, Geophys. Res. Lett., 26, 1841–1844,, 1999. a

Platt, U. and Stutz, J.: Differential Optical Absorption Spectroscopy: Principles and Applications, Physics of Earth and Space Environments, Springer Berlin Heidelberg,, 2008. a

Prais, S. J. and Winsten, C. B.: Trend Estimators and Serial Correlation, Cowles Commission Discussion Paper, 383, (last access: 10 July 2023), 1954. a

Randall, D. A., Wood, R. A., Bony, S., Colman, R., Fichefet, T., Fyfe, J., Kattsov, V., Pitman, A., Shukla, J., Srinivasan, J., Stouffer, R. J., Sumi, A., and Taylor, K. E.: Climate models and their evaluation, in: Climate change 2007: The physical science basis. Contribution of Working Group I to the Fourth Assessment Report of the IPCC (FAR), edited by: Solomon, S., Qin, D., Manning, M., Chen, Z., Marquis, M., Averyt, K., Tignor, M., and Miller, H. L., Cambridge University Press, 589–662, ISBN 978-0-521-88009-1, 2007. a

Rehfeld, K., Marwan, N., Heitzig, J., and Kurths, J.: Comparison of correlation analysis techniques for irregularly sampled time series, Nonlin. Processes Geophys., 18, 389–404,, 2011. a

Rosenkranz, P. W.: Retrieval of temperature and moisture profiles from AMSU-A and AMSU-B measurements, IEEE T. Geosci. Remote, 39, 2429–2435,, 2001. a

Schenkeveld, V. M. E., Jaross, G., Marchenko, S., Haffner, D., Kleipool, Q. L., Rozemeijer, N. C., Veefkind, J. P., and Levelt, P. F.: In-flight performance of the Ozone Monitoring Instrument, Atmos. Meas. Tech., 10, 1957–1986,, 2017. a, b

Schlüssel, P., Hultberg, T. H., Phillips, P. L., August, T., and Calbet, X.: The operational IASI Level 2 processor, Adv. Space Res., 36, 982–988,, 2005. a

Schneider, A., Borsdorff, T., aan de Brugh, J., Aemisegger, F., Feist, D. G., Kivi, R., Hase, F., Schneider, M., and Landgraf, J.: First data set of H2O/HDO columns from the Tropospheric Monitoring Instrument (TROPOMI), Atmos. Meas. Tech., 13, 85–100,, 2020. a

Schneider, M. and Hase, F.: Optimal estimation of tropospheric H2O and δD with IASI/METOP, Atmos. Chem. Phys., 11, 11207–11220,, 2011. a

Schrijver, H., Gloudemans, A. M. S., Frankenberg, C., and Aben, I.: Water vapour total columns from SCIAMACHY spectra in the 2.36 μm window, Atmos. Meas. Tech., 2, 561–571,, 2009. a

Schröder, M., Danne, O., Falk, U., Niedorf, A., Preusker, R., Trent, T., Brockmann, C., Fischer, J., Hegglin, M., Hollmann, R., and Pinnock, S.: A combined high resolution global TCWV product from microwave and near infrared imagers – COMBI, Satellite Application Facility on Climate Monitoring (CM SAF) [data set],, 2023. a, b, c, d

Sohn, B.-J. and Bennartz, R.: Contribution of water vapor to observational estimates of longwave cloud radiative forcing, J. Geophys. Res.-Atmos., 113, D20107,, 2008. a, b

Stevens, B. and Bony, S.: What Are Climate Models Missing?, Science, 340, 1053–1054,, 2013. a

Sulla-Menashe, D., Gray, J. M., Abercrombie, S. P., and Friedl, M. A.: Hierarchical mapping of annual global land cover 2001 to present: The MODIS Collection 6 Land Cover product, Remote Sens. Environ., 222, 183–194,, 2019. a

Susskind, J., Barnet, C., and Blaisdell, J.: Retrieval of atmospheric and surface parameters from AIRS/AMSU/HSB data in the presence of clouds, IEEE T. Geosci. Remote, 41, 390–409,, 2003. a

Tilstra, L. G., Tuinder, O. N. E., Wang, P., and Stammes, P.: Surface reflectivity climatologies from UV to NIR determined from Earth observations by GOME-2 and SCIAMACHY, J. Geophys. Res.-Atmos., 122, 4084–4111,, 2017. a

Trenberth, K. E., Fasullo, J. T., and Kiehl, J.: Earth's Global Energy Budget, B. Am. Meteorol. Soc., 90, 311–324,, 2009. a

Veefkind, J., Aben, I., McMullan, K., Förster, H., de Vries, J., Otter, G., Claas, J., Eskes, H., de Haan, J., Kleipool, Q., van Weele, M., Hasekamp, O., Hoogeveen, R., Landgraf, J., Snel, R., Tol, P., Ingmann, P., Voors, R., Kruizinga, B., Vink, R., Visser, H., and Levelt, P.: TROPOMI on the ESA Sentinel-5 Precursor: A GMES mission for global observations of the atmospheric composition for climate, air quality and ozone layer applications, Remote Sens. Environ., 120, 70–83,, 2012. a

Wagner, T., Heland, J., Zöger, M., and Platt, U.: A fast H2O total column density product from GOME – Validation with in-situ aircraft measurements, Atmos. Chem. Phys., 3, 651–663,, 2003. a

Wagner, T., Beirle, S., Sihler, H., and Mies, K.: A feasibility study for the retrieval of the total column precipitable water vapour from satellite observations in the blue spectral range, Atmos. Meas. Tech., 6, 2593–2605,, 2013. a, b

Wang, H., Souri, A. H., González Abad, G., Liu, X., and Chance, K.: Ozone Monitoring Instrument (OMI) Total Column Water Vapor version 4 validation and applications, Atmos. Meas. Tech., 12, 5183–5199,, 2019. a

Weatherhead, E. C., Reinsel, G. C., Tiao, G. C., Meng, X.-L., Choi, D., Cheang, W.-K., Keller, T., DeLuisi, J., Wuebbles, D. J., Kerr, J. B., Miller, A. J., Oltmans, S. J., and Frederick, J. E.: Factors affecting the detection of trends: Statistical considerations and applications to environmental data, J. Geophys. Res.-Atmos., 103, 17149–17161,, 1998. a

Weaver, C. P. and Ramanathan, V.: Deductions from a simple climate model: Factors governing surface temperature and atmospheric thermal structure, J. Geophys. Res.-Atmos., 100, 11585–11591,, 1995. a

Wentz, F. J.: A well-calibrated ocean algorithm for special sensor microwave/imager, J. Geophys. Res.-Oceans, 102, 8703–8718,, 1997. a

Wentz, F. J.: A 17-Yr Climate Record of Environmental Parameters Derived from the Tropical Rainfall Measuring Mission (TRMM) Microwave Imager, J. Climate, 28, 6882–6902,, 2015.  a, b

Xue, Y., Li, J., Menzel, W. P., Borbas, E., Ho, S.-P., Li, Z., and Li, J.: Characteristics of Satellite Sampling Errors in Total Precipitable Water from SSMIS, HIRS, and COSMIC Observations, J. Geophys. Res.-Atmos., 124, 6966–6981,, 2019. a, b

Short summary
This study presents a long-term data set of monthly mean total column water vapour (TCWV) based on measurements of the Ozone Monitoring Instrument (OMI) covering the time range from January 2005 to December 2020. We describe how the TCWV values are retrieved from UV–Vis satellite spectra and demonstrate that the OMI TCWV data set is in good agreement with various different reference data sets. Moreover, we also show that it fulfills typical stability requirements for climate data records.
Final-revised paper