A 16-year global climate data record of total column water vapour generated from OMI observations in the visible blue spectral range

Borger, Christian; Beirle, Steffen; Wagner, Thomas

doi:https://doi.org/10.5194/essd-15-3023-2023

Articles | Volume 15, issue 7

https://doi.org/10.5194/essd-15-3023-2023

Special issue:

Analysis of atmospheric water vapour observations and their...

https://doi.org/10.5194/essd-15-3023-2023

Articles | Volume 15, issue 7

Data description paper

14 Jul 2023

Data description paper |

| 14 Jul 2023

A 16-year global climate data record of total column water vapour generated from OMI observations in the visible blue spectral range

Christian Borger, Steffen Beirle, and Thomas Wagner

Abstract

We present a long-term data set of $1^{\circ} \times 1^{\circ}$ monthly mean total column water vapour (TCWV) based on global measurements of the Ozone Monitoring Instrument (OMI) covering the time range from January 2005 to December 2020.

In comparison to the retrieval algorithm of Borger et al. (2020), several modifications and filters have been applied accounting for instrumental issues (such as OMI's “row anomaly”) or the inferior quality of solar reference spectra. For instance, to overcome issues related to low-quality reference spectra, the daily solar irradiance spectrum is replaced by an annually varying mean earthshine radiance obtained in December over Antarctica. For the TCWV data set, we only consider measurements with an effective cloud fraction less than 20 %, an air mass factor (AMF) greater than 0.1, a snow- and ice-free ground pixel, and an OMI row that is not affected by the row anomaly over the complete time range of the data set. The individual TCWV measurements are then gridded to a regular $1^{\circ} \times 1^{\circ}$ lattice, from which the monthly means are calculated.

The investigation of sampling errors in the OMI TCWV data set shows that these are dominated by the clear-sky bias and cause on average deviations of around −10 %, which is consistent with the findings of previous studies. However, the spatiotemporal sampling errors and those due to the row-anomaly filter are negligible.

In a comprehensive intercomparison study, we demonstrate that the OMI TCWV data set is in good agreement with the global reference data sets of ERA5 (fifth-generation ECMWF atmospheric reanalysis), RSS SSM/I (Remote Sensing Systems Special Sensor Microwave Imager), and CM SAF/CCI TCWV-global (COMBI): over ocean the orthogonal distance regressions indicate slopes close to unity with very small offsets and high coefficients of determination of around 0.96. However, over land, distinctive positive deviations of more than +10 kg m⁻² are obtained for high TCWV values. These overestimations are mainly due to extreme overestimations of high TCWV values in the tropics, likely caused by uncertainties in the retrieval input data (surface albedo, cloud information) due to frequent cloud contamination in these regions. Similar results are found from intercomparisons with in situ radiosonde measurements from version 2 of the Integrated Global Radiosonde Archive (IGRA2) data set. Nevertheless, for TCWV values smaller than 25 kg m⁻², the OMI TCWV data set shows very good agreement with the global reference data sets. Furthermore, a temporal stability analysis proves that the OMI TCWV data set is consistent with the temporal changes in the reference data sets and shows no significant deviation trends.

As the TCWV retrieval can be easily applied to further satellite missions, additional TCWV data sets can be created from past missions, such as the Global Ozone Monitoring Experiment-1 (GOME-1) or the SCanning Imaging Absorption spectroMeter for Atmospheric CartograpHY (SCIAMACHY); under consideration of systematic differences (e.g. due to different observation times), these data sets can be combined with the OMI TCWV data set in order to create a data record that would cover a time span from 1995 to the present. Moreover, the TCWV retrieval will also work for all missions dedicated to NO₂ in the future, such as Sentinel-5 on MetOp-SG.

The Max Planck Institute for Chemistry (MPIC) OMI total column water vapour (TCWV) climate data record (CDR) is available at https://doi.org/10.5281/zenodo.7973889 (Borger et al., 2023).

Download & links

How to cite.

Received: 22 Sep 2021 – Discussion started: 16 Dec 2021 – Revised: 26 May 2023 – Accepted: 13 Jun 2023 – Published: 14 Jul 2023

1 Introduction

Water vapour is the most important natural greenhouse gas in Earth's atmosphere: it alters the Earth's energy balance by playing a dominant role in the atmospheric thermal opacity and has a major amplifying influence on several factors of anthropogenic climate change through various feedback mechanisms (Kiehl and Trenberth, 1997; Randall et al., 2007; Trenberth et al., 2009). Although water vapour is of great importance for processes at a global and climate scale, the complex interactions between the components of the hydrological cycle (including water vapour) and the atmosphere are still one of major challenges of climate modelling and for a better understanding of the Earth's climate system in general (Stevens and Bony, 2013). Moreover, the amount and distribution of water vapour are highly variable; thus, for global observations, these must also be measured with high spatiotemporal resolution. Considering that changes in water vapour are closely linked to changes in temperature via the Clausius–Clapeyron equation (i.e. for typical atmospheric conditions, a temperature increase of 1 K yields an increase in the water vapour concentration of approximately 6 %–7 %; Held and Soden, 2000), it is essential to accurately monitor the variability and change in the amount and distribution of water vapour on a global scale.

To observe the water vapour distribution on a global scale, satellite measurements provide invaluable information. Due to its spectroscopic absorption properties, water vapour can be retrieved from satellite spectra in various different spectral ranges, ranging from the radio (e.g. Kursinski et al., 1997), microwave (e.g. Rosenkranz, 2001), thermal infrared (e.g. Susskind et al., 2003; Schlüssel et al., 2005; Schneider and Hase, 2011), short, and near-infrared (e.g. Bennartz and Fischer, 2001; Gao and Kaufman, 2003; Schrijver et al., 2009; Dupuy et al., 2016; Schneider et al., 2020) to the visible spectral region (e.g. Noël et al., 1999; Lang et al., 2003; Wagner et al., 2003; Grossi et al., 2015).

Within the past decade, substantial progress has been made to retrieve total column water vapour (TCWV) within the visible blue spectral range (e.g. Wagner et al., 2013; Wang et al., 2019; Borger et al., 2020; Chan et al., 2020), enabling the use of measurements from satellite instruments like the TROPOspheric Monitoring Instrument (TROPOMI; Veefkind et al., 2012) and even the Global Ozone Monitoring Experiment-2 (GOME-2; Munro et al., 2016) for which so far only retrievals in the visible red and near-infrared spectral range have been available. In comparison to these aforementioned spectral ranges, TCWV retrievals in the visible “blue” have several advantages, such as similar sensitivity for the near-surface layers over land and ocean due to a more homogenous surface albedo distribution than at longer wavelengths (Koelemeijer et al., 2003; Wagner et al., 2013; Tilstra et al., 2017). Moreover, any satellite mission dedicated to NO₂ monitoring covers this spectral range.

For investigations of climate change or global warming, the Ozone Monitoring Instrument (OMI; Levelt et al., 2006, 2018) onboard NASA's Aura satellite is particularly interesting; launched in July 2004, OMI offers an almost continuous measurement data record of more than 16 years up until today. In this study, we make use of this long-term data record and retrieve total column water vapour (TCWV) from OMI's measurements in the visible blue spectral range in order to generate a climate data set.

The paper is structured as follows: in Sect. 2, we describe the data set generation and briefly explain the retrieval methodology and the applied modifications in comparison to the TCWV retrieval from Borger et al. (2020); in Sect. 3, we investigate potential sampling errors and how the limitation to clear-sky satellite observations influences the representativeness of the TCWV values of the data set; in Sect. 4 and Sect. 5, we characterize the data set via an intercomparison with the various global reference TCWV data sets and with IGRA2 radiosonde observations, respectively; in Sect. 6, we analyse the temporal stability of the OMI TCWV data set; and, finally, we briefly summarize our results in Sect. 8 and draw conclusions.

2 The Max Planck Institute for Chemistry (MPIC) OMI TCWV data set

2.1 Ozone Monitoring Instrument (OMI)

OMI (Levelt et al., 2006, 2018), onboard NASA's Aura satellite, is a nadir-looking UV–Vis push-broom spectrometer that measures the Earth's radiance spectrum from 270 to 500 nm with a spectral resolution of approximately 0.5 nm following a Sun-synchronous orbit with an Equator crossing time of around 13:30 LT. The instrument employs a 2D charge-coupled device (CCD) consisting of 60 across-track rows that cover a total swath width of approximately 2600 km with a spatial resolution of 24 km×13 km at nadir increasing to 24 km×160 km towards the edges of the swath. Launched in July 2004, OMI provides an almost continuous measurement record until today with more than 100 000 orbits.

However, since July 2007 OMI has suffered from the so-called “row anomaly” (RA), a dynamic artefact causing abnormally low radiance readings in the across-track rows, i.e. several rows of the CCD detector receive less light from the Earth, whereas some other rows appear to receive sunlight scattered off a peeling piece of spacecraft insulation. One plausible explanation for these effects is a partial obscuration of the entrance port by insulating layer material that may have come loose on the outside of the instrument (Schenkeveld et al., 2017; Boersma et al., 2018). Thus, in this study, the affected measurements are excluded for the entire period of the data set.

2.2 Methodology and modifications of the spectral analysis

To retrieve total column water vapour (TCWV) from UV–Vis spectra from OMI, we apply the TCWV retrieval of Borger et al. (2020) developed for TROPOMI onboard Sentinel-5P. The retrieval is based on the principles of differential optical absorption spectroscopy (DOAS; Platt and Stutz, 2008) with a fit window between 430 and 450 nm, and it consists of the common two-step DOAS approach. In the first step, the absorption along the light path is calculated as follows:

\begin{matrix} (1) & \ln (\frac{I}{I_{0}}) \approx - \sum_{i} σ_{i} (λ) \cdot {SCD}_{i} + Ψ + Φ . \end{matrix}

Here, I₀ and I represent the solar irradiance and the radiance backscattered from Earth, respectively; i denotes the index of a trace gas of interest; σ_i(λ) is the respective molecular absorption cross-section of the aforementioned trace gas; ${SCD}_{i} = \int_{s} c_{i} d s$ is the aforementioned trace gas's concentration integrated along the light path s (the so-called “slant column density”); Ψ represents summarizing terms accounting for the Ring effect and additional pseudo-absorbers; and Φ is a closure polynomial accounting for Mie and Rayleigh scattering as well as parts of the low-frequency contributions of the trace gas cross sections.

In the second step, to convert the slant column density (SCD) to a vertical column density (VCD), we apply the so-called air mass factor (AMF):

\begin{matrix} (2) & VCD = \frac{SCD}{AMF} . \end{matrix}

The AMF accounts for the non-trivial effects of atmospheric radiative transfer and depends on the conditions of the retrieval scenario (i.e. aerosol and cloud effects, viewing geometry, and surface properties) as well as the profile shape of the trace gas of interest. The algorithm of Borger et al. (2020) makes use of the relation between the H₂O VCD and the profile shape, and it iteratively finds the optimal VCD by assuming an exponential water vapour profile shape.

For the application of the algorithm to OMI measurements, several modifications had to be applied to the algorithm of Borger et al. (2020). For climate studies such as trend analyses, it is necessary to provide a consistent data record. Thus, all rows that have ever been affected by the so-called row anomaly are excluded from the data set for the complete time series, which corresponds to approximately half of the OMI swath. Furthermore, instead of a daily solar irradiance, an earthshine radiance is used as the reference spectrum within the DOAS analysis. The rationale for using an earthshine radiance over a solar irradiance is as follows:

The daily OMI solar irradiance spectra (OML1BIRR version 3) are very noisy and have several gaps, causing high H₂O SCD fit errors and, thus, leading to an overall poor quality of the H₂O VCD data set.
By using an annual mean solar irradiance spectrum from the year 2005 (also used during the QA4ECV project; Boersma et al., 2018), a good fit quality can be obtained; however, OMI is also suffering from degradation effects (Schenkeveld et al., 2017). Thus, for the case of climate trend analyses, it will be almost impossible to disentangle if a trend signal originates from the spectral degradation of OMI or indeed from a geophysical trend (see also Fig. A1). By using an earthshine radiance as the reference spectrum, these degradation effects will largely cancel out.
When using an earthshine radiance as the reference spectrum, the across-track biases within the OMI swath are also strongly reduced (see Fig. 1c); consequently, no destriping is necessary during post-processing (see also Anand et al., 2015).
However, a disadvantage of the use of earthshine spectra is that the retrieved H₂O slant columns do not represent absolute slant columns, as the earthshine reference spectra also contain H₂O absorptions. Hence, a slant column representative of the chosen reference sector has to be added to the retrieved values.

For the creation of annual earthshine reference spectra, we selected the Antarctic continent as the reference sector (high surface albedo due to snow and ice cover) and the time period of December (i.e. during austral summer), yielding a relatively high signal-to-noise ratio for our radiance measurements despite large solar zenith angles. Furthermore, only pixels above an altitude of 2000 m above sea level were selected; as the air temperatures are very low in this altitude range, the water vapour concentrations are very low as well, thereby representing a reference atmosphere that is as dry as possible (i.e. the reference SCD or rather the absolute value of its uncertainty has to be as low as possible). Moreover, to avoid the inclusion of noisy measurements (in particular from the descending part of the OMI orbit), only pixels with a solar zenith angle (SZA) below 80 ^∘ are considered. From these measurements, we calculate the monthly mean radiance for December for each year for every OMI row and then use the resulting reference spectra for the retrievals of the upcoming year.

Figure 1 illustrates the effect of different reference spectra on the H₂O SCD distribution for an exemplary orbit. In particular, distinctive stripe patterns are prominent when using the daily solar irradiance as the reference spectrum (Fig. 1a). Although the usage of the annual mean solar irradiance (Fig. 1b) can reduce the strength of the stripes, they are still clearly visible. In contrast, no across-track stripes are detectable for the case of the earthshine reference, and the SCDs are also lower overall due to the H₂O absorption in the earthshine reference (Fig. 1c).

Further details about destriping in general and a comparison of the temporal behaviour of the irradiance-based and earthshine SCD are available in Appendix A.

https://essd.copernicus.org/articles/15/3023/2023/essd-15-3023-2023-f01

Figure 1Exemplary orbit (Orbit 34382, 1 January 2011) showing the impact of different reference spectra on the OMI H₂O SCD distribution: (a) daily solar irradiance, (b) annual mean solar irradiance, and (c) monthly mean earthshine reference.

2.3 VCD conversion and data set generation

To account for the potential water vapour contamination within the earthshine reference spectra, the SCDs based on the earthshine reference have to be corrected for the corresponding offset. In this study, we determine this offset, ΔSCD, for each row based on the difference between the earthshine-based SCDs and solar-irradiance-based SCDs for the first 5 years of OMI operation (see Appendix A). Equation (2) can then be rewritten as follows:

\begin{matrix} (3) & VCD = \frac{eSCD + Δ SCD}{AMF}, \end{matrix}

where eSCD denotes the SCD derived using the earthshine reference.

The AMFs are calculated as described in Borger et al. (2020). For the determination of the AMF, additional information about the retrieval scenario, like cloud cover and surface properties, is necessary. We use the cloud information from the OMI L2 NO₂ product (OMNO2; Lamsal et al., 2021) and the modified OMI surface albedo version of Kleipool et al. (2008) as described in Borger et al. (2020). We also tested the surface albedo information from the OMNO2 product; however, within the framework of a trend analysis study (Borger et al., 2022), we observed spatial artefacts in the surface albedo trends that likely arise from the use of an older version of the MODIS data for the albedo calculation (Lok Lamsal, personal communication, 2021). The distribution of TCWV trends is mainly determined by the trends in the SCD. The albedo or AMF trends usually only determine whether the trend signal becomes stronger or weaker, but this only affects trends over land, as an albedo climatology from Kleipool et al. (2008) is used over ocean. As the ice flags from the OMI processor sometimes indicate snow/ice-free surfaces over Antarctica or Greenland, we additionally use the monthly mean sea-ice cover information from ERA5 (fifth-generation ECMWF atmospheric reanalysis; Hersbach et al., 2020) and the annual mean land cover information from MODIS Aqua (Sulla-Menashe et al., 2019).

To create the OMI TCWV data set, we have chosen the time range from January 2005 to December 2020 and only include observations with an effective cloud fraction <20 % and AMF >0.1. Furthermore, the pixels have to be free of snow and ice and must not be affected by the row anomaly. Hence, while about 50 % of the orbit is missing because of the RA filter, the remaining data still cover an “effective” swath of about 1300 km; this is larger than the swaths of GOME-1, the SCanning Imaging Absorption spectroMeter for Atmospheric CartograpHY (SCIAMACHY), or GOME-2A (all about 1300 km) and of the order of the Special Sensor Microwave/Imager (SSM/I; about 1394 km). Thus, OMI still achieves complete coverage of the Earth about every 2 to 3 calendar days, which should provide enough observational data for good representativeness in the case of a monthly mean (see also Appendix C and the good agreement with the reference data in Sect. 4). In total, this leaves about 30 % of TCWV data from an RA-filtered orbit and about 12 % of data from a complete orbit. The results of every orbit are then gridded to a $1^{\circ} \times 1^{\circ}$ lattice for every day. From these daily grids, the monthly mean H₂O VCD distributions are then calculated, ensuring that a continuous TCWV time series is available for as many grid cells as possible.

https://essd.copernicus.org/articles/15/3023/2023/essd-15-3023-2023-f02

Figure 2Global mean OMI H₂O VCD distribution from 2005 to 2020 based on the OMI analysis using earthshine reference spectra and corrected for the H₂O SCD bias. Areas with no valid values are coloured grey.

Figure 2 shows the global mean OMI H₂O VCD averaged over the complete time range of the TCWV data set. The resulting distribution demonstrates that the retrieval is capable of capturing the macroscale water vapour patterns, like high VCD values in the tropics (in particular over the Maritime Continent) and low values towards the polar regions, but also characteristic regional patterns, like the South Pacific convergence zone.

3 Sampling errors and clear-sky bias

Although satellite observations enable the analysis of trace gas concentrations on a global scale, a fundamental problem is that a satellite measurement is typically only taken once a day for one location. Furthermore, satellite measurements are usually only available under cloud-free conditions, especially in the visible or infrared spectral range, and thus no continuous time series is guaranteed. Consequently, they cannot provide a complete picture of geophysical variability, which leads to sampling errors in the calculation of averaged values (e.g. monthly means).

Moreover, the following question arises: to what extent does the limitation to cloud-free pixels influence the monthly averages determined from the OMI satellite measurements (i.e. whether a so-called “clear-sky bias” exists in the OMI TCWV data set)? Gaffen and Elliott (1993) investigated this bias using radiosonde ascents and found that the TCWV is about 0 %–15 % lower under cloud-free conditions than under cloudy conditions. Similarly, Sohn and Bennartz (2008) found a clear-sky bias of about 10 % between the Medium Resolution Imaging Spectrometer (MERIS) and the Advanced Microwave Scanning Radiometer for EOS (AMSR-E).

To estimate the sampling errors, we follow the methods of Xue et al. (2019) and Gleisner et al. (2020): we choose hourly resolved ERA5 data with a spatial resolution of $0.25^{\circ} \times 0.25^{\circ}$ as the reference data and collocate the ERA5 data with OMI overpass times. These data are then resampled to the $1^{\circ} \times 1^{\circ}$ resolution of the OMI TCWV data set and the monthly averages are calculated (TCWV_sampled). We then take the complete, original ERA5 data, resample them to the same spatial resolution, and calculate monthly means from these data (TCWV_true). The difference between the two data sets then represents the sampling error:

\begin{matrix} (4) & ε_{sampling} = {TCWV}_{sampled} - {TCWV}_{true} . \end{matrix}

With this definition, the sampling error summarizes the uncertainties due to gaps in the swath, temporal differences, or missing data (e.g. due to clouds) (Xue et al., 2019).

https://essd.copernicus.org/articles/15/3023/2023/essd-15-3023-2023-f03

Figure 3Global distributions of the mean sampling errors derived from monthly mean sampling differences for the time range from January 2005 to December 2020. Panel (a) depicts absolute sampling error (i.e. ε_sampling) and panel (b) shows relative sampling error (i.e. $ε_{sampling} / {TCWV}_{true}$ ). Grid cells for which no data are available are coloured grey.

Figure 3 shows the mean absolute and relative sampling errors for the complete time range of the OMI TCWV data set (January 2005 to December 2020). Overall, it can be seen that most deviations are negative, i.e. the actual TCWV is underestimated. Regarding the absolute deviations, the strongest deviations can be seen in the area of storm tracks in the mid-latitudes (e.g. North Atlantic) and the polar regions, with values of around −5 kg m⁻². The smallest deviations are found in the quasi-permanent cloud-free regions in the subtropics. As expected, the relative differences increase from the Equator towards the poles due to the decreasing TCWV values and reach values stronger than −30 %.

To investigate the extent to which these deviations are related to the clear-sky bias, we proceed similarly to the calculation of the sampling error: we collocate the ERA5 data to the OMI overpass time and once apply a cloud filter (effective cloud fraction < 20 %) and once not. We then resample both data sets to $1^{\circ} \times 1^{\circ}$ and calculate monthly means. The difference between both data sets then represents the clear-sky bias:

\begin{matrix} (5) & ε_{clear} = {TCWV}_{clear} - {TCWV}_{all} . \end{matrix}

https://essd.copernicus.org/articles/15/3023/2023/essd-15-3023-2023-f04

Figure 4Global distributions of the absolute differences (ε_clear; a, c, e, g) and relative differences ( $ε_{clear} / {TCWV}_{all}$ ; b, d, f, h) of the mean differences between clear-sky and all-sky ERA5 based on the OMI cloud information for winter (DJF; a, b), spring (MAM; c, d), summer (JJA; e, f), and autumn (SON; g, h) for the time range from January 2005 to December 2020. Grid cells for which no data are available are coloured grey

To determine seasonal structures, the global distributions of the absolute and relative clear-sky bias for the different seasons were determined from the monthly differences (see Fig. 4). Overall, the distributions of the clear-sky bias correspond very closely to the distributions of the sampling error, with respect to both strength and pattern. Moreover, the absolute and relative deviations show only slight changes between the different seasons.

https://essd.copernicus.org/articles/15/3023/2023/essd-15-3023-2023-f05

Figure 5Distributions of the absolute differences (ε_sampling; a) and relative differences ( $ε_{sampling} / {TCWV}_{true}$ ; b) of the monthly mean differences between clear-sky and all-sky ERA5 data based on the OMI cloud information. The solid and dashed orange lines indicate the mean and the median of the distributions, respectively.

A 16-year global climate data record of total column water vapour generated from OMI observations in the visible blue spectral range

2.1 Ozone Monitoring Instrument (OMI)

2.2 Methodology and modifications of the spectral analysis

2.3 VCD conversion and data set generation

4.1 Intercomparison with RSS SSM/I

4.2 Intercomparison with ERA5

4.3 Intercomparison with COMBI