The GEWEX Water Vapor Assessment archive of water vapour products from satellite observations and reanalyses

. The Global Energy and Water cycle Exchanges (GEWEX) Data and

temperature data records covering the period 1988-2009. The G-VAP data archive is referenced under the following digital object identifier (doi): http://dx.doi.org/10.5676/EUM_SAF_CM/GVAP/V001. Within G-VAP, the characterisation of water vapour products is, among other ways, achieved through intercomparisons of the considered data records, as a whole and grouped into three classes of predominant retrieval condition: clear-sky, cloudy-sky and all-sky. Associated results are shown using the 22 TCWV data records. The standard deviations among the 22 TCWV data records have been analysed and 5 exhibit distinct maxima over central Africa and the tropical warm pool (in absolute terms) as well as over the poles and mountain regions (in relative terms). The variability in TCWV within each class can be large and prohibits conclusions on systematic differences in TCWV between the classes.

Introduction
Water vapour is the most important natural greenhouse gas and the dominant source of infrared opacity in the clear-sky 10 atmosphere (Trenberth et al., 2007). In addition, fast acting water vapour feedbacks constitute a strong amplification mechanism for anthropogenic climate change (e.g., Held and Soden, 2000), thus, making water vapour also a key parameter for the energy budget of the Earth and consequently also for climate change analysis. Furthermore, the interactions of water vapour with other components of the climate system such as clouds and precipitation are still not fully understood.
Analysing recent decades of global water vapour distribution and variability is expected to help extend our understanding of 15 how the climate system responds to increasing greenhouse gas concentrations. Due to global coverage and observing periods approaching 40 years satellite-based water vapour data records are adequate sources of information to address the abovementioned challenges and to analyse the Earth's climate.
To date, a large variety of satellite-based water vapour data records are available. Information on such records is provided by the ECV (see Table A1 in the Appendix for a list of abbreviations) inventory (http://climatemonitoring.info/ecvinventory), 20 the Climate Data Guide (https://climatedataguide.ucar.edu/), and the Advancing Reanalysis web portal at http://reanalyses.org/. Without proper background information and understanding of the limitations of available data records, these data may be utilised incorrectly or misinterpreted. The need for quality assessments of ECV CDRs is part of the GCOS guidelines for the generation of data products. Assessments in general provide an overview of available data records and enable users to judge the quality and fitness for purpose of CDRs by informing them about the strengths and weaknesses of 25 existing and readily available records. With this in mind, GDAP initiated G-VAP whose major purpose is to quantify the current state of the art in water vapour products being constructed for climate applications and to support the selection process by GDAP. Within G-VAP, efforts started with an inventory of freely available water vapour data records. Overview tables for satellite, reanalyses, in-situ and ground-based data records are provided at http://gewex-vap.org/?page_id=13.
Satellite and reanalyses products with a minimum length of 10 years are basic elements of the G-VAP data archive. This 30 archive forms the basis for scientific analysis in the characterisation effort of satellite and reanalysis data records and results from such analysis have already been published (Schröder et al. 2016(Schröder et al. , 2017a. The G-VAP data archive complements the archive produced by the GEWEX Cloud Assessment (Stubenrauch et al., 2012). The Cloud Assessment data archive contains multi-decadal information on a large number of cloud properties from multiple platforms and is available from http://climserv.ipsl.polytechnique.fr/gewexca/index-2.html.
In Sect. 2, an overview of previous and currently available satellite sensors together with a brief outlook on upcoming missions relevant for the retrieval of water vapour is presented. The focus of Sect. 3 is a summary of existing satellite and 5 reanalyses data records. The G-VAP data archive is introduced in Sect. 4 which is based on a subset of data records introduced in Sect. 3. Sect. 4 starts with a brief introduction of all data records included in the archive. Then, the processing of the data records and the archive structure are explained. The following section includes exemplary results from the intercomparison of the data records from the G-VAP data archive. Here, TCWV data records with temporal coverage from 2003 to 2008 are considered so that all data records of the archive are included in the intercomparison. Conclusions are given 10 in Sec. 7, and Appendix A provides a list of all abbreviations used.

Overview of available satellite sensors
In this section, background information on the wide variety of sensors that measure atmospheric water vapour is provided.
Only sensors that have a greater than 10-year record and that cover (near) global scales, are discussed. These are the types of sensors used to create global CDRs of water vapour. 15 Generally water vapour sensors are deployed on low-Earth orbiting and geostationary satellites. Sensors that provide (near) global coverage typically operate on polar orbiting platforms in a sun-synchronous orbit. Such sensors provide global coverage with one day-time (at a particular local time) and one night-time overpass (12 hours later). Instruments classified as sounders carry several channels distributed about a water vapour absorption line to retrieve the vertical profile of water vapour. Instruments classified as imagers might also have channels clustered about an absorption line, but the primary 20 purpose of an imager is to sense the surface or cloud tops. Imagers are generally restricted to only retrieving TCWV.
The term "profile" usually implies the water vapour amount (mixing ratio) on a given set of pressure levels, such as those measured by radiosondes. Satellite sounding instruments respond to radiation from a great depth of the atmosphere as depicted by the instrument weighting function, so the retrieval of atmospheric layers is the natural unit here. These layers might be interpolated to pressure levels to compare with, for instance, a radiosonde or a model, but users should remain 25 aware of the broad vertical layer over which satellite sounders nominally average. This section approaches the overview of sensors from the standpoint of where and what they sample, and the pros and cons of each sensor from a user perspective. Chronological listings are readily available, for instance in Kämpfer (2012; Figure   9.1). A recent overview of sensors is also provided by Wulfmeyer et al. (2015). The information provided here is a snapshot in 2017, but radiance records and sensor intercalibration continue to progress, and algorithm improvements can expand the 30 yield and performance of remote retrievals of water vapour. This list is not meant to be exhaustive, but serves to orient the user to the major sensors supporting the water vapour CDR and their pros and cons. Sensors based on limb sounding techniques that focus on the upper atmosphere are not considered in this report. Information on these techniques and associated data records can found at the SPARC Water Vapour Assessment 2 web portal: http://www.sparcclimate.org/activities/water-vapour/.
There are a wide variety of water vapour sensors currently operating, and for climate research the sensors change and vary through time. Understanding which sensors were operating at any given time period is a major endeavour. The World 5 Meteorological Organization has created an online tool which makes this task much more feasible. The Observing Systems Capability Analysis and Review Tool is maintained at http://www.wmo-sat.info/oscar/.
In Table 1, a summary of the main sensors used for global water vapour CDRs is presented.

Passive microwave sensors
Passive microwave sensors are typically classified as imagers and sounders. Some instrument names indicate the principal 10 mission of the sensor, e.g. SSM/I, its successor SSMIS or AMSU. Regardless of the classification of the sensor, both imagers and sounders allow water vapour retrievals in clear and cloudy skies, but not in the presence of strong scattering by hydrometeors like during heavy precipitation events.
Conical scanning microwave imagers are typically configured at an Earth incidence angle of about 53 degrees. They have the advantage of constant spatial resolution across the scan, and constant sensitivity to the atmosphere via the same geometric path length. Microwave surface emissivity over land and ocean is a function of incidence angle, so, in principle conical scanners eliminate this variable from atmospheric retrievals. Cross-track scanners have changing spatial resolution which is 25 highest at near-nadir views and grows into larger fields-of-view at the outer edge of the scan. They have a minimal atmospheric path length at nadir. 1987 saw the launch of the first SSM/I instrument, a sensor that, while having no official climate mission, has had a profound impact on global water vapour records. The water vapour absorption line at 22 GHz is a key component of these TCWV retrievals, other window channels compensating for cloud and surface roughness effects. 30 TCWV from passive microwave imagers has historically only been retrieved over the ice-free oceans, and it is commonly although incorrectly stated that passive microwave retrievals work over ocean only. This is due to complex and variable land surface emissivity that changes on short timescales due to surface wetness, vegetation state, and soil properties. The barrier to passive microwave retrievals over land is beginning to fall, at least for operational weather users, as evidenced for instance by the NOAA Microwave Integrated Retrieval System (Boukabara et al., 2011;Forsythe et al., 2015). Du et al. (2015) demonstrate an AMSR2 algorithm to retrieve TCWV over land. AMSR2 on board the Global Change Observation Mission -Water is a successor of AMSR-E on board Aqua, and has the highest spatial resolution among passive microwave imagers on polar orbiting platforms. Land retrievals require further investigation for climate research. For the water vapour 5 climate record the passive microwave TCWV record has not yet been demonstrated over land, but there is some possibility of this advance in the coming years.

Infrared sensors
Infrared sounding sensors constitute the longest type of satellite record for water vapour profiling. A key distinction between infrared sensors for water vapour retrievals is between radiometers (e.g., HIRS, ATSR) and spectrometers (e.g., AIRS, IASI, Satellite System spacecraft continue the hyperspectral sounding record. A third IASI instrument is due for launch end of 2018 onboard Metop-C, which will extend the IASI mission and the associated sounding products from 2006 to beyond 2023. IASI's successor, IASI-NG, will be onboard the EPS-SG satellite, providing hyperspectral observations until ~2040.
Beginning with Metop-C and due to the availability of IASI, HIRS will not be continued. In February 2016 Sentinel 3 was launched which carries the SLSTR instrument, a successor of the ATSR instrument series. 20 Infrared-only retrievals of TCWV and water vapour profile are retrieved under clear-sky or mostly clear-sky conditions only.
The combination with passive microwave sounders improves the range of sky conditions in which retrievals are possible (e.g. Li et al., 2000;Kahn et al., 2014). The intercalibration of the HIRS record is still continuing (e.g. Shi and Bates, 2011).
There are intersensor differences in the spectral placement of the 20 channels on HIRS and most impactful is the switch of channel 10 from 8.6 µm to 12.5 µm on the HIRS 3 and 4 sensors beginning with NOAA-15 in 1998. 25 While land surface emissivity is much more uniform and less time-varying in the infrared than at microwave wavelengths, infrared land surface emissivity does vary (Seemann et al., 2008) and can be problematic for infrared retrievals, especially over desert surfaces.

Ultraviolet/Visible/Near-Infrared imagers
A TCWV retrieval using two channels at 0.885 µm (window) and 0.9 µm (water vapour absorption) has been demonstrated 30 from the MERIS and MODIS instruments (Diedrich et al., 2015;Lindstrot et al., 2014). The retrieval is limited to the daylight portion of the swath, as differential solar reflectance is the signal for this retrieval. These types of retrievals have the benefit of high spatial resolution (~ 1 km). The MERIS instrument was launched in 2002, while MODIS onboard the Terra spacecraft begins in 1999, and is complemented by the MODIS onboard the Aqua spacecraft which was launched in 2002. In February 2016 Sentinel 3 was launched which carries the OLCI instrument on Sentinel 3, a successor of the MERIS instrument. MERIS stopped operation in April 2012 while the TCWV time series can be extended with data from MODIS and OLCI. Retrievals from MERIS, MODIS and OLCI complement passive microwave TCWV retrievals because they 5 perform best over land and have reduced quality over oceans.
UV/VIS/NIR spectrometers such as GOME, SCIAMACHY, and GOME-2 yield retrievals of TCWV over land and ocean surfaces under daylight and clear-sky conditions (e.g., Grossi et al., 2015). The spatial resolution is between 320 km x 40 km for GOME, 60 km x 30 km for SCIAMACHY, and 80 km x 40 km for GOME-2, with cloud handling being a major challenge. 10 A recent development in remote sensing of water vapour is the retrieval of TCWV from the NASA Orbiting Carbon Observatory (OCO-2) spacecraft (Nelson et al. 2016). This retrieval uses near-infrared reflected sunlight with the high quality OCO-2 grating spectrometer.
Future European missions such as Sentinel-5 precursor, EPS-SG, and Sentinel-5 will carry similar UV/VIS/NIR spectrometers with improved spatial resolution (<10 km) which will further extend the TCWV records from UV/VIS/NIR 15 spectrometers.

GPS Radio Occultation
GPS-RO measurements profile atmospheric refractivity with globally-distributed, all-weather sampling from which tropospheric water vapour is derived. GPS-RO missions began with the prototype GPS-MET mission from April 1995 -Feb. an "open-loop" tracking capability that enable a far higher percentage of the occultation profiles to extend to within 1 km of the surface.
From the GPS-RO receiver phase measurements during each occultation, the bending of the signal path due to its passage through the atmosphere is determined. The bending angle profile is then inverted to a profile of index of refraction which is closely related to the dry gas and water vapour concentrations. The water vapour contribution is typically isolated using one 30 of two approaches. The first is a variational technique used in numerical weather prediction assimilation systems where estimates of the water vapour, temperature and pressure from the forecast and their associated error covariances are combined with the GPS occultation bending angle or refractivity profile and its error covariance to achieve a new, statistically optimal estimate (e.g. Healy and Eyre, 2010). The second approach, referred to as the "Direct" method, estimates the dry part of the refractivity from the analysis temperature that is then subtracted from the GPS-measured total refractivity to obtain the wet part of the refractivity, which is finally scaled to obtain the water vapour. Both methods rely on the analysis temperatures that are a combination of observations and the numerical weather prediction model. The advantage of the Direct method is that it does not rely on estimates of water vapour from models, forecasts or climatologies. Estimates of the 5 systematic and random uncertainties of water vapour derived from COSMIC observations via the Direct method are given in Kursinski and Gebhardt (2014).
GPS-RO has very high vertical resolution for a satellite system (100-200 m) and a relatively coarse horizontal resolution of about 100 km (Kursinski et al., 1997, eqn. 13). The altitude range of GPS-RO individual profiles of water vapour extends from the surface to about the 240 K-temperature level in the troposphere. 10 GPS-RO's ability to routinely penetrate through clouds combined with its insensitivity to surface emissivity eliminates sampling biases that limit other satellite measurement systems. A caveat with present GPS-RO derived water vapour is that in warm conditions typically found in low latitudes, GPS-RO water vapour profiles in the boundary layer can be biased low when a sharp water vapour concentration contrast exists between dry free tropospheric air and moist boundary layer air below, causing the so called super-refraction (e.g., Xie et al., 2006). 15 Xie et al. (2006) developed an inversion method that accounts for super-refraction and produces unbiased water vapour profiles in the boundary layer. Because it has been difficult to determine when super-refraction is occurring in the COSMIC profiles, the Xie et al. method has not yet been used much. To overcome this challenge, the new GPS-RO receivers on COSMIC 2 are designed to provide much higher signal-to-noise ratios to enable routine detection of super-refraction.
The COSMIC 2 mission is scheduled to launch six satellites in late 2018 and possibly a second set sometime later. 20 Successors of GRAS will fly on EPS-SG with a sustained operation until ~2040.

Reanalyses
Reanalysis systems are typically based on advanced operational atmospheric general circulation models and include data assimilation schemes. The data assimilation merges spatially and temporally discontinuous observations with model state fields to reinitialize atmospheric forecasts and produce spatially and temporally continuous state fields. The data assimilation 25 schemes differ by the degree of complexity, by the assimilation strategy (i.e., assimilation of geophysical parameters versus assimilation of radiances) and by data input. All major reanalyses assimilate a large diversity of data from observing system like radiosondes, ground-based GPS and GPS-RO, diverse satellites, buoys and ships and more. Reanalyses are performed with fixed model and data assimilation systems, however data input to the reanalysis system changes over the years.
Reanalysis products are generated retrospectively, and several reanalysis centres provide near real time services. The major 30 purpose is to provide a product which allows the analysis of the weather and climate and how it changes over time.
Reanalyses provide global coverage and cover the full troposphere and stratosphere. An advantage is the large diversity of gap-free geophysical parameters. Reanalyses rely on atmospheric numerical models which carry their own systematic biases and which is a disadvantage. Diversity in the assimilated observations in each system, the quality control and the varying number of observations in space and time can affect the reanalysis quality. More information can be found in the references given in Table 2 as well as at http://reanalyses.org and at http://s-rip.ees.hokudai.ac.jp/, the webpage of the SPARC Reanalysis Intercomparison Project.

Overview of data records 5
Using observations from satellites and data from reanalyses, as introduced in the previous section, a large variety of (global) long-term and freely available water vapour data records have been developed over the last decades. An overview of such data records from satellite and reanalyses is provided in Table 2. The table is based on the information available on the G-VAP website at http://gewex-vap.org/?page_id=13, where also tables for operational satellite as well as in-situ and groundbased products are available. The first column of Table 2 gives the sensor while the second one contains the data record 10 name. Subsequently this name is used. If not available or unique the data record is named after the utilised sensor and the owner. Besides the covered parameter and key references, the main technical specifications are provided which might allow a first order decision on the fitness-for-purpose of a particular data record. Other information sources on water vapour data include the ECV inventory (http://climatemonitoring.info/ecvinventory), the Climate Data Guide (https://climatedataguide.ucar.edu/), and the http://reanalyses.org/ website. 15 Note that in this section a general overview of available data records is provided and that not all data records mentioned in Table 2 are part of the G-VAP data archive. A few more details regarding retrieval scheme and other details are provided in short paragraphs per data record of the G-VAP data archive in Sect. 4.1.

G-VAP data archive
In this section, the G-VAP data archive is introduced which includes an explanation to why the archive consists of a subset 20 of the data records mentioned in Table 2. GDAP has initiated G-VAP, among others, with the purpose of having G-VAP support the process of selecting suitable water vapour products by GDAP for its production of globally consistent water and energy cycle products. The usage of the products within GDAP activities essentially implied to study long-term data records, that is, data records that start around 1980 and cover more than three decades. In order to increase the number of available data records G-VAP considered freely available data records with a temporal coverage of at least ten years. Thus, the 25 assessment considered data records that may not be used as input for GEWEX water and energy cycle data records but which are important to establish a deep understanding of atmospheric water vapour observations. After August 2016, the filling and updating of the G-VAP data archive ended in order to allow a timely analysis of the data and provision of results to GDAP.
Thus, data records that exceed a temporal coverage of more than 10 years by the end of 2015 are considered and are marked with an asterisk in Table 2. 30 Next follows a short introduction to the elements of the G-VAP data archive, to the post-processing which mainly transfers the data records onto a common grid and period and to the archive structure.

Introduction to individual data records
The elements of the G-VAP data archive are given in Table 2 and are marked with asterisks. In this section abstracts are provided for each element of the archive. These abstracts briefly introduce the input data, the retrieval scheme, technical 5 specifications, provide basic references to the algorithm and/or the data record and give access details.

AIRWAVE
The AIRWAVE retrieval scheme exploits the dual view capabilities of the ATSR instrument series by using the two thermal infrared channels, centred at 10.8 and 12 µm. It works above the cloud-free ocean by combining advanced radiative transfer 10 models and a spectral emissivity database. The simultaneous use of ATSR's forward and nadir measurements minimises the impact of the limited knowledge about the SST and the atmospheric radiation on the quality of the retrieved TCWV.
Exploiting only the thermal infrared channels of the instrument, the algorithm enables the estimation of TCWV for both day and night observations and the full exploitation of the ATSR instrument series, spanning from 1991 to 2012 (see Casadio et al., 2016 andCastelli et al., 2015 for details on the retrieval). The AIRWAVE processor is integrated in the ESA Grid 15 Processing on Demand environment for the bulk processing of the three ATSR missions . The product is available in Level 2 (1×1 km 2 and 0.25°×0.25°). Results from comparisons to satellite and radiosonde products can be found in Casadio et al. (2016) and Papandrea et al. (2018). The AIRWAVE version 1 data record was provided as monthly means at a spatial resolution of 1°×1° via ftp in June 2015.

AMSR-E JAXA
The AMSR-E standard TCWV algorithm uses AMSR-E brightness temperatures at 18.7, 23.8, and 36.5 GHz channels for V/H polarization, and SST, sea surface wind speed, temperature at 850 hPa given by the Japan Meteorological Agency sixhourly global analysis data as ancillary data (see Takeuchi, 2002 andTakeuchi et al., 2004 for details on the retrieval). The algorithm is applicable to open ocean regions only and its dynamic range is 0-70 kg/m 2 . The product is available in Level 2  25 (swath data with about 10 km sampling intervals) and Level 3 (daily, monthly, 0.25°×0.25° grid for ascending and descending orbits) from June 2002 to October 2011. The reprocessing of the AMSR-E TCWV product is underway along with other AMSR-E geophysical parameter products by applying the latest AMSR2 algorithm (see Kazumori et al., 2012 andKazumori, 2013) in order to produce consistent AMSR-E/AMSR2 products covering more than 15 years. The accuracy of the product is described in Kazumori et al. (2012). The AMSR-E JAXA version 2 was downloaded as monthly means from 30 https://gcom-w1.jaxa.jp/auth.html in February 2015.

AMSR-E REMSS
The REMSS AMSR-E water vapour algorithm utilises all AMSR-E channels and is part of a multi-parameter retrieval scheme that simultaneously retrieves TCWV, surface wind speed, SST, and cloud liquid water content and rain rate for ocean scenes. The basic algorithm is described in Wentz (1997), with updated models of the microwave absorption, emission, and scattering by the atmosphere and ocean surface (Meissner and Wentz, 2012;Wentz and Meissner, 2017). The algorithm most strongly depends on the 23.8 GHz, on the upper wing of the water vapour line at 23.235 GHz, with the other 5 AMSR-E channels providing information about surface roughness and atmospheric scattering and absorption from liquid water. The data are available as twice-per-day near-global gridded maps with 0.25°×0.25° resolutions from http://www.remss.com/missions/amsr.html. The TCWV values have been validated via comparison with similar measurements from TMI (Wentz, 2015) and with TCWV measurements made by ground based GNSS sensors located on small islands (Mears et al., 2015). Monthly means of AMSR-E REMSS V7 have been downloaded in March 2015. 10

ATOVS CM SAF
The ATOVS data record from CM SAF offers 13 years (1999 -2011) of satellite-derived global water vapour and temperature products. Different parameters are generated simultaneously: TCWV, mean temperature and vertically integrated water vapour in five layers, as well as specific humidity and temperature at six levels. Also available are the 15 number of valid observations and an uncertainty estimate. The data record was derived from ATOVS on-board NOAA-15 to -19 and Metop-A. ATOVS is composed of three instruments: HIRS, AMSU-B/MHS and AMSU-A. After application of a kriging routine, the products are available as daily and monthly means on a cylindrical equal area projection at a resolution of 90 km × 90 km. Further details on the retrieval, the data record and on validation results can be found in Courcoux and Schröder (2015). Also, the validation report, algorithm theoretical basis document and product user manual can be 20

CFSR 25
The NCEP CFSR was designed and executed as a global, high-resolution coupled atmosphere-ocean-land surface-sea ice system to provide an estimate of the state of these coupled domains over the period from 1979 to the present, currently being run as an operational, real-time product. The CFSR include 1) coupling of the atmosphere and ocean during the generation of the 6-hour guess field, 2) an interactive sea ice model, and 3) assimilation of satellite radiances by the Gridpoint Statistical Interpolation scheme over the entire period. The CFSR global atmosphere resolution is ~38 km (T382) with 64 levels 30 extending from the surface to 0.26 hPa from 1979-2009 and is ~25 km (T574) after that. The CFSR atmospheric model has observed variations in CO 2 over the 1979-present period, together with changes in aerosols and other trace gases and solar variations. Most available in-situ and satellite observations were included in the CFSR. Satellite observations were used in radiance form, rather than retrieved values, and were bias corrected with "spin up" runs at full resolution, taking into account variable CO 2 concentrations. CFSR output products are available at an hourly time resolution and a horizontal resolution of 0.5°×0.5°. More details on the methodology and results from evaluation are given in Saha et al. (2010). The CFSR data is distributed by the National Center for Environmental Information and National Center for Atmospheric Research. The temperature as well as TCWV and specific humidity profile data records have been accessed via https://rda.ucar.edu/datasets/ds093.2/ in March 2016 and June 2013, respectively. 5

EMiR
The MWR instrument flown onboard the European Remote Sensing satellites 1 and 2 as well as onboard the Environmental Satellite has provided a time series of global microwave observations over a period of nearly 21 years between 1991 and 2012. The EMiR data record builds on the MWR time series by applying a one-dimensional variational approach to provide 10 information on TCWV and wet tropospheric correction in clear and cloudy-sky conditions. Significant efforts were invested to ensure a good intercalibration of the three MWR instruments. The EMiR data record covers the entire global ice-free ocean from 11/1992 to 03/2012. It is available for individual orbits at a spatial resolution of typically 20 km as well in a gridded form for monthly mean values at 2°×2° and 3°×3° spatial resolution. Details on the retrieval, the data record and results from validation are given in Bennartz et al. (2017). The EMiR data record and supporting information can be 15 obtained free of charge from http://dx.doi.org/10.5676/DWD_EMIR/V001. The EMiR data record (version 1) was downloaded in May 2016.

ERA-Interim
The ERA-Interim's atmospheric general circulation model and 4D-Var assimilation system are based on the version of the 20 ECMWF integrated forecast system that was used in the ECMWF operational system between 12 December 2006 and 5 June 2007 (Cy31r2). It is conducted at reduced resolution of about 79 km in the horizontal (T255 in spectral space) and 60 levels in the vertical from the surface to 0.1 hPa. ERA-Interim is the predecessor of the ERA5 reanalysis, which is currently in production (Hersbach and Dee, 2016). ERA-Interim incorporates an improved representation of the hydrological cycle, 4D-Var, the implementation of variational bias control for satellite radiances and the assimilation of TCWV satellite retrievals 25 using the 1D+4D-Var approach (Bauer et al., 2006a, b). Over the global oceans boundary conditions are provided by prescribed estimates for SST and sea-ice cover. ERA-Interim exploits in situ measurements of surface pressure, 2 m temperature, 2 m relative humidity from land stations, ships and drifting buoys and near-surface wind from these latter two, upper-air temperatures, wind, and specific humidity from radiosondes, pilot balloons, aircraft, and wind profilers. The largest amount of data comes from polar-orbiting and geostationary satellite observations, which are mostly assimilated as 30 brightness temperature using suitable observation operators. In addition, satellite-derived atmospheric motion vector winds, information on surface wind from scatterometers, ozone retrievals, measurements from GPS radio occultation and ocean wave height from altimeters are ingested. Between 1992 and 2006 there is an artificial reduction in precipitation of about 0.1 mm/day over the global oceans, which can be traced back to a problem in the linearized moist physics in the 1D-Var observation operator in the above-mentioned 1D+4D-Var scheme (Geer et al., 2008). The ERA-Interim reanalysis provides globally complete atmospheric products from 1979 onwards at a mixed 3-hourly/6-hourly output frequency and is continued with updates available at about 3 months behind real time. ERA-Interim and its quality is described in Dee et al. (2011).
Monthly means of temperature as well as TCWV and specific humidity with a spatial resolution of 1°x1° were downloaded from http://apps.ecmwf.int/datasets/ in November 2012 and June 2016, respectively. 5

ERA-20C
ERA-20C's atmospheric general circulation model and 4D-Var assimilation system are based on the version of the ECMWF

GOME/SCIAMACHY/GOME-2 GlobVapour
The GOME/SCIAMACHY/GOME-2 family of instruments are nadir-looking spectrometers operating in the UV/VIS/NIR wavelength region A spectral window around the H 2 O absorption lines near 630 nm are used for the retrieval of TCWV both 25 over ocean and over land surfaces under daylight and clear-sky conditions (Wagner et al., 2003). Here, the GOME Data Processor 4.x algorithm (Grossi et al., 2015) is applied to observations from all three sensors. The algorithm has two major steps: the Differential Optical Absorption Spectroscopy least-squares fitting for the trace gas slant column, followed by the computation of a suitable air mass factor to make the conversion to the vertical column density. The air mass factor algorithm used is optimised for generating self-consistent long-term climatological data, by minimising external inputs. 30 Monthly time series of TCWV derived for each instrument are harmonised using GOME as reference. The product covers the period January 1996 until December 2008 and has a spatial resolution of 0.5°. Further details on the data record and results from validation can be found at http://globvapour.info/documents.html (see Product User Guide and Product Validation Report). The data record is accessible via http://globvapour.info/products.html#daftp and the version 1 was downloaded in June 2016.

HIRS UWisconsin
The HIRS UWisconsin moisture record retrieves TCWV as well as integrated high, mid, and low layer tropospheric 5 humidity from clear-sky radiances measured by HIRS over land and ocean both day and night. The retrieval is a statistical regression (Seemann et al., 2003 and developed from an atmospheric profile database (SeeBor, Borbas et al., 2005) that consists of geographically and seasonally distributed radiosonde, ozonesonde, and ECMWF reanalysis data. The AVHRR based Pathfinder Atmospheres -Extended cloud mask is used to characterize HIRS sub-pixel cloud cover. The HIRS TCWV and integrated high layer tropospheric humidity products are binned into a global map of 0.5°x0.5°, for four 10 time periods daily (night before and after midnight and day before and after noon) and compiled into monthly amounts. The HIRS UWisconsin moisture package (version 2.5R2) was provided via ftp and downloaded in May 2016.

HOAPS
The HOAPS data record is a satellite-based climatology of TCWV, near surface specific humidity, wind speed, precipitation, 15 evaporation, latent heat flux and freshwater budget (evaporation minus precipitation) over the global ice-free oceans. All variables are derived from SSM/I passive microwave radiometers onboard various DMSP platforms, except for SST, which is taken from AVHRR measurements. The data record generation involves multi-satellite averages, inter-sensor calibration, and an efficient sea ice detection procedure. All HOAPS products have global coverage, i.e., within ±180° longitude and ±80° latitude and are only defined over the ice-free ocean surface. The products are available as monthly averages and 6-20 hourly composites on a regular latitude/longitude grid with a spatial resolution of 0.5°×0.5°. More details on the water vapour retrieval, the data record and results from validation can be found in Schlüssel and Emery (1990) analysis (4D-Var) and variational bias correction for satellite radiances. The products are available as monthly, 6-hourly and 3-hourly (for surface parameters only) temporal resolutions on the TL319 quasi-regular Gaussian grid (approximately 55-km resolution) as well as a regular latitude/longitude grid with a spatial resolution of 1.25°×1.25°. More details on input data, methodology and quality of JRA-55 can be found in Kobayashi et al. (2015). Also, the JRA-55 product users' handbook can be downloaded from http://jra.kishou.go.jp/JRA-55/index_en.html. Monthly means of temperature as well as TCWV and specific humidity were downloaded from http://jra.kishou.go.jp/ in February 2015 and July 2016, respectively.

Merged Microwave REMSS
The REMSS monthly merged TCWV vapour product is assembled by combining measurements from SSM/I, SSMIS, 5 AMSR-E, WindSat, and AMSR2. The vapour algorithms for each of these instruments are very similar to the AMSR-E REMSS algorithm described above. The data from each satellite are assembled into monthly 1°×1° gridded maps with valid TCWV over the ice-free ocean. These maps are quality controlled to exclude grid points that are corrupted by ice or land emission. Then small offsets that are derived from comparisons with TMI (Wentz, 2015) are applied to satellites with significant overlap with TMI. Then the gridded maps were combined into a single data record, using simple averaging for 10 months when two or more satellites were operating. The merged TCWV product is available (at http://www.remss.com/measurements/atmospheric-water-vapor) as gridded 1°×1° gridded maps from January 1988 to the present and is updated on a monthly basis. Validation results can be found in Mears et al. (2015). The TCWV data record (V7) was obtained online from http://www.remss.com/measurements/atmospheric-water-vapor/tpw-1-deg-product.html in April 2013. 15

MERIS GlobVapour
The high spatial resolution TCWV data record was derived from ESA's 3rd reprocessing of the MERIS L1 archive. The TCWV retrieval is based on an optimal estimation approach and applied to swath-based, normalized radiances in MERIS bands at 865 nm, 885 nm, and 900 nm, while the cloud screening procedure utilises the full set of MERIS radiances between 20 400 and 900 nm. The product also includes information on uncertainties. Further details on the retrieval and results from validation are given in Lindstrot et al. (2012). The data record covers the period January 2003 -March 2012 and is available on a global grid at a spatial resolution of 0.05° over the coastal ocean, open ocean areas with occurring sun glint, and cloud free land surfaces. The data record (version 1) was provided via ftp in July 2015.

25
MERRA/MERRA-2 MERRA is described in (Rienecker et al., 2011). While there were some successes in MERRA, the changing observing system, particularly with sensors sensitive to water vapour, lead to spurious jumps in the global TCWV record. In attempting to improve on MERRA, MERRA-2 incorporates a mass conservation constraint on the dry mass and water vapour assimilation that results in the global water vapour analysis increment to be negligibly small, which has the result of a global 30 evaporation and precipitation balance (Takacs et al., 2016;Gelaro et al., 2017). For total column water, this has led to a global time series that is much more stable than that of MERRA (Bosilovich et al., 2017). While the global water analysis increments are essentially zero, they can have a value at any given point or region, and the influence of the changing water vapour observations can also be apparent. In addition, over land surfaces, MERRA-2 uses observation corrected precipitation as the surface source of water, providing another observation constraint in the global water cycle (Reichle et al., 2017). The input data, the methodology and the quality of MERRA2 are described in Gelaro et al. (2017). Monthly means of temperature as well as TCWV and of specific humidity (MERRA) with a spatial resolution of 0.5°x0.66° were downloaded from https://goldsmr2.gesdisc.eosdis.nasa.gov/ in November 2012 and March 2013, respectively. MERRA-2 has a spatial resolution of 0.5°x0.625° and was downloaded from https://goldsmr5.gesdisc.eosdis.nasa.gov/ in April 2016. 5

MODIS/Aqua NASA (MYD08_M3)
The MODIS/Aqua TCWV data record from NASA was obtained from the Collection 6 MODIS Near-IR Products (called MOD05). The near-IR TCWV is derived from the attenuation by water vapour of near-IR solar radiation. Techniques (Gao and Kaufman, 2003) employing ratios of water-vapour-absorbing channels 17, 18, and 19 with the atmospheric window 10 channels 2 and 5 are used. The ratios remove partially the effects of variation of surface reflectance with wavelength and result in atmospheric water-vapour transmittances. The solar retrieval algorithm relies on observations of water-vapour attenuation of reflected solar radiation in the near-infrared MODIS channels so that the product is produced only over areas where there is a reflective surface in the near IR, during the daytime, over clear land areas of the globe and above clouds over both land and ocean. Over clear ocean areas, water-vapour estimates are provided over the extended glint area. The 15 Level 2 data are generated at the 1 km spatial resolution of the MODIS instrument. There are three MODIS Level 3 gridded atmosphere products: daily, 8-day, and monthly, where the Level 2 atmosphere products are aggregated to a 1° x 1° equalangle global grid (called MOD08, Platnick et al., 2015). The MODIS/Aqua MYD08 monthly mean TCWV data record (MYD08_M3) was downloaded from https://ladsweb.modaps.eosdis.nasa.gov/search/order/2/MYD08_M3--6 in May 2015.

nnHIRS
The nnHIRS global atmospheric temperature-humidity profile data product is one of the products produced by the International Satellite Cloud Climatology Project and is based on new retrievals from re-calibrated HIRS measurements (Shi et al., 2016) with temporal (and some spatial) interpolations to provide global coverage every 3 hr over the period 1980 -2015. The new retrievals include the following: (1) a cloud detection algorithm is applied to each individual field of view 25 (pixel) and all clear pixels are processed, (2) the retrieval procedure accounts for variations of CO 2 abundance over the record, (3) the retrieval procedure accounts explicitly for variations of surface topography and (4) the retrieval obtains values for near-surface air and skin temperatures separately. Because of cloud cover, the typical coverage of the globe on a given day is about 30% but only about 10-15% at a given time of day. This product provides global coverage every 3 hr from the new retrieval results by employing time-interpolation procedures, including a specific statistical model of the diurnal 30 variations of temperature in the lower troposphere over land. The humidity profile is extended into the stratosphere by combining the HIRS-based results with those from other satellite measurements of humidity (Davis and Rosenlof, 2016).
The near-surface humidity over oceans is adjusted based on a matched analysis of satellite microwave observations to produce the SeaFlux products (Clayson et al., 2012). The near-surface temperatures over land are adjusted based on the global collection of surface weather observations (Smith et al., 2011). Results from comparisons to various other observations are shown in the ISCCP H-Version C-ATBD (2017). Monthly means of TCWV and specific humidity from the nnHIRS data record were provided via ftp in October 2015 while monthly means of temperature were provided in August 2016. The data record was provided on a 1° equal area grid together with software to map the data onto a regular longitude and latitude grid of 1°x1°. 5

NVAP-M Climate / NVAP-M Ocean
The NVAP-M completely replaces the heritage NVAP data set created in the 1990's and was created from polar orbiter satellite data along with radiosondes and surface-based Global Positioning System measurements. It contains three data types oriented towards different users: "Climate" strives for maximum temporal consistency (NVAP-M Climate), "weather" 10 strives for maximum spatial and temporal coverage and "Ocean" is a microwave-only record over the ocean ( product being averaged to match the lower spatial resolution. Upon request, the product can also be provided in 0.05°×0.05° spatial resolution by oversampling the SSM/I product. The product is available as daily composite and monthly mean and covers the period 2003 -2008. The retrieval, the data record and results from validation are described in Lindstrot et al. (2014). The data record is doi-referenced (http://dx.doi.org/10.5676/DFE/WV_COMB/FP) and accessible via http://globvapour.info/products.html#daftp. The SSM/I+MERIS data record (version 1) was downloaded in January 2014. 30

TMI REMSS
Using a consistent processing scheme and a robust radiative transfer model, the TMI data were intercalibrated with other microwave radiometers, brightness temperatures were computed, and then the ocean measurement products were generated for distribution. This carefully applied intercalibration yields consistent products from all microwave radiometer data processed at RSS. Besides TCWV the data record contains SST, 10 m surface wind speed, 10 m surface wind speed, cloud liquid water, and rain rate. Further details on the retrieval are given in Wentz (1997Wentz ( , 2015 and validation results are given in Wentz (2015). The data record is available at a grid resolution of 0.25°×0.25° and covers the period 7 December 1997 to 31 December 2014. The TCWV data record (V7) was obtained online from http://www.remss.com/missions/tmi.html in March 5 2015.

Processing
After download of the data records with technical specifications as described in the previous section and prior to further processing fill values, missing values and values that are outside the data record specific validity range were assigned a unique undefined value. 10 In order to ease joined analyses, the data records are mapped onto a common grid. The data records are provided as monthly means. All data records except NVAP-M Climate and Ocean are available as monthly means. NVAP-M Climate and Ocean contain daily averages and the daily values within a month are arithmetically averaged using all valid observations to compute monthly means. The common grid was defined as the minimum integer multiple applicable to most of the data record grids which leads to a grid resolution of 2°x2° longitude/latitude. In order to remove a shift in the spatial grid between 15 the satellite-based products and the reanalysis products, the reanalysis grids are shifted by half a grid box. Therefore, the CFSR, ERA-Interim, ERA-20C, JRA-55, MERRA and MERRA-2 monthly means are linearly interpolated to a grid with unchanged spatial resolution but changed centre positions. This approach was also applied to AMSR-E JAXA. Then, all data records are arithmetically averaged onto the common grid by considering all valid observations within a grid cell (see also Schröder et al., 2016). Note that the regridding procedure impacts the data record's internal variance and characteristics of 20 extremes.
Some reanalysis products contain valid values below surface pressure. Thus, a common surface pressure mask was applied to all data records. The common surface pressure mask is computed on monthly basis from MERRA monthly mean surface pressure, then interpolated on to the common grid. Data below this surface pressure were set to the undefined value. In case information on surface pressure is an integral part of the individual data record both masks were applied. Data at four 25 standard pressure levels are provided: 300 hPa, 500 hPa, 700 hPa, and 1000 hPa. An interpolation was not required because all profile data records include these four levels. In nnHIRS and some reanalysis products, levels below surface, e.g. in presence of mountains, are not present.
All data records are provided as Network Common Data Format files (http://www.unidata.ucar.edu/software/netcdf/). The data files conform to the NetCDF Climate and Forecast Metadata Convention version 1.5 (http://cf-pcmdi.llnl.gov/). In 30 addition to either TCWV (in kg/m 2 ), specific humidity (g/kg) or temperature (K) data, the longitude and latitude values are part of each file and defined at grid centre. For the profile data records the pressure vector (in hPa) is included as well.

Archive structure
The G-VAP data archive contains TCWV as well as profiles of specific humidity and temperature. In order to allow a joint analysis using all data records from the archive the maximum common period covered by all data records was identified which is the period from January 2003 -December 2008. In order to also provide data records that allow an analysis and intercomparison of data records in climate context, a second set of data records was defined which spans the period January 5 1988 -December 2008 by looking for a maximum in common temporal coverage and a maximum in number of available number data records. The common period for the water vapour and temperature profile data records goes from January 1988 to December 2009. Thus, the G-VAP data archive consists of four folders: 22 TCWV data records covering the period 2003-2008, 11 TCWV data records covering the period 1988-2008, as well as seven specific humidity and seven temperature data records covering the period 1988-2009. An overview of the folders is given in Table 3. The file names contain information 10 on parameter, key sensor, data record name, data record version, period and G-VAP data archive version (v1.0).
Besides the cautionary note on the impact of regridding on variability of each data record the following comments need to be taken into account when working with the G-VAP data archive: The data records have been downloaded between 2012 and 2016. In the meantime new data record versions of the individual elements of the archive might be available. Lastly, individual data records differ in terms of spatial and temporal sampling, coverage, and masks applied. 15 As G-VAP will be continued the G-VAP data archive will be updated over the next years and released in approximately 2020. The updated release will include newly generated data records, data records which have not been considered yet and new versions of elements of the current G-VAP data archive. It is planned to keep the overall format and technical specifications of the G-VAP data archive, with one important change: In order to keep the gap between last year covered by elements of the archive and release date at minimum and in order to enhance applicability, the data records will not be 20 provided on common period but on individual temporal coverage.

Results from intercomparison
In order to provide an example of the analysis possible with this archive and to reveal and, in part, explain differences among the data records, results from the intercomparison of TCWV data records over the period January 2003 -December 2008, i.e., utilizing data records from the folder TCWV/short, are shown in this section. In this way, all elements of the G-VAP 25 data archive are considered in this intercomparison effort (see Table 3). Intercomparison results for long-term data records and for profile data records are given in Schröder et al. (2016Schröder et al. ( , 2017aSchröder et al. ( , 2018. The analysis was carried out on the basis of monthly means on a regular latitude longitude grid with 2° resolution. The ensemble mean, the difference and the absolute and relative standard deviation were computed using all valid observations at each grid point and data from all records of the G-VAP data archive. The relative standard deviation was normalised using 30 the ensemble mean. Figure 1 shows the ensemble mean and respective absolute and relative standard deviations based on the 22 data records. As not all data records provide global coverage, the available number of data records differs regionally. Standard deviations are generally lowest over ocean areas, whereas over land, the values are generally larger. Regional maxima in standard deviation occur over central Africa, the tropical warm pool and South America. Largest relative standard deviations are found in polar and high mountain regions (>25%). 5 The intercomparison is further refined by dividing the 22 data records into the following three classes: clear-sky, cloudy-sky and all-sky. The assignment of a data record to one of these classes is done according to the predominant condition required for retrieval application. Some data records include observations under predominant clear-sky conditions while others include observations under predominant clear-sky and cloudy-sky conditions and under all-sky conditions, i.e., additionally in presence of strong precipitation. The clear-sky class is further divided into data records which allow observations on 10 global scales and above land only. Note that the monthly mean products have been used without any effort to harmonise the cloud and precipitation screening and that the separation of the data records according to predominant condition of retrieval applicability essentially also separates the data records from different sensor types. For simplicity the terms clear-sky, cloudy-sky and all-sky class are used to indicate the predominant condition of retrieval applicability. Microwave REMSS, and TMI REMSS. The data records agree generally well, with relative standard deviation values below 10% over large parts of the ocean, except for the polar (ocean) regions where standard deviation values larger than 25% are found. The clear-sky class includes data records based on measurements from ultraviolet/visible/near-infrared imagers that cannot see through clouds and are therefore predominantly limited to clear-sky condition: AIRWAVE, HIRS UWisconsin, GOME/SCIAMACHY/GOME-2 GlobVapour, MYD08_M3, MERIS GlobVapour, nnHIRS, and SSMI+MERIS. These data 25 records exhibit global coverage, except AIRWAVE (ocean only) and MERIS-related products (land only). HIRS UWisconsin, GOME/SCIAMACHY/GOME-2 GlobVapour, MYD08_M3, and nnHIRS have global coverage. Results are presented separately for all data records with global coverage (third row in Figure 2) and all data records with coverage over land areas (fourth row in Figure 2). The results reveal large standard deviations between the data records both over land and ocean. Large relative standard deviations are again found over the polar regions, both over land and ocean. The area of the 30 ITCZ, deserts (e.g., Sahara) and mountain regions are affected by large differences. Systematic differences might occur in mountain regions due to differences in original spatial resolution. In addition, the treatment of topography differs between the data records and over land the ambiguity between surface properties and near surface atmospheric properties in retrieving humidity at near surface layers is challenging and differences in its treatment contribute to the observed features over land.
In order to investigate in more detail which data records cause the different areas with increased standard deviation values, the difference relative to the ensemble mean was calculated for each of the 22 data records. The corresponding maps are shown in Figure 3. All IR-based retrievals (AIRWAVE, HIRS UWisconsin, nnHIRS, MYD08_M3) exhibit large positive differences over subtropical high pressure zones. GOME/SCIAMACHY/GOME-2 GlobVapour also exhibits a feature in this area but shows negative differences, thus lower values than the other IR-based retrievals. ATOVS CM SAF and MERRA-2 5 exhibit positive differences over all continents while nnHIRS shows generally negative differences over land. On regional scales over land, ERA-20C and nnHIRS show the largest negative differences whereas ATOVS CM SAF, MYD08_M3 and NVAP-M Climate show the largest positive differences on regional scales over land. These regional scales are mainly located in the tropics, i.e., over central Africa and South America and the Sahara. Here also the reanalysis data records differ,  variability as possible source for differences found between the data records and to assess the presence of systematic differences among the three classes. The reanalysis data (all-sky class) show good agreement over time over the tropical ocean area, with a spread of around 2 kg/m² among the different data records. This is not too surprising as the spatial maps already showed the good agreement among the reanalyses over ocean. For the two other classes, however, we found areas of larger differences in Figure 2. Accordingly, relatively large spreads among the data records in the order of 5 kg/m² and 6 25 kg/m² for the cloudy-sky and clear-sky classes were found, respectively. Concerning the data records from the cloudy-sky class the spread is mainly caused by NVAP-M Climate and Ocean that both show larger values than the others which actually agree within 2 kg/m². Despite the offsets between the data records, the anomalies agree well between the data records from the cloudy-sky class. Only the EMiR anomalies exhibit individual months with larger differences (~1 kg/m²) in 2005 and 2006 with respect to the other anomaly time series. For the data records from the clear-sky class, an offset between 30 the individual records and differences with respect to the seasonal cycle and its magnitude were found. GOME/SCIAMACHY/GOME-2 GlobVapour and MYD08_M3 show pronounced biannual cycles with maxima in spring and fall. Reasonable agreement is found between GOME/SCIAMACHY/GOME-2 GlobVapour and AIRWAVE as well as HIRS UWisconsin, MYD08_M3 and nnHIRS until end of 2006. Afterwards this agreement disappears. This is partly due to a small increase in TCWV for HIRS UWisconsin over the period 2003-2008 and to a decrease in TCWV for GOME/SCIAMACHY/GOME-2 GlobVapour occurring in late 2006 and 2007. The latter change coincides with the launch of Metop-A end of 2006 with the GOME-2 instrument onboard. Frequently observed regional maxima in standard deviation (Figure 1 and Figure 2) and in absolute difference (Figure 3) occur in regions with persistent and large mean cloud cover. Sohn and Bennartz (2008) argue that the specific humidity (and 5 with this TCWV) within clouds is generally larger than in surrounding clear-sky areas. This clear-sky bias is in order of 10% (Sohn and Bennartz, 2008). This is similar to the difference between HIRS UWisconsin, MYD08_M3, and nnHIRS and AMSR-E JAXA, AMSR-E REMSS, ATOVS CM SAF, HOAPS, Merged Microwave REMSS and TMI REMSS (Figure 4).
However, the variability in TCWV within the cloudy-sky and the clear sky class is relatively large (Figure 4) and prohibits conclusions on systematic differences between the all-sky and cloudy-sky as well as between the cloudy-sky and clear-sky 10 class. As noted previously, the presented results are based on a separation of the data records according to predominant retrieval condition and not according to atmospheric condition.

Data availability
The G-VAP data archive was generated in April 2017. The data is available in netCDF format and can be accessed via http://dx.doi.org/10.5676/EUM_SAF_CM/GVAP/V001 (Schröder et al., 2017b). 15

Conclusions
An overview of satellite sensors capable of retrieving quantitative information on atmospheric water vapour and of available water vapour data records based on satellite observations and reanalyses is given. The summary tables on available data records include information on technical specifications such as coverage and resolution in order to enable a first order decision on utilisation by users. 22 data records remain after restricting the temporal coverage to a minimum of 10 years. 20 These data records are described in some more details on methodology and basic technical specifications and form the basis of the G-VAP data archive. The archive contains profiles of temperature in K (1988-2009) and of specific humidity in g/kg (1988( -2009( ), long-term TCWV (1988) and short-term TCWV (2003)(2004)(2005)(2006)(2007)(2008), both in kg/m 2 . All data records have been regridded onto a common, regular longitude/latitude grid with 2° spatial resolution. The profile data records contain information at four levels: 1000, 700, 500 and 300 hPa. The G-VAP data archive is freely available at 25 http://dx.doi.org/10.5676/EUM_SAF_CM/GVAP/V001 and includes the vast majority of available water vapour data records from satellite and reanalyses with a temporal coverage of at least 10 years. The G-VAP data archive is designed to allow direct intercomparisons using a subset of or the full archive. Except for analysis which requires a high level of stability, the archive can be used in context of climate analysis, climate model evaluation and the analysis of impacts and dynamics associated to large scale climate relevant phenomena such as El Nino. 30 In order to give an example of the analysis possible with this archive and to showcase differences among the data records, TCWV data records over the period January 2003 -December 2008 were intercompared. In this way, all 22 elements of the G-VAP data archive are considered. The intercomparison revealed regions of distinct differences among the data records, namely, central Africa, tropical warm pool, South America, as well as polar and mountain regions. To further refine the analysis, the data records were separated into three classes of predominant condition of retrieval applicability: all-sky, 5 cloudy-sky and clear-sky. The spread of the time series in each class can be large and exceeds the differences between the classes. Also, the separation into the three classes essentially also separates the data records from sensor types. Thus, conclusions on systematic differences between the classes can not be drawn.
G-VAP will continue to work on the characterisation of water vapour data records. In particular, new data records and updated versions of already analysed data records will be implemented into future versions of the G-VAP data archive in 10 order to continue to strive for completeness and describe any changes in quality between the different versions.

Appendix A
This appendix gives definitions for the abbreviations that are used in this paper. Table 1: Summary of main satellite instruments used for water vapour climate data records. Spatial resolution is typically given at nadir. The last column provides information on the applicability of typical water vapour retrieval schemes under certain conditions and in certain regions.