A Decade of GOSAT Proxy Satellite CH 4 Observations

. This work presents the latest release (v9.0) of the University of Leicester GOSAT Proxy XCH 4 dataset. Since the launch of the GOSAT satellite in 2009, this data has been produced by the UK National Centre for Earth Observation (NCEO) as part of the ESA Greenhouse Gas Climate Change Initiative (GHG-CCI) and Copernicus Climate Change Services (C3S) projects. With now over a decade of observations, we outline the many scientiﬁc studies achieved using past versions of this data in order to highlight how this latest version may be used in the future. 5 We describe in detail how the data is generated, providing information and statistics for the entire processing chain from the L1B spectral data through to the ﬁnal quality-ﬁltered column-averaged dry-air mole fraction (XCH 4 ) data. We show that out of the 19.5 million observations made between April 2009 and December 2019, we determine that 7.3 million of these are sufﬁciently cloud-free (37.6%) to process further and ultimately obtain 4.6 million (23.5%) high-quality XCH 4 observations. We separate these totals by observation mode (land and ocean sun-glint) and by month, to provide data users with the expected 10 data coverage, including highlighting periods with reduced observations due to instrumental issues.


Introduction
Atmospheric methane (CH 4 ) is the second most important greenhouse gas in terms of anthropogenic climate radiative forcing (Myhre et al., 2013) with a global warming potential on a 100-year time-scale of 28-34 times that of CO 2 (Etminan et al., 2016) on a mass/mass basis. This strong warming potential, when coupled to its short lifetime relative to that of CO 2 (Prather et al., 2012) makes it of particular interest when considering rapid and achievable mitigation strategies (Nisbet et al., 2020).
Scientific debate continues on trying to explain the atmospheric CH 4 trend observed over the past couple of decades. Records from surface sites reveal a plateau from 2000 to 2007 and a resumed increase after 2007 (Rigby et al., 2008;Dlugokencky et al., 2009). Amongst the varied surface sources of CH 4 , the largest are natural wetlands, agriculture, livestock, biomass burning, waste and fossil fuel production; whereas the primary sink is the OH radical in the atmosphere (Kirschke et al., 2013;Saunois et al., 2020). Various hypotheses have been offered that attempt to attribute the behaviour in the global growth rate to a 10 particular component or mechanism (Nisbet et al., 2016;Schaefer et al., 2016;Hausmann et al., 2016;McNorton et al., 2016b;Buchwitz et al., 2017a;Worden et al., 2017;Turner et al., 2017;Rigby et al., 2017) but currently there is no consensus within the community. Many of these studies have utilised satellite observations of atmospheric CH 4 and have shown the increasing capability of such measurements to characterise global and regional surface methane fluxes (Jacob et al., 2016).
This work presents the most recent update to the University of Leicester GOSAT Proxy XCH 4 Retrieval. This version (v9.0) 15 now provides over a decade of global total column CH 4 observations, from April 2009 to December 2019. A full reprocessing of the entire time series has been performed to ensure consistency throughout the record and to ensure that results utilising the entire record are as robust as possible.
It is the intention of the authors that this study acts as a reference for everyone making use of the data and as such, we have attempted to provide as much practical detail as possible on the usage of the data. 20 Section 2 describes the GOSAT observations themselves and highlights any instrument anomalies or data gaps. Section 3 is broken down into several sub-sections detailing the usage of previous versions of this data by the scientific community. Section 4 gives an overview of the retrieval method and details the end-to-end data processing chain including statistics on throughput and data availability. Section 5 shows the validation of the data against the Total Carbon Column Observing Network, characterising not only the final Proxy XCH 4 data but also the individual components of the retrieval. Section 6 provides details 25 of the global distribution of the data. Section 7 further characterises the data by performing model comparisons at global and regional scales. Finally we provide a summary and recommendations for future use in Section 8.

GOSAT TANSO-FTS Observations
GOSAT was launched in January 2009 by the Japanese Space Agency (JAXA), as the first satellite mission dedicated to making greenhouse gas observations (Kuze et al., 2009). GOSAT is nicknamed "Ibuki", meaning breath in Japanese, highlighting that 30 its mission involves monitoring the breathing of the planet, through measurement of the carbon cycle. In order to achieve this, GOSAT is equipped with a high-resolution Fourier Transform Spectrometer (TANSO-FTS -Thermal And Near infrared Sensor for carbon Observations -Fourier Transform Spectrometer). Shortwave infrared bands at 0.76 µm (O 2 ), 1.6 µm (CO 2 and CH 4 ) and 2.0 µm (CO 2 ) all provide near-surface sensitivity while a thermal infrared band between 5.5 and 14.3 µm provides mid-tropospheric sensitivity.
The objective for GOSAT was to provide routine measurements appropriate for regional and continental-scale flux estimates. Kuze et al. (2009) state a target relative accuracy of 2% for CH 4 over 3 month averages at 100-1000 km spatial scales. The ESA GHG-CCI User Requirements Document (URD) specify goal (G), breakthrough (B) and threshold (T) requirements, 5 with the goal requirements being the most stringent (Buchwitz et al., 2017a). For XCH 4 these values are 9, 17 and 34 ppb respectively for precision and 1, 5 and 10 ppb for relative accuracy. We discuss in Section 5 how we exceed these breakthrough requirements.
As an FTS acquisition is relatively slow (∼4 seconds), the GOSAT sampling strategy is tailored to achieve this goal by measuring with a relatively large footprint of 10.5 km, spaced apart approximately ∼263 km across-track and ∼283 km along-10 track. This means that while GOSAT does not "image" the surface, it does return to the same location every 3 days allowing a long time series of comparable measurements to be obtained. As well as nominally measuring over land in nadir mode, GOSAT is also capable of measuring over the ocean, which is normally too dark in the SWIR. This is achieved in the so-called "ocean sun-glint" observation mode, when the sun-satellite angle allows for a sufficiently reflected signal from the glint spot.
Discussion of the GOSAT measurement sampling strategy and its evolution over time can be found in Appendix E. 15

Instrument Anomalies and Data Gaps
Throughout its 10 years of operation, GOSAT has experienced a number of incidents resulting in instrument anomalies (Kuze et al., 2016). These incidents include: -May 2014 -a solar paddle incident resulting in a temporary instrument shutdown.
-January 2015 -a switch to the secondary pointing mechanism due to degradation of the primary system. 20 -August 2015 -a cryocooler shutdown and restart.
-May 2018 -a CDMS (Command and Data Management System) incident resulting in GOSAT being inactive for 2 weeks.
-November 2018 -rotation anomaly of the second solar paddle.
The temporary reduction in observations related to these incidents is discussed in Section 4.3 and reflected in Figure 4. 25

Studies Utilising Proxy XCH 4 Data
The University of Leicester GOSAT Proxy XCH 4 data is produced operationally for the ESA Greenhouse Gas Climate Change Initiative (Buchwitz et al., 2017b) and the Copernicus Climate Change Service (C3S) (Buchwitz et al., 2018) as well as routinely for the UK National Centre for Earth Observation. This work details Version 9.0 of the University of Leicester GOSAT Proxy XCH 4 data but previous versions over the past decade have been used for a wide variety of scientific studies.
This section details some of these past studies in order to highlight the potential applications for this data.

Validation of data
Firstly, before any conclusions can be drawn from analysis of the data, the data itself must be validated to ensure its robustness and reliability. Previous versions of the data have been extensively validated against the TCCON network (Total Carbon Col-5 umn Observing Network) as part of the ESA Climate Change Initiative (Parker et al., 2011;Dils et al., 2014;Buchwitz et al., 2017a), including extensive validation of the model XCO 2 used in the generation of the data . We have also performed validation of the data against aircraft profile observations over the Amazon , one of the most important and challenging regions for the retrieval.

Comparison to other satellite observations 10
Although GOSAT was the first satellite mission dedicated to measuring GHGs, successful CH 4 retrievals were performed previously from SCIAMACHY and continue to be performed from new missions such as TROPOMI onboard Sentinel 5-Precursor and the recently launched GOSAT-2. Furthermore, many thermal infrared missions are capable of measuring CH 4 (IASI, AIRS, TES, CrIS), albeit with sensitivity to the mid-troposphere and little sensitivity to the surface. Nevertheless, it is important that these different observations are consistent and their capabilities well-understood if we wish to perform long-15 term analysis. The ESA Greenhouse Gas Climate Change Initiative (ESA GHG-CCI) (Buchwitz et al., 2017a) made substantial efforts to characterise and validate these different observations (Dils et al., 2014). The ensemble median algorithm (EMMA) (Reuter et al., 2020) homogenises the SCIAMACHY and GOSAT datasets produced via the ESA GHG-CCI project and is intended to be a long time series dataset for climate applications.
Studies such as Cressot et al. (2014) and Alexe et al. (2015) have investigated the consistency between flux inversions 20 utilising SCIAMACHY, GOSAT (and IASI) CH 4 observations and generally found good consistency in derived emissions. Worden et al. (2015) combined the surface sensitivity of GOSAT with the mid-tropospheric sensitivity of the NASA TES instrument to better estimate the lower tropospheric methane (and hence surface) emissions while Siddans et al. (2017) have compared their IASI CH 4 product to GOSAT observations, finding good consistency between the two. 25 Perhaps the most important scientific question related to atmospheric CH 4 concentrations is understanding the observed longterm behaviour. The cause of the so-called "hiatus" or plateau in atmospheric CH 4 between 2000 and 2007 remains unresolved with various studies speculating on the reason. Although the GOSAT record unfortunately only began in 2009, after the end of the plateau period, it can still help to characterise behaviour and understand the processes that may have contributed to the stalling. GOSAT data have been successfully used to infer long-term global fluxes. As well as contributing to the Global 30 Methane Budget assessments (Saunois et al., 2016(Saunois et al., , 2020, GOSAT data have been used to assess the role of regional wetland emissions (McNorton et al., 2016b;Maasakkers et al., 2019) and the role of OH variability as a potential cause for the stalling in growth rate (McNorton et al., 2016a;Maasakkers et al., 2019).

Regional emissions
GOSAT data have been successfully utilised in regional-scale studies to determine CH 4 fluxes over many different regions.
These types of studies are of particular interest as they can help inform policy-related discussions on validation and verification 5 of regional or country-scale emission targets, such as those relevant to the Paris Agreement (Bergamaschi et al., 2018a). Fraser et al. (2013 performed regional flux inversions and found large changes over Temperate Eurasia and Tropical Asia, with the satellite observations providing a significant error reduction over only using surface data. Wecht et al. (2014) performed continental-scale inversions over North America and produced estimates of Californian CH 4 emissions and found consistent emission estimates over the Los Angeles Basin between the satellite inversion and that from a dedicated aircraft campaign. 10 Turner et al. (2015) extended this work to the entire US and inferred a US anthropogenic CH 4 source over 50% larger than that from EDGAR and EPA bottom-up inventories. Satellite inversion results from Alexe et al. (2015) showed a redistribution of CH 4 emissions in the US from the north-east to south-central. These results are consistent with recent independent studies that suggest that bottom-up estimates of North American fossil fuel emissions (particularly related to natural gas and petroleum production facilities) are systematically underestimated. Ganesan et al. (2017) used GOSAT data to infer India's CH 4 emissions 15 between 2010-2015 and found average emissions of 22.0 Tg yr −1 to be consistent with the emissions reported by India to the UNFCCC with no significant trend over time. Sheng et al. (2018) performed a similar study over the US, Canada and Mexico and found that US emissions increased by 2.5% over the 7-year study period and attributed this to contributions from the oil and gas industry and livestock. In Feng et al. (2017) the individual XCO 2 and XCH 4 components from the Proxy retrieval are used to infer regional CO 2 and CH 4 fluxes simultaneously. Finally, Lunt et al. (2019) inferred CH 4 emissions over tropical Africa 20 and found a linear increase of between 1.5 and 2.1 Tg yr −1 for 2010-2016, attributing much of this to short-term increase in emissions over the Sudd wetland area in South Sudan.

UoL Proxy XCH Retrieval
The University of Leicester Full-Physics retrieval algorithm (UoL-FP) is based on the original NASA Orbiting Carbon Observatory (OCO) "Full-Physics" retrieval algorithm (Connor et al., 2008;Boesch et al., 2011;O'Dell et al., 2012) which was 25 designed to simultaneously fit the short-wave infrared radiances in the 0.76 µm O 2 -A band and the 1.6 µm and 2 µm CO 2 bands. This algorithm has been adapted for use on GOSAT observations and modified to perform a variety of different retrievals, including the Proxy method described here. The radiative transfer calculations are accelerated using the Low-Stream Interpolation approach (O'Dell, 2010).
The concept behind the Proxy XCH 4 retrieval approach (Frankenberg et al., 2006) is that the majority of atmospheric 30 scattering and instrument effects will be similar for CH 4 and CO 2 mole fraction retrievals performed in a common absorption band (around 1.6 µm, where both CO 2 and CH 4 have absorption features). By taking the ratio of the retrieved XCH 4 /XCO 2 , the CO 2 acts as a "proxy" for the modifications to the light path induced by scattering (Butz et al., 2010) and cancels out those in the CH 4 retrieval. This means that moderate aerosol scattering does not adversely impact upon the retrieval, resulting in a higher number of high-quality XCH 4 observations compared to the full-physics approaches where much stricter post-filtering are often required. This is especially useful over the tropics, where moderate aerosol or cirrus effects can limit the coverage of full-physics methods but affect the proxy approach far less severely.

5
In order to convert the retrieved XCH 4 /XCO 2 ratio into a final XCH 4 quantity, a model-based estimate of XCO 2 is used, according to the equation: where the relative variability of CO 2 in the atmosphere is known to be much lower than that of CH 4 . This leads to the primary disadvantage of this method, i.e. that the model-based estimates of XCO 2 may introduce biases in the retrieved CH 4 .

10
In an attempt to minimise such biases, CO 2 dry-air mole fractions XCO 2(model) used in Eq. (1) in the UoL proxy retrieval scheme are obtained by taking the median of the estimates produced by three atmospheric chemistry transport models which have assimilated surface in-situ data: GEOS-Chem (Feng et al., 2011), NOAA CarbonTracker (Peters et al., 2007), and CAMS (Chevallier et al., 2010).
The advantage of the Proxy retrieval approach compared to the "Full-Physics" retrieval as typically used for CO 2 (Boesch 15 et al., 2011;Cogan et al., 2012), is that Proxy retrievals are less sensitive to instrumental effects, and require less-strict quality filtering (Schepers et al., 2012;Parker et al., 2015), thereby ensuring a better coverage of regions (especially in the tropics), where Full-Physics retrievals are particularly challenging.

Retrieval Inputs and A Priori Generation
In order to prepare all of the necessary inputs to the retrieval, we use the Leicester Retrieval Preparation Toolset (LRPT) 20 software. The latest version of the GOSAT Level 1B files (version 210.210) are acquired directly from the NIES GDAS Data Server and are processed with the LRPT to extract the measured radiances along with all required sounding-specific ancillary information such as the measurement time, location and geometry. These measured radiances have the recommended radiometric calibration and degradation corrections applied as per Yoshida et al. (2013) with an estimate of the spectral noise derived from the standard deviation of the out-of-band signal. We then format the spectral data for input into the UoL-FP 25 retrieval algorithm and generate a list containing all of the ancillary data necessary to create the retrieval a priori information.
Sounding-specific a priori information are generated for all individual soundings present in the sounding-selector list above.
Atmospheric temperature and water vapour profiles are taken from ECMWF ERA-Interim up to August 2019 and ERA-5 thereafter. CO 2 profile information is taken from the 16r1 CAMS atmospheric inversion (Chevallier, 2019) and incremented by the NOAA estimated global growth rate for recent years. CH 4 profiles are taken from a combination of the MACC-II CH 4 30 inversion (v10-S1NOAA -https://apps.ecmwf.int/datasets/data/macc-ghg-inversions/) for the troposphere and from a dedicated  TOMCAT stratospheric chemistry model simulation (Chipperfield, 1999) for the stratospheric component. This ensures that the CH 4 a priori profiles are sufficiently vertically resolved and capture the sharp decrease in concentration around the tropopause ( Figure 1 (top)). As the MACC-II data is only available until 2012, the data after this period is repeated each year. Figure 3. Sankey diagram detailing the retrieval throughput at each step of the processing chain for the GOSAT Proxy XCH4 retrieval. As well as the absolute number of soundings, the percentage relative to the initial total is also given.
All of the atmospheric profiles are then interpolated to a sounding-specific retrieval grid. For each sounding a 20-level pressure-based retrieval grid is generated that ranges from the top of the atmosphere (0.1 hPa) to 20 hPa beneath the surface 5 pressure as estimated by ERA-Interim. This 20 hPa buffer allows the surface pressure to be adjusted during the initial cloudscreening process without leading to unphysical extrapolation of the a priori profiles.
In addition to atmospheric a priori information, we also generate sounding-specific a priori information for the spectral dispersion and surface albedo directly from the GOSAT spectra.
The retrieval algorithm also requires the input of spectroscopic parameters for the species being simulated. We use v4.2.0 of 10 the OCO line lists for CO 2 , H 2 O and O 2 , and take CH 4 parameters from the TCCON line lists (Toon, 2015).

Cloud-Screening
Prior to the XCH 4 and XCO 2 retrievals, cloudy GOSAT soundings are identified and excluded by using the UoL-FP retrieval algorithm to obtain the apparent surface pressure from O 2 A-band spectra and comparing it to the surface pressure provided by the ECMWF reanalysis (Dee et al., 2011). If the absolute difference between the retrieved and the ECMWF surface pressure is 15 larger than 30 hPa, a sounding is flagged as cloudy and excluded from further processing. The reason why a loose threshold is used for the surface pressure difference is that this procedure only aims to identify and remove soundings which are significantly cloudy. Partially cloudy scenes, or scenes where optically thin clouds are present, are processed by the retrieval algorithm, and are dealt with through a post-retrieval quality filtering scheme described later in this section.

XCH 4 and XCO 2 Retrievals
For soundings that pass the cloud screening procedure described above, retrievals with the UoL-FP algorithm for CO 2 and CH 4 mole fraction profiles are carried out separately. The state vector for these retrievals consists of 20-level profiles for CH 4 5 and CO 2 mole fractions along with profile scaling factors for H 2 O mole fraction and temperature with parameters for surface albedo and spectral dispersion also included allowing us to explicitly fit the wavelength for each spectra independently.
A post-retrieval quality filtering is then carried out, by selecting the retrievals that meet the following criteria: (1) goodnessof-fit (χ 2 ) parameter between 0.4 and 1.9 for both CH 4 and CO 2 ; (2) a posteriori error smaller than 20 ppb for CH 4 and 3 ppm for CO 2 ; (3) retrieved XCH 4 larger than 1650 ppb and XCO 2 larger than 350 ppm; and (4) latitude north of 60°S (to exclude 10 Antarctica). Figure 2 shows an example of the spectral fits for one month of data (August 2016) for all 25,274 successful, quality-filtered data measured over land. The top panels show the averaged measured radiances in the 1.6 µm CO 2 (left) and 1.65 µm CH 4 (right) retrieval windows, with the bottom panels showing the residual differences to the final simulated spectra. The estimated instrument noise is indicated by the shaded area and the residuals are found to be within the noise. 15 The retrieved XCH 4 and XCO 2 satisfying the aforementioned quality criteria are then used in Eq. (1), together with the ensemble median model XCO 2 described earlier in this section. Prior to the calculation of XCO 2 to be used in Eq.
(1), model CO 2 profiles are convolved with scene-dependent instrument averaging kernels computed as part of the CO 2 retrieval.
Before the final production of the data files, an offset is subtracted from the retrieved XCH 4 to remove a residual mean bias to TCCON (see Section 5). Currently, a single offset value of 9.06 ppb is used. This offset is applied to all analysis presented 20 here and is built-in to the final delivered data.
A summary of the throughput of the whole processing chain described in this section is shown in Fig. 3. This shows that in total, between April 2009 and December 2019 we have 19.5 million individual GOSAT soundings which are ingested into the LRPT software. Of these, 95.4% are successfully preprocessed, with the 4.6% that fail largely due to incomplete or invalid L1B data. A very small number of successfully preprocessed soundings < 0.1% are unable to be processed further as we are unable 25 to estimate a noise for those spectra. Of the 18.6 million soundings that continue and are attempted for cloud-clearing, 17.7 million are able to be successfully cloud-cleared. Of these, just over 7 million soundings are found to be cloud free, with over 10 million determined to be cloudy. A successful CH 4 retrieval is performed on the majority of these cloud-free soundings, with just 41,597 failing the retrieval. Of the 7.3 million successful CH 4 retrievals, 2.7 million are rejected by our final quality filtering. It should be noted that currently we exclude all retrievals below 60°S (i.e. Antarctica) due to low signal-to-noise and 30 difficulty in distinguishing low cloud from the snow-covered surface. This alone accounts for over 1 million of the 2.7 million rejected retrievals. Finally, we are left with almost 4.6 million successful and quality-controlled XCH 4 retrievals (23.5% of the total measurements performed).  Figure 4 shows the number of successful retrievals broken down by month and also split into land and glint observation modes. This figure is particularly useful as it highlights any systematic differences in data density over time (e.g. from changes in the GOSAT sampling strategy) and also highlights abrupt data gaps (e.g. from instrument anomalies). In particular, it shows that the increase in monthly data from 2014/2015 onwards is largely a result of an increase in the number of glint observations which is a direct result of the instrument sampling changes that increased the valid glint range. This does highlight that some 5 care should be taken when using the data for some applications as there cannot be assumed consistent temporal/spatial data coverage over the whole data record. Large data gaps, such as in January 2015 and December 2018 are also highlighted and indicate where care may need to be taken when analysing over these periods.

Validation Against TCCON
Evaluation against the Total Carbon Column Observing Network (TCCON) is the primary mechanism by which satellite-based measurements of XCO 2 and XCH 4 are validated.
The TCCON network consists of ground-based high-resolution Fourier transform spectrometers, performing direct measurements of solar spectra in the near-infrared. There are currently 27 operational sites located across North America, Europe, 5 Asia and Oceania, including several islands in the southern hemisphere and other remote areas. TCCON sites have become operational at different times (see Table A1) and hence the data record length varies between sites, with Burgos (Phillipines)  and Nicosia (Cyprus) the most recent to come online in 2017 and 2019 respectively. Also note that the Lauder site (Pollard et al., 2017) has multiple instruments and we have kept these records separate. This work uses the latest available TCCON data, GGG2014. Detailed dataset citations are available for each site in Table   A1.
For comparison between TCCON and GOSAT, all GOSAT soundings within ±5°of a TCCON site are taken. For these soundings, the average of the TCCON data within ±2 hours of the GOSAT overpass time is calculated, resulting in GOSAT-TCCON pairs when there is TCCON data available. It is these matched GOSAT-TCCON pairs that are then subsequently 10 analysed. In total we use 22 of these sites within our analysis (see Table A1), omitting some sites with insufficient data coverage or high-altitude sites where the total column may not be well-represented or co-located well to the satellite observations. in Figure 6 which show that generally the GOSAT-TCCON difference is small (< 5 ppb), the standard deviation (which can be considered to be the single measurement precision of the GOSAT data) is typically between 10-15 ppb, the correlation coefficient is generally high (0.7-0.9) and there are many co-located GOSAT-TCCON measurements with the distribution changing considerably between TCCON sites. Another important validation metric is the relative accuracy, or inter-station bias. This metric is an indication of any spatio-temporal variability of the bias and is defined in Dils et al. (2014) as the 20 standard deviation of the individual site biases. We obtain a value of 3.89 ppb for this metric, again smaller than the estimated TCCON accuracy of ±4 ppb. This meets the "breakthrough" user requirement for the systematic error of 5 ppb as defined by Buchwitz et al. (2017a).
In total across all TCCON sites we find 88,345 matching GOSAT-TCCON data pairs. The correlation between the GOSAT and TCCON data is shown in Figure 7, presented as a 2-D kernel density estimation (KDE) plot, along with the corresponding marginal 1-D KDE plots on the X and Y axes. An overall difference of 9.06 ppb is removed from the GOSAT data so that, by 5 design, the absolute average difference to TCCON is 0 ppb. The overall standard deviation or single measurement precision is found to be 13.72 ppb with a correlation coefficient of 0.92. The single measurement precision of 13.72 ppb comfortably exceeds the precision breakthrough requirement of 17 ppb (Buchwitz et al., 2017a) indicating that it "would result in a significant improvement for the targeted application". Although the data contributing to this plot are from a wide variety of TCCON sites in different locations and at different latitudes, the distribution appears consistent and is tightly aligned to the one-to-one 10 (dashed) line. However, there are signs of a potential hemispheric or latitudinal bias in the data against TCCON, although this is not apparent at all sites; for example Karlsruhe, Lamont, Tsukuba and Lauder all have negligible biases but span a large latitude range. It should also be noted that the uncertainty on the TCCON XCH 4 is approximately 4 ppb and for the majority of sites the GOSAT-TCCON difference is within this uncertainty so care must be taken to not over-interpret any signals at this scale. Figure 8. Correlation between the 88,345 matching TCCON XCH4/XCO2 ratios and co-located GOSAT XCH/XCO2 ratios retrieved as a raw fundamental part of the Proxy XCH4 retrieval (Equation 1). The data is presented as a 2-dimension kernel density estimation (KDE) plot. The distribution sites along the one-to-one line (grey dashed) with a standard deviation of 0.03 ppb/ppm and a correlation coefficient (r) of 0.89. An overall bias to TCCON of 0.02 ppb/ppm is present in this raw data. The individual KDE plots are shown along the upper and right margins.

XCH 4 /XCO 2 Ratio Validation
In addition to validation of our final XCH 4 Proxy dataset, TCCON data allow us the opportunity to validate the different components in Equation 1, namely the XCH 4 /XCO 2 ratio and the model-derived XCO 2 . Figure 8 shows the correlation between the retrieved GOSAT XCH 4 /XCO 2 ratio (ppb/ppm) (with no bias correction applied) and the corresponding ratio calculated from TCCON. There is an excellent correlation coefficient of 0.89 across the 88,345 5 matching data points with a standard deviation of just 0.03 ppb/ppm. An average offset between the two datasets of 0.02 ppb/ppm exists and is of a very similar magnitude to the global offset that is removed from the final data of 9.06 ppb. To be clear, the final bias correction which we apply to the Proxy XCH 4 is almost entirely attributed to this bias that we identify here in the XCH 4 /XCO 2 ratio. It should also be noted here that the TCCON data itself has a bias correction applied to the XCO 2 and XCH 4 data. This airmass-independent correction factor derived from airborne calibrations is 1/0.9898 for XCO 2 10 and 1/0.9765 for XCH 4 (Wunch et al. (2010) - Table 5). It is considered that this correction is mainly a result of deficiencies in the spectroscopy, which likely apply to the GOSAT retrievals as well and might go some way to explaining this small difference between TCCON and GOSAT.

Validation of XCO 2 Model
To validate the XCO 2 model data used in the generation of the final Proxy data, we evaluate the model median XCO 2 mixing ratios against TCCON but also evaluate the three individual models, sampled at the time and location of the GOSAT soundings, with the GOSAT sounding-specific averaging kernel applied. These XCO 2 models are all independent of TCCON data but do assimilate NOAA surface site measurements, some of which are nearby to TCCON sites. Figure 9 shows

Global CH 4 Distributions
As discussed in Section 3, one of the primary applications for the GOSAT Proxy XCH 4 data has been as input to global flux inversions. For this reason, it is useful to examine the global spatio-temporal distribution of the data. Figure  The purpose of this paper is to present details of the v9.0 Proxy dataset and provide information to facilitate the future use of the data. As such, it is not the intention that this work performs detailed scientific analysis and interpretation. We do not, for example perform any atmospheric flux inversions using this data as that is a significant study in its own right. However, we do feel that it would be informative to users of the data for us to perform a comparison against existing model XCH 4 simulations to give confidence that the data is of sufficient quality to use in such studies. 15 In this section we compare the GOSAT Proxy XCH 4 data to a simulation of the TM5 global chemistry transport model (Bergamaschi et al., 2013(Bergamaschi et al., , 2018b which has assimilated NOAA surface measurements. We have chosen to compare against model simulations that are both widely used within the community and that have already assimilated NOAA surface measurements. The reasoning for this is that any overall differences as might be seen from free-running model simulation are removed and we can clearly state the consistency of our dataset with the NOAA network. By proving good overall agreement to both  very similar characteristics and is in very good agreement to the GOSAT data. The difference between the two datasets (lower 10 panel) shows that although there is a small offset between the two (with GOSAT on average 6.55 ppb larger than the model), there is very good consistency over time. GOSAT and TM5 seem to agree slightly better during the peak of the seasonal cycle, particularly in the tropics, with the TM5 exhibiting a shallower trough. At very high northern latitudes, GOSAT is slightly lower than the model but this relates to observations over Greenland at high altitude and low signal to-noise ratio for the GOSAT soundings so care must be taken not to over-interpret this difference. 15 With Figure 11 providing confidence that the GOSAT data and TM5 simulation are in broad agreement, it is informative to break these comparisons down to a regional scale. Figure 12 shows timeseries of the GOSAT and TM5 data over the 16 different TRANSCOM regions (Gurney et al., 2002). All regions show good agreement between the modelled and observed data. We compute the de-seasonalised XCH 4 over time for each region. The average difference in model and observation ranges from 3.9 ppb (Eurasian Boreal) to 15.4 ppb (Southern Tropical Asia). On average across all regions, the mean difference between the model and observation is 9.8 ppb.
The observed seasonal cycles in each region are very well-represented by the model, with an average correlation coefficient of 0.93 (ranging from 0.84 to 0.98 across all regions). The peak to peak seasonal cycle timing and magnitude is very well reproduced between the two datasets. For example, the average peak to peak seasonal cycle amplitude for Northern Tropical  It is not the purpose of this paper to diagnose or interpret detailed differences between the observations and model but it is useful to make a few observations relevant for use of the data within future studies. Figure 12 indicates a potential latitudedependent bias, which is most likely due to model deficiencies in simulating the stratosphere (especially at mid and high latitudes (Patra et al., 2011;Alexe et al., 2015;Saad et al., 2016;Wang et al., 2017)) or inadequate inter-hemispheric mixing, but could partly indicate also some latitudinal bias of the satellite retrievals. Accounting for such a latitudinal dependence 15 through the fitting of a second-order polynomial function (as in Bergamaschi et al. (2013); Turner et al. (2015)) may improve the baseline agreement between model and observation and is an approach that users may wish to explore depending upon their application. Furthermore, these model simulations are constrained by the NOAA background observations. Therefore differences between TM5-4DVAR and GOSAT may partly reflect deficiencies of bottom-up inventories used as prior, particularly over strong emission regions (e.g. obvious deficiencies in tropical Africa related to wetlands). When incorporating the GOSAT 20 data into such inversions, this leads to the production of significant increments in the inverted fluxes and better agreement between observation and simulation (as in Alexe et al. (2015) and other studies noted in Section 3.4).

Summary and Outlook
In this work we have presented the latest version of the University of Leicester GOSAT Proxy XCH 4 dataset. This dataset now contains over a decade of global CH 4 observations, sensitive to surface emissions and hence suited to estimating CH 4 fluxes. 25 The capability to estimate global and regional CH 4 emissions is vital to improving our understanding of the global methane budget and how this budget may respond and change with respect to a changing future climate.
We begin this work by highlighting the wide variety of studies that previous versions of this dataset have contributed towards, demonstrating the significant utility of this dataset for examining and understanding the global methane budget.
This work provides a thorough description of the data processing chain, explaining in detail how the data is generated and 30 how the high-quality of the dataset is ensured. Extensive validation of the data against the TCCON network is performed, validating not only the final Proxy XCH 4 data but also the separate components (the XCH 4 /XCO 2 ratio and the modelled XCO 2 ) that form the final data product.
We also provide global seasonal maps of the data that demonstrate the global distribution of the data as well as highlighting particular features and regions that may be of interest for more detailed study.
Finally, as the primary usage of the data is expected to be as input into a flux inversion data assimilation framework in conjunction with atmospheric chemistry transport models and observations from surface networks, it is useful to compare the consistency against existing model simulations. We compare zonally and regionally against TM5 simulations that have 5 assimilated observations from the NOAA surface network. We find generally a high level of consistency whilst identifying the additional utility that the satellite observations should introduce to the system.
Despite GOSAT-1 having a planned mission lifetime of 5 years, it continues to successfully perform measurements 11 years after launch. GOSAT-2 was launched in October 2018 (Suto et al., 2020) and will continue the legacy of the GOSAT-1 mission.
GOSAT-2 offers several opportunities for development related to the dataset we describe here. Primarily, it ensures that should 10 GOSAT-1 cease operation, the valuable decade-long timeseries of observations can continue to be extended via GOSAT-2.
With a significant overlap in time between the two missions, consistency between the two missions can be assured, albeit with significant future work/development.
In addition, GOSAT-2 has additional capabilities, namely the possibility of measuring carbon monoxide (CO). By measuring CO 2 , CH 4 and CO simultaneously from the same instrument, GOSAT-2 would allow the extension of studies examining 15 biomass burning combustion, leading to constraints on fire emission ratios as have been performed previously for GOSAT-1 (Ross et al., 2013;Parker et al., 2016).
A strong focus of future CH 4 -measuring satellites will be to examine anthropogenic emission sources at very high spatial resolution (e.g. PRISMA (Pignatti et al., 2013); HISUI (Matsunaga et al., 2017); ENMAP (Guanter et al., 2015)), particularly relating to monitoring of the oil and gas industry. However, many scientific challenges and questions remain regarding the 20 long-term CH 4 behaviour and the response to a changing climate. For this reason, a long-term, consistent climate-ready data record as we present here is of continued importance. We expect that this data will be valuable for numerous studies, from regional flux inversions to monitoring long-term trends. With now over a decade of global atmospheric XCH 4 observations, this dataset has helped, and will continue to help, us better understand the global methane budget and investigate how it may respond to a future changing climate.

Data availability
The University of Leicester GOSAT Proxy v9.0 XCH 4 data     Table B1 outlines the history and evolution of the University of Leicester GOSAT Proxy XCH 4 data product. Entries include the 5 version number, the project that the data was generated for, the version of the GOSAT L1B data used, the time period covered by the data, whether ocean sun-glint data was generated, comments relating to changes/updates from previous versions and peer-reviewed publications that we are aware of that used the data. For the ESA GHG-CCI project, we also indicate which versions were officially delivered as part of the Climate Research Data Packages through the project. All Copernicus C3S

Appendix B: Previous Data Versions
versions were delivered to the Copernicus Climate Data Store.

Appendix C: Data Contents and Usage Notes
This section provides information on the contents and usage of the netCDF data files that we provide containing the Proxy XCH 4 data. Whilst we recommend that anyone using the data should discuss their specific usage with the author, the following information is useful to note.
Our data is delivered as daily netCDF files, containing n individual GOSAT soundings. We provide everything in the data 15 files that we believe users would require to make use of our data, including our a priori information (ch4_profile_apriori) and averaging kernels (xch4_averaging_kernel) which are provided on m vertical levels (see Appendix D).
In general, users should only use data that passes our quality checks (i.e. xch4_quality_flag == 0). In some specific use cases, the data that has failed our checks may still be of use but additional care should be taken in using this data and we strongly recommend discussing such applications with us to determine if that is suitable for your use.   Table B1. Table showing the evolution of the University of Leicester GOSAT Proxy XCH4 data product. Entries include the version number, the project that the data was generated for, the version of the GOSAT L1B data used, the time period covered by the data, whether ocean sunglint data was generated, comments relating to changes/updates from previous versions and peer-reviewed publications that we are aware of that used the data. For the ESA GHG-CCI project, we also indicate which versions were officially delivered as part of the Climate Research Data Packages through the project. All Copernicus C3S versions were delivered to the Copernicus Climate Data Store.
We provide data from both observation modes (nadir land and ocean sun-glint) in the same file. While we do not believe that we have any bias between these different modes, for some use cases, users may wish to exclude either of these modes.
The variable named xch4 refers to the final Proxy XCH 4 as calculated using Equation 1. This is the main data product that we provide. In addition to this, we also provide the other components of Equation 1. raw_xch4 and raw_xco2 refer to 5 the directly retrieved XCH 4 and XCO 2 quantities. These variables should generally not be used but may be useful for certain applications. For example, some users may wish to use the XCH 4 /XCO 2 ratio (i.e. raw_xch4/raw_xco2) but replace the model XCO 2 that we use (model_xco2) with their own modelled XCO 2 which may be more appropriate to their particular application or more consistent with their own model transport.
Our retrievals are typically performed on 20 vertical levels, with the first (bottom) level being the surface pressure. However, 10 in a number of instances, especially over high terrain, where the apparent surface pressure from our O 2 -A band cloud screening is above the bottom two levels of our pressure profile, this can result in only 19 active retrieval levels. This data is still valid but variables with a vertical dimension (m) will contain a fill_value of -9999.99 at the first/lowest value. This value should be checked for and that particular profile should be considered to only have 19 levels, rather than the standard 20.
We identify individual GOSAT soundings by their exposure_id. This may be of use when attempting to match our data 15 to other GOSAT data products. This is a numerical identification that matches the GOSAT L1B file which the sounding was extracted from, appended with an additional 3 digits (0-indexed) to identify the number of the sounding within that L1B file.
Equation C1  In order to correctly compare any model simulation to the satellite observations, the model data must be transformed to be consistent with assumptions made within the retrieval. Ultimately, this requires the satellite averaging kernels to be applied to the model data with any influence from the a priori data taken into account. The theory and methodology to do this is described in detail in Rodgers (2000) and we only briefly outline the method below. Equation D1 is the equation which should 5 be applied to any CH 4 model data and details which variables provided in the data files are required to achieve this. It is assumed that any model data has already been interpolated to the same 20-level pressure grid (pressure_levels) as used in the retrieval. It should be noted here that this interpolation should be done with care to try and ensure that the model XCH 4 is conserved via the interpolation process. Once on the same vertical grid as the GOSAT a priori and averaging kernels, Equation   D1 can be applied to compute the modelled XCH 4 by using the pressure_weight, xch4_averaging_kernel and 10 ch4_profile_apriori variables provided in the file. It should be noted that we provide these values for each individual GOSAT sounding and that these are all level (i.e. layer boundary) quantities. This section provides details on how the GOSAT measurement strategy has evolved over time. GOSAT initially operated primarily on a regular 5-point grid, revisiting the same grid point. This changed over time to a 3-point grid in order to reduce pointing uncertainty at the extreme angles when using the primary pointing mechanism. However, one consequence of a regular grid is that this resulted in a limited number of observations over some regions, especially islands, where the regular grid point Figure E1. would fall over the ocean. Thanks to the switch to the secondary pointing mechanism, several changes to the sampling strategy were possible.
Firstly, GOSAT is now capable of targeting more specifically and "follows" coastlines in a more efficient manner. The example in Figure E1 shows the change in sampling location over Indonesia, contrasting 2011 to 2017. Although the exact same grid location is not revisited in the same way, overall there are both more successful measurements and a better geographic 5 coverage.
Secondly, GOSAT is now capable of a wider pointing range and subsequently, the latitudinal range of ocean sun-glint observations have been extended (as observed in Figure 10). Figure  While we do not believe that these changes are detrimental to the continued consistency of the timeseries of GOSAT observations, we do feel that it is worth noting as they may have an impact (positively or negatively) on specific applications that a user may wish to use the data for, hence the reason for highlighting them here. Figure E2. Figure contrasting the sampling density pattern over Australia for early in the mission versus recent years. The 5-point sampling grid as used in the early years of the missions was updated to a more stable 3-point grid which continues to be used. The change in pointing mechanism has allowed the ocean sun-glint measurement range to be extended. Figure E3.

Appendix F: TCCON Data
The section provides details and references for the TCCON data used in this study.