Global GOSAT, OCO-2 and OCO-3 Solar Induced Chlorophyll Fluorescence Datasets

. The retrieval of solar induced chlorophyll fluorescence (SIF) from space is a relatively new advance in Earth observation science, having only become feasible within the last decade. Interest in SIF data has grown exponentially, and the retrieval of SIF and the provision of SIF data products has become an important and formal component of spaceborne Earth observation missions. Here, we describe the global Level 2 SIF Lite data products for the Greenhouse Gases Observing Satellite (GOSAT), the Orbiting Carbon Observatory-2 (OCO-2), and OCO-3 platforms, which are provided for each platform in daily netCDF files. We also outline the methods used to retrieve SIF and estimate uncertainty, describe all the data fields, and provide users the background information necessary for the proper use and interpretation of the data, such as considerations of retrieval noise, sun-sensor geometry, the indirect relationship between SIF and photosynthesis, and differences among the three platforms and their respective data products. OCO-2 and OCO-3 have the highest spatial resolution spaceborne SIF retrievals to date, and the target and snapshot area mode observation modes of OCO-2 and OCO-3 are unique. These modes provide hundreds to thousands of SIF retrievals at biologically diverse global target sites during a single overpass, and provide an opportunity to better inform our understanding of canopy-scale vegetation SIF emission across biomes.


Introduction
Chlorophyll fluorescence is light that is emitted from chlorophyll after the absorption of photosynthetically active radiation (PAR), which covers the spectral range of roughly 400 to 700 nm and corresponds to the range of light visible to the human eye (Müller, 1874). The fluorescence emission occurs in the range of ~650 to 800 nm during the light reaction of photosynthesis, where energy absorbed by leaf pigments is converted into the chemical energy that is needed by the dark reactions for fixing atmospheric carbon dioxide into sugars. The absorption of a photon by chlorophyll excites an electron, and the excitation energy has three main pathways: photochemistry, non-photochemical quenching or heat, and chlorophyll fluorescence. Most of the excitation energy is used for photochemistry when vegetation is not stressed, but at all times only a small fraction (~0.5-2%) is emitted as chlorophyll fluorescence (Porcar-Castell et al., 2014;Maxwell and Johnson, 2000).
Chlorophyll fluorescence has been a research tool for studying photosynthesis for nearly 150 years (Müller, 1874), but only recently have spaceborne retrievals of solar induced chlorophyll fluorescence (SIF) been realized (Guanter et al., 2007;Joiner et al., 2011;Frankenberg et al., 2011b). The number of spaceborne platforms from which SIF can be retrieved continues to grow, and the SIF temporal record continues to lengthen. Spaceborne SIF data has generated much excitement in a plethora of fields within the biological, biogeochemical cycle, climate, and Earth system science communities. Chlorophyll fluorescence has long been a key component of the plant physiological and ecophysiological research communities (Maxwell and Johnson, 2000) and has traditionally been studied in vivo at the subcellular level and in situ using pulse amplitude-modulated (PAM) fluorometry (Schreiber et al., 1986).
Most recently, remote sensing techniques have enabled the canopy and ecosystem-level retrieval of SIF from towers, aircraft, and satellites. The evolution in our ability to retrieve SIF infrequently at the leaf-level to frequent canopy-level retrievals across regional to global scales continues to greatly advance our understanding of plant and ecosystem function and carbon cycling.
Here, we describe, compare, and discuss the Level 2 SIF Lite version 9 (v9) data produced from the Greenhouse Gases Observing Satellite (GOSAT; http://dx.doi.org/10.22002/D1.8771) and Level 2 SIF Lite version 10 (v10) data from the Orbiting Carbon Observatory-2 (OCO-2) and OCO-3 (OCO-2 Science Team et al., 2020;OCO-3 Science Team et al., 2020). Our data description goes beyond previous documentation and publications via our description of the SIF Lite files and our presentation and comparison of the SIF data from the three platforms. Also, our discussions on SIF are intended to help the data user community to access and apply the data for scientific research and prevent misinterpretation.

Satellite platforms
The retrieval of SIF from space requires high spectral resolution and a high signal-to-noise ratio (SNR) as solar Fraunhofer lines are very narrow and SIF is a relatively weak signal (Frankenberg et al., 2011b).
Coincidentally, the spaceborne spectrometers that have been used for retrieving Earth's atmospheric carbon dioxide and methane concentrations include spectral channels covering Fraunhofer lines in the vicinity of the oxygen A-band where atmospheric mass is retrieved with high spectral resolution (< 0.2 nm), enabling SIF retrievals with a mean single measurement precision around ~0.5 W/m 2 /sr/μm (as fully described in Section 4.2). Thus, the retrieval of SIF from space has been pioneered by the atmospheric science community (Guanter et al., 2007;Joiner et al., 2011;Frankenberg et al., 2011b), and spaceborne SIF retrievals and data products have historically been a by-product of missions that have aimed to monitor Earth's atmospheric trace gases.

OCO-2 and OCO-3
OCO-2 is a NASA satellite that was launched in July 2014, and OCO-3 is a duplicate of the OCO-2 grating spectrometer attached to the Japanese Experimental Module Exposed Facility (JEM-EF) on the International Space Station (ISS) in May 2019 (Eldering et al., 2019). Each platform houses a threechannel grating spectrometer with a spectral resolving power of λ∕Δλ >17,000 (Crisp et al., 2017;Eldering et al., 2019)centered around the following wavelengths: an oxygen-A band at 0.765 μm and carbon dioxide bands at 1.61 μm and 2.06 μm. The swath widths are ~10 km with eight measurements acrosstrack. The spatial resolution at nadir is slightly different for OCO-2 and OCO-3, about 1.3 km × 2.25 km and 1.6 km × 2.2 km (across × along track), respectively.
OCO-2 has a 98.8 minute orbit with a 1:36 PM nodal crossing time and a 16-day ground-track repeat cycle (Crisp et al., 2017). The ISS has a precessing low-inclination orbit that allows OCO-3 to view Earth at absolute latitudes less than ~52°. The ISS orbits the Earth ~15.5 times a day and data acquisition is halted during ISS maintenance and docking, thus overpass times, revisit periods, and data availability are relatively irregular. Validation of the OCO-2 SIF retrievals was conducted by Sun et al. (2017) by comparing OCO-2 SIF to coordinated airborne measurements using the Chlorophyll Fluorescence Imaging Spectrometer .

Observation Modes
GOSAT observation modes are described as Observation Mode 1 Sunshine (OB1D), Observation Mode 2 Sunshine (OB2D), and Specific Observation Mode Sunshine (SPOD). OB1D is the routine observation mode, whereas OB2D is a non-routine mode in which the thermal-infrared observation and pointing mechanism is stopped during low power supply. Over land, SPOD is a target observation mode designed to observe specific sites. The TANSO-FTS sensor has a setting for low, medium, and high gain. The medium gain data is recommended for scenes that are bright, such as deserts. Since the data used for SIF retrievals are filtered to exclude bright scenes due to deserts, ice, snow, and cloud cover, the high gain data is used for SIF retrievals.
Nadir, glint, target, and transition observation modes are common to each OCO platform. The OCO-2 target mode provides repeated spatial sampling of a given target, such as an emission source or tower site. Target mode data for OCO-2 is absent from the v10 SIF Lite files, but will be included in the v11 update.
The OCO-3 target mode is a sequence of adjacent and partially overlapping segments that allow for increased spatial sampling. The target modes for both platforms provide over 10 3 soundings. OCO-3 has an additional observation mode using its pointing mirror assembly (PMA), which allows for snapshot area mapping (SAM) of targets of interest. SAMs are a series of scans of a target that are nearly adjacent and can cover an area of ~80 km 2 in about 2 minutes. The SAMs and their target locations, which include volcanoes, various vegetation land cover types, and point sources of fossil fuel emissions, can be viewed at https://ocov3.jpl.nasa.gov/sams/index.php. Target and SAM mode scans are prioritized and scheduled several days in advance of an overpass of the ISS over the target (Taylor et al., 2020).
The target and SAM observation modes offer unique, spatially resolved acquisition of a target during a single overpass at different sun-sensor geometries as solar illumination is relatively fixed during overpasses and soundings are acquired over a range of viewing angles as the sensors pass over their targets. For SIF applications, these measurements can be averaged to obtain SIF estimates with a reduced standard error or binned by sun-sensor geometries to investigate the effect of observation geometry on the retrieved SIF values, as we demonstrate below.

SIF Lite file structure and content
Level 2 data is ungridded (vector) data that contains geophysical variables that are of interest and use to the broader scientific community and is at same spatial and temporal resolution of the Level 0 and Level 1 data., Level 0 data which are data obtained as-is from the sensor (Level 0) to which and ancillary information, such as radiometric and geometric calibration coefficients and georeferencing parameters, is appended to Level 0 data to form (Level 1 data.), such as radiometric and geometric calibration coefficients and georeferencing parameters. Level 3 products refer to gridded (raster) data, which can be found at https://climatesciences.jpl.nasa.gov/sif/download-data/level-3/.
The annual and monthly spatial distribution of the GOSAT and OCO Level 2 data for the globe and the continental United States are presented in Figures 1 and 2 for visualization. These data are produced by the OCO-2 and OCO-3 projects at the Jet Propulsion Laboratory , quality controlled by NASA's Making Earth System dData rRecords for Use in Research Environments (MEaSUREs) SIF team, and are publicly available on the NASA Goddard Earth Sciences Data and Information Services Center (GES-DISC) website (https://disc.gsfc.nasa.gov/). Recent efforts by the OCO and MEaSUREs team have focused on harmonizing the processing pipeline, attributes, and file structures of the GOSAT and OCO SIF products (Parazoo et al., 2019). Here, we present a first analysis of these harmonized products and demonstrate for the user community their key commonalities and differences.
The ungridded Level 2 SIF Lite data are provided in netCDF-4 format and contain information for each sounding from which a SIF retrieval was made. For each of the three satellite platforms, there is one file for each day in which there is at least one sounding and each file contains information for all soundings acquired on that day, including all measurement modes (glint, nadir, target). The SIF Lite files can be read by, but are not limited to, MATLAB, Python, R, and Julia using their respective netCDF4 or HDF5 libraries.
The filename convention is, using the filename "oco2_LtSIF_200201_20210129t071949z.nc4" as an example, platform (oco2), data product (LtSIF), date (YYMMDD), and file creation date (YYYYMMDD) and time (tHHMMSS). The SIF Lite netCDF global attributes, dimensions, variables, and variable groups are described below and listed in Table1.

Global attributes and dimensions
The global attributes provide file-level metadata information, the most important of which for data users are the citation, contact information, and the time range of the data in the file. The times listed in the global attributes can be used in instances where the file names may have been changed. A netCDF dimension is an integer that specifies the shape of the multi-dimensional variables, and these are also described in Table   1. For the OCO-2 and OCO-3 data, there are dimensions for the footprint vertices (vertex_dim) and acrosstrack footprint (footprint_dim), which are not applicable for GOSAT. The polarization dimension (polarization_dim) is used for GOSAT's P and S polarizations. The only variable dimension is the sounding_dim, which is the number of soundings in the file.

Variables
The primary variables of interest in the SIF Lite files are the SIF, Daily_SIF, and SIF_Uncertainty variables, which are available for SIF retrievals at 757 nm and 771 nm and estimated SIF at 740 nm. The variables for GOSAT differ from those of OCO-2 and OCO-3 in that GOSAT has two polarizations, P and S, and thus retrieval-related variables are provided as a two-dimensional (2D) array.

Variable groups
Most of the variables have been grouped, as listed in Table 1. The ungrouped, root-level variables are those that are most used and some of these variables are duplicated in the Geolocation and Science groups. The Cloud group contains cloud and surface albedo variables from the L2ABP product, which are used in the assignment of the quality flag. The Geolocation group contains variables related to the geolocation of the sounding footprint, sun-sensor geometry, altitude, and acquisition time. GOSAT sounding footprints are circular and have a radius of 5 km, in contrast to the OCO-2 and OCO-3 soundings, which are rhomboidal and are described with coordinates for each of their four vertices. Thus, the GOSAT SIF Lite files do not contain the footprint latitude and longitude vertices, whereas the OCO-2/3 SIF Lite files do.
The Metadata group houses variables with sounding-level metadata information, including build version of the data, unique orbit and sounding identifiers, and measurement mode.
The Meteo group contains meteorological forecast variables, which were obtained from the GEOS-5 FP-IT 3h forecast (Lucchesi, 2015) and are provided as-is without validation. The Offset group is a collection of variables of the bias/offset adjustments and statistics. These include mean, median, and standard deviations of the adjusted and unadjusted SIF values separated by cross-track footprint. These data are reported on a grid of signal level bins with a range of 3.0-229.0 W/m 2 /sr/μm and follows the SIF bias correction scheme outlined by Frankengberg et al. (2011b).

Quality flag criteria and rationale
The Quality_Flag variable indicates the quality of the data for each sounding as being best (0), good (1), or failed (2). We recommend using a combination of best and good for scientific analysis. The criteria for the best and good quality flags are listed in Table 2, and soundings that do not meet either set of criteria are flagged as failed. The rationale for the criterion is as follows: reduced chi-square (χ 2 ) thresholds exclude fits that do not well represent the spectrum; continuum level radiance excludes scenes with brightness that is too high or low; solar zenith angle ( ) excludes retrievals with extreme solar zenith angles, which are more likely affected by rotational Raman scattering; and the O 2 and CO 2 thresholds exclude most cloudy scenes.

SIF retrieval
The SIF values provided in the SIF Lite files are based on spectral fits covering Fraunhofer lines, as SIF reduces the fractional depth of the Fraunhofer lines (Plascyk, 1975). The SIF retrieval methodologies are fully explained by Frankenberg et al. (2011b, a) and SIF is retrieved using the identical method for GOSAT and the OCO platforms at 757 nm and 771 nm. In brief, the main retrieval quantity in the retrieval state vector is the fractional contribution of SIF to the continuum level radiance, or relative fluorescence (SIF_Relative_757nm and SIF_Relative_771nm). The absolute SIF values (SIF_757nm and SIF_771nm) are generated during post-processing in W/m 2 /sr/μm.
It is important to note that although the SIF values have traditionally been loosely labeled as being retrieved at 757 nm and 771 nm, the retrieval fit windows used to produce the SIF Lite data is centered at 758.7 and 770.1 for OCO-2 and OCO-3, and at 758 and 771 for GOSAT. However, we retain the 757 and 771 nomenclature to remain consistent with previous publications and to avoid confusion.We estimated SIF at 740 nm for each sounding using both retrieval windows as described in more detail below.

SIF retrieval uncertainty
The determination of single sounding retrieval uncertainty is covered in great detail by Sun et al. (2018) and , and is provided in the SIF Lite files as SIF_Uncertainty_740nm, SIF_Uncertainty_757nm, and SIF_Uncertainty_771nm. Briefly, these values are the 1-sigma (σ) estimated single sounding measurement precision and represent the random component of the retrieval errors. It is derived through standard least-square fitting by evaluating the error covariance matrix: where is the Jacobian matrix of the least-squares fit, and 0 is the measurement error covariance matrix , whichand characterizes the instrument noise per detector pixel.
For the OCO-2/3 data, the uncertainty for SIF757 usually ranges between 0.3 and 0.5 W/m2/sr/μm, or ~15-50% of the absolute SIF value. Uncertainties for SIF771 are slightly higher due to less fluorescence and a relatively less reduction in the fractional depth of the radiance at 771 nm. Uncertainty for SIF740 is calculated from using the general formula for error propagation and the partial derivatives for the uncertainties for SIF757 and SIF771:

SIF 740 nm and intersensor comparisons
The spectral window in which SIF retrievals are made depends on the wavelength bands of the platform.
Assuming the spectral shape of SIF is known and invariant, one can convert SIF to a standard reference wavelength. Here, we use 740 nm as a reference as it corresponds to the 2 nd SIF peak and is not as strongly affected by chlorophyll re-absorption as red SIF, thus showing a relatively stable shape at wavelengths above 740 nm Parazoo et al., 2019). The differences in the retrieval windows complicate the comparison of SIF retrievals from different sensors, thus it is useful to provide SIF at a welldefined reference wavelength.
Although the range of the wavelengths used to retrieve SIF from the various sensors is small (740-771 nm), absolute fluorescence can vary greatly depending on the spectral window used to retrieve SIF (Joiner et al., 2013;Köhler et al., 2018;Sun et al., 2018). However, reference far-red SIF emission spectra at the leaf level indicates that far-red fluorescence spectral shapes are consistent across species .
Thus, we provide an estimate of absolute SIF 740 (SIF_740nm) in the GOSAT and OCO-2/3 SIF Lite files derived from the empirical relationship between SIF at 740 nm and SIF at 758.7 nm and 770.1 nm (denoted as 757 nm and 771 nm; Eq. 1). The rationale for including SIF 740 in the SIF Lite files is to allow for more consistent and robust comparisons of SIF and SIF-based analyses across sensors (Parazoo et al., 2019), and to reduce the retrieval error by a factor of √2 (Sun et al., 2018). We stress that the reported SIF 740 values are not retrieved, but are estimated under the assumption that the spectral shape of SIF is invariant.
The ratios used in Eq. 1 were based on leaf level measurements conducted by , however we observed a median ratio of 1.45 from OCO-2 over vegetated areas for 2015-2019 ( Figure S1). The reason for this difference has not yet been discerned and requires further analysis, but the small potential bias introduced by the use of the empirical ratio does not infringe on the utility of the SIF 740 data.

Bias/offset correction
Biases in retrieved SIF can occur due to uncertainties in the exact instrument line-shape per footprint or slight uncertainties in detector linearity. To correct for biases, we use reference targets that are nonfluorescent surfaces barren of vegetation, similar to the method described by Frankenberg (2011b). In short, the background signal over reference targets is subtracted from all relative SIF values. We calculate the background signal for each day as mean SIF over all barren surfaces within a 31-day window centered on the current day for GOSAT and a three-day window for OCO-2/3. These windows were chosen to obtain a robust background signal given their respective spatial-temporal resolution. Here, we identify barren surfaces using a combination of the MODIS MCD12Q1 land cover data product (Friedl and Sulla-Menashe, 2019) and the Vegetation Photosynthesis Model (VPM) (Xiao et al., 2004;Zhang et al., 2017) from the year 2018. The native spatial resolution of these data sets is 500 m, but we aggregated the data to a global 0.20-degree grid so that the barren surface reference targets had a coarser resolution than the soundings.
We classified barren surfaces as those grid cells which were 100% barren and/or snow and ice by MCD12Q1 and had zero (0) annual gross primary production as estimated by VPM. We also excluded coastal grid cells that overlapped with water using a global coastline shapefile and a buffer.

Daily average SIF and the daily correction factor
We provide an estimate of daily average SIF (Daily_SIF), which is instantaneous SIF scaled entirely upon the geometry of incoming solar radiation over a day. Instantaneous SIF is the absolute value of SIF for any given sounding and is a strong function of the illumination of the canopy at that instant in time. The differences in the illumination geometry of soundings at different overpass times and latitudes complicate direct comparisons of SIF at different points of Earth's surface and comparisons of SIF to other data that are more temporally coarse, such as daily estimates of GPP.
Downwelling solar radiation scales linearly with ( ) under clear sky conditions when ignoring Rayleigh scattering and gas absorption. As described by Frankenberg et al. (2011b) and Köhler et al. (2018), a first order approximation of daily average SIF ( ) can be written as: where 0 is absolute instantaneous SIF, ( 0 ) is the solar zenith angle at the time of measurement 0 with a heaviside function H to zero out negative values of cos(θ), and the integral is computed numerically in 10-min time steps ( ). In terms of the SIF Lite file variable names, this equation can be written for SIF at any wavelength as _ = • _ _ .

Scaling of SIF to GPP
We should note that SIF is, to first order, only a proxy for the electron transfer rate in the light reaction of photosystem II. However, SIF is oblivious to the light-independent reactions that fix CO 2 . Nevertheless, many studies have reported on the linearity of SIF and GPP at bi-weekly or monthly timescales and at coarse spatial resolutions (Verma et al., 2017;Doughty et al., 2019;Yang et al., 2015). The seasonality of SIF and GPP tend to match well at such coarse temporal resolutions because both SIF and GPP are driven by changes in canopy structure, the amount chlorophyll in the canopy, and the amount of sunlight can sometimes simultaneously downregulate photosynthesis and SIF, as has been seen in evergreen needleleaf ecosystems, but not always .
At the leaf level, GPP saturates before SIF in response to APAR, such that we could see increased SIF without any response in GPP at high levels of APAR (Gu et al., 2019). Conversely, vegetation stress can cause a near or total cessation of GPP via stomatal closure with little or no change in SIF. This decoupling has been seen at the leaf scale during forced stomatal closure of deciduous tree species (Marrs et al., 2020) and a one-month drought experiment with Eastern cottonwood (Populus deltoides) (Helm et al., 2020).
However, these studies and others of deciduous vegetation and croplands have repeatedly found a better correlation between SIF and APAR than SIF and GPP Miao et al., 2018). For SIF to be a reliable proxy of APAR, SIF yield (ratio of SIF to APAR) would need to remain constant.

Negative SIF values
Data users are likely to find negative SIF values, which are due to retrieval noise, but these values should generally not be discarded. The one -sigma uncertainty in retrieved SIF values (SIF_Uncertainty) can be substantial, but negative values are plausible in a retrieval sense although not in physical terms (actual SIF emission cannot be negative). Discarding negative values will introduce a high bias when averaging. Nevertheless, extremely negative values may indicate a problem with the retrieval. We recommend the following guidelines for filter ing negative SIF values: accept if SIF + 2 σ uncertainty ≥ 0; questionable if SIF + 2 -σ uncertainty < 0 and SIF + 3-σ uncertainty ≥ 0; and reject if SIF + 3-σ uncertainty < 0. T hese thresholds have not been incorporated into the Q uality_Flag variable of the SIF Lite data.

Sun-sensor geometry
Users of SIF data from any source should be aware that sun-sensor geometry plays a role in the absolute values of SIF, in addition to vegetation canopy characteristics (Joiner et al., 2020;Köhler et al., 2018).
Absolute SIF values increase rapidly when the phase angle approaches 0° (when the sun and sensor are aligned), but the effect of sun-sensor geometry has been shown to be small when the phase angle is greater than 20° Doughty et al., 2019). Thus, retrieved SIF values from target or SAM mode scans during a single overpass can vary greatly despite homogeneous vegetation cover due to changing sunsensor geometries during data acquisition. Figure 3 illustrates the phase angle and SIF 757 for a SAM acquired over the Amazon rainforest, where the vegetation canopy is very homogenous. The figure also illustrates how the phase angle changes during an OCO-3 SAM scan and that the sun-sensor geometries for each individual swath are rather distinct from each other (Figure 3a). Mean SIF for each swath is also distinctively different (Figure 3b), despite that the canopy was experiencing the same illumination geometry and environmental conditions during the two minutes in which this SAM was acquired. The effect of sunsensor geometry is also illustrated in Figure 4, which shows the relationship between SIF for individual OCO-2 soundings and phase angle for two target scans in the Amazon. A distinctive change in the absolute values of retrieved SIF were observed due to sun-sensor geometry.

Averaging over space and time to reduce retrieval uncertainty
There are two main challenges to working with all spaceborne SIF data: 1) the inherently large uncertainties for individual soundings due to retrieval noise, and 2) the effect of differences in sun-sensor geometry on retrieved SIF values. Thus, we advise against using single soundings for analysis. However, averaging soundings across space and time can reduce the retrieval noise by a factor of 1/√ , with being the number of soundings comprising the average . For platforms with a wide swath, like the TROPOspheric Monitoring Instrument (TROPOMI), the effect of sun-sensor geometry can be accounted for by averaging soundings for a point of interest over the entire repeat cycle (16-days for TROPOMI) as demonstrated by Doughty et al. (2019Doughty et al. ( , 2021. In the case of OCO-2/3, as we demonstrate in Figure 3 and in Braghiere et al. (2021), soundings can be grouped by phase angle and then averaged to reduce retrieval uncertainty. Thus, retrieval uncertainty and sun-sensor geometry effects can be substantially minimized.
For GOSAT, we recommend averaging SIF retrieved from both the P and S polarizations, as demonstrated in Figure 5.
Users should also keep in mind that when conducting analyses at large spatial scales, gridding the data prior to analysis is largely unnecessary as the ungridded Level 2 data can be used directly (Doughty et al., 2019).
Doing so will allow the users to retain sounding-level information that may aid in the interpretation of the results, which would otherwise be lost when merely gridding the SIF values. For instance, as demonstrated by Doughty et al. (2019), ungridded Level 2 SIF data was used to calculate mean SIF for the entire Amazon Basin at different phase angles to show that the seasonality of SIF in the Amazon Basin was consistent across sun-sensor geometries. Such an analysis would not have been possible with gridded data because after gridding it is impossible to group the data by sounding-level attributes, such as phase angle or cloud fraction.

The use of SIF at 740, 757, and 771 nm
It is important to note that in areas where the SIF signal is near zero, the use of SIF at 757 nm would be more appropriate as the SIF signal is stronger at this wavelength. In areas where vegetation is sparse or SIF yield is low due to vegetation responses to environmental conditions or canopy leaf physiology, SIF at 771 nm could be within the noise range due to its relatively far distance from the far-red peak at 740 nm.
In these cases, we advise the use of SIF at 757 nm. Since SIF at 771 nm is used to compute SIF at 740 nm in the SIF Lite files, diligence should likewise be used when using SIF at 740 nm in analyses.
Here, we present the first comparisons between GOSAT and OCO-2 Level 2 data. Currently, there are not enough coincident soundings for GOSAT and OCO-3 to provide a robust analysis but given that OCO-2 and OCO-3 compare very well, we expect a comparison between GOSAT and OCO-3 to mimic the findings from our GOSAT and OCO-2 comparison.
Although the data records for GOSAT and OCO-2 overlap six years, only a small percentage of soundings flagged as best quality and cloud free from GOSAT and OCO-2 overlap on the same day ( Figure 5a).
Despite this filter, the mean SIF values may differ widely on the same day due to differences in overpass time (and thus solar illumination angle and environmental conditions), viewing geometry, and the number of OCO-2 soundings comprising the mean. We progressively filtered the data as illustrated in Figure 5 to ensure the soundings were of a vegetated land surface, had similar sun-sensor geometries, environmental, and atmospheric conditions, and that the temperature was high enough for photosynthesis to occur as indicated by the temperature_skin variable in the SIF Lite data.
We found that the correlation and slope improved with more conservative filtering of the data, and that the comparison between GOSAT SIF and OCO-2 SIF were reasonable. However, it is important to note that any comparison between GOSAT and OCO data will inevitably be affected by spatial sampling bias, as the swath width for both OCO platforms is smaller than the diameter of the GOSAT sounding footprints (Figure 6; left footprints). Also, it could be the case that only a small portion of the GOSAT footprint is sampled by OCO ( Figure 6; right footprints). Our filter of ≥ 10 OCO-2 soundings within a GOSAT footprint aimed to reduce this potential sampling bias in addition to reducing the uncertainty of the OCO-2 SIF retrievals. It must also be remembered that in this comparison, we do not have the luxury to average several GOSAT soundings to reduce the uncertainty as we did with OCO-2, so the uncertainties of the GOSAT SIF is much higher than that for OCO-2.
Upon a more detailed comparison of GOSAT and OCO-2 SIF at 740 nm, 757 nm, and 771 nm using the strictest filter we applied in Figure 5f, we found SIF 740 from the two platforms to have higher correlations than for SIF 757 and SIF 771 alone (Figure 7). We also noticed that GOSAT and OCO-2 soundings most frequently overlap in the boreal winter, which corresponds to a period of little or no photosynthesis at mid and high latitudes ( Figures S2 and S3). Thus, the direct comparison of GOSAT and OCO-2 SIF is severely restricted by the relatively infrequent overlap of the two platforms during the growing season.
In addition to the sounding level comparisons, we found mean annual SIF 757 for GOSAT and OCO-2 to compare reasonably well at the global scale during the boreal summer ( Figure 8). The relatively large differences in SIF illustrated at the gridcell level in Figure 8c are due to differences in the spatial and temporal sampling of the two platforms. We presented the comparison here at 4.0-degree spatial resolution to improve the sampling by GOSAT (Fig. 1a).

Collocating Soundings with their Targets
Currently, the target and SAM soundings are not collated to the target to which they correspond, but variables will be added to upcoming versions (e.g., v11) of the SIF Lite files that will collocate the target and SAM soundings with their intended target site. For OCO-3, some of the target sites are in close proximity to each other and thus a target site may fall within the scan of another target. For these sites, users may also want to check scans that were intended for target sites adjacent to their target of interest. The OCO-3 targets, the dates of their scans, and scan maps are available at https://ocov3.jpl.nasa.gov/sams/index.php. A list of target locations for OCO-2 and OCO-3 are available in Table S1 and Table S2, respectively.

Conclusions
Here, we have presented and described the Level 2 SIF Lite files for GOSAT, OCO-2, and OCO-3, which have been standardized in the same netCDF format to maximize their interoperability and accessibility by the data user community and allow for intersensor comparisons.Users of remote sensing data are more accustomed to using Level 3 gridded data for analyses, but we incentivize data users to also exploit the Level 2 data we have presented in the SIF Lite files. The OCO-2 and OCO-3 platforms provide the highest spatial resolution spaceborne SIF data, and the target and SAM observation modes are unique to these platforms. The observation scheme for the OCO platforms allow for time series to be constructed for the target locations, and the repeated target and SAM scans allow for the investigation of the directionality and escape of SIF at varying sun-sensor geometries across many biomes in different seasons.
We have demonstrated how users can break target and SAM observations into phase angles for analysis and have described how the effect of sun-sensor geometry and retrieval noise can be mitigated through the averaging of the data. The OCO platforms also provide a rich resource for the validation of radiative transfer models, which is currently underutilized. Upcoming spaceborne platforms with frequent revisits and/or high spatial resolution, such as the European Space Agency's FLuorescence EXplorer (FLEX) and NASA's Geostationary Carbon Cycle Observatory (GeoCarb), are expected to further our understanding of changes in vegetation structure and function (Drusch et al., 2016;Polonsky et al., 2014;Moore et al., 2018).

Author contributions
RD and CF conceived this manuscript. TK prepared and provided the data and RD performed the analysis. RD prepared the manuscript with contributions from all co-authors.

Competing interests
The authors declare that they have no conflict of interest.

Acknowledgements
We thank Lan Dang for helping to process the GOSAT data, Annmarie Eldering for helping coordinate the publication of the SIF Lite files at the GES-DISC, and Yi Yin for publishing the GOSAT data on the Caltech data repository. Part of this research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration. Reference herein to any specific commercial product, process or service by trade name, trademark, manufacturer or otherwise does not constitute or imply its endorsement by the United States Government or the Jet Propulsion Laboratory, California Institute of Technology.         Figure 5f, which were data that had the most conservative filter: best quality and cloud free, vegetation, co-occurring within 1 hour, viewing zenith angle < 5°, number of OCO-2 soundings within a GOSAT footprint ≥ 10, and skin temperature ≥ 5 °C.   Table 2. Criterion of quality flags best and good for the Level 2 GOSAT, OCO-2, and OCO-3 data.

Financial support
Soundings that do not meet either set of criteria are flagged as failed (2).