Global GOSAT, OCO-2 and OCO-3 Solar Induced Chlorophyll 1 Fluorescence Datasets 2

. The retrieval of solar induced chlorophyll fluorescence (SIF) from space is a relatively new 10 advance in Earth observation science, having only become feasible within the last decade. Interest in SIF 11 data has grown exponentially, and the retrieval of SIF and the provision of SIF data products has become 12 an important and formal component of spaceborne Earth observation missions. Here, we describe the global 13 Level 2 SIF Lite data products for the Greenhouse Gases Observing Satellite (GOSAT), the Orbiting 14 Carbon Observatory-2 (OCO-2), and OCO-3 platforms, which are provided for each platform in daily 15 netCDF files. We also outline the methods used to retrieve SIF and estimate uncertainty, describe all the 16 data fields, and provide users the background information necessary for the proper use and interpretation 17 of the data, such as considerations of retrieval noise, sun-sensor geometry, the indirect relationship between 18 SIF and photosynthesis, and differences among the three platforms and their respective data products. OCO- 19 2 and OCO-3 have the highest spatial resolution spaceborne SIF retrievals to date, and the target and 20 snapshot area mode observation modes of OCO-2 and OCO-3 are unique. These modes provide hundreds 21 to thousands of SIF retrievals at biologically diverse global target sites during a single overpass, and provide 22 emission across biomes.


Introduction 24
Chlorophyll fluorescence is light that is emitted from chlorophyll after the absorption of photosynthetically 25 active radiation (PAR), which covers the spectral range of roughly 400 to 700 nm and corresponds to the 26 range of light visible to the human eye (Müller, 1874). The fluorescence emission occurs in the range of 27 ~650 to 800 nm during the light reaction of photosynthesis, where energy absorbed by leaf pigments is 28 converted into the chemical energy that is needed by the dark reactions for fixing atmospheric carbon 29 dioxide into sugars. The absorption of a photon by chlorophyll excites an electron, and the excitation energy 30 has three main pathways: photochemistry, non-photochemical quenching or heat, and chlorophyll 31 fluorescence. Most of the excitation energy is used for photochemistry when vegetation is not stressed, but 32 at all times only a small fraction (~0.5-2%) is emitted as chlorophyll fluorescence (Porcar-Castell et al., 33 2014; Maxwell and Johnson, 2000). 34 35 Chlorophyll fluorescence has been a research tool for studying photosynthesis for nearly 150 years (Müller,  platforms from which SIF can be retrieved continues to grow, and the SIF temporal record continues to 39 lengthen. Spaceborne SIF data has generated much excitement in a plethora of fields within the biological, 40 biogeochemical cycle, climate, and Earth system science communities. Chlorophyll fluorescence has long 41 been a key component of the plant physiological and ecophysiological research communities (Maxwell and 42 Johnson, 2000) and has traditionally been studied in vivo at the subcellular level and in situ using pulse 43 amplitude-modulated (PAM) fluorometry (Schreiber et al., 1986). 44

45
Most recently, remote sensing techniques have enabled the canopy and ecosystem-level retrieval of SIF 46 from towers, aircraft, and satellites. The evolution in our ability to retrieve SIF infrequently at the leaf-level 47 to frequent canopy-level retrievals across regional to global scales continues to greatly advance our 48 understanding of plant and ecosystem function and carbon cycling. Sunshine (OB2D), and Specific Observation Mode Sunshine (SPOD). OB1D is the routine observation 97 mode, whereas OB2D is a non-routine mode in which the thermal-infrared observation and pointing 98 mechanism is stopped during low power supply. Over land, SPOD is a target observation mode designed 99 to observe specific sites. The TANSO-FTS sensor has a setting for low, medium, and high gain. The 100 medium gain data is recommended for scenes that are bright, such as deserts. Since the data used for SIF 101 retrievals are filtered to exclude bright scenes due to deserts, ice, snow, and cloud cover, the high gain data 102 is used for SIF retrievals. 103 104 Nadir, glint, target, and transition observation modes are common to each OCO platform. The OCO-2 target 105 mode provides repeated spatial sampling of a given target, such as an emission source or tower site. Target 106 mode data for OCO-2 is absent from the v10 SIF Lite files, but will be included in the v11 update. 107 108 The OCO-3 target mode is a sequence of adjacent and partially overlapping segments that allow for 109 increased spatial sampling. The target modes for both platforms provide over 10 3 soundings. OCO-3 has an 110 additional observation mode using its pointing mirror assembly (PMA), which allows for snapshot area 111 mapping (SAM) of targets of interest. SAMs are a series of scans of a target that are nearly adjacent and 112 can cover an area of ~80 km2 in about 2 minutes. The SAMs and their target locations, which include 113 volcanoes, various vegetation land cover types, and point sources of fossil fuel emissions, can be viewed at 114 https://ocov3.jpl.nasa.gov/sams/index.php. Target and SAM mode scans are prioritized and scheduled 115 several days in advance of an overpass of the ISS over the target (Taylor et al., 2020). 116

117
The target and SAM observation modes offer unique, spatially resolved acquisition of a target during a 118 single overpass at different sun-sensor geometries as solar illumination is relatively fixed during overpasses 119 and soundings are acquired over a range of viewing angles as the sensors pass over their targets. For SIF 120 applications, these measurements can be averaged to obtain SIF estimates with a reduced standard error or 121 binned by sun-sensor geometries to investigate the effect of observation geometry on the retrieved SIF 122 values, as we demonstrate below. 123 3 Data description 124

SIF Lite file structure and content 125
Level 2 data is ungridded (vector) data that contains geophysical variables that are of interest and use to the 126 broader scientific community and is at same spatial and temporal resolution of the Level 0 and Level 1 127 data., Level 0 data which are data obtained as-is from the sensor (Level 0) to which and ancillary 128 information, such as radiometric and geometric calibration coefficients and georeferencing parameters, is 129 appended to Level 0 data to form (Level 1 data.), such as radiometric and geometric calibration coefficients 130 and georeferencing parameters. Level 3 products refer to gridded (raster) data, which can be found at 131 https://climatesciences.jpl.nasa.gov/sif/download-data/level-3/. 132

133
The annual and monthly spatial distribution of the GOSAT and OCO Level 2 data for the globe and the 134 continental United States are presented in Figures 1 and 2  have focused on harmonizing the processing pipeline, attributes, and file structures of the GOSAT and OCO 140 SIF products (Parazoo et al., 2019). Here, we present a first analysis of these harmonized products and 141 demonstrate for the user community their key commonalities and differences. 142

143
The ungridded Level 2 SIF Lite data are provided in netCDF-4 format and contain information for each 144 sounding from which a SIF retrieval was made. For each of the three satellite platforms, there is one file 145 for each day in which there is at least one sounding and each file contains information for all soundings 146 acquired on that day, including all measurement modes (glint, nadir, target). The SIF Lite files can be read 147 by, but are not limited to, MATLAB, Python, R, and Julia using their respective netCDF4 or HDF5 libraries. 148 The filename convention is, using the filename "oco2_LtSIF_200201_20210129t071949z.nc4" as an 149 example, platform (oco2), data product (LtSIF), date (YYMMDD), and file creation date (YYYYMMDD) 150 and time (tHHMMSS). The SIF Lite netCDF global attributes, dimensions, variables, and variable groups 151 are described below and listed in Table1. 152

Global attributes and dimensions 153
The global attributes provide file-level metadata information, the most important of which for data users 154 are the citation, contact information, and the time range of the data in the file. The times listed in the global 155 attributes can be used in instances where the file names may have been changed. A netCDF dimension is 156 an integer that specifies the shape of the multi-dimensional variables, and these are also described in Table  157 1. For the OCO-2 and OCO-3 data, there are dimensions for the footprint vertices (vertex_dim) and across-158 track footprint (footprint_dim), which are not applicable for GOSAT. The polarization dimension 159 (polarization_dim) is used for GOSAT's P and S polarizations. The only variable dimension is the 160 sounding_dim, which is the number of soundings in the file. 161

Variables 162
The primary variables of interest in the SIF Lite files are the SIF, Daily_SIF, and SIF_Uncertainty variables, 163 which are available for SIF retrievals at 757 nm and 771 nm and estimated SIF at 740 nm. The variables 164 for GOSAT differ from those of OCO-2 and OCO-3 in that GOSAT has two polarizations, P and S, and 165 thus retrieval-related variables are provided as a two-dimensional (2D) array. 166

Variable groups 167
Most of the variables have been grouped, as listed in Table 1

Quality flag criteria and rationale 186
The Quality_Flag variable indicates the quality of the data for each sounding as being best (0), good (1), or 187 failed (2). We recommend using a combination of best and good for scientific analysis. The criteria for the 188 best and good quality flags are listed in Table 2, and soundings that do not meet either set of criteria are 189 flagged as failed. The rationale for the criterion is as follows: reduced chi-square (χ 2 ) thresholds exclude 190 fits that do not well represent the spectrum; continuum level radiance excludes scenes with brightness that 191 is too high or low; solar zenith angle ( ) excludes retrievals with extreme solar zenith angles, which are It is important to note that although the SIF values have traditionally been loosely labeled as being retrieved 205 at 757 nm and 771 nm, the retrieval fit windows used to produce the SIF Lite data is centered at 758.7 and 206 770.1 for OCO-2 and OCO-3, and at 758 and 771 for GOSAT. However, we retain the 757 and 771 207 nomenclature to remain consistent with previous publications and to avoid confusion.We estimated SIF at 208 740 nm for each sounding using both retrieval windows as described in more detail below. 209

SIF retrieval uncertainty 210
The determination of single sounding retrieval uncertainty is covered in great detail by Sun et al.
where is the Jacobian matrix of the least-squares fit, and 0 is the measurement error covariance matrix 219 and characterizes the instrument noise per detector pixel. 220

221
For the OCO-2/3 data, the uncertainty for SIF757 usually ranges between 0.3 and 0.5 W/m2/sr/μm, or ~15-222 50% of the absolute SIF value. Uncertainties for SIF771 are slightly higher due to less fluorescence and a 223 relatively less reduction in the fractional depth of the radiance at 771 nm. Uncertainty for SIF740 is 224 calculated from using the general formula for error propagation and the partial derivatives for the 225 uncertainties for SIF757 and SIF771: 226

SIF 740 nm and intersensor comparisons 230
The spectral window in which SIF retrievals are made depends on the wavelength bands of the platform. 231 Assuming the spectral shape of SIF is known and invariant, one can convert SIF to a standard reference 232 wavelength. Here, we use 740 nm as a reference as it corresponds to the 2 nd SIF peak and is not as strongly 233 affected by chlorophyll re-absorption as red SIF, thus showing a relatively stable shape at wavelengths we observed a median ratio of 1.45 from OCO-2 over vegetated areas for 2015-2019 ( Figure S1). The 253 reason for this difference has not yet been discerned and requires further analysis, but the small potential 254 bias introduced by the use of the empirical ratio does not infringe on the utility of the SIF 740 data. 255

Bias/offset correction 256
Biases in retrieved SIF can occur due to uncertainties in the exact instrument line-shape per footprint or 257 slight uncertainties in detector linearity. To correct for biases, we use reference targets that are non-258 fluorescent surfaces barren of vegetation, similar to the method described by Frankenberg (2011b). In short, 259 the background signal over reference targets is subtracted from all relative SIF values. We calculate the 260 background signal for each day as mean SIF over all barren surfaces within a 31-day window centered on 261 the current day for GOSAT and a three-day window for OCO-2/3. These windows were chosen to obtain a 262 robust background signal given their respective spatial-temporal resolution. Here, we identify barren year 2018. The native spatial resolution of these data sets is 500 m, but we aggregated the data to a global 266 0.20-degree grid so that the barren surface reference targets had a coarser resolution than the soundings. 267 We classified barren surfaces as those grid cells which were 100% barren and/or snow and ice by 268 MCD12Q1 and had zero (0) annual gross primary production as estimated by VPM. We also excluded 269 coastal grid cells that overlapped with water using a global coastline shapefile and a buffer.

Scaling of SIF to GPP 288
We should note that SIF is, to first order, only a proxy for the electron transfer rate in the light reaction of 289 photosystem II. However, SIF is oblivious to the light-independent reactions that fix CO 2 . Nevertheless, 290 many studies have reported on the linearity of SIF and GPP at bi-weekly or monthly timescales and at

Negative SIF values 314
Data users are likely to find negative SIF values, which are due to retrieval noise, but these values should 315 generally not be discarded. The one-sigma uncertainty in retrieved SIF values (SIF_Uncertainty) can be 316 substantial, but negative values are plausible in a retrieval sense although not in physical terms (actual SIF 317 emission cannot be negative). Discarding negative values will introduce a high bias when averaging. 318 Nevertheless, extremely negative values may indicate a problem with the retrieval. We recommend the 319 following guidelines for filtering negative SIF values: accept if SIF + 2-σ uncertainty ≥ 0; questionable if 320 SIF + 2-σ uncertainty < 0 and SIF + 3-σ uncertainty ≥ 0; and reject if SIF + 3-σ uncertainty < 0. These 321 thresholds have not been incorporated into the Quality_Flag variable of the SIF Lite data. 322

Sun-sensor geometry 323
Users of SIF data from any source should be aware that sun-sensor geometry plays a role in the absolute 324 illustrates how the phase angle changes during an OCO-3 SAM scan and that the sun-sensor geometries for 332 each individual swath are rather distinct from each other (Figure 3a). Mean SIF for each swath is also 333 distinctively different (Figure 3b), despite that the canopy was experiencing the same illumination geometry 334 and environmental conditions during the two minutes in which this SAM was acquired. The effect of sun-335 sensor geometry is also illustrated in Figure 4, which shows the relationship between SIF for individual 336 OCO-2 soundings and phase angle for two target scans in the Amazon. A distinctive change in the absolute 337 values of retrieved SIF were observed due to sun-sensor geometry. 338

Averaging over space and time to reduce retrieval uncertainty 339
There are two main challenges to working with all spaceborne SIF data: 1) the inherently large uncertainties 340 for individual soundings due to retrieval noise, and 2) the effect of differences in sun-sensor geometry on 341 retrieved SIF values. Thus, we advise against using single soundings for analysis. However Basin at different phase angles to show that the seasonality of SIF in the Amazon Basin was consistent 358 across sun-sensor geometries. Such an analysis would not have been possible with gridded data because 359 after gridding it is impossible to group the data by sounding-level attributes, such as phase angle or cloud 360 fraction. 361

The use of SIF at 740, 757, and 771 nm 362
It is important to note that in areas where the SIF signal is near zero, the use of SIF at 757 nm would be 363 more appropriate as the SIF signal is stronger at this wavelength. In areas where vegetation is sparse or 364 SIFyield is low due to vegetation responses to environmental conditions or canopy leaf physiology, SIF at 365 771 nm could be within the noise range due to its relatively far distance from the far-red peak at 740 nm. 366 In these cases, we advise the use of SIF at 757 nm. Since SIF at 771 nm is used to compute SIF at 740 nm 367 in the SIF Lite files, diligence should likewise be used when using SIF at 740 nm in analyses. 368

Comparison of GOSAT, OCO-2, and OCO-3 369
OCO-3 SIF has been shown to have a very high correlation (r > 0.9) with OCO-2 (Taylor et al., 2020). 370 Here, we present the first comparisons between GOSAT and OCO-2 Level 2 data. Currently, there are not 371 enough coincident soundings for GOSAT and OCO-3 to provide a robust analysis but given that OCO-2 372 and OCO-3 compare very well, we expect a comparison between GOSAT and OCO-3 to mimic the findings 373 from our GOSAT and OCO-2 comparison. 374

375
Although the data records for GOSAT and OCO-2 overlap six years, only a small percentage of soundings 376 flagged as best quality and cloud free from GOSAT and OCO-2 overlap on the same day (Figure 5a). 377 Despite this filter, the mean SIF values may differ widely on the same day due to differences in overpass 378 time (and thus solar illumination angle and environmental conditions), viewing geometry, and the number 379 of OCO-2 soundings comprising the mean. We progressively filtered the data as illustrated in Figure 5 to 380 ensure the soundings were of a vegetated land surface, had similar sun-sensor geometries, environmental, 381 and atmospheric conditions, and that the temperature was high enough for photosynthesis to occur as 382 indicated by the temperature_skin variable in the SIF Lite data. 383

384
We found that the correlation and slope improved with more conservative filtering of the data, and that the 385 comparison between GOSAT SIF and OCO-2 SIF were reasonable. However, it is important to note that 386 any comparison between GOSAT and OCO data will inevitably be affected by spatial sampling bias, as the 387 swath width for both OCO platforms is smaller than the diameter of the GOSAT sounding footprints ( Figure  388 6; left footprints). Also, it could be the case that only a small portion of the GOSAT footprint is sampled 389 by OCO (Figure 6; right footprints). Our filter of ≥ 10 OCO-2 soundings within a GOSAT footprint aimed 390 to reduce this potential sampling bias in addition to reducing the uncertainty of the OCO-2 SIF retrievals. 391 It must also be remembered that in this comparison, we do not have the luxury to average several GOSAT 392 soundings to reduce the uncertainty as we did with OCO-2, so the uncertainties of the GOSAT SIF is much 393 higher than that for OCO-2. 394 395 Upon a more detailed comparison of GOSAT and OCO-2 SIF at 740 nm, 757 nm, and 771 nm using the 396 strictest filter we applied in Figure 5f, we found SIF740 from the two platforms to have higher correlations 397 than for SIF 757 and SIF 771 alone (Figure 7). We also noticed that GOSAT and OCO-2 soundings most 398 frequently overlap in the boreal winter, which corresponds to a period of little or no photosynthesis at mid 399 and high latitudes (Figures S2 and S3). Thus, the direct comparison of GOSAT and OCO-2 SIF is severely 400 restricted by the relatively infrequent overlap of the two platforms during the growing season. 401 402 In addition to the sounding level comparisons, we found mean annual SIF 757 for GOSAT and OCO-2 to 403 compare reasonably well at the global scale during the boreal summer (Figure 8). The relatively large 404 differences in SIF illustrated at the gridcell level in Figure 8c are due to differences in the spatial and 405 temporal sampling of the two platforms. We presented the comparison here at 4.0-degree spatial 406 resolution to improve the sampling by GOSAT (Fig. 1a). 407 408

Collocating Soundings with their Targets 409
Currently, the target and SAM soundings are not collated to the target to which they correspond, but 410 variables will be added to upcoming versions (e.g., v11) of the SIF Lite files that will collocate the target  416   Table S1 and Table S2, respectively. 417

Conclusions 418
Here, we have presented and described the Level 2 SIF Lite files for GOSAT, OCO-2, and OCO-3, which 419 have been standardized in the same netCDF format to maximize their interoperability and accessibility by 420 the data user community and allow for intersensor comparisons. Users of remote sensing data are more 421 accustomed to using Level 3 gridded data for analyses, but we incentivize data users to also exploit the 422 Level 2 data we have presented in the SIF Lite files. The OCO-2 and OCO-3 platforms provide the highest 423 spatial resolution spaceborne SIF data, and the target and SAM observation modes are unique to these 424 presented in main text Figure 5f, which were data that had the most conservative filter: best quality and 511 cloud free, vegetation, co-occurring within 1 hour, viewing zenith angle < 5°, number of OCO-2 512 soundings within a GOSAT footprint ≥ 10, and skin temperature ≥ 5 °C. Timestamp (seconds since 1 January 1993)

BuildID
The ID of the Build, including the software version that created this product

CollectionLabel
The Collection Label of the Build, including the software version that created this product