Articles | Volume 13, issue 2
https://doi.org/10.5194/essd-13-281-2021
https://doi.org/10.5194/essd-13-281-2021
Data description paper
 | 
09 Feb 2021
Data description paper |  | 09 Feb 2021

A daily, 250 m and real-time gross primary productivity product (2000–present) covering the contiguous United States

Chongya Jiang, Kaiyu Guan, Genghong Wu, Bin Peng, and Sheng Wang
Abstract

Gross primary productivity (GPP) quantifies the amount of carbon dioxide (CO2) fixed by plants through photosynthesis. Although as a key quantity of terrestrial ecosystems, there is a lack of high-spatial-and-temporal-resolution, real-time and observation-based GPP products. To address this critical gap, here we leverage a state-of-the-art vegetation index, near-infrared reflectance of vegetation (NIRV), along with accurate photosynthetically active radiation (PAR), to produce a SatelLite Only Photosynthesis Estimation (SLOPE) GPP product for the contiguous United States (CONUS). Compared to existing GPP products, the proposed SLOPE product is advanced in its spatial resolution (250 m versus >500 m), temporal resolution (daily versus 8 d), instantaneity (latency of 1 d versus >2 weeks) and quantitative uncertainty (on a per-pixel and daily basis versus no uncertainty information available). These characteristics are achieved because of several technical innovations employed in this study: (1) SLOPE couples machine learning models with MODIS atmosphere and land products to accurately estimate PAR. (2) SLOPE couples highly efficient and pragmatic gap-filling and filtering algorithms with surface reflectance acquired by both Terra and Aqua MODIS satellites to derive a soil-adjusted NIRV (SANIRV) dataset. (3) SLOPE couples a temporal pattern recognition approach with a long-term Cropland Data Layer (CDL) product to predict dynamic C4 crop fraction. Through developing a parsimonious model with only two slope parameters, the proposed SLOPE product explains 85 % of the spatial and temporal variations in GPP acquired from 49 AmeriFlux eddy-covariance sites (324 site years), with a root-mean-square error (RMSE) of 1.63 gC m−2 d−1. The median R2 over C3 and C4 crop sites reaches 0.87 and 0.94, respectively, indicating great potentials for monitoring crops, in particular bioenergy crops, at the field level. With such a satisfactory performance and its distinct characteristics in spatiotemporal resolution and instantaneity, the proposed SLOPE GPP product is promising for biological and environmental research, carbon cycle research, and a broad range of real-time applications at the regional scale. The archived dataset is available at https://doi.org/10.3334/ORNLDAAC/1786 (download page: https://daac.ornl.gov/daacdata/cms/SLOPE_GPP_CONUS/data/, last access: 20 January 2021) (Jiang and Guan, 2020), and the real-time dataset is available upon request.

1 Introduction

Gross primary productivity (GPP) quantifies the amount of carbon dioxide (CO2) fixed by plants through photosynthesis (Beer et al., 2010; Jung et al., 2017). Because GPP is the largest carbon flux and influences other ecosystem processes such as respiration and transpiration, monitoring GPP is crucial for understanding the global carbon budget and terrestrial-ecosystem dynamics (Bonan, 2019; Friedlingstein et al., 2019). In addition, biomass accumulation driven by GPP is the basis for food, feed, wood and fiber production, and therefore monitoring GPP is crucial for human welfare and development (Guan et al., 2016; Ryu et al., 2019).

Over the past 2 decades, a number of GPP products with different spatial and temporal characteristics have been derived using remote sensing approaches (Xiao et al., 2019). However, since GPP cannot be directly observed at large scales, different models have been developed and used in generating GPP products. Process-based models use a series of nonlinear equations to represent the atmosphere–vegetation–soil system and associated fluxes. For example, a publicly available global GPP product using process-based models is the Breathing Earth System Simulator (BESS) (Jiang and Ryu, 2016). Machine-learning models upscale site-observed GPP to a larger scale by establishing non-parametric relationships between the ground truth and gridded explanatory variables. The FLUXCOM GPP product is a typical example of this approach (Jung et al., 2019). Semi-empirical approaches utilize equations with a concise physiological meaning (e.g., light use efficiency) that are parameterized with several empirical constraint functions. The MOD17 (MODIS GPP/NPP – net primary production – Project) GPP product (Running et al., 2004), the Vegetation Photosynthesis Model (VPM) GPP product (Zhang et al., 2017) and the Global LAnd Surface Satellite (GLASS) GPP product (Yuan et al., 2010) belong to this category.

With differing principles, assumptions and complexity, existing remote sensing GPP models heavily rely upon inputs with large uncertainties. First, climate forcing, such as temperature, humidity, precipitation and wind speed, is commonly used in these GPP models. However, these meteorological data are not observed but derived from reanalysis approaches and usually have coarse spatial resolution (e.g., >10 km) and large time lags (e.g., weeks). Second, plant functional types (PFTs) are used to define different parameterization schemes in those models. To date, satellite land cover products are usually characterized by considerably large time lags (>1 year), relatively low accuracy (≤70 %) (Yang et al., 2017) and more uncertainties with regards to year-to-year variations (Cai et al., 2014; Li et al., 2018). Third, high-level remote sensing land products such as the leaf area index (LAI), fraction of absorbed photosynthetically active radiation (FPAR), clumping index (CI), land surface temperature (LST) and soil moisture (SM) are used by some models. These variables are not directly observed but retrieved by complicated algorithms, and their accuracy still needs significant improvement to meet requirements of Earth system models (GCOS, 2011).

Alternative approaches which heavily rely on reliable satellite observations with low dependence on uncertain model structure or parameterization and model inputs are highly required. Solar-induced fluorescence (SIF) emerged in recent years and may provide a new opportunity for GPP estimation (Guanter et al., 2014). Linear relationships have been found between SIF and GPP at various ecosystems (Liu et al., 2017; Magney et al., 2019; Yang et al., 2015). However, satellite SIF data generally have coarse resolution, large spatial gaps, short temporal coverage and limited quality (Bacour et al., 2019; Zhang et al., 2018) and are therefore not suitable for many applications.

Near-infrared reflectance of vegetation (NIRV,Ref), defined as the product of the normalized difference vegetation index (NDVI) and observed NIR reflectance (NIRRef) (Eq. 1), has recently been presented as a proxy of GPP (Badgley et al., 2017). A global monthly 0.5 GPP dataset has been produced from satellite data using the linear relationship between NIRV,Ref and GPP (Badgley et al., 2019), explaining 68 % GPP variations observed by the FLUXNET network. Several field studies have recently found that taking incoming radiation into account further improves the NIRV–GPP relationship (Dechant et al., 2020; Wu et al., 2020). Because MODIS provides long-term and real-time (2000–present) observations of red (RedRef) and NIR (NIRRef) reflectance and atmospheric conditions with high spatial (250 m for reflectance and 1 km for atmosphere) and temporal (daily) resolutions, now there is an unprecedented opportunity to generate an observation-based GPP product.

(1) NIR V , Ref = NDVI × NIR Ref = NIR Ref - Red Ref NIR Ref + Red Ref × NIR Ref

Leveraging the concept of NIRV, here we present a new GPP model and the resultant daily, 250 m and real-time GPP product (2000–present) covering the contiguous United States (CONUS) (Jiang and Guan, 2020). The product is named SatelLite Only Photosynthesis Estimation (SLOPE) because (1) the model only uses satellite data and (2) the model only has two slope parameters. Detailed model design, multi-source satellite data processing and comprehensive evaluation procedures are elucidated below.

2 Production of the SLOPE product

The method we used to estimate GPP using the novel vegetation index NIRV,Ref follows the concept of light use efficiency (LUE) (Monteith, 1972; Monteith and Moss, 1977):

(2) GPP = PAR × FPAR × LUE .

Since NIRV,Ref has been found strongly correlated to FPAR (Badgley et al., 2017) and moderately correlated to LUE (Dechant et al., 2019), it is possible to simplify Eq. (2) as

(3) GPP PAR × a × NIR V , Ref + b ,

where a and b are the slope and intercept, which can be fitted from ground GPP observations. Both PAR (photosynthetically active radiation) and NIRV,Ref can be easily derived from satellite observations with high spatial and temporal resolutions in real time, avoiding complicated but uncertain algorithm or parameterization to quantify FPAR and LUE in Eq. (2). This linear relationship is likely to converge within C3 species (Badgley et al., 2019) but differs between C3 and C4 species (Wu et al., 2019). Accordingly, land cover data with considerably large time lags and relatively low accuracy may not be necessary for the model parameterization. Instead, an in-season C3–C4 species dataset is needed for the accurate calibration of the linear relationship.

Defining the ratio of GPP to PAR as the incident PAR use efficiency (iPUE) gives

(4) iPUE = GPP / PAR = FPAR × LUE a × NIR V , Ref + b .

Here iPUE is a confounding factor of canopy structure and leaf physiology, representing the capacity of plants to use incoming radiation for photosynthesis. When vegetation is absent, iPUE is zero and NIRV,Ref is expected to be zero too. However, this is not true in reality, as >99.9 % soils have positive NIRV,Ref values according to a global soil spectral library (Jiang and Fang, 2019), and the correction of NIRV,Ref for soil is needed for better performance at low vegetation cover (Zeng et al., 2019). To address this issue, we will propose a spatially explicit correction for NIRV,Ref to derive a soil-adjusted index SANIRV (see details in Sect. 2.2). Since SANIRV=0 when iPUE = 0, Eq. (4) becomes

(5) iPUE c × SANIR V ,

where c is the slope coefficient.

Considering the presence of mixed pixels of C3 and C4 species with the 250 m pixels, Eq. (5) can be rewritten as

(6) iPUE c C 4 × f C 4 + c C 3 × 1 - f C 4 × SANIR V ,

where cC4 and cC3 are the coefficients for C4 and C3 species, respectively, and fC4 is the fraction of C4 species in vegetation. Therefore, the SLOPE GPP model is

(7) GPP c C 4 × f C 4 + c C 3 × 1 - f C 4 × PAR × SANIR V .

In the SLOPE model (Eq. 7), PAR, SANIRV and fC4 are remote sensing inputs, whereas cC4 and cC3 are model parameters to be calibrated using ground-truth GPP data (Fig. 1). In the following sections, we will elaborate on the derivation of PAR, SANIRV and fC4, along with their quantitative uncertainties, and the model calibration for parameters cC4 and cC3. With the uncertainty of each term (ΔcC4, ΔcC3, ΔfC4, ΔPAR and ΔSANIRV), the uncertainty of GPP can be estimated in a spatiotemporally explicit manner by

(8) Δ GPP = f C 4 × PAR × SANIR V × Δ c C 4 + 1 - f C 4 × PAR × SANIR V Δ c C 3 + c C 4 - c C 3 × PAR × SANIR V Δ f C 4 + c C 4 × f C 4 + c C 3 × 1 - f C 4 × SANIR V Δ PAR + c C 4 × f C 4 + c C 3 × 1 - f C 4 × PAR Δ SANIR V .
https://essd.copernicus.org/articles/13/281/2021/essd-13-281-2021-f01

Figure 1Framework to produce the SLOPE GPP product. The box with dashed lines is the legend.

Download

2.1 Derivation of PAR

SLOPE adopts several machine learning approaches to compute PAR using forcing data mainly from Terra and Aqua MODIS Atmosphere and Land products (data solely from morning satellite Terra, afternoon satellite Aqua and combination of the two satellites are called MOD, MYD and MCD, respectively, hereinafter). The list of inputs includes aerosol optical depth (AOD) at 3 km resolution from the MOD04_3K (MODIS/Terra Aerosol 5Min L2 Swath 3km) and MYD04_3K (MODIS/Aqua Aerosol 5Min L2 Swath 3km) products (Remer et al., 2013), total column water vapor (TWV) at 1 km resolution from the MOD05_L2 (MODIS/Terra Total Precipitable Water Vapor 5-Min L2 Swath 1km and 5km) and MYD05_L2 (MODIS/Aqua Total Precipitable Water Vapor 5-Min L2 Swath 1km and 5km) products (Chang et al., 2015), cloud optical thickness (COT) at 1 km resolution from the MOD06_L2 (MODIS/Terra Clouds 5-Min L2 Swath 1km and 5km) and MYD06_L2 (MODIS/Aqua Clouds 5-Min L2 Swath 1km and 5km) products (Baum et al., 2012), total column ozone burden (TO3) at 5 km resolution from the MOD07_L2 (MODIS/Terra Temperature and Water Vapor Profiles 5-Min L2 Swath 5km) and MYD07_L2 (MODIS/Aqua Temperature and Water Vapor Profiles 5-Min L2 Swath 5km) products (Borbas et al., 2015), white-sky land surface shortwave albedo (ALB) at 500 m resolution from the MCD43A3 (MODIS/Terra+Aqua Albedo Daily L3 Global 500m SIN Grid) product (Román et al., 2009), and altitude (ALT) at 30 m resolution from the Shuttle Radar Topography Mission Global 1 arc second (SRTMGL1) product (Kobrick and Crippen, 2017).

MODIS atmosphere products are swath data, and swaths vary day by day. To maintain consistency and facilitate further usage, all data are reprojected using the nearest neighbor resampling approach to the Conus Albers projection on the North American Datum of 1983 (NAD83) (European Petroleum Survey Group – EPSG:6350) with 1 km spatial resolution. For swath data, an overlap area exists between two paths. In this case, data with smaller sensor view zenith angles provided by MOD/MYD03_L2 products are chosen. MODIS land products and SRTMGL1 are tile data with finer resolution than 1 km. They are reprojected to the EPSG 6350 spatial reference by aggregating all fine-resolution pixels within each 1 km grid.

Data gaps exist in all MODIS products, and a gap-filling measure is required. For MODIS atmosphere products, gaps in MOD and MYD are first filled by data in the MYD and MOD counterpart on the same day, followed by a multi-year average on that day. Since the multi-year average of COT is always non-zero, directly using it for a gap-filling measure always implies a cloudy condition. Therefore, a CLARA-2 (cLoud, Albedo and surface Radiation) cloud mask at 0.05 acquired from NOAA AVHRR (Advanced Very High Resolution Radiometer) data is employed (Karlsson et al., 2017). Only MODIS data gaps for AVHRR cloudy pixels are filled by a multi-year average of COT, whereas MODIS COT data gaps for AVHRR clear pixels are set to 0. For the MODIS land product, i.e., ALB, a temporally moving window with a 7 d radius is utilized for a specific day, and a Gaussian filter is applied to the time series data within the moving window on a per-pixel basis. The filtered values are used to fill gaps on that specific day.

Machine learning approaches are used to upscale the ground truth to satellite data. The ground truth is from the Surface Radiation Budget (SURFRAD) Network (Augustine et al., 2000), including seven long-term continuous sites across the CONUS. Daily-mean shortwave radiation (SWR) and PAR on the surface are calculated from site observations at 1–3 min intervals from 2000 through 2018. Daily-mean top-of-atmosphere SWR (SWRTOA) is calculated using latitude and day-of-year (DOY) information (Allen et al., 1998). Subsequently, atmospheric transmittance (tSWR) and proportion of PAR in SW (pPAR) are calculated as SWR / SWRTOA and PAR / SWR.

Models are built to estimate tSWR first, followed by pPAR. MOD data representing atmospheric conditions in the morning and MYD for the afternoon are used separately for the estimation, and the two estimates are averaged to account for discrepancies between the morning and afternoon. Clear and cloudy conditions are also treated separately in modeling considering the absence or presence of non-zero COT data. For the estimation of tSWR, ALB, ALT and SWRTOA are used in addition to atmosphere data, whereas for pPAR, ALB, ALT and the estimated tSWR are used. A summary of model inputs is listed in Table 1.

Table 1Summary of machine learning model inputs for the estimation of tSWR and pPAR. Daily estimations from MOD and MYD atmosphere data are averaged.

Download Print Version | Download XLSX

Four different machine learning approaches are employed to estimate tSWR and pPAR. They are the least absolute shrinkage and selection operator (LASSO) (Tibshirani, 1996), multivariate adaptive regression splines (MARS) (Friedman, 1991), k-nearest neighbor regression (KNN) (Goldberger et al., 2005), and random forest regression (RF) (Liaw and Wiener, 2002). We used scikit-learn, a free software machine learning library for the Python programming language to build the models. All the four algorithms were automatically optimized by tuning their hyperparameters using 5-fold cross validation on their training dataset. All inputs and outputs are the same for the four approaches. Four different PAR estimations are then obtained by Eq. (9), and their ensemble mean and standard deviation are considered as the final estimation and uncertainty, respectively.

(9) PAR = SWR TOA × t SWR × p PAR

2.2 Derivation of SANIRV

SLOPE derives NIRV,Ref (Eq. 1) from MODIS band 1 (red) and band 2 (NIR) surface reflectance (SR) at 250 m resolution from MOD/MYD09GQ products (Vermote et al., 2002). Only pixels with quality control (QC) information defined as “corrected product produced at ideal quality all bands” were used. Since cloud and cloud shadows substantially reduce NIRV,Ref values, SLOPE adopts three strategies to mitigate the cloud contamination.

First, the cloud mask is applied. MOD–MYD COT data processed in Sect. 2.1 are resampled to the same spatial reference with MOD–MYD SR data and used to mask out cloudy pixels. At this point, a morphological dilation operation is used to enlarge the cloud mask to account for cloud edges. However, since COT data have a coarser resolution (1 km) than SR data (250 m), there are still partial clouds and cloud shadows left after this step.

Second, MOD and MYD data are combined. Ideally, on a specific day, MOD and MYD NIRV,Ref should be identical if they are obtained under the same conditions. However, the remaining cloud contamination and sun-target-sensor geometry could cause differences between morning and afternoon observations. Considering that the vegetation index is more sensitive to cloud contamination than the sensor view zenith angle, a simple criterion is applied to combine MOD and MYD observations. If the difference between MOD and MYD NIRV,Ref is greater than or equal to a predefined threshold (0.1 in this study), then the smaller one is likely cloud contaminated, and the larger one is used. Otherwise, the average value of the two is used. However, in many cases, both MOD and MYD data are contaminated, and the sensor view zenith angle may cause unexpected day-to-day variations.

Third, a temporal filter is applied. The filter is based on the assumption that NIRV,Ref should change smoothly within a short time period. Accordingly, a temporally moving window with a 7 d radius is utilized for a specific day. The mean and standard deviation are calculated from the NIRV,Ref time series on a per-pixel basis. Values outside the range of mean ±1.5 standard deviations are considered as outliers and dropped. Subsequently, the mean of the first 3 d and that of the last 7 d are calculated, respectively. If the NIRV,Ref value of the target day is 20 % smaller or larger than both the first 3 d mean and the last 3 d mean, then that NIRV,Ref value is considered as an outlier and dropped.

After the removal of outliers, a large amount of data gaps exist, and a gap-filling measure is required. Similar to ALB in Sect. 2.1, a temporally moving window with a 7 d radius is utilized for a specific day, and a Gaussian filter is applied and used to fill gaps on that day. The rest of the data gaps are filled with the multi-year average of NIRV,Ref. Considering extreme cases for which no data are available on a specific day over all years, the multi-year average of ±3 d is used for the final gap-filling measure.

To minimize the effects of variations in soil brightness on NIRV,Ref, soil background NIRV (NIRV,Soil) is identified from multi-year average NIRV,Ref time series. For a specific pixel, soil background NIRV (NIRV,Soil) is supposed to be (1) smaller than seasonal mean NIRV,Ref, which includes the vegetated period, and (2) smaller than 0.2 indicated by a global soil spectral library (Jiang and Fang, 2019). Therefore, NIRV,Soil is supposed to within a range of [0, min(mean(NIRV,Ref), 0.2)]. The mode of daily NIRV,Ref averaged over 2000–2019 within this value range is considered as NIRV,Soil. An example is given in Fig. S5. Theoretically, NIRV,Soil for evergreen species cannot be obtained from time series NIRV,Ref because of the persistent vegetation cover. Pixels with a NIRV,Soil value larger than 0.1 and seasonal coefficient of variation (CV) of NIRV,Ref smaller than 33 % are empirically considered as evergreen species, and their NIRV,Soil values are set to 0.

Finally, SANIRV is defined as

(10) SANIR V = NIR V , Ref - NIR V , Soil NIR V , Peak - NIR V , Soil × NIR V , Peak ,

where NIRV,Peak is the maximum value of the multi-year average NIRV,Ref time series on a per-pixel basis. SANIRV does not change NIRV,Peak but changes more for low NIRV,Ref values. SANIRV,Ref is set to 0 when NIRV,Ref NIRV,Soil. In general, SANIRV is supposed to be smooth within a short time period; therefore, the standard deviation within the ±3 d temporal window is calculated as uncertainty.

2.3 Derivation of the C4 fraction

A National Land Cover Database (NLCD) along with a crop-specific land cover product Cropland Data Layer (CDL) are used to derive the fraction cover of C4 crop in vegetation (fC4). NLCD is a comprehensive land cover database produced by the United States Geological Survey (USGS). It provides several main thematic classes such as urban, agriculture and forest with high accuracy (Homer et al., 2004). The 30 m nationwide NLCD data are available for 2001, 2004, 2006, 2008, 2011, 2013 and 2016. CDL is an agriculture-oriented product produced by the United States Department of Agriculture (USDA). It provides >100 crop cover types and leverages other land cover types from NLCD (Boryan et al., 2011). Across the CONUS CDL data are available at a 30 m spatial resolution and in a yearly temporal frequency from 2008 through 2019, whereas in some areas annual data are available back to the 1990s.

The fraction of C4 crop in vegetated areas is first derived using existing CDL data. NLCD land cover types are categorized into vegetated areas and non-vegetated areas with 30 m resolution. The fraction of vegetated areas in total area is subsequently calculated for each 250 m pixel. Similarly, CDL crop types are categorized into C4-planted areas and non-C4 areas with 30 m resolution. The fraction of C4 crops in total area is subsequently calculated for each 250 m pixel. The ratio of the fraction of C4 crops in total area to the fraction of vegetated areas in total area is calculated to derive the fraction of C4 crops in vegetated areas at 250 m resolution. Since NLCD data are not available for every year, an assumption is made that 1 year of NLCD data can represent adjacent years. Specifically, NLCD 2001 is used for 2000–2002; NLCD 2004 is used for 2003 and 2004; NLCD 2006 is used for 2005 and 2006; NLCD 2008 is used for 2007–2009; NLCD 2011 is used for 2010 and 2011; NLCD 2013 is used for 2012–2014; and NLCD 2016 is used for 2015–2019.

To predict the fraction of C4 crop in vegetation for region years for which no CDL data are available, crop rotation patterns are identified from historical data. Assuming that C4 crops are planted following three rotation strategies: C4–non-C4, C4–C4–non-C4 and non-C4–non-C4–C4 and assigning 1 to C4 and 0 to non-C4, a total of eight possible time series during the period of 2008–2019 when nationwide CDL data are available are listed in Table 2. On a per-pixel basis, the time series of the fraction of C4 crop in vegetation areas during 2008–2019 is compared with the eight predefined rotation patterns. The Pearson coefficient r is calculated between actual time series and each of the eight patterns, and the pattern yielding the largest r is the identified rotation pattern. Once the pattern is identified, the fraction of C4 crop in vegetated areas in any unknown year can be inferred. If 1 year is inferred as C4, then the multi-year average of the C4 fraction over C4-dominated years is used. Otherwise, the multi-year average over C3-dominated years is used. If the largest r is smaller than 0.497, i.e., p>0.1 for 12 years, then it is considered as no significant pattern, and the multi-year average over all years is used. The RMSE between the predicted and reference CDL C4 fraction is calculated as uncertainty. To account for the land cover change, the predicted C4 crop fraction is set to 0 in years when NLCD data are not classified as cropland. It is worth mentioning that C4 grassland and shrubland are not considered in this study, as no nationwide high-resolution distribution data are available.

Table 2Predefined C4-planting patterns from 2008 through 2019. If the C4 crop dominates in a specific year, 1 is assigned. Otherwise, 0 is assigned.

Download Print Version | Download XLSX

2.4 Calibration for iPUE coefficients

SLOPE was calibrated using the GPP data derived from AmeriFlux site observations. The AmeriFlux network is a community of sites that use eddy-covariance technology to measure landscape-level carbon, water and energy fluxes across the Americas (Baldocchi et al., 2001). A total of 48 sites (324 site years) were involved in this study (Table S3). All of the 43 sites in the FLUXNET2015 Tier 1 dataset (variable name: GPP_DT_VUT_MEAN; quality control: NEE_VUT_REF_QC) in the CONUS were used because this dataset was produced by a standardized data processing pipeline with strict data quality control protocols and is commonly considered the ground truth. Additionally, seven sites were from the AmeriFlux level 4 dataset (variable name: GPP_or_MDS; quality control: NEE_or_fMDSsqc). This dataset was generated more than 10 years ago, and only AmeriFlux Core Sites that are not covered by FLUXNET2015 were used for data quality consideration. For both datasets, only days with the best quality control flags were used in the SLOPE modeling and evaluation procedures.

We used Eq. (5) to conduct model calibration. Although SLOPE considers the iPUE–SANIRV relationship for C3 and C4 species, we also tested other configurations for comparison purposes. Configuration 1 (“all”) is defined as follows: all data were used together to fit a universal iPUE coefficient c. Configuration 2 (“C3–C4”) is defined as follows: data were separated for C3 and C4 species to fit cC3 and cC4, respectively. It is worth mentioning that only C4 crops (six sites) were considered as C4 species, whereas C4 grass and shrubs (three sites: US-SRG, Santa Rita Grassland; US-SRM, Santa Rita Mesquite; and US-Wkg, Walnut Gulch Kendall Grasslands) were still categorized into C3 species because of the lack of nationwide and high-resolution C4 grass and shrub data. Configuration 3 (“PFTs”) is defined as follows: data were separated for different PFTs, evergreen needleleaf forest (ENF; 14 sites), deciduous broadleaf forest and mixed forest (DBF and MF; 8 sites), shrubland and woody savannah (SHR and WSA; 5 sites), grassland (GRA; 8 sites), wetland (WET; 5 sites), C3 cropland (10 sites) and C4 cropland (6 sites), to fit PFT-specific iPUE coefficients. The RMSE between SANIRV-derived and AmeriFlux iPUE for C3 and C4 are calculated as uncertainties of cC3 and cC4, respectively.

3 Evaluation of the SLOPE product

3.1 Performance of PAR

SLOPE PAR demonstrates distinctive and detailed spatial variations in the CONUS because of the large spatial variations of atmospheric conditions (Fig. 2a). As an example, on 10 July 2020, large areas in New Jersey, Wisconsin, Oklahoma, South Dakota and Montana display significantly lower values than other areas, due to dominant impacts of clouds (Fig. S1). Aerosol optical depth (Fig. S2), total water vapor (Fig. S3) and total ozone burden (Fig. S4) also influence the amount of clear-sky PAR to some degree. For example, the southeastern part of the CONUS shows more aerosol and thus lower PAR values than other cloud-free areas. In addition to the total amount of PAR, SLOPE PAR also reveals variations in the ratio of PAR to SWR (Fig. S5). Despite a relatively small range (0.40–0.46), it is negatively correlated with cloud optical thickness and total ozone burden and positively correlates with total water vapor. PAR uncertainties caused by the difference of the four machine learning algorithms are generally small (<5 %; Fig. 2b). Higher uncertainties are mainly distributed in cloudy and desert areas.

https://essd.copernicus.org/articles/13/281/2021/essd-13-281-2021-f02

Figure 2Spatial distribution of 1 km resolution (a) PAR (MJ m−2 d−1) and (b) PAR uncertainty (MJ m−2 d−1) on 10 July 2020.

To evaluate SLOPE PAR, we used two different site observation datasets which are independent of the PAR derivation procedure. The first dataset is SURFRAD (Table S1). While SURFRAD data from 2000 through 2018 were used for model training, we used data in 2019 for evaluation. The second dataset is FLUXNET2015 (Table S2). A total of 41 sites providing PAR data were used for the evaluation. For both datasets, only days with the best quality control flags were used.

Evaluation results show that SLOPE PAR is in a highly aligned agreement with the ground truth independent from the training procedure (Fig. 3). Across the seven SURFRAD sites in 2019 and the 41 AmeriFlux sites from 2000 to 2014, SLOPE PAR achieves an overall coefficient of determination (R2) of 0.91 and root-mean-square error (RMSE) values of 1.09 and 1.19 MJ m−2 d−1, respectively. In addition, the performance is reasonably stable under different sky conditions, indicated by similar R2 and RMSE values from low to high atmospheric transmittance (Fig. S6).

https://essd.copernicus.org/articles/13/281/2021/essd-13-281-2021-f03

Figure 3Comparison between site-observed PAR and SLOPE PAR. (a) Comparison across seven SURFRAD sites in 2019. (b) Comparison across 41 AmeriFlux sites from 2000 to 2014. All site data are independent of the training procedure.

Download

3.2 Performance of SANIRV

SLOPE SANIRV demonstrates detailed and distinctive spatial variations in the CONUS (Fig. 4a). In the peak growing season, remarkably high SANIRV values ( 0.5) from the Corn Belt in the central US are observed. This area is one of the most productive areas on Earth, producing more than 30 % of global corn and soybean (Green et al., 2018). Forested areas in the eastern and western US are characterized by relatively high values (0.3–0.4) and medium values (0.2–0.3), respectively. Low values (<0.2) are mainly observed in grasslands and shrublands in the western US. Uncertainty is associated with SANIRV data on the pixel basis (Fig. 4b). In general, areas with higher SANIRV values also have higher uncertainties. However, this pattern is altered by atmospheric conditions, where areas with higher cloud optical thickness (Fig. S1), aerosol optical depth (Fig. S2) and water vapor (Fig. S3) values tend to have larger uncertainties.

https://essd.copernicus.org/articles/13/281/2021/essd-13-281-2021-f04

Figure 4Spatial distribution of 250 m resolution (a) SANIRV and (b) SANIRV uncertainty across the CONUS on 10 July 2020.

Figure 5 demonstrates that SLOPE SANIRV is able to capture spatial and temporal variations at a small scale (e.g., within a county). An overall drop in SANIRV due to an extreme event damage can be observed within a short time period, thanks to the high temporal resolution (daily) of the SLOPE product. In addition, differences between plots possibly indicating different varieties, planting density and management can also be observed, thanks to the high spatial resolution (250 m) of the SLOPE product.

https://essd.copernicus.org/articles/13/281/2021/essd-13-281-2021-f05

Figure 5SANIRV in a 50×75 km2 area at Cedar Rapids, Iowa (red marker in Fig. 4a), on (a) 9 Aug 2020 and (b) 13 Aug 2020. A severe derecho took place from 10–11 August 2020. The maps are shown with the sinusoidal projection.

Download

SLOPE SANIRV shows significantly different seasonality for different PFTs (Fig. 6). The evergreen needleleaf forest site US-Blo (Blodgett Forest) is characterized by a relatively stable SANIRV seasonal cycle in 2019 (Fig. 6a), indicated by a CV of 14.9 % only. The deciduous broadleaf forest site US-Ha1 (Harvard Forest EMS Tower) has a large seasonal variation with a CV of 108.6 % (Fig. 6b). The SANIRV value suddenly rises from 0 to 0.3 in May, reaches 0.4 in June and July and gradually decreases back to 0 in October. The hot desert open shrubland site US-Whs (Walnut Gulch Lucky Hills Shrub) has an overall low SANIRV value (Fig. 6c), with a peak value observed in early October. The grassland site US-AR1 (ARM USDA UNL OSU Woodward Switchgrass 1) shows a distinct double-peak (in June and September) seasonal pattern (Fig. 6d), which is caused by the precipitation seasonality there. The wetland site US-Myb (Mayberry Wetland) is characterized by a long growing season and a flat peak from April to November (Fig. 6e). The cropland site US-Bo1 (Bondville) has corn planted in 2019, and it shows the highest SANIRV peak up to 0.5 among all the shown six sites (Fig. 6f). It is worth mentioning that compared to the two raw satellite-observed NIRV values provided by MOD09GQ and MYD09GQ products, respectively, SLOPE SANIRV successfully removes the soil impact in the non-growing season as the values equal to or close to 0. In addition, SLOPE SANIRV is gap-free and much less contaminated by noises. Furthermore, spatiotemporally explicit uncertainty is associated with each SANIRV value.

https://essd.copernicus.org/articles/13/281/2021/essd-13-281-2021-f06

Figure 6Comparison between SANIRV and raw NIRV derived from MOD09GQ and MYD09GQ products at six AmeriFlux sites (Table S3) in 2019. (a) US-Blo (evergreen needleleaf forest, ENF). (b) US-Ha1 (deciduous broadleaf forest, DBF). (c) US-Whs (open shrubland, OSH). (d) US-AR1 (grassland, GRA). (e) US-Myb (wetland, WET). (f) US-Bo1 (cropland, CRO). Shaded areas indicate uncertainties of SANIRV.

Download

3.3 Performance of the C4 fraction

SLOPE predicts a reasonable fraction of the C4 crop in vegetation in the CONUS (Fig. 7a). Most of the C4 crops are located in the Corn Belt, especially in Indiana, Illinois, Iowa and Nebraska. A direct comparison between the predicted C4 crop fraction (Fig. 8a) and independent reference CDL data (Fig. 8b) indicates that the SLOPE prediction is able to reconstruct the spatial patterns of the fraction of C4 crop in vegetation at 250 m resolution. It is worth mentioning that the uncertainty metric RMSE is sensitive to extreme values, and it is different from the misclassification rate (0.4 does not mean 40 %). For a pure pixel of a corn–soybean rotation field, the RMSE equals 0.39 if 3 out of 20 years are misclassified, i.e., misclassification rate of 0.15. A further investigation with regard to interannual dynamics shows that the SLOPE predictions can even perform better than CDL reference data (Fig. 9), benchmarked with the ground truth collected in the field. At this point, the CDL land cover could be prone to uncertainties in both satellite observation and classification algorithm, and classification is conducted year by year without an interannual consideration (Lark et al., 2017). SLOPE employs a rotation model to match decadal time series of CDL data, during which procedure noises in CDL data are suppressed. The features for which SLOPE is able to reconstruct spatial and interannual patterns of CDL data enable producing GPP in years when CDL data are unavailable (e.g., 2020 and years before 2008 for most regions). It is worth mentioning that uncertainty is also associated with each pixel (Fig. 7b).

https://essd.copernicus.org/articles/13/281/2021/essd-13-281-2021-f07

Figure 7Spatial distribution of 250 m resolution (a) predicted fraction of C4 crop in vegetation in 2000 and (b) C4 crop fraction uncertainty across the CONUS.

https://essd.copernicus.org/articles/13/281/2021/essd-13-281-2021-f08

Figure 8C4 crop fraction of (a) SLOPE predicted and (b) CDL reference data in a 50×75 km2 area in Keith County, Nebraska (red marker in Fig. 7a), in 2000. Only CDL data during 2008–2019 are used in the modeling procedure, and therefore (b) is independent of (a). The maps are shown with the sinusoidal projection.

Download

https://essd.copernicus.org/articles/13/281/2021/essd-13-281-2021-f09

Figure 9Comparison of fraction of C4 crop in vegetation between the field-collected ground truth, 250 m resolution CDL data and 250 m resolution SLOPE predictions at six AmeriFlux sites (Table S3) in the US Corn Belt from 2000 to 2020. (a) US-Ne1 (uncertainty: 0.17; Mead – irrigated continuous maize site). (b) US-Ne2 (uncertainty: 0.40; Mead – irrigated maize–soybean rotation site). (c) US-Ne3 (uncertainty: 0.18; Mead – rainfed maize–soybean rotation site). (d) US-Bo1 (uncertainty: 0; Bondville). (e) US-Ro1 (uncertainty: 0.16; Rosemount – G21). Uncertainty is the RMSE between the predicted and the CDL reference.

Download

3.4 Performance of GPP

SLOPE SANIRV shows a strong linear correlation with iPUE (Fig. 10). When data from all 49 sites (324 site years) are used together, the SANIRV–iPUE relationship has an overall R2 value of 0.73 (Fig. 10a). This is composed of an R2 value of 0.92 from C4 species (Fig. 10b) and 0.70 from C3 species (Fig. 10c). C3 species can be further decomposed into six PFTs (Fig. 10d–i), among which cropland has the highest R2 value up to 0.80 (Fig. 10i), whereas evergreen needleleaf forest has the lowest value of 0.46 (Fig. 10d). This relatively weak iPUE–SANIRV relationship is expected because evergreen needleleaf forest tends to allocate resources for leaf construction and maintenance at large timescales and does not have much flexibility to change canopy structure and leaf color as a response to varying environment at small timescales (Badgley et al., 2019; Chabot and Hicks, 1982). Previous studies found that changes in the xanthophyll cycle instead of chlorophyll concentration or absorbed PAR explained the seasonal variation of photosynthetic capacity in evergreen needleleaf forest (Gamon et al., 2016; Magney et al., 2019). Therefore, SIF was suggested by some studies as a better proxy of photosynthetic capacity in this ecosystem (Smith et al., 2018; Turner et al., 2020), though satellite SIF has coarser spatial resolution, shorter temporal coverage, larger temporal latency and a lower signal-to-noise ratio than SANIRV. In addition, the relatively weak iPUE–SANIRV relationship is also partly because of the small value ranges in both SANIRV and iPUE.

https://essd.copernicus.org/articles/13/281/2021/essd-13-281-2021-f10

Figure 10Comparison between SANIRV and iPUE over different subsets. The slope value of the SANIRV–iPUE relationship is the model parameter c (Eq. 5). Panel (a) is used by the model configuration 1 (all). Panels (b) and (c) are used by the model configuration 2 (C3–C4), which is actually used by SLOPE. Panels (b) and (d)(i) are used by the model configuration 3 (PFTs).

Download

The overall slope is 3.82 gC MJ−1 for all data (Fig. 10a). A distinct difference is found between C4 (5.18; Fig. 10b) and C3 (3.54; Fig. 10c) species, suggesting the importance of separating C4 from C3 species in modeling. The slope values vary to a limited degree within C3 species (Fig. 10d–i), ranging from 3.26 gC MJ−1 (cropland; Fig. 10i) to 3.80 gC MJ−1 (evergreen needleleaf forest; Fig. 10d), indicating the insignificance of separating different PFTs. It is worth mentioning that the SANIRV–iPUE relationship has a zero intercept because of the successful removal of the soil impact.

A 100-time-repeated 5-fold cross validation reveals the robustness of the SANIRV–iPUE relationships (Fig. 11). Here the repeated cross validation means the whole GPP dataset from all 49 sites (324 site years) is randomly split into five folds, four folds for training and one fold for testing, and the process is repeated 100 times yielding 500 training–testing splits in total. In all subsets, the uncertainties of the iPUE coefficient c (the slope of the SANIRV–iPUE relationship) are less than 1 % (Fig. 11a). When using the three different model configurations, the model performances in simulating the whole training–testing datasets also show little variation (Fig. 11b), in general <0.5 % and <1.5 % for the training and testing datasets, respectively. Moreover, the R2 values between training and testing datasets, and between C3–C4 and PFT-based configurations are almost identical ( 0.76). These results suggest using cC4=5.18 (Fig. 10b) and cC3=3.54 (Fig. 10c) in SLOPE is reasonable. The 95 % confidential intervals of c for C4 and C3 species (Fig. 11a) are used as their uncertainties in SLOPE.

https://essd.copernicus.org/articles/13/281/2021/essd-13-281-2021-f11

Figure 11Statistics of the SANIRV–iPUE relationship from cross validation. (a) Slopes of the SANIRV–iPUE relationship over different subsets. (b) R2 between AmeriFlux GPP and estimated GPP using different model configurations for the training and testing datasets, respectively. Error bars in both subplots indicate 95 % confidential intervals over 500 experiments.

Download

SLOPE GPP demonstrates detailed and distinctive spatial variations in the CONUS (Fig. 12a). The Corn Belt is the most productive area, largely contributed by the C4 crop corn whose GPP could reach up to 30 gC m−2 d−1 (Fig. 13a). Forested areas in the eastern US show medium GPP values, followed by forests and croplands in the western US. Grasslands and shrublands in the central and western US generally show low productivity. On this example day, the R2 of spatial patterns between GPP and SANIRV, GPP and C4 fraction, and GPP and PAR across the CONUS are 0.89, 0.34 and 0.01, respectively. SANIRV, an integrated vegetation index containing information of both FPAR and LUE (Eq. 4), explains the majority of GPP spatial variation. C4 fraction mainly contributes to the distribution and magnitude of the peak in GPP spatial variation. Although PAR does not influence the nationwide GPP spatial variation, it regulates GPP values at local scale. For example, northeastern Nebraska shows smaller GPP values than southeastern Nebraska in spite of a similar SANIRV (Fig. 4a) and C4 fraction (Fig. 7a) because of smaller PAR values (Fig. 2a). At a small scale (e.g., within a county), the 250 m resolution ( 0.06 km2 per pixel) SLOPE GPP is close to revealing field-level heterogeneity, considering that the mean and median crop field sizes in the CONUS are 0.19 and 0.28 km2, respectively (Yan and Roy, 2016). For example, Fig. 13a shows large contrast in GPP, but Fig. 13b is more homogeneous. This is because corn reaches peak growing season in early July when soybean canopy is still open and sparse. SLOPE GPP with its pixel size much smaller than field area is therefore able to show GPP difference between corn and soybean. In late August, corn turns yellow, while soybean is still green and active, and therefore they have similar GPP values considering corn is C4, while soybean is C3. We suggest that the 250 m resolution makes a big difference compared to existing global GPP products whose spatial resolutions are at least 500 m ( 0.25 km2 per pixel). Quantitative uncertainty is provided for each SLOPE GPP estimate (Eq. 8). The spatial pattern shows that the Corn Belt has the largest uncertainty (Fig. 12b; e.g., 5 gC m−2 d−1) due to the considerable contribution from the uncertainty of the C4 fraction (Fig. 7b).

https://essd.copernicus.org/articles/13/281/2021/essd-13-281-2021-f12

Figure 12Spatial distribution of 250 m resolution (a) GPP (gC m−2 d−1) and (b) GPP uncertainty (gC m−2 d−1) across the CONUS on 10 July 2020.

https://essd.copernicus.org/articles/13/281/2021/essd-13-281-2021-f13

Figure 13GPP (gC m−2 d−1) in a 50 × 75 km2 area in Champaign County, Illinois (red marker in Fig. 12a), on (a) 10 July 2020 and (b) 20 August 2020. The maps are shown with the sinusoidal projection.

Download

SLOPE GPP agrees fairly well with the ground truth from the AmeriFlux sites (Fig. 14). Across all of the 49 sites (328 site years; Fig. 14a), SLOPE GPP achieves an overall R2 value of 0.85, RMSE of 1.63 gC m−2 d−1 and relative error of 37.8 %. For individual sites (Fig. 14b), the median R2 and RMSE values are 0.80 and 1.69 gC m−2 d−1, respectively. C4 cropland generally shows the highest median R2 value (0.94), followed by deciduous broadleaf forest and mixed forest (0.88) and C3 cropland (0.87). The lowest median R2 value (0.69) is observed for evergreen needleleaf forest. With regard to RMSE, smaller median values are found in grassland (1.09 gC m−2 d−1), shrubland and woody savannah (1.48 gC m−2 d−1), and deciduous broadleaf forest and mixed forest (1.48 gC m−2 d−1), whereas C3 (2.15 gC m−2 d−1) and C4 (2.01 gC m−2 d−1) cropland tend to have larger RMSE values.

https://essd.copernicus.org/articles/13/281/2021/essd-13-281-2021-f14

Figure 14Performance of the SLOPE GPP. (a) Comparison between AmeriFlux GPP and SLOPE GPP across all sites. (b) R2 and RMSE of individual sites. Sites with a C3–C4 rotation are separated into C3 CRO and C4 CRO.

Download

SLOPE GPP generally captures seasonal and interannual variations of AmeriFlux GPP for different PFTs (Fig. 15). At the evergreen needleleaf forest site US-Blo (Fig. 15a), the GPP seasonal cycle is mainly driven by PAR, as the iPUE indicated by SANIRV is fairly stable (Fig. 6a). At the deciduous broadleaf forest site US-Ha1 (Fig. 15b), the start of the season and the end of the season agree well between AmeriFlux GPP and SLOPE GPP. At the open-shrubland site US-Whs (Fig. 15c), the quick rise and drop of GPP in response to the start and end of the wet season are clearly observed in SLOPE GPP. Even the double-peak pattern in 2011 can be observed in SLOPE GPP. At the grassland site US-AR1 (Fig. 15d), the impact of a severe drought in the southern Great Plains in 2011 is distinct in SLOPE GPP, as the GPP values in 2011 are only about half of those in 2010 and 2012. At the cropland site US-Bo1 (Fig. 15f), the rotation-caused year-to-year variation is distinct, indicated by higher values in odd-numbered years with C4 crop corn planted and lower values in even-numbered years with C3 crop soybean planted (Fig. 9d). The lowest GPP peak is observed in 2012 when a severe drought attacked the central US.

https://essd.copernicus.org/articles/13/281/2021/essd-13-281-2021-f15

Figure 15Comparison between AmeriFlux (black dots) and SLOPE (red curves) daily GPP at six AmeriFlux sites (Table S3) from 2000 to 2019. (a) US-Blo (evergreen needleleaf forest, ENF). (b) US-Ha1 (deciduous broadleaf forest, DBF). (c) US-Whs (open shrubland, OSH). (d) US-AR1 (grassland, GRA). (e) US-Myb (wetland, WET). (f) US-Bo1 (cropland, CRO). Shaded areas indicate uncertainties of SLOPE GPP.

Download

4 Data availability and data format

The archived daily 250 m resolution SLOPE GPP data product from 2000 to 2019 is distributed under a Creative Commons Attribution 4.0 License. It is publicly available at NASA's Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC) with a DOI of https://doi.org/10.3334/ORNLDAAC/1786 (download page: https://daac.ornl.gov/daacdata/cms/SLOPE_GPP_CONUS/data/, last access: 20 January 2021) (Jiang and Guan, 2020). Data from 2020 are available from the authors upon request. All data are projected in the standard MODIS Land Integerized Sinusoidal tile map projection and are stored in GeoTIFF format files with a data type of signed 16 bit integer. Each processing tile has a size of 4800 pixels by 4800 pixels, representing a land region of approximately 1200 km by 1200 km . In addition to the GPP product, SLOPE PAR, SANIRV and C4 fraction, along with their uncertainties, are also released. These datasets are also stored in the same spatial projection and file format with the GPP dataset. PAR (resampled from 1 km to 250 m to be consistent with GPP) and SANIRV are provided on a daily basis, whereas C4 fraction is provided on an annual basis. A README file is provided along with the SLOPE product, which instructs the usage of the data.

5 Conclusions

This study produces a long-term and real-time (2000–present) GPP product with daily and 250 m spatial and temporal resolutions. The product is based on a remote-sensing-only (SLOPE) model, which uses accurate PAR, soil-adjusted NIRV and dynamic C4 fractions as inputs. Evaluation against AmeriFlux ground-truth GPP shows that the SLOPE GPP product has a reasonable accuracy, with an overall R2 of 0.85 and RMSE of 1.63 gC m−2 d−1. To demonstrate the real-time capacity of the SLOPE GPP product, the latest GPP data on 2 November 2020, 2 d prior to the revision of this paper, is shown in Fig. S7. The spatiotemporal resolution and instantaneity of the SLOPE GPP product are higher than existing global GPP products, such as MOD17, VPM, GLASS, FLUXCOM and BESS. We expect this novel GPP product can significantly contribute to various researchers and stakeholders in fields related to the regional carbon cycle, land surface processes, ecosystem monitoring and management, and agriculture. The approaches used in this study, in particular, the derivation of SANIRV, can also be applied to any other satellite platform with the two most classical bands: red and NIR, for example, SaTellite dAta IntegRation (STAIR) from Landsat–MODIS fusion data, which has daily, 30 m spatiotemporal resolution and can be applied at a large scale (Jiang et al., 2020; Luo et al., 2018); commercial Planet Labs data with a daily interval and spatial resolution up to 3m (Houborg and McCabe, 2016; Kimm et al., 2020); and the Advanced Very High Resolution Radiometer (AVHRR) with a temporal coverage as far back as 1982 (Franch et al., 2017; Jiang et al., 2017). However, caution should be used in the interpretation of GPP seasonal trajectory in evergreen needleleaf forests because of the relatively poor relationship between SANIRV–iPUE and GPP magnitude in southwestern US grasslands because of the ignorance of the fraction of C4 grasslands. Finally, although the SLOPE product has been generated from 2000 to present, caution should also be used in the interpretation of the long-term trend because the SLOPE model, as many other LUE models, does not explicitly consider the CO2 fertilization effects on vegetation productivity.

Supplement

The supplement related to this article is available online at: https://doi.org/10.5194/essd-13-281-2021-supplement.

Author contributions

CJ and KG designed the project and the workflow. CJ and GW developed the SLOPE model. CJ processed the data and generated the GPP product. CJ, BP and SW interpreted the results and refined the experiments. CJ wrote the paper, and KG, GW, BP and SW all contributed to the improvement of the paper.

Competing interests

The authors declare that they have no conflict of interest.

Acknowledgements

Chongya Jiang, Kaiyu Guan, Genghong Wu and Sheng Wang are funded by the DOE Center for Advanced Bioenergy and Bioproducts Innovation (U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research under award no. DE-SC0018420). Any opinions, findings and conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the U.S. Department of Energy. Kaiyu Guan and Bin Peng are funded by NASA awards (nos. NNX16AI56G and 80NSSC18K0170). Kaiyu Guan is also funded by an NSF CAREER award (no. 1847334). Chongya Jiang and Kaiyu Guan also acknowledge the support from Blue Waters Professorship from the National Center for Supercomputing Applications of UIUC. This research is part of the Blue Waters sustained-petascale computing project, which is supported by the National Science Foundation (award nos. OCI-0725070 and ACI-1238993) and the state of Illinois. Blue Waters is a joint effort of the University of Illinois at Urbana-Champaign and its National Center for Supercomputing Applications. We thank NASA for freely sharing the MODIS products.

Financial support

This research has been supported by the U.S. Department of Energy (grant no. DE-SC0018420), the National Aeronautics and Space Administration (grant nos. NNX16AI56G and 80NSSC18K0170) and the National Science Foundation (grant no. 1847334).

Review statement

This paper was edited by Jens Klump and reviewed by three anonymous referees.

References

Allen, R. G., Pereira, L. S., Raes, D., and Smith, M.: Crop evapotranspiration-Guidelines for computing crop water requirements-FAO Irrigation and drainage paper 56, Italy: Rome, available at: http://www.fao.org/3/x0490E/x0490e00.htm (last access: 20 January 2021), 1998. 

Augustine, J. A., DeLuisi, J. J., and Long, C. N.: SURFRAD – A national surface radiation budget network for atmospheric research, B. Am. Meteorol. Soc., 81, 2341–2357, https://doi.org/10.1175/1520-0477(2000)081<2341:SANSRB>2.3.CO;2, 2000. 

Bacour, C., Maignan, F., Peylin, P., MacBean, N., Bastrikov, V., Joiner, J., Köhler, P., Guanter, L., and Frankenberg, C.: Differences Between OCO-2 and GOME-2 SIF Products From a Model-Data Fusion Perspective, J. Geophys. Res.-Biogeosci., 124, 3143–3157, https://doi.org/10.1029/2018JG004938, 2019. 

Badgley, G., Field, C. B., and Berry, J. A.: Canopy near-infrared reflectance and terrestrial photosynthesis, Sci. Adv., 3, 1–6, https://doi.org/10.1126/sciadv.1602244, 2017. 

Badgley, G., Anderegg, L. D., Berry, J. A., and Field, C. B.: Terrestrial Gross Primary Production: Using NIR V to Scale from Site to Globe, Glob. Chang. Biol., 25, 3731–3740,, https://doi.org/10.1111/gcb.14729, 2019. 

Baldocchi, D., Falge, E., Gu, L., Olson, R., Hollinger, D., Running, S., Anthoni, P., Bernhofer, C., Davis, K., Evans, R., Fuentes, J., Goldstein, A., Katul, G., Law, B., Lee, X., Malhi, Y., Meyers, T., Munger, W., Oechel, W., Paw, U. K. T., Pilegaard, K., Schmid, H. P., Valentini, R., Verma, S., Vesala, T., Wilson, K., and Wofsy, S.: FLUXNET: a new tool to study the temporal and spatial variability of ecosystem-scale carbon dioxide, water vapor, and energy flux densities, B. Am. Meteorol. Soc., 82, 2415–2434, https://doi.org/10.1175/1520-0477(2001)082<2415:FANTTS>2.3.CO;2, 2001. 

Baum, B. A., Menzel, W. P., Frey, R. A., Tobin, D. C., Holz, R. E., Ackerman, S. A., Heidinger, A. K., and Yang, P.: MODIS cloud-top property refinements for collection 6, J. Appl. Meteorol. Climatol., 51, 1145–1163, https://doi.org/10.1175/JAMC-D-11-0203.1, 2012. 

Beer, C., Reichstein, M., Tomelleri, E., Ciais, P., Jung, M., Carvalhais, N., Rödenbeck, C., Arain, M. A., Baldocchi, D., and Bonan, G. B.: Terrestrial gross carbon dioxide uptake: global distribution and covariation with climate, Science, 329, 834–838, 2010. 

Bonan, G.: Climate Change and Terrestrial Ecosystem Modeling, Cambridge University Press, Cambridge, UK, 354 pp., ISBN 978-1-107-04378-7, 2019. 

Borbas, E. E., Seemann, S. W., Kern, A., Moy, L., Li, J., Gumley, L., and Menzel, W. P.: MODIS atmospheric profile retrieval algorithm theoretical basis document, Citeseer, available at: http://modis-atmos.gsfc.nasa.gov/MOD07_L2/atbd.html (last access: January 2021), 2015. 

Boryan, C., Yang, Z., Mueller, R., and Craig, M.: Monitoring US agriculture: The US department of agriculture, national agricultural statistics service, cropland data layer program, Geocarto Int., 26, 341–358, https://doi.org/10.1080/10106049.2011.562309, 2011. 

Cai, S., Liu, D., Sulla-Menashe, D., and Friedl, M. A.: Enhancing MODIS land cover product with a spatial-temporal modeling algorithm, Remote Sens. Environ., 147, 243–255, https://doi.org/10.1016/j.rse.2014.03.012, 2014. 

Chabot, B. F. and Hicks, D. J.: The ecology of leaf life spans, Annu. Rev. Ecol. Syst., 13, 229–259, https://doi.org/10.1146/annurev.es.13.110182.001305, 1982. 

Chang, L., Gao, G., Jin, S., He, X., Xiao, R., and Guo, L.: Calibration and evaluation of precipitable water vapor from Modis infrared observations at night, IEEE Trans. Geosci. Remote Sens., 53, 2612–2620, https://doi.org/10.1109/TGRS.2014.2363089, 2015. 

Dechant, B., Ryu, Y., Badgley, G., Zeng, Y., Berry, J. A., Goulas, Y., Li, Z., Zhang, Q., Kang, M., Li, J., and Moya, I.: Canopy structure explains the relationship between photosynthesis and sun-induced chlorophyll fluorescence in crops, EarthArXiv Prepr., https://doi.org/10.31223/OSF.IO/CBXPQ, 2019. 

Dechant, B., Ryu, Y., Badgley, G., Zeng, Y., Berry, J. A., Zhang, Y., Goulas, Y., Li, Z., Zhang, Q., Kang, M., Li, J., and Moya, I.: Canopy structure explains the relationship between photosynthesis and sun-induced chlorophyll fluorescence in crops, Remote Sens. Environ., 241, 111733, https://doi.org/10.1016/j.rse.2020.111733, 2020. 

Franch, B., Vermote, E. F., Roger, J. C., Murphy, E., Becker-reshef, I., Justice, C., Claverie, M., Nagol, J., Csiszar, I., Meyer, D., Baret, F., Masuoka, E., Wolfe, R., and Devadiga, S.: A 30+ year AVHRR land surface re?ectance climate data record and its application to wheat yield monitoring, Remote Sens., 9, 1–14, https://doi.org/10.3390/rs9030296, 2017. 

Friedlingstein, P., Jones, M. W., O'Sullivan, M., Andrew, R. M., Hauck, J., Peters, G. P., Peters, W., Pongratz, J., Sitch, S., Le Quéré, C., Bakker, D. C. E., Canadell, J. G., Ciais, P., Jackson, R. B., Anthoni, P., Barbero, L., Bastos, A., Bastrikov, V., Becker, M., Bopp, L., Buitenhuis, E., Chandra, N., Chevallier, F., Chini, L. P., Currie, K. I., Feely, R. A., Gehlen, M., Gilfillan, D., Gkritzalis, T., Goll, D. S., Gruber, N., Gutekunst, S., Harris, I., Haverd, V., Houghton, R. A., Hurtt, G., Ilyina, T., Jain, A. K., Joetzjer, E., Kaplan, J. O., Kato, E., Klein Goldewijk, K., Korsbakken, J. I., Landschützer, P., Lauvset, S. K., Lefèvre, N., Lenton, A., Lienert, S., Lombardozzi, D., Marland, G., McGuire, P. C., Melton, J. R., Metzl, N., Munro, D. R., Nabel, J. E. M. S., Nakaoka, S.-I., Neill, C., Omar, A. M., Ono, T., Peregon, A., Pierrot, D., Poulter, B., Rehder, G., Resplandy, L., Robertson, E., Rödenbeck, C., Séférian, R., Schwinger, J., Smith, N., Tans, P. P., Tian, H., Tilbrook, B., Tubiello, F. N., van der Werf, G. R., Wiltshire, A. J., and Zaehle, S.: Global Carbon Budget 2019, Earth Syst. Sci. Data, 11, 1783–1838, https://doi.org/10.5194/essd-11-1783-2019, 2019. 

Friedman, J. H.: Multivariate Adaptive Regression Splines, Ann. Stat., 19, 1–67, https://doi.org/10.1214/aos/1176347963, 1991. 

Gamon, J. A., Huemmrich, K. F., Wong, C. Y. S., Ensminger, I., Garrity, S., Hollinger, D. Y., Noormets, A., and Peñuelas, J.: A remotely sensed pigment index reveals photosynthetic phenology in evergreen conifers, P. Natl. Acad. Sci., 113, 201606162, https://doi.org/10.1073/pnas.1606162113, 2016. 

GCOS: Systematic observation requirements for satellite-based data products for climate, Supplemental details to the satellite-based component of the “Implementation Plan for the Global Observing System for Climate in Support of the UNFCCC,” Reference Number GCOS-154, available at: http://www.wmo.int/pages/prog/gcos/Publications/gcos-154.pdf (last access: 20 January 2021), 2011. 

Goldberger, J., Hinton, G. E., Roweis, S. T., and Salakhutdinov, R. R.: Neighbourhood components analysis, in Advances in neural information processing systems, 513–520, available at: http://www.cs.utoronto.ca/~rsalakhu/papers/ncanips.pdf (last access: 20 January 2021), 2005. 

Green, T. R., Kipka, H., David, O., and McMaster, G. S.: Where is the USA Corn Belt, and how is it changing?, Sci. Total Environ., 618, 1613–1618, https://doi.org/10.1016/j.scitotenv.2017.09.325, 2018. 

Guan, K., Berry, J. A., Zhang, Y., Joiner, J., Guanter, L., Badgley, G., and Lobell, D. B.: Improving the monitoring of crop productivity using spaceborne solar-induced fluorescence, Glob. Chang. Biol., 22, 716–726, https://doi.org/10.1111/gcb.13136, 2016. 

Guanter, L., Zhang, Y., Jung, M., Joiner, J., Voigt, M., Berry, J. a, Frankenberg, C., Huete, A. R., Zarco-Tejada, P., Lee, J.-E., Moran, M. S., Ponce-Campos, G., Beer, C., Camps-Valls, G., Buchmann, N., Gianelle, D., Klumpp, K., Cescatti, A., Baker, J. M., and Griffis, T. J.: Global and time-resolved monitoring of crop photosynthesis with chlorophyll fluorescence, P. Natl. Acad. Sci. USA, 111, E1327-33, https://doi.org/10.1073/pnas.1320008111, 2014. 

Homer, C., Huang, C., Yang, L., Wylie, B., and Coan, M.: Development of a 2001 National Land-Cover Database for the United States, Photogramm. Eng. Remote Sens., 70, 829–840, https://doi.org/10.14358/PERS.70.7.829, 2004. 

Houborg, R. and McCabe, M. F.: High-Resolution NDVI from planet's constellation of earth observing nano-satellites: A new data source for precision agriculture, Remote Sens., 8, 768, https://doi.org/10.3390/rs8090768, 2016. 

Jiang, C. and Fang, H.: GSV: a general model for hyperspectral soil reflectance simulation, Int. J. Appl. Earth Obs. Geoinf., 83, 101932, https://doi.org/10.1016/j.jag.2019.101932, 2019. 

Jiang, C. and Guan, K.: SLOPE Daily 250 m CONUS Gross Primary Productivity (2000–2019), ORNL DAAC, https://doi.org/10.3334/ORNLDAAC/1786, 2020. 

Jiang, C. and Ryu, Y.: Multi-scale evaluation of global gross primary productivity and evapotranspiration products derived from Breathing Earth System Simulator (BESS), Remote Sens. Environ., 186, 528–547, https://doi.org/10.1016/j.rse.2016.08.030, 2016. 

Jiang, C., Ryu, Y., Fang, H., Myneni, R., Claverie, M., and Zhu, Z.: Inconsistencies of interannual variability and trends in long-term satellite leaf area index products, Glob. Chang. Biol., 23, 4133–4146, https://doi.org/10.1111/gcb.13787, 2017. 

Jiang, C., Guan, K., Pan, M., Ryu, Y., Peng, B., and Wang, S.: BESS-STAIR: a framework to estimate daily, 30 m, and all-weather crop evapotranspiration using multi-source satellite data for the US Corn Belt, Hydrol. Earth Syst. Sci., 24, 1251–1273, https://doi.org/10.5194/hess-24-1251-2020, 2020. 

Jung, M., Reichstein, M., Schwalm, C. R., Huntingford, C., Sitch, S., Ahlström, A., Arneth, A., Camps-Valls, G., Ciais, P., Friedlingstein, P., Gans, F., Ichii, K., Jain, A. K., Kato, E., Papale, D., Poulter, B., Raduly, B., Rödenbeck, C., Tramontana, G., Viovy, N., Wang, Y. P., Weber, U., Zaehle, S., and Zeng, N.: Compensatory water effects link yearly global land CO2 sink changes to temperature, Nature, 541, 516–520, https://doi.org/10.1038/nature20780, 2017. 

Jung, M., Koirala, S., Weber, U., Ichii, K., Gans, F., Gustau-Camps-Valls, Papale, D., Schwalm, C., Tramontana, G., and Reichstein, M.: The FLUXCOM ensemble of global land-atmosphere energy fluxes, Sci. Data, 1–14, https://doi.org/10.1038/s41597-019-0076-8, 2019. 

Karlsson, K.-G., Anttila, K., Trentmann, J., Stengel, M., Fokke Meirink, J., Devasthale, A., Hanschmann, T., Kothe, S., Jääskeläinen, E., Sedlar, J., Benas, N., van Zadelhoff, G.-J., Schlundt, C., Stein, D., Finkensieper, S., Håkansson, N., and Hollmann, R.: CLARA-A2: the second edition of the CM SAF cloud and radiation data record from 34 years of global AVHRR data, Atmos. Chem. Phys., 17, 5809–5828, https://doi.org/10.5194/acp-17-5809-2017, 2017. 

Kimm, H., Guan, K., Jiang, C., Peng, B., Gentry, L. F., Wilkin, S. C., Wang, S., Cai, Y., Bernacchi, C. J., Peng, J., and Luo, Y.: Deriving high-spatiotemporal-resolution leaf area index for agroecosystems in the U.S. Corn Belt using Planet Labs CubeSat and STAIR fusion data, Remote Sens. Environ., 239, 111615, https://doi.org/10.1016/j.rse.2019.111615, 2020. 

Kobrick, M. and Crippen, R.: SRTMGL1: NASA Shuttle Radar Topography Mission Global 1 arc second V003, NASA EOSDIS L. Process. DAAC, https://doi.org/10.5067/MEaSUREs/SRTM/SRTMGL1.003, 2017. 

Lark, T. J., Mueller, R. M., Johnson, D. M., and Gibbs, H. K.: Measuring land-use and land-cover change using the U.S. department of agriculture's cropland data layer: Cautions and recommendations, Int. J. Appl. Earth Obs. Geoinf., 62, 224–235, https://doi.org/10.1016/j.jag.2017.06.007, 2017. 

Li, W., MacBean, N., Ciais, P., Defourny, P., Lamarche, C., Bontemps, S., Houghton, R. A., and Peng, S.: Gross and net land cover changes in the main plant functional types derived from the annual ESA CCI land cover maps (1992–2015), Earth Syst. Sci. Data, 10, 219–234, https://doi.org/10.5194/essd-10-219-2018, 2018. 

Liaw, A. and Wiener, M.: Classification and regression by randomForest, R News, 2, 18–22, 2002. 

Liu, L., Guan, L., and Liu, X.: Directly estimating diurnal changes in GPP for C3 and C4 crops using far-red sun-induced chlorophyll fluorescence, Agr. Forest Meteorol., 232, 1–9, https://doi.org/10.1016/j.agrformet.2016.06.014, 2017. 

Luo, Y., Guan, K., and Peng, J.: STAIR: A generic and fully-automated method to fuse multiple sources of optical satellite data to generate a high-resolution, daily and cloud-/gap-free surface reflectance product, Remote Sens. Environ., 214, 87–99, https://doi.org/10.1016/j.rse.2018.04.042, 2018. 

Lyapustin, A., Wang, Y., Laszlo, I., Kahn, R., Korkin, S., Remer, L., Levy, R., and Reid, J. S.: Multiangle implementation of atmospheric correction (MAIAC): 2. Aerosol algorithm, J. Geophys. Res.-Atmos., 116, D03211, https://doi.org/10.1029/2010JD014986, 2011. 

Magney, T. S., Bowling, D. R., Logan, B., Grossmann, K., Stutz, J., and Blanken, P.: Mechanistic evidence for tracking the seasonality of photosynthesis with solar-induced fluorescence, P. Natl. Acad. Sci., 116, 11640–11645, https://doi.org/10.1073/pnas.1900278116, 2019. 

Monteith, J. L.: Solar Radiation and Productivity in Tropical Ecosystems, J. Appl. Ecol., 9, 747, https://doi.org/10.2307/2401901, 1972. 

Monteith, J. L. and Moss, C. J.: Climate and the Efficiency of Crop Production in Britain, Philos. T. Roy. Soc. B Biol. Sci., 281, 277–294, https://doi.org/10.1098/rstb.1977.0140, 1977. 

Remer, L. A., Mattoo, S., Levy, R. C., and Munchak, L. A.: MODIS 3 km aerosol product: algorithm and global perspective, Atmos. Meas. Tech., 6, 1829–1844, https://doi.org/10.5194/amt-6-1829-2013, 2013. 

Román, M. O., Schaaf, C. B., Woodcock, C. E., Strahler, A. H., Yang, X., Braswell, R. H., Curtis, P. S., Davis, K. J., Dragoni, D., Goulden, M. L., Gu, L., Hollinger, D. Y., Kolb, T. E., Meyers, T. P., Munger, J. W., Privette, J. L., Richardson, A. D., Wilson, T. B., and Wofsy, S. C.: The MODIS (Collection V005) BRDF/albedo product: Assessment of spatial representativeness over forested landscapes, Remote Sens. Environ., 113, 2476–2498, https://doi.org/10.1016/j.rse.2009.07.009, 2009. 

Running, S. W., Nemani, R. R., Heinsch, F. A., Zhao, M., Reeves, M., and Hashimoto, H.: A continuous satellite-derived measure of global terrestrial primary production, Bioscience, 54, 547, https://doi.org/10.1641/0006-3568(2004)054[0547:ACSMOG]2.0.CO;2, 2004. 

Ryu, Y., Berry, J. A., and Baldocchi, D. D.: What is global photosynthesis? History, uncertainties and opportunities, Remote Sens. Environ., 223, 95–114, https://doi.org/10.1016/j.rse.2019.01.016, 2019. 

Smith, W. K., Biederman, J. A., Scott, R. L., Moore, D. J. P., He, M., Kimball, J. S., Yan, D., Hudson, A., Barnes, M. L., MacBean, N., Fox, A. M., and Litvak, M. E.: Chlorophyll Fluorescence Better Captures Seasonal and Interannual Gross Primary Productivity Dynamics Across Dryland Ecosystems of Southwestern North America, Geophys. Res. Lett., 45, 748–757, https://doi.org/10.1002/2017GL075922, 2018. 

Tibshirani, R.: Regression Shrinkage and Selection Via the Lasso, J. R. Stat. Soc. Ser. B, 58, 267–288, https://doi.org/10.1111/j.2517-6161.1996.tb02080.x, 1996. 

Turner, A. J., Köhler, P., Magney, T. S., Frankenberg, C., Fung, I., and Cohen, R. C.: A double peak in the seasonality of California's photosynthesis as observed from space, Biogeosciences, 17, 405–422, https://doi.org/10.5194/bg-17-405-2020, 2020.  

Vermote, E. F., El Saleous, N. Z., and Justice, C. O.: Atmospheric correction of MODIS data in the visible to middle infrared: First results, Remote Sens. Environ., 83, 97–111, https://doi.org/10.1016/S0034-4257(02)00089-5, 2002. 

Wu, G., Guan, K., Jiang, C., Peng, B., Kimm, H., Chen, M., Yang, X., Wang, S., Sukyer, A. E., Bernacchi, C., Moore, C. E., Zeng, Y., Berry, J., and Cendrero-Mateo, M. P.: Radiance-based NIRv as a proxy for GPP of corn and soybean, Environ. Res. Lett., 15, 034009, https://doi.org/10.1088/1748-9326/ab65cc, 2019. 

Wu, G., Guan, K., Jiang, C., Peng, B., Kimm, H., Chen, M., Yang, X., Wang, S., Suyker, A. E., Bernacchi, C., Moore, C. E., Zeng, Y., Berry, J., and Cendrero-Mateo, M. P.: Radiance-based NIRv as a proxy for GPP of corn and soybean, Environ. Res. Lett., 15, 034009, https://doi.org/10.1088/1748-9326/ab65cc, 2020. 

Xiao, J., Chevallier, F., Gomez, C., Guanter, L., Hicke, J. A., Huete, A. R., Ichii, K., Ni, W., Pang, Y., Rahman, A. F., Sun, G., Yuan, W., Zhang, L., and Zhang, X.: Remote sensing of the terrestrial carbon cycle: A review of advances over 50 years, Remote Sens. Environ., 233, 111383, https://doi.org/10.1016/j.rse.2019.111383, 2019. 

Yan, L. and Roy, D. P.: Conterminous United States crop field size quantification from multi-temporal Landsat data, Remote Sens. Environ., 172, 67–86, https://doi.org/10.1016/j.rse.2015.10.034, 2016. 

Yang, X., Tang, J., Mustard, J. F., Lee, J., Rossini, M., Joiner, J., Munger, J. W., Kornfeld, A., and Richardson, A. D.: Solar-induced chlorophyll fluorescence that correlates with canopy photosynthesis on diurnal and seasonal scales in a temperate deciduous forest, Geophys. Res. Lett., 42, 2977–2987, https://doi.org/10.1002/2015GL063201, 2015. 

Yang, Y., Xiao, P., Feng, X., and Li, H.: Accuracy assessment of seven global land cover datasets over China, ISPRS J. Photogramm. Remote Sens., 125, 156–173, https://doi.org/10.1016/j.isprsjprs.2017.01.016, 2017. 

Yuan, W., Liu, S., Yu, G., Bonnefond, J.-M. M., Chen, J., Davis, K., Desai, A. R., Goldstein, A. H., Gianelle, D., Rossi, F., Suyker, A. E., and Verma, S. B.: Global estimates of evapotranspiration and gross primary production based on MODIS and global meteorology data, Remote Sens. Environ., 114, 1416–1431, https://doi.org/10.1016/j.rse.2010.01.022, 2010. 

Zeng, Y., Badgley, G., Dechant, B., Ryu, Y., Chen, M., and Berry, J. A.: A practical approach for estimating the escape ratio of near-infrared solar-induced chlorophyll fluorescence, Remote Sens. Environ., 232, 111209, https://doi.org/10.1016/j.rse.2019.05.028, 2019. 

Zhang, Y., Xiao, X., Wu, X., Zhou, S., Zhang, G., Qin, Y., and Dong, J.: A global moderate resolution dataset of gross primary production of vegetation for 2000–2016, Sci. Data, 4, 170165, https://doi.org/10.1038/sdata.2017.165, 2017. 

Zhang, Y., Joiner, J., Gentine, P., and Zhou, S.: Reduced solar-induced chlorophyll fluorescence from GOME-2 during Amazon drought caused by dataset artifacts, Glob. Chang. Biol., 24, 2229–2230, https://doi.org/10.1111/gcb.14134, 2018. 

Download
Short summary
Photosynthesis, quantified by gross primary production (GPP), is a key Earth system process. To date, there is a lack of a high-spatiotemporal-resolution, real-time and observation-based GPP dataset. This work addresses this gap by developing a SatelLite Only Photosynthesis Estimation (SLOPE) model and generating a new GPP product, which is advanced in spatial and temporal resolutions, instantaneity, and quantitative uncertainty. The dataset will benefit a range of research and applications.
Altmetrics
Final-revised paper
Preprint