A daily, 250 m, and real-time gross primary productivity product (2000 – present) covering the Contiguous United States

. Gross primary productivity (GPP) quantifies the amount of carbon dioxide (CO 2 ) fixed by plants through 10 photosynthesis. Although as a key quantity of terrestrial ecosystems, there is a lack of high-spatial-and-temporal-resolution, real-time, and observation-based GPP products. To address this critical gap, here we leverage a state-of-the-art vegetation index, near ‐ infrared reflectance of vegetation (NIR V ), along with accurate photosynthetically active radiation (PAR), to produce a SatelLite Only Photosynthesis Estimation (SLOPE) GPP product in the Contiguous United States (CONUS). Compared to existing GPP products, the proposed SLOPE product is advanced in its spatial resolution (250 m versus > 500 15 m), temporal resolution (daily versus 8-day), instantaneity (1 day latency versus > 2 weeks latency), and quantitative uncertainty (on a per-pixel and daily basis versus no uncertainty information available). These characteristics are achieved because of several technical innovations employed in this study: (1) SLOPE couples machine learning models with MODIS atmosphere and land products to accurately estimate PAR. (2) SLOPE couples highly efficient and pragmatic gap-filling and filtering algorithms with surface reflectance acquired by both Terra and Aqua MODIS satellites to derive a soil-adjusted NIR V 20 (SANIR V ) dataset. (3) SLOPE couples a temporal pattern recognition approach with a long-term Crop Data Layer (CDL) product


Introduction 30
Gross primary productivity (GPP) quantifies the amount of carbon dioxide (CO 2 ) fixed by plants through photosynthesis (Beer et al., 2010;Jung et al., 2017). Because GPP is the largest carbon flux and influences other ecosystem processes such as respiration and transpiration, monitoring GPP is crucial for understanding the global carbon budget and terrestrial ecosystem dynamics (Bonan, 2019;Friedlingstein et al., 2019). In addition, biomass accumulation driven by GPP is the basis for food, feed, wood and fiber production, and therefore monitoring GPP is crucial for human welfare and development (Guan et al., 35 2016;Ryu et al., 2019).
Over the past two decades, a number of GPP products with different spatial and temporal characteristics have been derived using remote sensing approaches (Xiao et al., 2019). However, since GPP cannot be directly observed at large scales, different models have been developed and used in generating GPP products. Process-based models use a series of nonlinear equations 40 to represent the atmosphere-vegetation-soil system and associated fluxes. For example, a publicly-available global GPP product using process-based models is the Breathing Earth System Simulator (BESS) . Machine-learning models upscale site-observed GPP to a larger scale by establishing non-parametric relationships between ground-truth and gridded explanatory variables. The FLUXCOM GPP product is a typical example of this approach (Jung et al., 2019). Semiempirical approaches utilize equations with a concise physiological meaning (e.g., light use efficiency) that are parameterized 45 with several empirical constraint functions. The MOD17 GPP product (Running et al., 2004), the Vegetation Photosynthesis Model (VPM) GPP product (Zhang et al., 2017), and the Global LAnd Surface Satellite (GLASS) GPP product (Yuan et al., 2010) belong to this category.
With differing principles, assumptions and complexity, existing remote sensing GPP models heavily rely upon inputs with 50 large uncertainties. First, climate forcing, such as temperature, humidity, precipitation and wind speed, are commonly used in these GPP models. However, these meteorological data are not observed but derived from reanalysis approaches and usually have coarse spatial resolution (e.g., > 10 km) and large time lag (e.g., weeks). Second, plant functional types (PFTs) are used to define different parameterization schemes in those models. To date, satellite land cover products are usually characterized by considerably large time lag (> 1 year), relatively low accuracy (≤ 70%) (Yang et al., 2017), and more uncertainties with 55 regards to year-to-year variations (Cai et al., 2014;Li et al., 2018). Third, high-level remote sensing land products such as leaf area index (LAI), fraction of absorbed photosynthetically active radiation (FPAR), clumping index (CI), land surface temperature (LST) and soil moisture (SM) are used by some models. These variables are not directly observed but retrieved by complicated algorithms, and their accuracy still needs significant improvement to meet requirements of earth system models (GCOS, 2011). 60 Alternative approaches which heavily rely on reliable satellite observations with low dependence on uncertain model structure/parameterization and model inputs are highly required. Solar-induced fluorescence (SIF) emerged in recent years may provide a new opportunity for GPP estimation (Guanter et al., 2014). Linear relationships have been found between SIF and GPP at various ecosystems (Liu et al., 2017;Magney et al., 2019;Yang et al., 2015). However, satellite SIF data generally 65 have coarse resolution, large spatial gaps, short temporal coverage, and limited quality (Bacour et al., 2019;Zhang et al., 2018), and therefore not suitable for many applications.
Near-infrared reflectance of vegetation (NIR V,Ref ), defined as the product of normalized difference vegetation index (NDVI) and observed NIR reflectance (NIR Ref ) (Eq. [1]), has recently been presented as a proxy of GPP (Badgley et al., 2017). A 70 global monthly 0.5° GPP dataset has been produced from satellite data using the linear relationship between NIR V,Ref and GPP , explaining 68% GPP variations observed by the FLUXNET network. Several field studies have recently found that taking incoming radiation into account further improves the NIR V ~ GPP relationship (Dechant et al., 2020;Wu et al., 2020). Because MODIS provides long-term and real-time (2000 -present) observations of red (Red Ref ) and NIR (NIR Ref ) reflectance and atmospheric conditions with high spatial (250 m for reflectance and 1 km for atmosphere) and temporal (daily) 75 resolutions, now there is an unprecedented opportunity to generate an observation-based GPP product. , Leveraging the concept of NIR V , here we present a new GPP model and the resultant daily, 250m, and real-time GPP product (2000 -present) covering the Contiguous United States (CONUS) (Jiang and Guan, 2020). The product is named SatelLite Only Photosynthesis Estimation (SLOPE) because (1) the model only uses satellite data, and (2) the model only has two slope 80 parameters. Detailed model design, multi-source satellite data processing, and comprehensive evaluation procedures are elucidated below.

Production of the SLOPE product
The method we used to estimate GPP using the novel vegetation index NIR V,Ref follows the concept of light use efficiency (LUE) (Monteith, 1972;Monteith and Moss, 1977): 85 (2) Since NIR V,Ref has been found strongly correlated to FPAR (Badgley et al., 2017), and moderately correlated to LUE , it is possible to simplify Eq. (2) as: , where a and b are slope and intercept which can be fitted from ground GPP observations. Both PAR and NIR V,Ref can be easily derived from satellite observations with high spatial and temporal resolutions in real time, avoiding complicated but uncertain algorithm/parameterization to quantify FPAR and LUE in Eq. (2). This linear relationship is likely to converge within C3 90 species , but differs between C3 and C4 species (Wu et al., 2019). Accordingly, land cover data with considerably large time lags and relatively low accuracy may not be necessary for the model parameterization. Instead, an inseason C3/C4 species dataset is needed for the accurate calibration of the linear relationship.
Defining the ratio of GPP to PAR as the incident PAR use efficiency (iPUE) gives: 95 Here iPUE is a confounding factor of canopy structure and leaf physiology, representing the capacity of plants to use incoming radiation for photosynthesis. When vegetation is absent, iPUE is zero and NIR V,Ref is expected to be zero too. However, this is not true in reality as >99.9% soils have positive NIR V,Ref values according to a global soil spectral library (Jiang and Fang, 2019), and the correction of NIR V,Ref for soil is needed for better performance at low vegetation cover . To address this issue, we will propose spatially-explicit correction for NIR V,Ref to derive a soil adjusted index SANIR V (see details 100 in section 2.2). Since SANIR V = 0 when iPUE = 0, Eq. (4) becomes: where c is the slope coefficient.
Considering the presence of mixed pixel of C3 and C4 species with the 250 m pixels, Eq. (5) can be rewritten as: where c C4 and c C3 are the coefficients for C4 and C3 species, respectively, and f C4 is the fraction of C4 species in vegetation. 105 Therefore, the SLOPE GPP model is: In the SLOPE model (Eq. [7]), PAR, SANIR V and f C4 are remote sensing inputs, whereas c C4 and c C3 are model parameters to be calibrated using ground-truth GPP data ( Fig. 1). In the following sections, we will elaborate on the derivation of PAR, SANIR V , and f C4 , along with their quantitative uncertainties, and the model calibration for parameters c C4 and c C3 . With the 110 uncertainty of each term (∆c C4 , ∆c C3 , ∆f C4 , ∆PAR and ∆SANIR V ), the uncertainty of GPP can be estimated in a spatiotemporally-explicit manner by: Figure 1. Framework to produce the SLOPE GPP product. The box with dash lines is the legend.

Derivation of PAR 115
SLOPE adopts several machine learning approaches to compute PAR using forcing data mainly from Terra & Aqua/MODIS Atmosphere and Land products (data solely from morning satellite Terra, afternoon satellite Aqua, and combined of the two satellites are called MOD, MYD and MCD, respectively hereinafter). The list of inputs include aerosol optical depth (AOD) at 3 km and 1 km resolutions from MOD/MYD04_3K and MCD19A2 products (Lyapustin et al., 2011;Remer et al., 2013), respectively, total column water vapour (TWV) at 1 km resolution from MOD/MYD05_L2 products (Chang et al., 2015), 120 cloud optical thickness (COT) at 1 km resolution from MOD/MYD06_L2 products (Baum et al., 2012), total column ozone burden (TO3) at 5 km resolution from MOD/MYD07_L2 products (Borbas et al., 2015), white-sky land surface shortwave albedo (ALB) at 500 m resolution from MCD43A3 product (Román et al., 2009), and altitude (ALT) at 30 m resolution from Shuttle Radar Topography Mission Global 1 arc second (SRTMGL1) product (Kobrick and Crippen, 2017).

125
MODIS atmosphere products are swath data and swaths vary day by day. To maintain consistency and facilitate further usage, all data are reprojected using the nearest neighbor resampling approach to the Conus Albers projection on a NAD83 datum (EPSG: 6350) with a 1 km spatial resolution. For swath data, overlap area exists between two paths. In this case, data with smaller sensor view zenith angles provided by MOD/MYD03_L2 products are chosen. MODIS land products and SRTMGL1 are tile data with finer resolution than 1 km. They are reprojected to the EPSG 6350 spatial reference by aggregating all fine 130 resolution pixels within each 1 km grid.
Data gaps exist in all MODIS products and gap-filling is required. For MODIS atmosphere products, gaps in MOD/MYD are first filled by data in MYD/MOD counterpart on the same day, followed by multi-year average on that day. Since the multi-year average of COT is always non-zero, directly using it for gap-filling always implies a cloudy condition. Therefore, 135 CLARA-2 cloud mask at 0.05° acquired from NOAA/AVHRR data is employed (Karlsson et al., 2017). Only MODIS data gaps for AVHRR cloudy pixels are filled by multi-year average of COT, whereas MODIS COT data gaps for AVHRR clear pixels are set to 0. For the MODIS land product, i.e., ALB, a temporally moving window with a 7-day radius is utilized for a specific day, and a Gaussian filter is applied to the time series data within the moving window on a per pixel basis. The filtered values are used to fill gaps on that specific day. 140 Machine learning approaches are used to upscale ground-truth to satellite. Ground-truth is from the Surface Radiation Budget (SURFRAD) Network (Augustine et al., 2000), including seven long-term continuous sites across the CONUS. Daily mean shortwave radiation (SWR) and PAR on the surface are calculated from site observations at 1 -3 min intervals from 2000 through 2018. Daily mean SWR at the top of atmosphere (SWR TOA ) is calculated using latitude and day of year (DOY) 145 information (Allen et al., 1998). Subsequently, atmospheric transmittance (t SWR ) and proportion of PAR in SW (p PAR ) are calculated as SWR/SWR TOA and PAR/SWR. Models are built to estimate t SWR first, followed by p PAR . MOD data representing atmospheric conditions in the morning and MYD for the afternoon are used separately for the estimation, and the two estimates are averaged to account for discrepancies 150 between morning and afternoon. Clear and cloudy conditions are also treated separately in modeling considering the absence/presence of non-zero COT data. For the estimation of t SWR , ALB, ALT and SWR TOA are used in addition to atmosphere data, whereas for p PAR , ALB, ALT and the estimated t SWR are used. A summary of model inputs is listed in Table 1.
Four different machine learning approaches are employed to estimate t SWR and p PAR . They are least absolute shrinkage and 155 selection operator (LASSO) (Tibshirani, 1996), multivariate adaptive regression splines (MARS) (Friedman, 1991), k-nearest neighbors regression (KNN) (Goldberger et al., 2005), and random forest regression (RF) (Liaw and Wiener, 2002). We used Scikit-learn, a free software machine learning library for the Python programming language, to build the models. All the four algorithms were automatically optimized by tuning their hyperparameters using five-fold-cross-validation on their training dataset. All inputs and outputs are the same for the four approaches. Four different PAR estimations are then obtained by Eq. 160 (9), and their ensemble mean and standard deviation are considered as the final estimation and uncertainty, respectively.
[1]) from MODIS band 1 (red) and band 2 (NIR) surface reflectance (SR) at 250 m resolution from MOD/MYD09GQ products (Vermote et al., 2002). Only pixels with quality control (QC) information of 'corrected product produced at ideal quality all bands' were used. Since cloud and cloud shadows substantially reduce NIR V,Ref values, SLOPE adopts three strategies to mitigate the cloud contamination. 170 First, the cloud mask is applied. MOD/MYD COT data processed in Section 2.1 are resampled to the same spatial reference with MOD/MYD SR data and used to mask out cloudy pixels. At this point, a morphological dilation operation is used to enlarge the cloud mask to account for cloud edges. However, since COT data have a coarser resolution (1 km) than SR data (250 m), there are still partial clouds and cloud shadows left after this step. 175 Second, MOD and MYD data are combined. Ideally, on a specific day, MOD and MYD NIR V,Ref should be identical if they are obtained under the same conditions. However, the remaining cloud contamination and sun-target-sensor geometry could cause differences between morning and afternoon observations. Considering vegetation index is more sensitive to cloud contamination than sensor view zenith angle, a simple criterion is applied to combine MOD and MYD observations. If the 180 difference between MOD and MYD NIR V,Ref is greater than or equal to a predefined threshold (0.1 in this study), then the smaller one is likely cloud contaminated and the larger one is used. Otherwise, the average value of the two is used. However, in many cases, both MOD and MYD data are contaminated, and sensor view zenith angle may cause unexpected day-to-day variations.

185
Third, a temporal filter is applied. The filter is based on the assumption that NIR V,Ref should change smoothly within a short time period. Accordingly, a temporally moving window with a 7-day radius is utilized for a specific day. Mean and standard deviation are calculated from the NIR V,Ref time series on a per pixel basis. Values outside the range of mean ± 1.5 standard deviations are considered as outliers and dropped. Subsequently, the mean of the first 3 days and that of the last 7 days are calculated, respectively. If the NIR V,Ref value of the target day is 20% smaller or larger than both the first 3 days mean and the 190 last 3 days mean, then that NIR V,Ref value is considered as an outlier and dropped.
After the removal of outliers, a large amount of data gaps exist and gap-filling is required. Similar to ALB in Section 2.1, a temporally moving window with a 7-day radius is utilized for a specific day, and a Gaussian filter is applied and used to fill gaps on that day. The rest of data gaps are filled with multi-year average of NIR V,Ref . Considering extreme cases that no data 195 is available on a specific day over all years, multi-year average of ± 3 days is used for the final gap-filling.
where NIR V,Peak is the maximum value of multi-year average NIR V,Ref time series on a per-pixel basis. SANIR V does not change In general, SANIR V is supposed to be smooth within a short time period, therefore, the standard deviation within the ±3-day temporal window is 210 calculated as uncertainty.

Derivation of C4 fraction
A National Land Cover Database (NLCD) along with a crop-specific land cover product Cropland Data Layer (CDL) are used to derive the fraction cover of C4 crop in vegetation (f C4 ). NLCD is a comprehensive land cover database produced by the 215 United States Geological Survey (USGS). It provides several main thematic classes such as urban, agriculture, and forest with high accuracy (Homer et al., 2004). The 30 m nationwide NLCD data are available in 2001, 2004, 2006, 2008, 2011, 2013 and 2016. CDL is an agriculture-oriented product produced by the United States Department of Agriculture (USDA). It provides > 100 crop cover types and leverages other land cover types from NLCD (Boryan et al., 2011). Across the CONUS CDL data are available at a 30m spatial resolution and in a yearly temporal frequency from 2008 through 2019, whereas in some areas 220 annual data are available back to the 1990s.
The fraction of C4 crop in vegetated areas is first derived using existing CDL data. NLCD land cover types are categorized into vegetated areas and non-vegetated areas with 30 m resolution. Fraction of vegetated areas in total area is subsequently calculated for each 250 m pixel. Similarly, CDL crop types are categorized into C4-planted areas and non-C4 areas with 30 m 225 resolution. Fraction of C4 crops in total area is subsequently calculated for each 250 m pixel. The ratio of fraction of C4 crops in total area to fraction of vegetated areas in total area is calculated to derive fraction of C4 crop in vegetated areas at 250 m resolution. Since NLCD data is not available every year, an assumption is made that one year NLCD data can represent adjacent To predict the fraction of C4 crop in vegetation for region-years that no CDL data is available, crop rotation patterns are identified from historical data. Assuming that C4 crops are planted following three rotation strategies: C4/non-C4, C4/C4/non-C4, and non-C4/non-C4/C4, and assigning 1 to C4 and 0 to non-C4, a total of eight possible time series during the period of 235 2008 -2019 when nationwide CDL data are available are listed in Table 2. On a per-pixel basis, the time series of the fraction of C4 crop in vegetation during 2008 -2019 is compared with the eight predefined rotation patterns. PeaSLOPEn coefficient r is calculated between actual time series and each of the eight patterns, and the pattern yielding the largest r is the identified rotation pattern. Once the pattern is identified, fraction of C4 crop in vegetated areas in any unknown year can be inferred. If one year is inferred as C4, then the multi-year average of C4 fraction over C4-dominated years is used. Otherwise, the multi-240 year average over C3-dominated years is used. If the largest r is smaller than 0.497, i.e., p > 0.1 for 12 years, then it is considered as no significant pattern and the multi-year average over all years is used. The RMSE between predicted and reference CDL C4 fraction is calculated as uncertainty. To account for the land cover change, the predicted C4 crop fraction is set to 0 in years when NLCD data is not classified as cropland. It is worth mentioning that C4 grassland and shrubland are not considered in this study as no nationwide high-resolution distribution data is available. 245 Table 2. Predefined C4-planting patterns from 2008 through 2019. If C4 crop dominates in a specific year, 1 is assigned.

Calibration for iPUE coefficients 250
SLOPE was calibrated using the GPP data derived from AmeriFlux site observations. The AmeriFlux network is a community of sites that use eddy-covariance technology to measure landscape-level carbon, water, and energy fluxes across the Americas (Baldocchi et al., 2001). A total of 48 sites (324 site years) were involved in this study ( Table S3). All of the 43 sites in the FLUXNET2015 Tier 1 dataset (variable name: GPP_DT_VUT_MEAN; quality control: NEE_VUT_REF_QC) in the CONUS were used, because this dataset was produced by standardized data processing pipeline with strict data quality control protocols 255 and is commonly considered as ground-truth. Additional 7 sites were from the AmeriFlux level 4 dataset (variable name: GPP_or_MDS; quality control: NEE_or_fMDSsqc). This dataset was generated more than ten years ago and only AmeriFlux Core Sites that are not covered by FLUXNET2015 were used for data quality consideration. For both datasets, only days with the best quality control flags were used in the SLOPE modelling and evaluation procedures.

260
We used Eq. (5) to conduct model calibration. Although SLOPE considers iPUE ~ SANIR V relationship for C3 and C4 species, we also tested other configurations for comparison purposes. Configuration 1 ("all"): all data were used together to fit a universal iPUE coefficient c. Configuration 2 ("C3/C4"): data were separated for C3 and C4 species to fit c C3 and c C4 , respectively. It is worth mentioning that only C4 crops (6 sites) were considered as C4 species, whereas C4 grass and shrubs  (Fig. 2a). As an example, on Jul 10, 2020, large areas in New Jersey, Wisconsin, Oklahoma, South Dakota and Montana display significantly lower values than other areas, due to dominant impacts of cloud (Fig. S1). Aerosol optical depth (Fig. S2), total water vapor (Fig. S3) and total ozone burden (Fig. S4) also influences the amount of clear-sky 280 PAR to some degree. For example, the southeastern part of the CONUS show more aerosol and thus lower PAR values than other cloud-free areas. In addition to the total amount of PAR, SLOPE PAR also reveals variations in the ratio of PAR to SWR (Fig. S5). Despite of a relatively small range (0.40-0.46), it is negatively correlates with cloud optical thickness and total ozone burden and positively correlates with total water vapor. PAR uncertainties caused by the difference of the four machine learning algorithms are generally small (< 5%; Fig. 2b). Higher uncertainties are mainly distributed in cloudy and desert areas. 285 To evaluate the SLOPE PAR, we used two different site observation datasets which are independent of the PAR derivation procedure. The first dataset is SURFRAD (Table S1). While SURFRAD data from 2000 through 2018 were used for model training, we used data in 2019 for evaluation. The second dataset is FLUXNET2015 (Table S2). A total of 41 sites providing PAR data were used for the evaluation. For both datasets, only days with the best quality control flags were used. 290 Evaluation results show that SLOPE PAR is in a highly aligned agreement with ground truth independent from the training procedure (Fig. 3). Across the seven SURFRAD sites in 2019 and the 41 AmeriFlux sites from 2000 to 2014, SLOPE PAR achieves an overall coefficient of determination (R 2 ) of 0.91, and root-mean-square errors (RMSE) of 1.09 and 1.19 MJ m -2 d -1 , respectively. In addition, the performance is reasonably stable under different sky conditions, indicated by similar R2 and 295 RMSE values from low to high atmospheric transmittance ( Figure S6).   SLOPE SANIR V demonstrates detailed and distinctive spatial variations in the CONUS (Fig. 4a). In the peak growing season, remarkably high SANIR V values (~ 0.5) from the Corn Belt in the Central US are observed. This area is one of the most productive areas on Earth, producing more than 30% of global corn and soybean (Green et al., 2018). Low values (< 0.2) are mainly observed in grasslands and shrublands in the Western US. Uncertainty is associated with SANIR V data on the pixel basis (Fig. 4b). In general, areas with higher SANIR V values also have higher uncertainties. However, this pattern is altered by atmospheric conditions, where areas with higher cloud optical thickness (Fig. S1), aerosol optical depth (Fig. S2), and water vapour (Fig. S3) values tend to have larger uncertainties. 320 Fig. 5 demonstrates that SLOPE SANIR V is able to capture spatial and temporal variations at small scale (e.g., within a county).

Performance of SANIRV
An overall drop in SANIR V due to an extreme event damage can be observed within a short time period, thanks to the high temporal resolution (daily) of the SLOPE product. In addition, differences between plots possibly indicating different varieties, planting density and management, can also be observed, thanks to the high spatial resolution (250 m) of the SLOPE product. 325 Figure 6. Comparison between SANIR V and raw NIR V derived from MOD09GQ and MYD09GQ products at six AmeriFlux sites (Table S3)  Shaded areas indicate uncertainties of SANIR V . SLOPE SANIR V shows significantly different seasonality for different PFTs (Fig. 6). The evergreen needleleaf forest site US-Blo is characterized by a relatively stable SANIR V seasonal cycle in 2019 (Fig. 6a), indicated by a CV = 14.9% only. The deciduous broadleaf forest site US-Ha1 has a large seasonal variation with a CV = 108.6% (Fig. 6b). The SANIR V value 335 suddenly rises from 0 to 0.3 in May, reaches 0.4 in June and July, and gradually decreases back to 0 in October. The hot desert open shrubland site US-Whs has an overall low SANIR V value (Fig. 6c), with a peak value observed in early October. The grassland site US-AR1 shows a distinct double-peak (in June and September) seasonal pattern (Fig. 6d), which is caused by the precipitation seasonality there. The wetland site US-Myb is characterized by a long growing season and a flat peak from April to November (Fig. 6e). The cropland site US-Bo1 has corn planted in 2019, and it shows the highest SANIR V peak up 340 to 0.5 among all the shown six sites (Fig. 6f). It is worth mentioning that compared to the two raw satellite-observed NIR V provided by MOD09GQ and MYD09GQ products, respectively, SLOPE SANIR V successfully removes the soil impact in the non-growing season as the values equal to or close to zero. In addition, SLOPE SANIR V is gap-free and much less contaminated by noises. Furthermore, spatiotemporally-explicit uncertainty is associated with each SANIR V value. SLOPE predicts a reasonable fraction of C4 crop in vegetation in the CONUS (Fig. 7a). Most of the C4 crops are located in the Corn Belt, especially in Indiana, Illinois, Iowa and Nebraska. A direct comparison between predicted C4 crop fraction (Fig.   8a) and independent reference CDL data (Fig. 8b) indicates that the SLOPE prediction is able to reconstruct the spatial patterns of the fraction of C4 crop in vegetation at 250 m resolution. It is worth mentioning that the uncertainty metric RMSE is sensitive 360

Performance of C4 fraction 345
to extreme values, and it is different from misclassification rate (0.4 does not mean 40%). For a pure pixel of a corn/soybean rotation field, the RMSE = 0.39 if three out of 20 years is misclassified, i.e., misclassification rate = 0.15. A further investigation with regard to interannual dynamics shows that the SLOPE predictions can even perform better than CDL reference data (Fig. 9), benchmarked with ground truth collected in the field. At this point, the CDL land cover could be prone to uncertainties in both satellite observation and classification algorithm, and classification is conducted year by year without 365 an interannual consideration (Lark et al., 2017). SLOPE employs a rotation model to match decadal time series of CDL data, during which procedure noises in CDL data are suppressed. The features that SLOPE is able to reconstruct spatial and interannual patterns of CDL data enables producing GPP in years when CDL data is unavailable (e.g., 2020 and years before 2008 for most regions). It is worth mentioning that uncertainty is also associated with each pixel (Fig. 7b). SLOPE SANIR V shows a strong linear correlation with iPUE (Fig. 10). When data from all 49 sites (324 site years) are used together, the SANIR V ~ iPUE relationship has an overall R 2 value of 0.73 (Fig. 10a). This is composed of R 2 of 0.92 from C4 385 species (Fig. 10b) and 0.70 from C3 species (Fig. 10c). C3 species can be further decomposed into six PFTs (Fig. 10d -10i), among which cropland has the highest R 2 value up to 0.80 (Fig. 10i), whereas evergreen needleleaf forest has the lowest value of 0.46 (Fig. 10d). This relatively weak iPUE ~ SANIRv relationship is expected because evergreen needleleaf forest tends to allocate resources for leaf construction and maintenance at large time scales and does not have much flexibility to change canopy structure and leaf color as a response to varying environment at small time scales Chabot and 390 Hicks, 1982). Previous studies found that changes in xanthophyll cycle instead of chlorophyll concentration or absorbed PAR explained the seasonal variation of photosynthetic capacity in evergreen needleleaf forest (Gamon et al., 2016;Magney et al., 2019). Therefore, SIF was suggested by some studies as a better proxy of photosynthetic capacity in this ecosystem (Smith et al., 2018;Turner et al., 2020), though satellite SIF has coarser spatial resolution, shorter temporal coverage, and larger temporal latency, and lower signal-to-noise ratio than SANIRv. In addition, the relatively weak iPUE ~ SANIRv relationship is also 395 partly because of the small value ranges in both SANIR V and iPUE.
The overall slope is 3.82 gC MJ -1 for all data (Fig. 10a). Distinct difference is found between C4 (5.18; Fig. 10b) and C3 (3.54; Fig. 10c) species, suggesting the importance of separating C4 from C3 species in modelling. The slope values vary to a limited degree within C3 species (Fig. 10d -10i), ranging from 3.26 gC MJ -1 (cropland; Fig. 10i) to 3.80 gC MJ -1 (evergreen needleleaf 400 forest; Fig. 10d), indicating the insignificance of separating different PFTs. It is worth mentioning that the SANIR V ~ iPUE relationship has a zero intercept because of the successful removal of the soil impact. Figure 11. Statistics of the SANIR V ~ iPUE relationship from cross validation. (a) Slopes of the SANIR V ~ iPUE relationship over different subsets. (b) R 2 between AmeriFlux GPP and estimated GPP using different model configurations for the training and testing datasets, respectively. Error bars in both subplots indicate 95% confidential intervals over 500 experiments.
A 100-time repeated 5-fold cross-validation reveals the robustness of the SANIR V ~ iPUE relationships (Fig. 11). Here the 410 repeated cross-validation means the whole GPP dataset from all 49 sites (324 site years) is randomly split into 5 folds, 4 folds for training and 1 fold for testing, and the process is repeated 100 times yielding 500 training-testing splits in total. In all subsets, the uncertainties of the iPUE coefficient c (the slope of the SANIR V ~ iPUE relationship) are less than 1% (Fig. 11a).
When using the three different model configurations, the model performances in simulating the whole training/testing datasets also show little variation (Fig. 11b), in general < 0.5% and < 1.5% for the training and testing datasets, respectively. Moreover, 415 the R 2 values between training and testing datasets, and between C3/C4 and PFT-based configurations are almost identical (~ 0.76). These results suggest using c C4 = 5.18 (Fig. 10b) and c C3 = 3.54 (Fig. 10c) in SLOPE is reasonable. The 95% confidential intervals of c for C4 and C3 species (Fig. 11a) are used as their uncertainties in SLOPE. SLOPE GPP demonstrates detailed and distinctive spatial variations in the CONUS (Fig. 12a). The Corn Belt is the most 430 productive area, largely contributed by the C4 crop corn whose GPP could reach up to 30 gC m -2 d -1 (Fig. 13a). Forested areas in the Eastern US show medium GPP values, followed by forests and croplands in the Western US. Grasslands and shrublands in the Central and Western US generally show low productivity. On this example day, the R 2 of spatial patterns between GPP and SANIR V , GPP and C4 fraction, and GPP and PAR across the CONUS are 0.89, 0.34, and 0.01, respectively. SANIR V , an integrated vegetation index containing information of both FPAR and LUE (Eq. [4]), explains the majority of GPP spatial 435 variation. C4 fraction mainly contributes to the distribution and magnitude of the peak in GPP spatial variation. Although PAR does not influence the nationwide GPP spatial variation, it regulates GPP values at local scale. For example, Northeastern Nebraska shows smaller GPP values than Southeastern Nebrask in spite of similar SANIR V (Fig. 4a) and C4 fraction (Fig. 7a) because of smaller PAR values (Fig. 2a). At small scale (e.g., within a county), the 250 m resolution (~0.06 km 2 per pixel) SLOPE GPP is close to revealing field-level heterogeneity, considering that the mean and median crop field sizes in the 440 CONUS are 0.19 km 2 and 0.28 km 2 , respectively (Yan and Roy, 2016). For example, Fig. 13a shows large contrast in GPP but Fig. 13b is more homogeneous. This is because corn reaches peak growing season in early July when soybean canopy is still open. SLOPE GPP with its pixel size much smaller than field area is therefore able to show GPP difference between corn and soybean. In late August, corn turns yellow while soybean is still green and active, and therefore they have similar GPP values considering corn is C4 while soybean is C3. We suggest that the 250 m resolution makes a big difference from existing global 445 GPP products whose spatial resolutions are at least 500 m (~0.25 km 2 per pixel). Quantitative uncertainty is provided for each SLOPE GPP estimate (Eq. [8]). The spatial pattern shows that the Corn Belt has the largest uncertainty ( Fig. 12b; e.g., 5 gC m -2 d -1 ) due to the considerable contribution from the uncertainty of C4 fraction (Fig. 7b). SLOPE GPP agrees fairly well with ground-truth from the AmeriFlux (Fig. 14). Across all of the 49 sites (328 site years; Fig.  455 14a), SLOPE GPP achieves an overall R 2 of 0.85, RMSE of 1.63 gC m -2 d -1 , and relative error of 37.8%. For individual sites ( Fig. 14b), the median R 2 and RMSE are 0.80 and 1.69 gC m -2 d -1 , respectively. C4 cropland generally shows the highest median R 2 value (0.93), followed by deciduous broadleaf forest and mixed forest (0.88) and C3 cropland (0.87). The lowest median R 2 value (0.69) is observed for evergreen needleleaf forest. With regard to RMSE, smaller median values are found in grassland (1.09 gC m -2 d -1 ), shrubland and woody savannah (1.48 gC m -2 d -1 ), and deciduous broadleaf forest and mixed forest 460 (1.48 gC m -2 d -1 ), whereas C3 (2.15 gC m -2 d -1 ) and C4 (2.01 gC m -2 d -1 ) cropland tend to have larger RMSE values. Figure 15. Comparison between AmeriFlux (black dots) and SLOPE (red curves) daily GPP at six AmeriFlux sites (Table S3)  Shaded areas indicate uncertainties of SLOPE GPP. SLOPE GPP generally captures seasonal and interannual variations of AmeriFlux GPP for different PFTs (Fig. 15). At the evergreen needleleaf forest site US-Blo (Fig. 15a), the GPP seasonal cycle is mainly driven by PAR as the iPUE indicated by 470 SANIR V is fairly stable (Fig. 6a). At the deciduous broadleaf forest site US-Ha1 (Fig. 15b), the start of season and the end of season agree well between AmeriFlux GPP and SLOPE GPP. At the open shrubland site US-Whs (Fig. 15c), the quick rise and drop of GPP in response to the start and end of the wet season are clearly observed in SLOPE GPP. Even the double-peak pattern in 2011 can be observed in SLOPE GPP. At the grassland site US-AR1 (Fig. 15d) (Fig. 15f), the rotation-caused year-to-year variation is distinct, indicated by higher values in odd number years with C4 crop corn planted and lower values in even years with C3 crop soybean planted (Fig. 9d).
It is also observed the lowest GPP peak in 2012 when a severe drought attacked the Central US.

Data availability and data format
The archived daily 250 m resolution SLOPE GPP data product from 2000 to 2019 is distributed under a Creative Commons 480 https://daac.ornl.gov/daacdata/cms/SLOPE_GPP_CONUS/data/) (Jiang and Guan, 2020). Data from 2020 are available from the authors upon request. All data are projected in the standard MODIS Land Integerized Sinusoidal tile map projection and are stored in GeoTIFF format files with a data type of signed 16-bit integer. Each processing tile is in size of 4800 pixels by 485 4800 pixels, representing approximately 1200 km by 1200 km land region. In addition to the GPP product, SLOPE PAR, SANIR V , and C4 fraction, along with their uncertainties, are also released. These datasets are also stored in the same spatial projection and file format with the GPP dataset. PAR (resampled from 1 km to 250 m to be consistent with GPP) and SANIR V are provided on a daily basis, whereas C4 fraction is provided on an annual basis. A README file is provided along with the SLOPE product, which instructs the usage of the data. 490

Conclusions
This study produces a long-term and real-time (2000 -present) GPP product with daily and 250 m spatial and temporal resolutions. The product is based on remote sensing only (SLOPE) model, which uses accurate PAR, soil-adjusted NIR v , and dynamic C4 fraction as inputs. Evaluation against AmeriFlux ground-truth GPP shows that the SLOPE GPP product has a reasonable accuracy, with an overall R 2 of 0.85 and RMSE of 1.63 gC m -2 d -1 . To demonstrate the real-time capacity of the 495 SLOPE GPP product, the latest GPP data on Nov 2, 2020, two days prior to the revision of this manuscript, is shown in Fig.   S6. The spatiotemporal resolution and instantaneity of the SLOPE GPP product are higher than existing global GPP products, such as MOD17, VPM, GLASS, FLUXCOM and BESS. We expect this novel GPP product can significantly contribute to various researchers and stakeholders in fields related to the regional carbon cycle, land surface processes, ecosystem monitoring and management, and agriculture. The approaches used in this study, in particular, the derivation of SANIR V , can 500 also be applied to any other satellite platforms with the two most classical bands: red and NIR. For example, SaTallite dAta IntegRation (STAIR) Landsat-MODIS fusion data which has daily, 30 m spatiotemporal resolution and can be applied at large scale Luo et al., 2018), commercial Planet Labs data with a daily interval and spatial resolution up to 3m (Houborg and McCabe, 2016;Kimm et al., 2020), and the Advanced Very High Resolution Radiometer (AVHRR) with a temporal coverage as far back as 1982 (Franch et al., 2017;Jiang et al., 2017). However, caution should be used in the 505 interpretation of GPP seasonal trajectory in evergreen needleleaf forests because of relatively poor relationship between SANIRv and iPUE, and GPP magnitude in southwestern US grasslands because of the ignorance of fraction of C4 grasslands.
Finally, although the SLOPE product has been generated from 2000 to present, caution should also be used in the interpretation of long-term trend because the SLOPE model, as many other LUE models, does not explicitly consider the CO 2 fertilization effects on vegetation productivity. 510 Boryan, C., Yang, Z., Mueller, R. and Craig, M.: Monitoring US agriculture: The US department of agriculture, national 565 agricultural statistics service, cropland data layer program, Geocarto Int., 26(5), 341-358,