Implementation of CCDC to produce the LCMAP Collection 1.0 annual land surface 1 change product 2

31 The increasing availability of high-quality remote sensing data and advanced technologies have 32 has spurred land cover mapping to characterize land change from local to global scales. 33 However, most land change datasets either span multiple decades at a local scale or cover limited 34 time over a larger geographic extent. Here, we present a new land cover and land surface change 35 dataset created by the Land Change Monitoring, Assessment, and Projection (LCMAP) program 36 over the conterminous United States (CONUS). The LCMAP land cover change dataset consists 37 of annual land cover and land cover change products over the period 1985-2017 at 30-m 38 resolution using Landsat and other ancillary data via the Continuous Change Detection and 39 Classification (CCDC) algorithm. In this paper, we describe our novel approach to implement 40 the CCDC algorithm to produce the LCMAP product suite composed of five land cover and five 41 land surface change related products. The LCMAP land cover products were validated using a 42 collection of ~ 25,000 reference samples collected independently across CONUS. The overall 43 agreement for all years of the LCMAP primary land cover product reached 82.5%. The LCMAP 44 products are produced through the LCMAP Information Warehouse and Data Store (IW+DS) 45 and Shared Mesos Cluster systems that can process, store, and deliver all datasets for public 46 access. To our knowledge, this is the first set of published 30 -m annual land cover and land 47 cover change datasets that include land cover, land cover change, and spectral change spanning 48 from the 1980s to the present for the United States. The LCMAP product suite provides useful 49 information for land resource management and facilitates studies to improve the understanding 50 of terrestrial ecosystems and the complex dynamics of the Earth system. The LCMAP system 51 could be implemented to produce global land change products in the future. 52 53 54 55 56 57 58 59 60


Introduction
abrupt changes detection on the land surface was modified through lessons learnedt from the 153 prototype test to include both gradual land cover transition and abrupt land change so that the 154 algorithm could be used asin an operational setting with the goals of robust, repeatable, and 155 geographically consistent results (Brown et al., 2020). The algorithm was further used to classify  (Dwyer et al., 2018). Landsat 180 ARD is organized in tiles, which are units of uniform dimension bounded by static corner points 181 in a defined grid system (Fig. 1). An ARD tile is currently defined as 5,000 x 5,000 30-meter (m) pixels or 150 x 150-kilometer (km). To implement CCDC algorithms to produce LCMAP 183 Collection 1.0 land change products in CONUS, all available Landsat ARD records of surface 184 reflectance and brightness temperature from the 1980s to 2017 were required. indicate what land cover types were observed before and after a detected change and further to 189 generate LCMAP annual land cover products ( Table 1). The land cover products are produced by 190 using training data from NLCD in 2001. NLCD provides Anderson Level II (Anderson, 1976) 191 land cover classification for CONUS and outlying areas (Homer et al., 2020). Spectral index and 192 change metrics between cloud-corrected Landsat mosaics are used, among other information, to 193 identify change pixels (Jin et al., 2013). These metrics allow NLCD to incorporate temporal and 194 spectral trajectory information into both training data selection and final land cover 195 classification. The NLCD land cover data is used in LCMAP as land cover training data.

213
The removal of invalid and cloud-contaminated data points is important for deriving model 214 coefficients that accurately represent the phenology of the surface, and for the correct 215 identification of model break points. The CCD algorithm uses Landsat ARD PIXELQA values to 216 mask observations identified as cloud, cloud shadow, fill, or (in some cases) snow derived based 217 on the Fmask 3.3 algorithm (Zhu et al., 2015a;Zhu and Woodcock, 2012). Additional cirrus and 218 terrain occlusion bits are provided for Landsat 8 OLI-TIRS ARD that are not available in the 219 Landsat 4-7 TM/ETM+ quality assessment band. To maintain consistency across the historical 220 archive, the algorithm does not rely on these Landsat 8-only QA flags to filter out observations.

221
Landsat ARD containing invalid or physically unrealistic data values are removed. For the 222 surface reflectance bands, the valid data range is between 0 and 10000. Brightness temperature 223 values, which in the ARD are stored as 10 × temperature (kelvin), are converted to 100 × °C and 224 observations are filtered for values outside the range -9320 and 7070 (-93.2-70.7°C). This 225 procedure rescales the brightness temperature values into a roughly similar numerical range as 226 the surface reflectance bands. A multitemporal mask (Tmask) model (Zhu and Woodcock,227 2014a) is implemented first to remove additional outliers by using the multitemporal observation 228 record to identify values that deviate from the overall phenology curve using a specific harmonic 229 model to perform an initial fit to the phenology. Additional details are provided in the 230 Supplementary materials S1.

231
The filtered Landsat ARD is further operated to generate the time series fit by harmonic models 232 whose sinusoidal components are frequency multiples of the base annual frequency. A constant 233 and linear term characterizes the surface reflectance or brightness temperature offset value and 234 overall slope, respectively. The full harmonic model is defined as follows: 235 ( , ) = 0, + 1, + ∑ ( , cos + , sin )

=1
(1) 236 where ω is the base annual frequency (2π⁄T), t is the ordinal of the date when January 1 of the

Regression models and change detection thresholds 248
The best-fit coefficients for the time series model are calculated using a LASSO regression 249 model (Tibshirani, 1996). In contrast to Ordinary Least Squares (OLS) that was used in the  Observations not yet incorporated into the model are evaluated as a group of no fewer than the 270 _ parameter value; this is the "peek window," which "slides" along the time series one observation at a time. Each iteration, a value is calculated for each individual observation 272 within the peek window, as follows:

287
The permanent snow procedure indicates that too few clear (less than 25% of total observations) 288 or water observations, which are identified from the QA band, exist to robustly detect change, 289 and a large fraction of observations are snow. The algorithm will return at most one segment that 290 fits through the entire time series and provide the filtered observations number at least twelve.

291
The model will, under the default settings, fit only four coefficients (i.e., characterizing the 292 reflectance and brightness temperature bands using only a simple harmonic with no higher 293 frequency terms). Unlike other procedures, snow pixels are not filtered out and are fit as part of 294 the annual pattern. This avoids overfitting the model to a seasonally sparse observation record.

295
Similarly, for the insufficient clear observations determined by the QA band, the model will

321
The training data used in XGBoost for the LCMAP Collection 1.0 land cover products is from 322 the USGS NLCD 2001 land cover product (Homer et al., 2020). To meet the LCMAP land cover 323 legend, the NLCD data is first cross-walked to LCMAP classes, as shown in Fig.43 and Table 2.

324
The use of NLCD data that was cross -walked to the LCMAP land cover legend as the training 325 data will reduce uncertainties and improve the consistenceconsistency of annual land cover pixel. This step aims to reduce potential noise in the classifier by removing pixels that may be 334 heavily mixed with different cover types, or whose land cover label may be less reliable. It also 335 removes the narrow linear low-intensity developed pixels corresponding to road networks, which 336 were found to have registration issues with Landsat ARD in some areas.

Ancillary data 339
Ancillary data used in the classification contains two main datasets: the DEM and the WPI layer.

340
Three DEM derivative datasets are implemented as geographic references for land cover 341 classification as ancillary data including topographic slope, aspect, and position index. The WPI 342 is highly related to wetland distribution and has a potential to improve wetland classification in 343 LCMAP.

368
After the classification models in a given tile are trained, predictions are generated for each July 369 1st date that has an associated CCD segment (Fig. 54). The prediction information is supplied to 370 the production step for the creation of land cover. The process is repeated for each tile for the 371 entire CONUS ARD extent.   frequent land cover changes in the 33 years (Fig. 65c). The western part of CONUS, however, 451 contains more spectral changes than in the east (Fig.6d). The NLCD land change estimates also The south ARD tile outlined in Fig. 65(a) covers the northern Dallas region, and the spatial 477 patterns of land cover and change are shown in more detail in Fig. 87. The land cover distributions in the region show that urban land expands considerably from 1985 (Fig. 87a), to 479 1990 (Fig. 78b), and to 2016 (Fig. 87c). The land conversion was primarily from cropland and 480 grass/shrub to developed land. Lake Ray Roberts was created in the late 1980s and captured in 481 the land cover map (Fig. 87b&c). The lake and urban conversion are also visible in the change 482 count from 1985 to 2016 (Fig. 87g), which mainly show as blue, suggesting a one-time 483 conversion. On the other hand, there is almost no change in the urban center (Fig. 87g). Fig. 78 484 (d-f) shows high classification confidence at the urban center, water, grass/shrub, and tree cover 485 areas, whereas cropland has relatively low confidence, indicating frequent management activities 486 over croplands in the regions. The total pixels of different change numbers suggest that one to 487 two change times are dominant, although some pixels change more than three times (Fig. 87h).

488
The land cover distributions in 1985, 1990, and 2017 show an increase in developed land and 489 decreases in cropland and grass/shrub (Fig. 87i).

490
The spatial patterns of land cover and change in the north ARD tile displayed in Fig. 65 and some areas changed multiple times (Fig. 98g). Most changes in forest lands were related to 499 wildland fires that occurred in the region. In 1988, 50 fires burned a mosaic covering nearly 3213 500 km 2 in Yellowstone as a result of extremely warm, dry, and windy weather (NPS, 2021). Trees 501 regrew in some of the burn areas and these changes could occur more than once as shown in the 502 change map that indicates at least two changes in these areas. The total pixels of different change 503 frequencies suggest that one to two changes were dominant and very few pixels changed more 504 than three times (Fig. 98h). The land cover distributions in 1985,1990, and 2017 had increases in 505 grass/shrub after 1985 and reductions in tree cover after that (Fig. 98i).

Validation of land cover product
The overall accuracy between the annual reference land cover label and the LCMAP annual land 508 cover products was calculated as 82.5% (±0.22%, standard error) when summarized for all years.

509
Overall accuracy across the time series  varied within about 1.5% annually, ranging 510 from a high of 83% in the late 1990s to about 82% in the late 2010s (Fig. 109). Per class 511 accuracies across CONUS ranged between 43% and 96% for user's accuracy (Table 3)

556
By implementing the CCDC algorithm through a system engineering approach, LCMAP 557 provides a fully automated framework for land change monitoring. The framework can also be 558 updated to include the latest Landsat records so that it can be used for operational continuous 559 monitoring in a large geographic extent (Brown et al. 2020). Therefore, when new observations 560 become available, the framework can provide timely and consistent land cover characteristics to 561 the public. However, these training data were randomly selected from the NLCD land cover product,

607
suggesting errors could potentially be carried over to the training samples due to potential errors 608 in the training source. Besides uncertainties in training data, some obvious challenges such as 609 class definitional differences between pasture/hay and grassland between NLCD and LCMAP 610 could potentially be carried over to the LCMAP land cover product. Implementing Improving 611 training data by reducing uncertainties and potential errors in a more consistent and accurate way 612 is critical to strengthen land cover classification and to improve the scientific quality of LCMAP 613 products in the future.

614
There are apparent shifts in some land cover types, especially in snow/ice and barren (Fig.76), 615 and a decline in overall agreement (Fig.109) in 2017, the last year of the Collection 1.0 product.

616
The last year's product usually is provisional because limited Landsat observations are available The LCMAP products generated in this paper are available at https://earthexplorer.usgs.gov/ 626 (LCMAP, 2021). All LCMAP land change products are mosaiced for the conterminous United States in the GeoTIFF format. Find exact data as described here at Table   Table 1 LCMAP land cover product specifications  Caption of Figure   Figure 1 Landsat ARD tile grids for the conterminous U.S.          Grass/Shrub Land predominantly covered with shrubs and perennial or annual natural and domesticated grasses (e.g. pasture), forbs, or other forms of herbaceous vegetation. The grass and shrub cover must comprise at least 10% of the area and tree cover is less than 10% of the area. 4

Caption of
Tree Cover Tree-covered land where the tree cover density is greater than 10%. Cleared or harvested trees (i.e. clearcuts) will be mapped according to current cover (e.g. Barren, Grass/Shrub).

Water Bodies
Areas covered with water, such as streams, canals, lakes, reservoirs, bays, or oceans. 6 Wetland Lands where water saturation is the determining factor in soil characteristics, vegetation types, and animal communities. Wetlands are composed of mosaics of water, bare soil, and herbaceous or wooded vegetated cover. 7 Ice and Snow Land where accumulated snow and ice does not completely melt during the summer period (i.e. perennial ice/snow). 8 Barren Land comprised of natural occurrences of soils, sand, or rocks where less than 10% of the area is vegetated.          and numbers of pixels (y-axis) of these changes (h), total pixels of different change (h), and areas of different land cover and areas (y-axis) of different land cover (x-axis) in the three times for the ARD tile 9_6 (i). Figure 109 Overall agreement between LCMAP primary land cover and reference data across CONUS. The cross lines represent +/-one standard errors.