TimeSpec4LULC: A Global Deep Learning-driven Dataset of MODIS Terra-Aqua Multi-Spectral Time Series for LULC Mapping and Change Detection

Land Use and Land Cover (LULCs) mapping and change detection are of paramount importance to understand the distribution and effectively monitor the dynamics of the Earth’s system. An unexplored way to create global LULC maps is by building good quality LULC-models based on state-of-the-art deep learning networks. Building such models requires large global good quality time series LULC datasets, which are not available yet. This paper presents TimeSpec4LULC (Khaldi et al., 2021), a smart open-source global dataset of multi-Spectral Time series for 29 LULC classes. TimeSpec4LULC was 5 built based on the 7 spectral bands of MODIS sensor at 500 m resolution from 2002 to 2021, and was annotated using a spatial agreement across the 15 global LULC products available in Google Earth Engine. The 19-year monthly time series of the seven bands were created globally by: (1) applying different spatio-temporal quality assessment filters on MODIS Terra and Aqua satellites, (2) aggregating their original 8-day temporal granularity into monthly composites, (3) merging their data into a Terra+Aqua combined time series, and (4) extracting, at the pixel level, 11.85 million time series for the 7 bands along with 10 a set of metadata about geographic coordinates, country and departmental divisions, spatio-temporal consistency across LULC products, temporal data availability, and the global human modification index. To assess the annotation quality of the dataset, a sample of 100 pixels, evenly distributed around the world, from each LULC class, was selected and validated by experts using very high resolution images from both Google Earth and Bing Maps imagery. This smartly, pre-processed, and annotated dataset is targeted towards scientific users interested in developing and evaluating various machine learning models, including 15 deep learning networks, to perform global LULC mapping and change detection. 1 https://doi.org/10.5194/essd-2021-253 O pe n A cc es s Earth System Science Data D icu ssio n s Preprint. Discussion started: 12 October 2021 c © Author(s) 2021. CC BY 4.0 License.


Introduction
Accurate Land-Use and Land-Cover (LULC) information, including distribution, dynamics and changes, is of paramount importance for understanding and modelling the natural and human-modified behavior of the Earth's system (Tuanmu and Jetz,20 2014; Verburg et al., 2009). Land cover is an essential variable that provides powerful insights for the assessment and modelling of terrestrial ecosystem processes, biogeochemical cycles, biodiversity, climate, and water resources, etc. (Luoto et al., 2007;Menke et al., 2009;Polykretis et al., 2020). Land uses incorporate many types of modifications that an increasing human population, 9 billion expected by 2050, causes to the Earth surface. LULCs are subjected to anomalies, trends and changes both from anthropogenic and natural origins (Polykretis et al., 2020), with subsequent modifications on their biophysical 25 properties. LULC change is usually interpreted as the conversion from one LULC category to another or/and the modification of land management within LULC (Meyer et al., 1994). The assessment of LULC and LULC change has been identified as an essential planetary boundary to assess the status and trends of social-ecological systems from the local to the global scale in the pursuit of a safe operating space for humanity. For instance, urban sprawl and agriculture expansion or abandonment affect the biodiversity, soil quality, climate, food security, and human health (Lambin and Geist, 2008;Feddema et al., 2005). For this 30 reason, continuous and accurate LULC and LULC change mapping is essential in policy and research to monitor ecological and environmental change at different temporal and spatial scales (Polykretis et al., 2020;García-Mora et al., 2012), and as a decision support system to ensure an effective and sustainable planning and management of natural resources (Kong et al., 2016;Congalton et al., 2014;Grekousis et al., 2015).
Satellite remote sensing in combination with geographic information systems (GIS) have provided convenient, inexpensive, 35 and continuous spatio-temporal information for mapping LULCs and detecting changes on the Earth's surface from regional to global scales (Kong et al., 2016;Kerr and Ostrovsky, 2003;Pfeifer et al., 2012) thanks to their strong ability to cover, timely and repeatedly, wide and inaccessible areas, and to get high spatial and temporal resolution data (Alexakis et al., 2014;Yirsaw et al., 2017;Patel et al., 2019). Since the 1980s, multiple global LULC products (Table 1) have been derived from remotely sensed data, providing alternative characterizations of the Earth surface at varying extents of spatial and temporal resolutions 40 (Townshend et al., 1991;Loveland et al., 2000;Bartholome and Belward, 2005).
One of the most important limitations of global LULC products is the within-product variability of accuracy (across different years, regions, and LULC types), besides the generally low agreement among products in many regions of the world (Tsendbazar et al., 2015b(Tsendbazar et al., , 2016Gao et al., 2020;Gong et al., 2013;Zimmer-Gembeck and Helfand, 2008). Such lack of consensus can translate into huge implications for subsequent global assessments of biodiversity status, carbon balance, or climate change -Satellite sensors: the spatial, temporal and spectral resolutions of the source satellite images strongly determine the precision and accuracy of derived LULCs. Native pixel size can vary from dozens of meters to kilometers, which determines the precision. Revisiting frequency can vary from daily images to several weeks, which determines the possibility of removing cloud and atmospheric noise effects. In addition, the greater the number of spectral bands in a sensor, the greater the amount of complementary information that can help to differentiate among LULC classes. across LULC products, statistics on temporal data availability, and the global human modification index. The annotation quality was further assessed by experts using Google Satellite and Bing Maps very high resolution images in 100 samples per class evenly distributed around the world.

Methods
To build TimeSpec4LULC, we first determined the spatial and temporal agreement across 15 heterogeneous global LULC 105 products (listed in Table 1) for 29 broad and globally harmonized LULC classes. Then, for each class, we extracted a 19-year monthly time series for the seven 500-meter spectral bands of MODIS Terra and Aqua combined. We carried out this process in GEE (Gorelick et al., 2017) since it provides access to freely available satellite imagery under a unified programming, processing and visualization environment.

110
To find the spatial consensus across global LULC products for different LULC classes, we followed five steps: 1) selection of global LULC products, 2) standardization and harmonization of LULC legends, 3) combination of products across space and We used the 15 most updated global LULC products available in GEE (Table 1). These products widely differ in their source satellite data, spatial resolution, temporal coverage, class legend, and accuracy. Given such heterogeneity, we used the consensus across all of them in space and time as a source of reliability to support our annotation. That is, a given LULC class is assigned to a 500 m pixel only if it was consistent over time and space across all the 15 LULC products.

120
Broadly, land cover (LC) refers to the different vegetation types (usually following biotype, plant functional type, or physiognomy schemes, such as forests, shrublands, or grasslands) or other biophysical classes (such as water bodies, snow, or bare soil) that cover the Earth's surface. Whereas, land use (LU) describes the set of human activities that significantly modify a LC (such as urban areas and croplands). To standardize and harmonize the LULC legends across the 15 LULC products, we used expert knowledge (Vancutsem et al., 2013) to create a common nomenclature of 29 broad classes (6 LU classes and 23 LC classes) that were interoperable across all products (see the hierarchical structure of our legend in Fig. 1 and the rule set across products in Table 3). The LULC legend was structured into 6 hierarchical levels (L0 to L5).
The six anthropogenic LU classes contained Urban and built-up areas and five types of croplands. The 23 natural or seminatural LC classes covered three aquatic systems (Marine water bodies, Continental water bodies, and Wetlands) and 18 130 terrestrial systems (Permanent snow, Barren lands, Moss and Lichen lands, Grasslands, Close Shrublands, Open Shrublands, and 12 types of Forests that differed in their canopy type, phenology and tree cover).

Combining products across time and space
For each LULC class, we built a consensus image with its global distribution by agreement across the 12 LULC products (P1 to P12). For products that only contained one image (P1 to P7) referred to a particular year or period, we obtained a binary mask with the targeted LULC class as 1. As further explained below, for products that were a collection over years (P8 to P12), we first obtained a binary mask for each year and then produced their combination over years (Table 4). Then, we obtained the spatial agreement over these 12 masks and used two water bodies products (P13 and P14) and one impervious surface product (P15) to further refine the consensus.
The inter-annual combination of LULC masks was based on two types of rules according to the inter-annual consistency 140 of some products for specific LULC classes. In general, the combination over time was done using the AND operator when handling classes with high temporal stability, namely Urban and built-up areas, water bodies, Permanent Snow, Croplands, Open Shrublands, Barren lands, and Grasslands. Alternatively, since some LULC products showed implausible inter-annual changes for specific classes (e.g., wetlands affected by droughts or large areas of no-forest cover in one year preceded and followed by forest in the previous and following years, respectively), to avoid unrealistic data loss, the MEAN operator was 145 used in the following classes: Moss and Lichen lands, Forests, Close Shrublands, and Wetlands.
Afterwards, for each LULC class, a spatial combination of the 12 masks, one per product, was performed following six specific rules according to the global abundance of each class (Table 4). In general (Rule 1), the MEAN operator was used across products P1 to P12 and the result was multiplied by two water class masks (P13 and P14) to eliminate water pixels from land classes and land pixels from water classes, and by impervious surface class mask of P15 to eliminate impervious pixels 150 from all classes but urban. More relaxed rules where applied to five LULC classes that had too few pixels (less than 1000) with the general rule (see details in Table 4). The application of the explained spatio-temporal combination resulted in a mask for The 30-meter resolution LULC consensus was reprojected to MODIS resolution (approximately equal to 500 m) using the spatial MEAN reducer. This 500 m average consensus was used to explore different pixel-purity thresholds θ for each LULC class. We kept θ = 1, when the number of 500-meter pixels retrieved was greater than 1000, which meant that those pixels were 100% pure for that class. When the number of these pixels was lower than 1000 at θ = 1, we decreased θ to 0.75%, which meant that up to 25% of the pixel could be occupied by another LULC class. In any case, our dataset provides as metadata at 160 pixel level, the spatial purity percentage, so the user can control the desired purity and subsequent sample size (Table 6). To clarify the process of spatial agreement creation across the 15 global LULC products, we provide an example explaining the final mask creation for the class Dense Evergreen Broadleaf (Fig. 2). After performing the spatial agreement process for each LULC class, we combined the final class masks for all the 29 LULC classes to generate one global LULC map describing their distribution (Fig. 3).
165 Figure 3. Distribution of the number of covered countries (FAO-L0) over the 29 LULC classes. This map combines all the LULC classes maps that were generated from the process of spatial agreement across the 15 global LULC products available in GEE.

Extracting times series of spectral data for 29 LULC classes globally
To extract the 19-year monthly time series of the seven 500-meter MODIS spectral bands for each of the 29 LULC classes throughout the entire world, we followed four steps (Fig. 4): 1) spatio-temporal filtering of Terra and Aqua data based on Quality Assessment flags, 2) aggregating the original 8-day Terra and Aqua data into monthly composites, 3) merging the two monthly time series into a Terra+Aqua Combined time series, and 4) data extraction and archiving (Fig. 5).  The quality of any time series of satellite imagery is affected by the internal malfunction of satellite sensors, atmospheric (e.g., clouds, shadows, cirrus, etc.) or land (e.g., floods, snow, fires, etc.) conditions. In addition to the spectral bands, MODIS products provide 'Quality Assessment' (QA) flags as metadata bands to allow the user to filter out spectral values affected by disruptive conditions. Therefore, all QA flags were used to remove noise, spurious values, and outliers in the image collection.

180
"MODLAND QA" flags (bits 0-1) were used to only select pixel values produced at 'ideal quality'. Then, "State QA" flags were used to mask out clouds (bits 0-1), internal clouds (bit 10), pixels adjacent to clouds (bit 13), cirrus (bits 8-9), cloud shadows (bit 2), high aerosol quantities (bits 6-7), and internal fires (bit 11). Regarding the water flag (bits 3-5), it was used to mask out water pixels in terrestrial systems (but not in Permanent Snow, and in Croplands flooded with seasonal water).  for each pixel and month, when both Terra and Aqua had values, we used the mean between them; when one satellite had a missing value, we used the available one; and when both of them had missing values, the combined value remains missing.

Data extraction and archiving
One of the biggest advantages of our dataset is its global scale characteristic since all the LULC classes data were extracted globally from all regions over the world. To take into account all the differences across the globe and thinking of regional  To organize and assess the quality of the extracted global data for all the 29 LULC classes, we first present the description of the dataset structure stored in the repository 2 (Khaldi et al., 2021), then we evaluate the quality of its annotation process.

Data structure description
TimeSpec4LULC is provided in 30 different ZIP files owning the name of the 29 LULC classes (The Barren Lands class is divided into two files since it is too large). Within each ZIP file, there exists a set of seven CSV files, each one corresponding to 215 one of the seven spectral bands. The naming of each file follows this structure: IdOfTheClass_NameOfTheClass_ModisBand.csv For example, for band 1 of the Barren Lands class, the filename is: 01_BarrenLands_MCD09A1b01.csv Inside each CSV file, rows represent the collected pixels for that class. The first 11 columns contain the following metadata: -"IdOfTheClass": Id of the class.
-"PurityOfThePixel": Spatial and inter-annual consensus for this class across multiple land-cover products, i.e., purity of the pixel.

225
-"DataAvailability": percentage of non-missing data per band throughout the time series.
-"Lat": Latitude of the pixel center.

Data quality control
The quality of the dataset annotation was assessed and validated visually by two co-author experts using two very high reso-235 lution imagery (< 1m/pixel) sources, namely Google Earth and Bing Maps imagery 3 . The assessment process includes three stages.
-First, a set of 100 samples is carefully selected from each class following the minimum distance criteria. That is, depending on the overall size of LULC class, the pixels within a minimum distance with respect to each pixel are eliminated until getting 100 pixels evenly distributed over the globe. Fig. 6 shows the distribution of the 2,900 selected pixels.

240
-Second, the class of each pixel of the 29×100 samples is identified visually by the expert eye following the next rule. We consider as ground truth the dominant LULC class, such LULC class occupies at least 70% of the pixel. The presence of up to 30% of features of other different LULC classes within the dominate class are ignored.
-Once the validated LULC classification matrix was obtained (Table 5), the F1 score was calculated at all the LULC levels (from L0 to L5), labeled as LULC-L5 in Fig. 1. As it can be noticed, as we go up from level L0 to L5 the obtained 245 dataset accuracy increases from 87% to 96% due mainly to the forests classification.

Results and Discussion
The total number of collected time series (pixels) in all the 29 LULC classes is 11,856,992, which is large enough to build high quality DL models. The density of the collected pixels over the world is depicted by Fig. 7.
The number of collected time series, at the lowest purity, differs from one class to another according to the global abundance 250 of each class ( Forests classes (except Dense Broadleaf classes). This implies that, within 500 m pixels, the LULC products are less consistent within these classes and that it may exist remaining noise in one LULC class from other LULC classes. Since our dataset provides the level of spatial purity at pixel level, the user can always select the desired purity threshold.
Certain number of time series in each LULC class still contain some missing data that could be handled neither with the monthly aggregation process nor with the Terra-Aqua merging process (Table 7). The LULC classes that have less than exactly the same over all the spectral bands excluding Band 6 which is the most contaminated by gaps. This severe drop in the number of time series without gaps in Band 6 compared to the other bands is due to the known "dead lines" in Aqua band 6 caused by the already reported malfunctioning or noise in some of its detectors (Zhang et al., 2018b).
Our dataset is distributed all over the world's FAO-L0 and FAO-L1 partitions (Fig. 8, Fig. 9, and Table 8). For almost all the 29 LULC classes, the collected time series covers more than 9 countries, except with Moss and Lichen lands, which were 270 collected from 2 countries, and Deciduous Needleleaf Forests, which were collected from 3 countries. This small number of countries covered by these classes is due to their natural scarce distribution over the world. Contrary, some of the LULC classes, namely Water Bodies (Marine and Continental), Rainfed Croplands (Cereal and Broadleaf), and Urban and built-up areas have a broad world coverage, i.e., more than 80 countries and more than 570 departments ( Fig. 8 and Fig. 9).
According to Fig. 10, the gHM index of the five cropland classes, and the Urban and built-up areas class is widely higher 275 (more than 59% of human change) compared to the other land cover classes, which proves their accurate annotation as human Land Uses.
This smartly, pre-processed, and annotated dataset is targeted towards scientific users interested in developing and evaluating various DL models to detect LULCs and monitor their change. For example, TimeSpec4LULC can be used i) to characterize the seasonal and inter-annual dynamics and changes of vegetation types and LULC classes, ii) to perform environmental 280 monitoring, management and planning, ii) to study the intra-class behaviors of LULCs, i.e., assess the behavior of one specific LULC in different areas of the world and see whether it maintains the same pattern or it reveals different patterns, and iii) to study the inter-class differences and similarities of LULCs, i.e., recognise and compare the patterns and dynamics of all LULCs (e.g., time series segmentation).

285
Accurate LULC mapping and change detection is highly relevant for many applications, including Earth system modeling, environmental monitoring, management and planning, or natural hazards assessment, among many others. However, there still exists a high level of disagreement across current global LULC products, particularly for some LULC classes. To address the challenge of improving LULC products and change detection, we have created a smart open-source global dataset of multi-spectral time series for 29 LULC classes containing almost 11.85 million pixels annotated by using the spatio-temporal pixels, and validated with photo-interpretation by experts using very high resolution images from both Google Earth and Bing Maps. The overall accuracy (F1 value) of the annotation varied from 96% at the coarser classification level to 87% at the finest level. This smartly, pre-processed, and annotated dataset is targeted towards scientific users interested in developing and evaluating various machine learning models, including deep learning networks, to perform global LULC mapping and change 300 detection.        Code and data availability. This dataset (Khaldi et al., 2021) is available to the public through an unrestricted data repository hosted by Zenodo at https://zenodo.org/record/5020024#.YWARotpBxaQ.
Author contributions. RK contributed to the conception of the dataset, implemented the code, performed all the data extraction and wrote the paper. DA-S contributed to the conception of the dataset, assessed its quality, provided guidance, and wrote the paper. EG assessed the quality of the dataset. YB contributed to the conception of the dataset. AE and FH provided edits and suggestions. ST contributed to the conception of the dataset, provided guidance, and wrote the paper.
Competing interests. The authors declare that they have no conflict of interest.