Key Landscapes for Conservation Land Cover and Change Monitoring, Thematic and Validation Datasets for Sub-Saharan Africa

Mounting social and economic demands on natural resources increasingly threaten key areas for conservation in Africa. Threats to biodiversity pose an enormous challenge to these vulnerable areas. Effective protection of sites with strategic conservation importance requires timely and highly detailed geospatial monitoring. Larger ecological zones and wildlife 15 corridors warrant monitoring as well, as these areas have an even higher degree of pressure and habitat loss. To address this, a satellite imagery based monitoring workflow to cover at-risk areas at various details was developed. During the program’s first phase, a total of 560442km2 area in Sub-Saharan Africa was covered, from which 153665km2 were mapped with 8 land cover classes while 406776km2 were mapped with up to 32 classes. Satellite imagery was used to generate dense time series data from which thematic land cover maps were derived. Each map and change map were fully verified and validated by an 20 independent team to achieve our strict data quality requirements. The independent validation datasets for each KLCs are also described and presented here (full and teaser datasets are available at Szantoi et al., 2020A https://doi.pangaea.de/10.1594/PANGAEA.914261).

products feature the same thematic land cover legend and geometric accuracy, and were processed and validated following the 45 same methodology. All products, including the C-HSM data, are free and open to any user with guaranteed long-term maintenance and availability under the Copernicus license.
Copernicus serves as an operational program where data production takes place on a continuous basis. This paper presents twelve KLC land cover [change] datasets that cover up to 560442km 2 terrestrial land area in Sub-Saharan Africa (SSA) mapped under the first phase (Phase 1) of the C-HSM activity. The datasets are based on freely available medium spatial resolution 50 data. Each of the KLCs were individually validated for both present (~2016) and change (~2000) dates. The developed processing chain always consists of preliminary data assessment for availability, pre and post processing as well as fully independent quality verification and validation steps. For the latter, a second dataset called validation data is presented.
Several recent studies call for the sharing of product validation datasets (Fritz et al., 2017;Tsendbazar et al., 2018), especially if a collection received financial support from government grants (Szantoi et al., 2020B). Accordingly, the validation datasets 55 (LC/LCC) associated with each of the KLCs are also shared.

Study Area
The provided thematic datasets concentrate on Sub-Saharan Africa. This region is on the frontline of natural and human induced changes. The selection of areas were conducted based on present and future pressures envisioned and predicted (MacKinnon et al., 2015). In this first phase (Phase 1), 12 large areas totalling 560442km 2 in SSA were selected, mapped and 60 validated ( Figure 1). These areas cover various ecosystems and generally reside in transboundary regions (Table 1, Figure 1). Table 1 Mapped Key Landscapes for Conservation (KLC) within Phase 1. Mapping detail refers to the employed classification scheme -Dichotomous (D) and Modular (M); see it in the Data collection and mapping guidelines section. KLC (MacKinnon et al., 2015) Code Mapping detail Ecoregion (Dinerstein et al., 2017) Figure 1 Spatial distribution of the Key Landscapes for Conservation Phase 1 areas.

Thematic dataset production 75
The production workflow for the entire process is shown in Figure 2. Each stage is explained in details in the below sections. Landsat TM, ETM+ and OLI imagery at Level1TP processing level were used in the production of the Phase 1 land cover and change maps. The Level1TP data was further corrected for atmospheric conditions to produce surface reflectance products for the classification phase. The atmospheric correction module was implemented based on the 6S as a direct radiative transfer model (Masek et al., 2006). The Shuttle Radar Topography Mission (30m or 90m) Digital Elevation Model was used to estimate the target height and slope, as well as correct the surface sun incidence angles to perform an optional topographic 90 correction. The Aerosol Optical Thickness (AOT) was estimated directly from either Landsat or Sentinel-2 data (Hagolle et al., 2015). Based on the area's meteo-climatic conditions (climate profile and precipitation patterns), season specific satellite image data were selected for each KLC (Table 1). Due to data scarcity for many areas, especially for the change maps (year 2000), imagery was collected for a target year ± 3 years. In extreme cases, (±) 5 years were allowed, or until four cloud free observations per pixel for the specified date were reached. The cloud and shadow masking procedure was based on the FMASK 95 algorithm (Zhu et al., 2015).

Land cover classification system
All thematic maps were produced either at Dichotomous or at both Dichotomous and Modular levels within the Land Cover Classification System (LCCS) developed by the Food and Agriculture Organization of the United Nations and the United Nations Environment Programme (Di Gregorio, 2005). The LCCS (ISO 19144-2) is a comprehensive hierarchical 100 classification system that enables comparison of land cover classes regardless of geographic location or mapping date and scale (Di Gregorio, 2005). At the Dichotomous level, the system distinguishes eight major LC classes. At the Modular level, thirty-two LC classes were used (Table 2).

Automatic classification
Based on the pre-selected imagery data, Dense Multitemporal Timeseries (DMT) based vegetation indices were generated to reduce data dimensionality and enhance the signal of the surface target. The DMT for each KLCs were based on the preprocessed and geometrically coregistered data, forming a geospatial datacube (Strobl et al., 2017). In addition, three vegetation indices were calculated to aid the separation of terrestrial vs. aquatic (NDFI), vegetated vs. barren (SAVI), and evergreen vs. (1) All the pre-processed data (spectral bands and the DMT based indices) were fed into the Support Vector Machine supervised classification model. The Support Vector Machine classifier can handle data with high dimensionality and performs well with mapping heterogeneous areas, including vegetation community types (Szantoi et al., 2013). To produce the thematic maps, the Minimum Mapping Unit concept used by Szantoi et al. (2016) was employed. Individual pixels (with corresponding land cover 120 class information) were assigned into objects, where the minimum size of an object was set at 0.5-5 hectares, as a compromise between technical feasibility (pixel size) and the general size of the observable features (various land cover classes). Still, classification errors (omission and commission of various classes) and false alarms (for land cover change) arose due to the data availability (cloud cover, no data) and the seasonal behaviour of the land cover (e.g. rapid foliage change). To correct these errors, expert human image interpretation skills and knowledge that improved the outputs from the automated process 125 were employed.

Land cover change detection
Land cover change was interpreted as a categorical change in which a particular land cover was replaced by another land cover.
As an example of conversion, the change of Cultivated and Managed Terrestrial Areas (A11) into a Natural and Semi-Natural Terrestrial Vegetation (A12) or a Cultivated and Managed Terrestrial Areas (A11) into Artificial Surfaces and Associated 130 Areas (B15) can be mentioned. The basic condition for LC changes identification was the detection of changes in spectral reflectance within specific image bands of the employed satellite imagery, but such changes were further evidenced by other interpretation parameters such as shape and texture patterns. In regards to our methodology, images acquired in two or more different timeframes were used in the identification process. Furthermore, land cover changes were characterised by those changes that have longer than yearly and/or seasonal periodicity (dry/wet season). Urban sprawl, tree plantations (large or 8 small) to replace herbaceous crops (large or small), tree covers (closed or open) or the creation of a new water reservoir undergo long-term changes that classify as actual LCCs. In our workflow, the LCC process followed the same image pre-processing steps as the LC method, and an independent classification (similarly to the LC procedure) of the past date was performed.
Finally, the LC and the LCC products were compared and change polygons were extracted. As with the LC product, the visual refinement was an important step to produce accurate LCC polygons. 140

Validation dataset production
The validation datasets (Table 3, Figure 3) were individually created for each KLCs. The validation datasets (points) were generated using a stratified random sampling procedure. This assured a sufficient estimation for all land cover and land cover change classes according to their frequency of occurrence. The following formula (Gallaun et al., 2015) was used to determine the minimum number of validation points (per class per KLC): 145 In cases where classes covered smaller areas in total, additional sampling units were allocated according to the Neyman optimal allocation in order to minimize the variance of the estimator of the overall accuracy for the total sample size [n] (Gallaun et 155 al., 2015;Stehman, 2012): sample size for class c population size for class c estimated error rate for class c 160 number of classes population size for class k estimated error rate for class k were excluded from the accuracy statistics due to an error/disagreement during the evaluation procedure ( Table 3 -"Number of points LC/LCC"). The blind process attempt to interpret all validation points was based on available ancillary data (i.e. higher resolution imagery), without direct comparison to the generated LC/LCC maps. The plausibility process reviewed every point whose the blind interpretation did not match the corresponding LC/LCC value (disagreement between the LC/LCC data and the blind interpretation). After this review, the final validation reference is established. 170

Technical Validation
Spatial, temporal and logical consistency was assessed by an independent procedure from the producer to determine the products positional accuracy, the validity of data with respect to time (seasonality), and the logical consistency of the data 180 (topology, attribution and logical relationships). A Qualitative-systematic accuracy assessment was also performed wall-towall through a systematic visual examination for a) global thematic assessment b) expected size of polygons (Minimum Mapping Unit (MMU)), c) seasonal effects and d) spatial patterns (i.e. following correct edges).
The quantitative accuracy assessment (i.e. validation) results are shown in Table 4 (overall accuracies), and in the Appendix (thematic class accuracies per KLC, Appendix A). Generally, the program aimed at a minimum of 85% overall accuracy for 185 each product (KLC) and a minimum of 75% thematic accuracy (Producer's and User's) for each class within each KLC. The land cover change (LCC) accuracy should be >72%. In exceptional cases, the thematic accuracies might be lower than the threshold due to the difficulty to discriminate a particular class in a certain KLC. Figure

Discussion
There is a direct relationship between population growth, agricultural expansion, energy demand and pressure on land. With the current state of development, population increase and economic growth, a large portion of the Sub-Saharan population 225 depend on the remaining natural resources to meet their food and energy needs (Brink et al., 2012). The demands of social and economic growth require additional land, typically at the expense of previously untouched areas. Areas under protection (i.e. National Parks) that remain well-preserved (see Figures 4 and 5AB) often have regions in close proximity under tremendous pressure. Such areas (many times transboundary ones) need very accurate monitoring and base maps, which are provided through this work; especially as areas shared between and/or among countries are frequently not mapped with a common 230 legend, if mapped at all. The presented KLC datasets can be used for continuous land cover/use monitoring, evaluation of management practices/effectiveness, endowment for scientific counsel, habitat modelling, information dissemination and capacity building in their corresponding countries and to manage natural resources such as forests, soil, biodiversity, ecosystem services and agriculture (Tolessa et al., 2017). Furthermore, regional climate change, biogeochemical and hydrologic models are currently capable of using high resolution LC data for predictions in general (Nissan et al., 2019) and spatially focused (i.e. 235 Africa) (Sylla et al., 2016;Vondou and Haensler, 2017).
The validation datasets are independently collected and verified through a robust procedure. Validation datasets can then be used for additional land cover mapping, creating spectral libraries, and for the validation of other local, regional and global datasets. It is important that various land cover products can be used or compared against one another regardless of their geographic origins. Here, twelve land cover maps for different areas in Sub-Saharan Africa where quality land cover products 240 are missing (Marshall et al., 2017) were introduced. These products come with land cover change information as well, generally dating back to year 2000 (±3 years). All data were produced using the unified Land Cover Classification System. The LCCS's modular level can be applied to local scales through its very detailed classes (here 32). Geist and Lambin (2002) describe the human driving forces of land-cover changes as an interlinking of three key variables: 245 expansion of agriculture, extraction of wood, and development of infrastructure. The main land cover dynamic in Sub-Saharan Africa can be explained by the first two variables, where agriculture expansion is further subdivided into shifting cultivation, permanent cultivation, and cattle ranching, and wood extraction is subdivided into commercial wood extraction (clear-cutting, selective harvesting), fuelwood extraction, pole wood extraction and charcoal production. Although the driving force behind the clearing of natural vegetation has traditionally been predominantly attributed to the expansion of new agricultural land 250 areas (including investments in large-scale commercial agriculture) (Brink and Eva, 2009), firewood extraction and charcoal production are also key factors in forest, woodland and shrub land degradation throughout the region. This land cover dynamic is not just a by-product of greater forces such as logging for timber and agricultural expansion, but stems from a specific need to satisfy energy demand (European Commission, 2018); in fact, in Sub-Saharan Africa, the main use of extracted wood is for energy production (Kebede et al., 2010). Although the region possesses a huge diversity of energy sources such as oil, gas, 255 coal, uranium, and hydropower, the local infrastructure and use of these commercial energy sources are very limited.

Drivers of change
Traditional sources of energy in the form of firewood and charcoal account for over 75% of the total energy use in the region (Kebede et al., 2010). Efforts to meet the population and economic demands in sub-Saharan Africa while preserving biodiversity and ecosystem functioning require informed decision-making. The global component of the Copernicus Land Service (Copernicus Global Land), in particular the High-Resolution Hot Spot Monitoring component, present a unique 260 opportunity for such information gathering.

Sources of errors
As the applied LCCS allows very detailed hierarchical classification, some classes can be difficult to distinguish from each other. This is especially true in Africa's vast and very heterogeneous landscapes where agricultural land use is mainly smallholder based (i.e. very small plots), while shifting cultivation is mostly due to the lack of fertilizers and weak soil, leading 265 to land abandonment. Landscapes are generally not composed of clearly fragmented and well identifiable cover formation. In this region, landscapes usually form a continuum of various cover (vegetation) formations that might include different layers of tree, shrub and herbaceous. These variations combined with differences in vegetation density (open vs. closed) and heights makes class assignments challenging. Moreover, some specific agriculture classes distinguish even the cultivation type, e.g. differentiating between fruit tree plantations from tree plantations for timber. Thus, the discrimination of such classes is very 270 difficult and might introduce classification errors.
Apart from the land cover classification, errors could also be introduced due to climate-induced variability, such as leaf phenology where deciduous vegetation might appear bare during a dry period (season).
At a more general level, difficulties in identifying between aquatic or regularly flooded surfaces and terrestrial areas have been observed in certain KLCs, especially when flooded periods are short. 275

Datasets current and future use
The C-HSM datasets have been widely used by policy makers (African and European partners) to help identify areas prone to change due to human activities.

Data Availability
The data are provided in a shapefile (*.shp) format, polygon geometry for the land cover and change datasets and point geometry for the validation datasets. The presented data is in the World Geodetic System 1984 Geographic Coordinate System (GCS) (EPSG:4326) and its datum (EPSG:6326). The validation data, beside using the same GCS, also have the Africa Albers Equal Area Conic (EPSG:102022) projected coordinate system. 290 Each of the 12 KLCs is described by two vector layers: a Land Cover (LC) layer and a Land Cover Change (LCC) layer. The LC layer is a wall-to-wall map, covering the entire Area of Interest (AOI). The LC temporal reference for the project is the year 2016, although for each area the actual "mapping year" is noted in the file name (i.e. CAF01_2016) and generally refers to the year in which the largest number of satellite images were used for the classification. The LCC layer provides a partial 295 coverage of the AOI, as it contains only the areas (polygons) where thematic change occurred compared to the LC layer. The LCC temporal reference is the year 2000 (+/-3 years), noted in the file name (i.e. CAF01_2000).

Validation points dataset:
Each of the 12 areas has been quantitatively validated using a spatially specific point dataset. These datasets were generated through the method described in point 3.2, and each point was used to verify the correctness of the LC/LCC maps. The 305 corresponding data in the attribute attributes refer to the most detailed classification level attributes [mapcode_A or mapcode_B] present in the LC and LCC datasets (shapefiles). The plaus201X and plaus200X refer to the year the validation sets represent, as these can be different among KLCs; the exact year is always noted in the columns' names (e.g. plaus2000, plaus2016).

310
The naming of all attributes follow the same structure in all data. Please see the details in the Appendix Information and Supplementary Information section.
The complete package (all datasets) is available for download at https://doi.pangaea.de/10.1594/PANGAEA.914261, or individually as source datasets. 315 Besides archiving the datasets at PANGEA (www.pangea.eu) with corresponding Digital Object Identifiers, the Copernicus Hot-Spot website (https://land.copernicus.eu/global/hsm) provides open access to all the land cover/change and validation data presented in this article as well as technical reports and on the fly statistics.

Conclusions and Outlook 320
The C-HSM service component is part of Copernicus Global Land, which produces near real time biophysical variables at medium scale, globally. In contrast, the C-HSM activity is an on-demand component that addresses specific user requests in the field of sustainable management of natural resources. The products presented here provide the first set of standardized land cover and land cover change datasets for 12 KLCs with their corresponding validation datasets in Sub-Saharan Africa. The geographic distribution covers the tropical and subtropical regions of West, Central and South-Eastern Africa. The next release 325 will also include countries in the Caribbean and Pacific areas of the ACP region and some areas beyond these regions may be mapped depending on user demands. The most recent land cover change will be reassessed for selected already-mapped KLC's in order to generate longer-term time series land cover dynamics information. While this is not done systematically, but on specific customer requests, the C-HSM service encourages stakeholder cooperation and provides capacity building workshops around the globe. In person training events provide an opportunity for new and existing users to learn how to use and interpret 330 data, operate the web information system, and easily assess recent land cover change data using Sentinel 2 image mosaics. Here, we provide very high-quality products, which can be used directly as base maps and for policy decisions, as well as for comparison and/or evaluation of other land cover products or the implementation of validation datasets for training/validation purposes.
Finally, the service has a high degree of confidence that the data presented here (and the next phase) are of highest quality, 335 reaching regularly above 90% overall accuracy. This is guaranteed by a rigorous and independent production-validation mechanism and feedback loop, which does not stop until the required overall, and per-class accuracy levels are reached.
Following the general European Commission's Copernicus Programme open access policy, the data is distributed free to any user through a dedicated website (https://land.copernicus.eu/global/hsm). This interactive online information system allows access to browse, analyse and download the data, including the accuracy assessment information. 340

Appendix Information
Appendix A contains the thematic class accuracies for each KLC, both land cover and land cover maps.