Mapping the vegetation of the Lake Tana basin , Ethiopia , using Google Earth images

The basin of Lake Tana is one of the most important watersheds in the Nile Basin. It is of great significance to the economy and politics of Ethiopia. In the past, the natural vegetation of the Lake Tana basin was heavily damaged to facilitate the continued expansion of cropland. Vegetation must be conserved and restored to protect the natural environment and maintain the biodiversity of the Lake Tana basin. In this research, we mapped the vegetation of the Lake Tana basin through visual interpretation using high-spatial-resolution images provided by Google Earth and field survey data to provide detailed information of the actual vegetation state for planning conservation and restoration. A total of 33 171 polygons were generated to represent the vegetation patches of the Lake Tana basin on the map, and the validation using surveyed vegetation plots indicated that 90 % of the patches were correctly identified. The DOI of the dataset used for map production is https://doi.org/10.4121/uuid:48d45053-36f6-411b-96b1-7ae0e22d56d0. We expect that this vegetation map could benefit vegetation conservation and restoration in the Lake Tana basin.


Introduction
Lake Tana, located in the highlands of northwestern Ethiopia, is the country's largest freshwater lake and the third largest lake in the Nile Basin.Lake Tana is the source of the Blue Nile, and its basin is one of the most important catchments in the Nile Basin.It has rich natural resources and great potential for the development of irrigation, hydroelectric power, high-value crops, aquatic products, livestock products, and ecological tourism (Bijan and Shimelis, 2011).The Lake Tana basin is of critical significance to the economy and politics of Ethiopia.It also greatly influences the livelihoods of tens of millions of people in the lower Nile Basin.
Historically, there was a large area of Afromontane forest and many indigenous plant species in the Lake Tana basin; 172 woody species were observed in the basin, many of which were indigenous species (IFAD, 2007a).There are also large areas of wetlands and seasonally flooded plains, which provide multiple services to the local community and serve as a home for many endemic bird species (Ayalew, 2010;Bijan and Shimelis, 2011).
The population density and growth rate of the Lake Tana basin are very high.Over 2 million people reside in this basin, and the population density exceeds 150 people per square kilometer (Yimenu, 2005).The large population and high rate of population growth increase the demand for food.To meet this demand, large areas of forest, grassland, and wetland were transformed into cropland, and more livestock was raised on grassland.Deforestation and overgrazing have resulted in the destruction of great amounts of natural vegetation, a decline in biodiversity and forest stand density, desertification, and soil erosion (Alelign et al., 2007).To protect the natural environment and maintain biodiversity, it is vital that vegetation is restored and conserved in the Lake Tana basin (Bishaw, 2001).Since the 1990s, efforts have been un-dertaken to conserve and restore the natural vegetation of the Lake Tana basin (Bishaw, 2001;Teketay, 2001).However, its degradation and decline is still a major problem (IFAD, 2007b).
Detailed regional vegetation distribution data are the basis of vegetation management and conservation.Rational and scientific planning of vegetation conservation and restoration can only be conducted for the whole basin when the vegetation of the whole basin is well surveyed and mapped.However, vegetation maps that include the Lake Tana basin were made for Africa, East Africa, and Ethiopia at small scales, such as the vegetation map of Eritrea, Ethiopia, and Somalia at a scale of 1 : 5 000 000 (Pichi Sermolli, 1957), that of Ethiopia and Eritrea (von Breitenbach, 1963), that of Africa at a scale of 1 : 5 000 000 (White, 1983), that of the Horn of Africa (Friis, 1992), that of Ethiopia (Sebsebe et al., 1996(Sebsebe et al., , 2004;;Sebsebe and Friis, 2009), and the potential vegetation map of Ethiopia at a scale of 1 : 2 000 000 (Friis et al., 2011).The vegetation maps compiled by Pichi Sermolli (1957), von Breitenbach (1963), White (1983), andFriis (1992) were published many years ago at small scales; therefore, they cannot provide detailed information of the actual vegetation of the Lake Tana basin.The potential vegetation map compiled by Friis et al. (2011) also cannot reflect the actual status of the vegetation of Lake Tana basin.Another map that could present the vegetation of the Lake Tana basin is the land cover/use map developed by Shimelis et al. (2008), at a scale of approximately 1 : 1 700 000.However, only large patches of vegetation were mapped, and many patches were merged or omitted.Therefore, there is a shortage of detailed vegetation data in the Lake Tana basin, which limits the effectiveness of planning vegetation management and biodiversity conservation.Therefore, in this research, we produced a vegetation map of the Lake Tana basin using high-spatialresolution satellite images provided by Google Earth and field survey data.We believe that this map will aid vegetation and biodiversity conservation in the Lake Tana basin.

Study area
Lake Tana is located in the highlands of northwestern Ethiopia (Fig. 1).The average altitude of Lake Tana is approximately 1800 m, and the area of the basin (including water surface area) is 15 096 km 2 .The water surface area is 3000-3600 km 2 and the maximum water depth is 14 m.Gilgel Abay, Ribb, Gumera, and Megech are the most important rivers feeding into Lake Tana and contribute over 90 % of the total inflow.
The zonal vegetation of the Lake Tana basin is dry evergreen Afromontane forest.However, only small patches of remnant forest currently exist due to heavy deforestation.The biodiversity of the Lake Tana basin is rich, and many endemic plant species grow in this catchment.There are large areas of wetlands in this basin, which are the home of many endemic birds.

Data sources
High-spatial-resolution satellite images provided by Google Earth and vegetation survey data were used to map vegetation.Field vegetation surveys were performed in 2015 and 2016, during which 156 vegetation plots were investigated (Fig. 1).
Prior to conducting the vegetation surveys in the Lake Tana basin, we selected survey sites using Google Earth.These sites were located within the large vegetation patch with a uniform appearance.When we reached a selected site, we placed the plot in the central area of the vegetation patch, at least 30 m away from the boundary.We recorded the name, coverage, and height of each species in the vegetation plot (1 m × 1 m for herbaceous, 5 m × 5 m for shrub, and 20 m × 20 m for arboreal plants).Three Ethiopian geobotanists participated in these vegetation surveys, who were responsible for species identification.They also helped us to assess the reasonability of the sites selected for the vegetation survey.
In addition to the Google Earth images and surveyed vegetation plots, the Atlas of the Potential Vegetation of Ethiopia compiled by Friis et al. (2011) was an important reference in this research.
Earth Syst.Sci.Data, 10, 2033Data, 10, -2041Data, 10, , 2018 www.earth-syst-sci-data.net/10/2033/2018/ 3.2 Vegetation classification system Shimelis et al. (2008) classified the vegetation and land cover of the Lake Tana basin into 13 types: forest-mixed, forestevergreen, forest-deciduous, range-bush, pasture, rangegrasses, wetland-mixed, plantation, barley, teff, maize, urban, and water areas.Based on this vegetation classification system and suggestions from the Ethiopian geobotanists, the vegetation of the Lake Tana basin was categorized into seven groups: natural forest, woodland, plantation forest, bushland, grassland, wetland, and cultivated land.Three types of nonvegetation cover, i.e., water body, village and urban, were also mapped.Sub-types of these vegetation groups exist for variations in dominant species; however, we did not differentiate these sub-types owing to the limitations of the spatial resolution of the satellite images.

Interpretation marks
Fifty-two vegetation plots were randomly selected to establish interpretation marks.The coordinates of the vegetation plots were recorded and then transformed into KML files, which could be read by the Google Earth software.These KML files were opened in Google Earth and used to establish interpretation marks according to the color and texture characteristics of the vegetation in the satellite images (Fig. 2).The keys to image interpretation are as follows: -Natural forest.Crowns are dense, usually tightly packed, and overlapping in clusters.The texture is coarse and the color is green or dark green.This type is mostly located around churches or near rivers.
-Woodland.This vegetation cover appeared as large crowned trees.The color is green or yellow-green and the texture is coarse.The canopy may be tightly packed or open with visible patches of understory.
-Plantation forest.Uniformly spaced dense trees are almost the same height.The color is dark green and the texture is coarse.The crowns are tightly packed with an almost uniform texture.The patch tends to be rectangular with straight rows.
-Bushland.The signature feature of this type is a coarse texture with mottled tones.Shrubs are unevenly spaced, and tend to clump, presenting a mottled pattern in the area.There may be a mixture of scattered trees in the bushland.
-Grassland.This type is almost smooth in appearance.
The color is light green, beige, or light brown.Cow trails may be visible.
-Wetland.Coarse texture, dense, and dark green with irregular edges near pools, ponds, rivers, or lakes.Scattered shrubs and trees may exist.

Method of vectorization
Visual interpretation was employed to identify vegetation in Google Earth based on the established interpretation marks.
The "Add polygon" tool was used to vectorize the vegetation patches at a scale of approximately 1 : 5000.Three people participated in the vectorization of vegetation patches.To ensure that the identification criteria were consistent, vegetation/land identification was only conducted by one person.
The vegetation identification process received beneficial guidance from Ethiopian researchers who are familiar with the vegetation of the Lake Tana basin.We worked with Ethiopian geobotanists in two ways.The first is face-to-face co-working.From 27 October to 7 December 2015, during the vegetation survey period in the Lake Tana basin, we worked with Ethiopian geobotanists to identify vegetation.From 10 to 25 October 2016, we invited Ethiopian geobotanists to China to make revisions to our vegetation map.From 16 December 2016 to 2 January 2017, we collected vegetation plots in the Lake Tana basin to validate and revise the vegetation map with Ethiopian geobotanists.Secondly, we also consulted with Ethiopian geobotanists via email when we were uncertain about the results of vegetation identification.
Our collaborators from Ethiopia are geobotanists who are very familiar with the Lake Tana basin.They greatly contributed to the identification of vegetation, and their professional knowledge guaranteed the quality of this vegetation map.
The vectorization and identification process continued for over one and a half years, and 33 171 polygons were generated to represent the vegetation patches of the Lake Tana basin on the map.The other 104 surveyed plots were used to assess the accuracy of vegetation identification.The result of this assessment is presented in Table 1.
The KML files of all vegetation types were imported into the Global Mapper software (v16.0) and then transformed into SHP files, which could be read by ArcGIS (v9.3,ESRI).In ArcGIS, the vegetation type of each polygon was marked in an attributes table and all SHP files were merged into one.Finally, a vegetation map was designed and exported for printing on A1-sized paper (approximate scale of 1 : 310 000) (Fig. 3).

Accuracy of vegetation identification
Table 1 presents the accuracy matrix of vegetation identification, which was calculated based on the surveyed plots.Plantation forests were all correctly identified.The identification accuracy of grassland was also very high, with only a few plots identified as cultivated land.The identification accuracy of bushland, forest, and wetland is lower than that of grassland and plantation forest but higher than that of woodland.
Other land cover types, such as water bodies, villages, and urban areas, are easily identified; therefore, we did not perform accuracy assessment for these three land cover types.

Natural forest
Two types of natural forest exist in this basin: dry evergreen Afromontane and riverine forest (Friis et al., 2011).The altitude at which dry evergreen Afromontane forests occur ranges from 1500 to 2700 m.The mean annual temperature and rainfall are 14-25 • C and 700-1100 mm (Friis, 1992).The high amplitude of altitude and rainfall result in complex habitats and species compositions.The characteristic arborous layer species are Podocarpus falcatus and Juniperus procera, and the dominant understory species are Croton macrostachyus, Ficus spp., Olea europaea subsp.cuspidata, Trema orientalis, and Maesa lanceolata.
Owing to the continuous expansion of cropland in the past, the natural forest was gradually destroyed.Small patches of remnant forests can be found in two main forms in this region: protected state and church forests.
Acacia-Commiphora woodlands usually occupy dry slopes with an altitude of 1000-1900m (ANRS, 2004)).Such habitats are characterized by large variations in soil and topography and diverse biotic and ecological elements.Most of the plant species in Acacia-Commiphora woodland have small deciduous or leathery evergreen leaves.
There is a large variation in the stand density of Acacia-Commiphora woodlands, and such woodlands were observed with three different formations: dense forest with closed canopies, scattered individuals, and wooded grassland.Acacia-Commiphora woodlands are also known for containing some Acacia, Boswellia, and Commiphora species, which can be used to produce gum and resin.

Plantation forest
Eucalyptus species are the main species of plantation forests.Cupressus lusitanica and pine species were also planted in some areas.In addition, Acacia mearnsii was also found in the southern area of the Lake Tana basin.
There are approximately 600 Eucalyptus species worldwide, and over 120 of these are found in Ethiopia (Alemayehu, 2017).Eucalyptus globulus and Eucalyptus camaldulensis are the most common and widely planted species in Ethiopia.E. globulus is usually planted in areas above 2200 m in altitude, and E. camaldulensis is planted in regions with an altitude of 1700-2400 m.
The development of Eucalyptus plantations was widely criticized as they suppress the growth of indigenous species and use large amounts of underground water.However, the plantation area of Eucalyptus forest has increased rapidly in the past 15 years (Birru et al., 2003).

Grassland
Grasslands are mainly distributed along rivers, around villages, on mountains and hilltops, on slopes, and on highlands with stony and shallow soils.Common grassland species are Eragrostis spp., Pennisetum spp., Panicum spp., Echinochloa spp., Setaria spp., Hyparrhenia spp., Cymbopogon spp., and Sorghum spp.Scattered shrubs are present on grassland, such as Senna spp.and Maytenus senegalensis.
Wetlands have rich biodiversity and provide diverse ecological functions.The lake and its tributaries are the home of 28 fish species, 15 of which are endemic to Ethiopia.Over 300 species of birds have been observed and recorded in the Lake Tana basin, which has been defined as an international bird site by BirdLife International (BLI) (Shimelis, 2013).

Cultivated land
Teff, sorghum, chickpea, rice, maize, and sesame are widely planted in the Lake Tana basin.These crops are often mixed with endemic or exotic arbor species, such as Croton macrostachyus, several Acacia species, Albizia gummifera, Cordia africana, Juniperus procera, Grevillea robusta, and Sesbania sesban, which forms a complex agroforestry system.

Water body
Lake Tana is the largest water body in this watershed.The total area of Lake Tana is 3080.8km 2 , which constitutes 98.98 % of the total water surface area.

Village
Many of the villages in the Lake Tana basin are very small.These small villages are sparsely distributed throughout the landscape.It is difficult to vectorize all village patches; therefore, only large villages were identified and vectorized in this research.

Urban
There are two large cities in the Lake Tana basin: Gonder and Bahir Dar.The total urban area is 69.04 km 2 , occupying 0.46 % of the Lake Tana basin.

Discussions
5.1 Vegetation/land classification system IGBP DISCover (Belward, 1996) and the Land Cover Classification System (Di Gregorio and Jansen, 2000) are more detailed than the system adopted in this research.However, these two systems require more information to classify different land cover types.For example, canopy cover and plant height are required to differentiate closed from open shrublands.The differentiation of woody savannas, savannas, and grasslands also depends on the canopy cover and plant height of upper-layer vegetation.In addition, satellite images generated during different seasons were required to differentiate "evergreen forest" from "deciduous forest".However, it is difficult to collect such information, and it is not easy to identify these vegetation covers based only on Google Earth images.
After over 50 years of deforestation and land reclamation, most needle leaf forests in the Lake Tana basin have been destroyed for the timber trade.Many researchers deemed that there is no typical "savanna" in the Lake Tana basin based on the IGBP system, and they prefer to use "grassland" in the land/vegetation classification (Shimelis et al., 2008;Aster and Seleshi, 2009;Wubneh and Amare, 2017).
Therefore, in this research, we merged different forest types (evergreen needle leaf, evergreen broadleaf, deciduous needle leaf, deciduous broadleaf, and mixed forest) into natural forest.We also merged closed and open shrublands into bushland, and woody savannas, savannas, and grasslands were merged into grasslands.
Woodland was separated from the natural forest category.Owing to the altitude at which woodland exists, the species composition and community physiognomy are quite different from that of natural forest (dry evergreen Afromontane and riverine forests) (Friis et al., 2011).
Plantation forest (Eucalyptus) is a very important forest type in the Lake Tana basin as it plays a vital role in the development of forestry and agriculture.The Ethiopian ecologists strongly suggested that we differentiate Eucalyptus from other forest types.
Finally, the vegetation of the Lake Tana basin was categorized into seven groups: natural forest, woodland, plantation forest, bushland, grassland, wetland, and cultivated land.Three types of non-vegetation cover, i.e., water body, village, and urban, were also mapped in this research.

Accuracy of vegetation identification
The validation using the surveyed plots indicated that the identification accuracy exceeded 85 % for most vegetation/land cover types, except woodland.Table 1 shows that misclassifications occurred more between bushland, woodland, and natural forest.Bushlands are usually composed of low and sparse shrubs.However, in some areas, shrubs can grow to be tall and dense.It is difficult to differentiate bushlands from woodlands and natural forests if this is the case.Woodlands mainly consist of arboreal species and are usually regarded as sparse "forest".However, it is not easy to distinguish between natural forest and dense woodland using remote sensing images.
Earth Syst.Sci.Data, 10, 2033Data, 10, -2041Data, 10, , 2018 www.earth-syst-sci-data.net/10/2033/2018/ Misclassification also occurred between grassland, wetland, and cultivated land.The color and texture of grassland and wetland (especially seasonal wetlands) are similar during the dry season.Therefore, wetlands can easily be identified as grasslands.The color and texture of abandoned cultivated land are very similar to those of grassland.Therefore, grasslands were identified as cultivated lands.
The number and distribution of sampling sites significantly influence the validation of remote sensing data products (Darvishzadeh et al., 2011).In this research, the vegetation plots used to validate the result of vegetation identifica-tion were collected along a cement road, which caused biases and uncertainties in the validation.

Flaws existed in vectorization
The polygons of vegetation patches were generated through manual delineation.The quality of the vector data was greatly influenced by the technicians who conducted the vectorization.Although a training course was held to unify the vectorization criteria, some flaws still occurred during vectorization.We found that the boundaries of some polygons C. Song et al.: Vegetation map of the Lake Tana basin were not delineated strictly along the border of vegetation patches, which negatively affected the quality of this dataset.
Another issue is that there were gaps between polygons caused by the vectorization approach adopted in this research.The polygons of vegetation patches were delineated by the "Add polygon" tool of Google Earth.If two patches with different vegetation covers are connected or they are very close, then a gap will be created between the two vegetation polygons.
We did not interpret patches of cultivated land because it is difficult to determine which crops are planted among them using Google Earth images.Another reason is that our main objective in this research was to map the natural vegetation of the Lake Tana basin.

Potential uses of this dataset
This vegetation map provides detailed data on the spatial distribution of vegetation in Lake Tana basin.It could be used to aid local governments in producing development plans for forestry, agriculture, and stockbreeding in the Lake Tana basin.This vegetation map could also be used in the conservation of natural resources as it can help managers to determine the conservation targets for the Lake Tana basin.Moreover, this vegetation map could be used as basic data for studying changes in land use, restoration ecology, landscape ecology, ecological modeling, and hydrological modeling.

Figure 1 .
Figure 1.Location of the Lake Tana basin, survey route, and plots.

Figure 2 .
Figure 2. Interpretation marks based on the Google Earth images.

Figure 3 .
Figure 3. Vegetation map of the Lake Tana basin, Ethiopia.

Table 1 .
Accuracy of vegetation identification (%).Numbers in the table represent the percent of one vegetation type classified into another category.Taking bushland for instance, "85" means that 85 % of bushland plots were correctly identified as bushland and "15" means that 15 % of bushland plots were identified as woodland.