Glacial lake inventory of high-mountain Asia in 1990 and 2018 derived from Landsat images

There is currently no glacial lake inventory data set for the entire high-mountain Asia (HMA) area. The definition and classification of glacial lakes remain controversial, presenting certain obstacles to extensive utilization of glacial lake inventory data. This study integrated glacier inventory data and 668 Landsat TM, ETM+, and OLI images and adopted manual visual interpretation to extract glacial lake boundaries within a 10 km buffer from glacier extent using ArcGIS and ENVI software, normalized difference water index maps, and Google Earth images. The theoretical and methodological basis for all processing steps including glacial lake definition and classification, lake boundary delineation, and uncertainty assessment is discussed comprehensively in the paper. Moreover, detailed information regarding the coding, location, perimeter and area, area error, type, time phase, source image information, and subregions of the located lakes is presented. It was established that 27 205 and 30 121 glacial lakes (size 0.0054–6.46 km2) in HMA covered a combined area of 1806.47± 2.11 and 2080.12± 2.28 km2 in 1990 and 2018, respectively. The data set is now available from the National Special Environment and Function of Observation and Research Stations Shared Service Platform (China): https://doi.org/10.12072/casnw.064.2019.db (Wang et al., 2019a).


Introduction
Against the background of climate warming and the consequent widespread mass loss of glaciers in alpine regions, increasing volumes of glacial meltwater are being released. This results in glacial lake expansion and extended areas of low-lying terrain (e.g. depressions and troughs) left behind by retreating glaciers in which water can accumulate and new glacial lakes can form (Clague and Evans, 2000;Mool et al., 2001;Song et al., 2016). As both a water resource and a source of flash flood and debris flow hazards, glacial lakes participate in several natural processes, e.g. re-gional energy and water cycles (Slemmons et al., 2013); act as both indicators and containers of environmental information (Wang et al., , 2019bZhang et al., 2019); and drive hillslope erosion and landscape evolution (Cook et al., 2018) in the alpine cryosphere. On the one hand, glacial lakes act as temporary storage for the meltwater resource because a considerable amount of meltwater is retained by glacial lake expansion; e.g. approximately 0.2 % a −1 of the total glacial meltwater was reserved in glacial lakes from 1990 to 2010 in the Tien Shan in central Asia (Wang et al., 2013). On the other hand, given the worldwide expansion in lake area in recent decades, the potential will increase for glacial lakes to develop into glacial lake outburst floods and related debris flows that could threaten downstream residents, infrastructure, and regional ecological and environmental security (Huggel et al., 2002;ICMOD, 2011;Bolch et al., 2012;Haeberli et al., 2016). Thus, glacial lakes perform important roles both in the meltwater cycle and in glacier hazard evolution in the cryosphere.
Following the rapid development of remote sensing technology and computer science, remote sensing imagery acquired by various satellites and sensors has been used widely in glacial lake research. In particular, Landsat imagery has become the most important data source for the dynamic investigation of glacial lakes because of its wide coverage, continuous and long-term temporal sequence, and accessibility. Based on remote sensing data, both the distribution and the characteristics of change in glacial lakes in the mountains and watersheds in the high-mountain Asia (HMA) region have been widely reported (Supplement Table S1). For example, multi-source remote sensing imagery has been used to compile glacial lake inventories for regions of the Tibetan Plateau , the Tien Shan (Wang et al., 2013), the Himalaya (Gardelle et al., 2011;Nie et al., 2017), the Hengduan Mountains (Wang et al., 2017), Uzbekistan (Petrov et al., 2017), Pakistan (Senese et al., 2018), and HMA, excluding the Altai and Sayan (Chen et al., 2020). These inventories have proved an important data resource both for revealing the spatio-temporal characteristics of glacial lakes and for understanding the response of glacial lakes to the effects of climate change in these regions.
Automatic and semi-automatic glacial lake boundary vectorization approaches have been used most widely in regional glacial lake investigations because of their higher efficiency and objectivity in comparison with manual visual vectorization. In such research, water bodies are usually determined based on the characteristics of different remote sensing bands and computer-dependent algorithms, e.g. the normalized difference water index (NDWI), band ratio, support vector machine, decision tree, spectral transformation, object-oriented classification, global-local iterative scheme, active contour model, and random forest (Gardelle et al., 2011;Huggel et al., 2002;Li et al., 2011;Veh et al., 2018;Zhang et al., 2018). However, manual post-processing is often required to calibrate the uncertainties that could easily be produced by the above approaches. Furthermore, the labour costs associated with rectification of lake boundary errors increase sharply with increasing complexity of study area terrain (Yang et al., 2019). With consideration of the accuracy, efficiency, and time overheads associated with the various vectorization approaches, a manual vectorization approach was adopted for investigation of the glacial lakes on the Tibetan Plateau  despite the labour requirements and the anticipated additional errors produced by individual subjectivity (Nie et al., 2017;Yang et al., 2019;Song et al., 2014).
Controversies and knowledge gaps remain regarding available glacial lake inventories for different alpine cryosphere regions, which present certain obstacles to extensive utilization of glacial lake inventory data. The main problems relate to regional differences in lake development, inconsistent specifications of lake definition, and the adoption of various approaches regarding lake interpretation (Yao et al., 2018). There is no existing comprehensive glacial lake inventory for the entire HMA, and knowledge regarding the spatiotemporal characteristics of glacial lakes in this region remains incomplete. The objectives of this study were to fill this knowledge gap by producing a glacial lake inventory data set for HMA derived from Landsat images and to provide fundamental data for water resource evaluation, assessment of glacial lake outburst floods, and glacier hydrology research in the mountain cryosphere region.

Study area
The HMA area mainly comprises the Tibetan Plateau and surrounding alpine ranges. The area is divided into 13 subregions in version 6 of the Randolph Glacier Inventory (RGI 6.0), i.e. the Himalayan area (western Himalaya, central Himalaya, and eastern Himalaya), the Hengduan Mountains, southern and eastern Tibet, Inner Tibet, the Karakoram and western Kunlun, the Qilian Shan and eastern Kunlun, the Hindu Kush, the Pamir, the Hissar-Alay and western Tien Shan, the eastern Tien Shan, and the Altai and Sayan (Arendt et al., 2015;Pfeffer et al., 2014). The boundaries of the 13 subregions and outlines of the glaciers in HMA derived from RGI 6.0 are shown in Fig. 1. This region (26-54 • N, 67-104 • E) is characterized by tremendously complex topographic conditions with a widespread distribution of mountain glaciers. According to the Climatic Research Unit Time Series v4.02 data set (http://data.ceda.ac.uk/badc/cru/data/ cru_ts/cru_ts_4.02/, last access: 5 September 2020), the air temperature of the different subregions in HMA increased at an average annual rate of 0.002-0.054 • a −1 during 1990-2018 ( Fig. 1). The annual rate of change in precipitation in HMA during 1990-2018 varied from −9.9 to 4.2 mm a −1 with a small average rate of increase of 0.3 mm a −1 .
The HMA area has the largest surviving glaciers of any region other than the polar regions. As reported in RGI 6.0, there were 97 973 modern glaciers in our study area, covering a total area of approximately 98 768.86 km 2 . Together, these glaciers produced an average negative mass balance of −150 ± 110 kg m −2 a −1 (Hock et al., 2019), which was the primary source of water supply for the development of glacial lakes. Over recent decades, glaciers in most areas of HMA appear to have experienced widespread mass wastage and area shrinkage (Bolch et al., 2012;Yao et al., 2012;Kääb et al., 2012;Brun et al., 2017). However, the so-called "Karakoram Anomaly" refers to a region that is a prominent exception, which is characterized by glaciers with stable or positive mass balance (Hewitt, 2005;Gardner et al., 2013;Kääb et al., 2015).  , and buffer area within 10 km of glacier extent for glacial lake inventory of high-mountain Asia.

Data source
We developed our glacial lake inventory of HMA based on 668 high-quality images selected from more than 1800 Landsat images with a 30 m spatial resolution derived from the websites of the United States Geological Survey (https://www.usgs.gov/, last access: 5 September 2020) and Geospatial Data Cloud (http://www.gscloud.cn/, last access: 5 September 2020). To ensure the accuracy of glacial lake boundary extraction, the following criteria were applied to imagery selection. First, the cloud coverage in an image had to be < 10 %. Second, for areas with no eligible or only lowquality imagery (because of snow or shadows) in the given year, acceptable images from years closest to the given year were chosen as replacements (Fig. 2). Third, images acquired in summer or autumn (June-November), when lake areas were believed to be near to or at their maximal extent, were set as optimal choices to minimize the impact produced by seasonal area changes in the glacial lakes (Fig. 2). Based on the above criteria, 394 and 274 Landsat images were selected to represent circa 1990 and circa 2018, respectively, which completely covered the buffer area within 10 km of glacier extent acquired from the second Chinese glacier inventory (http://westdc.westgis.ac.cn, last access: 5 September 2020) and RGI 6.0 (https://www.glims.org/RGI/rgi60_dl.html, last access: 5 September 2020). Among the selected images, those acquired during summer and autumn (June-November) accounted for 82.0 % of the total number of selected images, while those acquired during autumn (September-November) accounted for 56.9 % of the total number. In addition, a Shuttle Radar Topography Mission digital elevation model with a spatial resolution of 1 arcsec (http://imagico.de/map/ demsearch.php, last access: 5 September 2020) was used to derive the elevation of the glacial lakes.

Outline of workflow
The methods and workflow adopted in this study to produce the glacial lake inventory mainly included collation of knowledge and formulation of the specifications of the glacial lake inventory, data preprocessing, manual vectorization of glacial lakes, interactive checking and error controlling, and attribute database assignment (Fig. 3).
1. Collation of available knowledge regarding glacial lake inventories. As much literature as possible relevant to the investigation and recording of glacial lakes was collected. The various definitions and classifications of glacial lakes, as well as the methods adopted previously for glacial lake boundary extraction and assessment of the extent of glacial lake distributions, were summarized, and normative rules were formulated for the HMA glacial lake inventory, as explained further in Sect. 4.2.
2. Formulation of the specifications of lake identification. First, a working group of four leading experts in the field was founded in 2014 to discuss and formulate the specifications of the glacial lake inventory. Current knowledge regarding identification of lakes from Landsat imagery (e.g. pixel colour, lake shape, and lake background features) and specifications of vectorization (e.g. viewing scale of 1 : 10 000 on a computer screen vectorization of mixed pixels) were discussed, and unified operating criteria were compiled to guide the glacial lake inventory operatives. Novice vectorization operatives were trained until their vectorization results met the pre-specifications of the inventory.
3. Preprocessing of remote sensing data. Preprocessing of the Landsat imagery included false-colour compositing and calculation of NDWI maps. The false-colour composite images were based on combinations of the operating bands of 7, 5, and 2 or 4, 3, and 2 for Landsat TM and ETM+ images and 5, 4, and 3 for Landsat OLI images. The preliminary lake extent was extracted automatically from each image over the entire HMA area using the NDWI based on the near-infrared band (NIR) and green band (GREEN), which represent the minimum and maximum water reflectance, respectively (McFeeters, 1996;Zhai et al., 2015;Li et al., 2016;Zhang et al., 2018): where B i is the spectral band of Landsat imagery. The NDWI maps were calculated for each selected Landsat image using different region-specific thresholds. Liu et al. (2016) and Du et al. (2014) suggested that it might be preferable to set the optimized threshold of NDWI GREEN/NIR of Landsat OLI images to −0.05. By considering the edge effects according to the mixed pixels, this study initially selected a lower optimal threshold (approx. −0.1) for specific images to obtain the maximum water body. Then, higher thresholds were tested for visual water extraction before a suitable threshold (which varied in the range of −0.10 to 0.20) was selected for a given image to obtain the NDWI map. When manual vectorization was performed on a falsecolour composite image, the NDWI maps of potential glacial lakes were overlaid to assist in glacial lake identification.
4. Manual vectorization and entering of attribute data. The inventory work was performed during 2014-2019. Seven groups were formed to conduct lake boundary vectorization of the 13 HMA subregions. After vectorization of a glacial lake, it was required that manual attribute items (e.g. data source and lake type) be input concurrently.

5.
Interactive checking and accuracy control. First, glacial lakes were discerned via human-computer interaction; i.e. potential glacial lakes were revealed by the NDWI maps or identified visually from the false-colour composite images. Second, glacial lake boundary vectorization results were checked interactively by another vectorization operative to eliminate misclassified areas of shadow and ice and to add areas of glacial lakes evidently omitted in the boundary extraction process. This checking process also minimized the subjective judgement errors of the operatives. Third, attribute items such as glacial lake classification, new or disappeared lakes, and separated or coalesced lakes were checked interactively. In this process, Google Earth imagery was used as an important auxiliary reference data source for error examination.

Definition of glacial lakes
The definition of a glacial lake determines the type of cryosphere water body that will be recorded as a glacial lake. There are multiple definitions of a glacial lake based on different perspectives (Mool et al., 2001;Yao et al., 2018). When glacial lake inventories are undertaken, most emphasize the elementary role of glaciation in the formation of glacial lakes (Clague and Evans, 2000;Mool et al., 2001). The important difference is whether the period of glaciation or the supply source of glacial lakes is given greatest attention. Some studies that have focused on the former have proposed that a glacial lake is a natural water body formed by alpine glacier movement since the Last Glacial Maximum, i.e. distinguishing between ancient or modern glaciers (Liu et al., 1988;Costa and Schuster, 1988). However, other studies have emphasized the relation of glacial lakes to meltwater in glaciated areas (Wang et al., 2013Zhang et al., 2015). The glacial lake inventory data compiled in this study are intended for use both in water source evaluation and in assessment of environmental change in the alpine cryosphere. Thus, lakes related to glaciers or to glaciation in the alpine cryosphere were all recorded as glacial lakes. Most Quaternary glaciers have disappeared, and the remaining relics are incomplete, which makes it difficult to recover a continuous and complete glaciation range in alpine regions. Thus, it is of great importance to determine the range of glaciation in an alpine region when conducting a glacial lake inventory based on remote sensing data. The most practical approach might be to specify an indicator threshold to define the glaciation extent according to relevant findings of existing glacier relics in a typical region. On the one hand, the glaciation frontier can usually be indicated by a specified lowest-elevation threshold, which is generally closely related to the regional climatic context caused by the elevation effect. However, the lowest-elevation threshold might vary enormously with respect to different regions because regional climatic settings differ. For instance, the lowest elevations of 1700 m in Austria (Buckel et al., 2018), 2000 m in Pakistan (Senese et al., 2018), 3000 m in Nepal and Bhutan (Mool et al., 2001), and 3500 m in Peru (Hanshaw and Bookhagen, 2014) were used as specified elevation thresholds to record glacial lakes. On the other hand, defining glaciation extent within a specific distance from modern glacier terminals could be more suitable for the establishment of a glacial lake inventory in relatively large-scale regions with a complex regional climate, because the differing climate within large-scale regions can be indicated approximately by the lowest elevation of individual glacier terminals. Some studies have adopted distances of 2, 3, or 10 km from modern glacier terminals as thresholds with which to define areas of glacial lakes (Petrov et al., 2017;Veh et al., 2018;Wang et al., 2012Wang et al., , 2013. Distances of 2, 5, 10, and 20 km were considered by Zhang et al. (2015). They found that a distance of 10 km from a modern glacier terminal might be a reasonable guide to glaciation extent and a threshold suitable for a glacial lake inventory of the Tibetan Plateau. This was supported by the finding that the most distant glacierized boundary of the Little Ice Age was up to 10 km from the modern glaciers in the Himalaya area (Wang et al., 2012;Nie et al., 2017). Additionally, to record glacial lakes more precisely, combined distance and elevation thresholds have been used simultaneously to define areas of glacial lakes in special small regions; e.g. lakes at elevations above 1500 m and within 2 km of modern glaciers were recorded as glacial lakes in Uzbekistan (Petrov et al., 2017). In this study, given the large scale of the HMA region with its complex climatic context and extremely varied terrain, the data set compiled included glacial lakes within a buffer zone of 10 km from modern glacier extent, which covered an area of approximately 1.25×10 6 km 2 according to the second glacier inventory of China and RGI 6.0 ( Fig. 1).

Classification of glacial lakes
In glaciation regions, the characteristics of glacial lakes, which include the phase of lake formation, lake basin topography, dam material constituents, geometrical relationship with modern glaciers, and source of water supply (or combinations thereof), have been employed as the basis for glacial lake classification systems (Huggel et al., 2002;Liu et al., 1988;Mool et al., 2001;Yao et al., 2018). For instance, based on lake basin topography, lakes in an inventory of the Hindu Kush-Himalaya region were classified as erosion lakes, valley trough lakes, cirque lakes, blocked lakes, lateral moraineand end moraine-dammed lakes, and supraglacial lakes (Liu et al., 1988;Mool et al., 2001). Recently, Yao et al. (2018) presented a complete classification schema for glacial lake inventory and study of glacial lake hazards that included six classes and eight subclasses based mainly on the mechanism of glacial lake formation, lake basin topography, and the geometrical relationship with modern glaciers.
Generally, it is a little difficult to distinguish glacial lake type in terms of material properties, topographic features, and phase of lake formation using remote sensing imagery. Moreover, most of the standards mentioned above were found inapplicable in previous studies of glacial lake classification in large-scale regions such as HMA because of the lack of enough remote sensing data with a satisfactory spatial resolution. In this study, the hydrologic relationship between glacial lakes and modern glaciers was adopted as a classification criterion because the present data set is intended to provide fundamental data for water resource evaluation and glacier hazard assessment. Consequently, glacial lakes were divided into just two types: glacier-fed lakes and non-glacier-fed lakes. The glacier-fed lakes were further divided into three subclasses: supraglacial lakes (lakes developed on glacier surface), ice-contacted lakes (lakes contacting the glacier terminal or margin), and ice-uncontacted lakes (lakes not contacting the glacier but fed directly by glacial meltwater). This classification was based on whether the surface hydrological flow of the modern glacier and topographic features of the lake basin allowed for a lake to receive meltwater from the modern glacier. To achieve reliable classification results, glacial lakes were distinguished with the assistance of 3D digital terrain imagery from Google Earth, a Shuttle Radar Topography Mission digital elevation model, and glacier outlines from RGI 6.0. Based on visual inspection of the satellite images and with reference to 3D digital terrain imagery from Google Earth, we recorded a glacierfed lake when (1) a lake had lower elevation than the modern glacier (mother glacier) and (2) the meltwater of the mother glacier(s) could flow into the lake via surface channels. It is common for glacial lakes to be fed by meltwater through subsurface channels; however, we ignored this because it is difficult to survey the subsurface channels of glacial lakes using remote sensing data. In addition, lake type was distinguished based on the topographic features of the lake basin and the modern glaciers; in most cases, it is possible that the lakes were fed by meltwater flowing through both subsurface and surface channels.

Extraction of lake boundary
This study adopted automatic glacial lake extraction and manual glacial lake boundary vectorization to determine glacial lake boundaries. In the NDWI-based automatic lake boundary extraction approach, two bands were selected to facilitate a ratio calculation to maximize the difference between water and non-water objects in the remote sensing imagery based on a given threshold. The given threshold was determined subjectively with consideration of how much detailed information of the lake water bodies was captured precisely. The given threshold was varied to account for various factors such as the differences in Landsat sensors (i.e. TM, ETM+, and OLI), time phase of images, quality of images, and complexity of surface features. To achieve the optimal threshold for lake water body recognition, the candidate threshold was debugged iteratively for each image. In practice, because the area of the glacial lakes was usually small (see next paragraph) and the spectral features of the lake water bodies were varied, the threshold had to be set to allow for the capture of the greatest number of water body pixels, which consequently resulted in simultaneous acquisition of more non-lake-water-body noise information. It also resulted in more effort in the subsequent manual modification to reduce noise information using methods such as algorithms to eliminate mountain shadows (Gardelle et al., 2011).
Manual visual vectorization distinguishes lake boundaries by identifying the unique texture, colour, and other charac-teristics of glacial lakes in false-colour composite images based on available professional knowledge and accumulated experience in vectorization operations. Even though it was regarded a time-consuming and labour-intensive process, it was also considered an attractive approach because of its consistency, high level of quality control, and reasonably simple operational procedure, given the varied quality of Landsat images available for the large-scale HMA region. In this study, the manual visual vectorization process was generally found more suitable in terms of effort and precision for generating a glacial lake inventory data set of the HMA region in comparison with automatic glacial lake extraction. Therefore, manual visual vectorization in conjunction with NDWI maps was the main method adopted to extract glacial lake boundaries to minimize the deficiencies produced by individual subjectivity of the operatives.
The minimum number of pixels used to extract a glacial lake water body was found to be inconsistent in the available literature. For example, arbitrary threshold areas of 0.0027 km 2 (3 lake water body pixels; Zhang et al., 2015) and 0.0081 km 2 (9 lake water body pixels; Nie et al., 2017) were used in earlier glacial lake investigations. Moreover, minimum threshold areas of 0.01 km 2 (approximately 10 lake water body pixels), 0.02 km 2 (approximately 22 lake water body pixels), and 0.1 km 2 (approximately 111 lake water body pixels) have also been set to evaluate the level of risk of glacial lake outburst floods in the Himalaya and Tien Shan (Petrov et al., 2017;Wang et al., 2013;Worni et al., 2013;Bolch et al., 2011;Allen et al., 2019). Theoretically, 1 pure pixel of a lake water body could be recorded as a glacial lake. However, a glacial lake is generally not represented by 1 pure pixel unless it is aligned perfectly with the raster grid; usually, it would be surrounded partly or fully by 1-8 mixed lake water body pixels (Fig. 4a, b). Consequently, manual delineation was required for approximately one-half, one-eighth, or seven-eighths of the peripheral mixed pixels surrounding pure lake water body pixels (Fig. 4d, e). If 3 or 4 pure lake water body pixels exist in a Landsat image, the maximum number of peripheral mixed pixels is 12 (Fig. 4d, e). Usually, for 3 pure lake water body pixels, the ratio of the area of pure lake water body pixels to the area of peripheral mixed pixels can be expressed as follows: Area of peripheral mixed pixels Area of 3 pure lake water body pixels × 100 % = 6 × 1 2 + 5 × 1 8 + 1 × 7 8 3 × 100 % = 150 %. (2) For 4 pure lake water body pixels, the ratio of the area of pure lake water body pixels to the area of peripheral mixed pixels is Area of peripheral mixed pixels Area of 4 pure lake water body pixels × 100 % = 8 × 1 2 + 4 × 1 8 4 × 100 % = 112.5 %.
(3) Figure 4. Sketches showing the relationships of pure water body pixels and surrounding potential mixed water body pixels: (a) a pure water body pixel surrounded by 4 potential mixed water body pixels, (b) a pure water body pixel surrounded by 8 potential mixed water body pixels, (c) 2 pure water body pixels surrounded by 10 potential mixed water body pixels, (d) 3 pure water body pixels with 12 surrounding potential mixed water body pixels, and (e) 4 pure water body pixels with 12 surrounding potential mixed water body pixels.
Thus, in this study, the minimum glacial lake area recorded was set at 0.0054 km 2 (e.g. 3-4 pure lake water body pixels with approximately 12 peripheral mixed pixels, which equate to approximately 6 full lake water body pixels) because a lake area covering less than 3 pure lake water pixels could possibly have an error of > 100 % (Fig. 4b, c) despite the revised coefficient of 1 standard deviation (0.6872) involved (see Sect. 5).

Input of attribute items
Eight attribute items were input into the HMA glacial lake inventory: lake coding, location (longitude, latitude, and elevation), perimeter, area, type, area error, time phase, source image information, and subregion of located lake. (1) We encoded each glacial lake based on its central location using the same coding format as used by the National Snow and Ice Data Center to encode glaciers. The code can be expressed as "GLmmmmmmEnnnnnN", where m and n represent the results of the longitude and latitude of each glacial lake centroid multiplied by 1000, respectively; GL is the abbreviation of glacial lake; and E and N represent eastings and northings, respectively.
(2) The location information of each glacial lake was labelled as the geographic coordinates of the centroid of the shape of each glacial lake, calculated using ArcMap software. The lake elevation was defined as the average elevation of a buffer zone of a 30 m radius centred on the glacial lake centroid, which was derived from the Shuttle Radar Topography Mission digital elevation model. (3) The area and perimeter of each lake were calculated using Ar-cMap based on the unified geography coordinate system of WGS 1984 and the Asia North Albers equal-area conic projection system, respectively, to avoid errors caused by projection deformation. (4) The error of lake area was calculated using Eqs. (4) and (5) (Sect. 5). (5) Lake type, which was input manually, was defined as supraglacial lake, icecontacted lake, ice-uncontacted lake, or non-glacier-fed lake (see Sect. 4.2.2). (6) Lake time phase was the acquisition date of the original Landsat image, which was recorded as the time phase for each lake. (7) Source image information referred to the image number of the Landsat images used to extract the glacial lake boundary. (8) The subregion to which each lake belonged identified the regional location within the HMA area. Each lake was assigned based on .shp file data of the boundaries of the 13 HMA subregions, obtained from the National Snow and Ice Data Centre, using the ArcMap spatial analysis tool.

Error assessment
The errors associated with glacial lake extraction from remote sensing imagery using manual visual delineation are generally related to components of the quality of the images (e.g. spatio-temporal resolution, cloud coverage, and mountain shadows), experience, operative subjectivity, and the threshold area of the inventory (Gardelle et al., 2011;Hall et al., 2003;Paul et al., 2004;Salerno et al., 2012;Zhang et al., 2015). It has been reported that the area error in glacial lake boundary extraction based on remote sensing images can be approximately ±0.5 pixels depending on the quality of the imagery (Fujita et al., 2009;Salerno et al., 2012). Furthermore, the area error in glacial lake delineation attributable to manual delineation can be assumed to follow a Gaussian distribution (Hanshaw and Bookhagen, 2014). Hence, the theoretical maximum area error in glacial lake boundary extraction is the half area of the edge pixels because pure lake water body pixels are usually surrounded by mixed pixels (Fig. 4). The lake area error of a single glacial lake within 1 standard deviation (1σ ) can be expressed as follows (Hanshaw and Bookhagen, 2014): where P is the perimeter of the glacial lake (m), G is the spatial resolution of the remote sensing imagery (30 m in this data set), 0.6872 is the revised coefficient under 1σ (i.e. approximately 69 % of peripheral pixels are subjected to errors), E is the relative error of the glacial lake, and A is the total area of the glacial lake. Then the accumulation of errors in all the lake areas for the entire study region or subregions can be calculated using the following formula based on error propagation theory: where E T is the area error of the entire study region or subregions, i is the lake of no. i in the entire study region or subregions, and a is the error area of a single lake. The resulting calculated error indicated that the total absolute area error in HMA glacial lakes was approximately ±2.11 and ±2.28 km 2 and the average relative error was ±13.5 % and ±13.2 % in 1990 and 2018, respectively. The relative area errors of each lake varied from 1 % to 79 %, and a significant power exponential relationship was found between the relative area error and the sizes of the glacial lakes (E = 0.050A −0.45 , R 2 = 0.96, α < 0.001; Fig. 5a). Smallsized lakes (i.e. area ≤ 0.01 km 2 , which accounted for 2 % of the total lake area in HMA) had the largest average relative area error of 44.6 % (Fig. 5b). Medium-sized lakes (i.e. area of 0.01-0.1 km 2 , which accounted for 34 % of the total lake area in HMA) had an average relative area error of 22.0 % (Fig. 5c). Large-sized lakes (i.e. area ≥ 0.1 km 2 , which accounted for 64 % of the total lake area in HMA) had the smallest average relative area error of 7.6 % (Fig. 5d). In summary, smaller glacial lakes in the HMA region had larger relative area errors and vice versa.
To further verify the accuracy of the manual delineation of glacial lake boundaries, nine lakes located within the HMA region were surveyed using a portable GPS device (Trimble GeoXH6000) with decimetre accuracy during July-August 2018 (Fig. 6). The lakes selected for field survey covered areas of 0.01-2.97 km 2 . The field-based lake boundaries were compared with those obtained via manual delin-eation (i.e. derived from Landsat OLI imagery acquired during 2018). It was found that the area error (i.e. the percentage difference in the absolute area encircled by the manually delineated lake boundary and that derived by the GPS survey) varied from 5.5 % to 16.3 %. Moreover, it was determined that the average horizontal distance deviation between the two types of boundary varied from 3.2 to 15.3 m (Table 1). Overall, the horizontal deviations were largely confined to 1 pixel, and the average accuracy of the delineation of glacial lake boundaries was within ±0.5 pixels (±15 m).

Distribution and changes in HMA glacial lakes
As indicated by the achieved HMA glacial lake inventory, 30 121 (2080.12 ± 2.28 km 2 ) glacial lakes were identified in 2018 and their distribution had considerable spatial heterogeneity (Fig. 7). The greatest concentration of glacial lakes was in the Altai and Sayan (335.42 ± 0.88 km 2 , which accounted for 16.1 % of the total area of glacial lakes) and the eastern Himalaya (310.37 ± 0.89 km 2 , which accounted for 14.9 % of the total area of glacial lakes). Relatively few glacial lakes were found distributed in the eastern Kunlun and Qilian Shan (38.85 ± 0.29 km 2 , which accounted for 1.9 % of the total area of glacial lakes) and the eastern Tien Shan (40.55 ± 0.32 km 2 , which accounted for 2.0 % of the total area of glacial lakes). The HMA glacial lakes were located within the elevation range of 1357-6247 m in 2018. An approximate normal distribution was presented both for the lakes of the entire HMA region and for the lakes in most subregions. More than 43 % of the HMA lake area has survived within the vertical range of 4500-5400 m, with the peak lake area of 241.89 ± 0.80 km 2 (accounting for 11.6 % of the total area) in the range of 5100-5300 m. The elevation band of Table 1. Horizontal deviations between lake boundaries obtained by manual delineation and field survey using a portable GPS device (Trimble GeoXH6000).

Name
Lake Lake size Horizontal deviations of Area error (labelled in Fig. 6   rate of change of glacial lakes in the different 200 m elevation bands presented a large average trend against elevation, rising as a whole during 1990-2018 (Fig. 7). The lake area expanded most from the elevation of approximately 5000 m, and the rate of expansion reached approximately 28 % at 5700-5900 m in the entire HMA region, although it differed between different subregions. Lake area showed a notable rate of increase with elevation in most subregions, e.g. the Hissar-Alay and western Tien Shan, Hindu Kush, eastern Himalaya, Hengduan Mountains, eastern Tien Shan, and Altai and Sayan. The rate of expansion varied markedly and no observable trends in the rate of increase or decrease with elevation were discovered in Karakoram and western Kunlun, western Himalaya, and Inner Tibet. The rate of expansion in the central Himalaya and southern and eastern Tibet was found to have seemingly decreased with increasing elevation (Fig. 7).

Comparison and limitations
There are at least 34 published reports or data sets on the regional extent of glacial lakes in the HMA area, which are based on various lake boundary extraction methods and dif-ferent data sources (Supplement Table S1). This previous research work examined glacial lakes from as early as 1962 up until 2017. However, it is difficult to evaluate any discrepancies comprehensively because different extents of glacial lake distribution were examined and inconsistent thresholds of minimum lake area were used. Glacial lake inventory data of the Third Pole region in 1990  and of the HMA (Chen et al., 2020) in 2017 were used for comparison because both recorded glacial lakes in the same buffer zone (i.e. within 10 km of the modern glacier extent) and over similar periods. For the comparison, the same thresholds and regions were adopted for the inventory data. Marked discrepancies were found to exist between the different data sets in terms of both the number and the area of the glacial lakes. In 1990, only 4601 glacial lakes (≥ 0.0054 km 2 ) with a total area of 554.33 km 2 were recorded by Zhang et al. (2015), whereas 20 410 glacial lakes with a total area of 1376.23 km 2 were catalogued in the Third Pole region in this study. In 2017, 14 477 glacial lakes with a total area of 1635.94 km 2 were recorded by Chen et al. (2020), whereas we recorded 22 727 glacial lakes (≥ 0.0081 km 2 ) with a total area of 1726.41 km 2 in 2018 in HMA (excluding Altai and Sayan). We consider the discrepancies attributable to three primary factors. (1) The buffer zone within 10 km of the modern glacier extent was inconsistent between the data sets because different glacier inventories were used. (2) Different operatives catalogued the glacial lakes using different remote sensing data covering different periods. (3) Many glacial lakes were possibly missed because of the comparatively less manual vectorization effort involved in the work of  and Chen et al. (2020). Overall, our glacial lake inventory catalogued glacial lakes throughout the entire HMA more comprehensively and with more careful error assessment when compared with available glacial lake data sets from regional or river-basin-based studies.
Several limitations deserve proper consideration when using the glacial lake inventory data. First, a degree of uncertainty resulted from using Landsat image data that covered different periods, i.e. both interannually and seasonally. Although images acquired in summer or autumn (June-November) were set as optimal choices, the selected images covered most seasons of the year; e.g. the images selected in June-November accounted for only 72.3 % and 88.8 % of the total number in 2018 and 1990, respectively. Interannually, images were selected from a span of 10 a (1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995) and 4 a (2016-2019) to obtain sufficient high-quality images of the HMA area. Second, this study recorded all lakes located within the 10 km buffer area of glacier extent as glacial lakes. Therefore, certain lakes that have no relation to glaciers or to glaciation (i.e. non-glacial lakes) in the alpine cryosphere were potentially catalogued in error because of the difficulty in distinguishing non-glacial lakes from glacial lakes based on remote sensing data. Third, we identified water bodies related to glaciers or to glaciation in the alpine cryosphere as glacial lakes. However, in many cases, it was difficult to determine whether such bodies should be recorded as glacial lakes, e.g. cases of long narrow water bodies on rivers and cases where the number of pure water body pixels was small. Thus, some errors and inconsistences were inevitable because of having different operatives performing the lake boundary vectorization and inspection. In future, this glacial lake inventory will be updated and shared on the National Special Environment and Function of Observation and Research Stations Shared Service Platform (China).

Data availability
The data set developed in this study comprises two .shp file documents containing the glacial lake inventory of the HMA region in 1990 and 2018. The data set can now be accessed via the website of the National Special Environment and Function of Observation and Research Stations Shared Service Platform (China): https://doi.org/10.12072/casnw.064.2019.db (Wang et al., 2019a).

Conclusions
A glacial lake inventory of the HMA region was realized based on satellite remote sensing data and GIS techniques. Eight attribute items were recorded in the glacial lake inventory data set of the HMA region. Lake area error was assessed carefully with respect to theoretical analysis of lake boundary pixels and actual boundaries derived by GPS fieldbased surveys. On average, the deviations between the delineation of lake boundaries derived using the two methods were within ±0.5 pixels (±15 m). The relative area errors of each lake in 2018 varied from 1 % to 79 %, and the average relative area errors of ±13.2 % in the entire HMA region were characterized by an increase in the relative area error with decreasing lake size.
Overall, 30 121 glacial lakes with a total area of 2080.12± 2.28 km 2 were catalogued in 2018 in the HMA region. Glacial lakes survived in all 13 subregions of HMA from the elevation of 1357 to 6247 m. Glacial lakes were found to be concentrated in the subregions of Altai and Sayan and the eastern Himalaya and at elevation bands of 4500-5400 m. The HMA glacial lakes have experienced widespread expansion with an average rate of increase in area of 15.2 %. Lake area expanded most in the higher-elevation bands during 1990-2018. The data set is expected to provide basic data to support cryosphere hydrology research, water resources utilization and management, and assessment of glacier-related hazards in the HMA region.
Author contributions. XW designed the study and wrote the manuscript. XG, CY, and QL produced data and performed analysis. All other authors discussed and drafted the formulation of the specifications of the glacial lake inventory in this study. All authors contributed to the final form of the manuscript.
Competing interests. The authors declare that they have no conflict of interest.
Financial support. This research has been supported by the National Natural Science Foundation of China (grant nos. 41771075, 41571061, and 41271091).
Review statement. This paper was edited by Ge Peng and reviewed by two anonymous referees.