Annual 30-meter Dataset for Glacial Lakes in High Mountain Asia from 2008 to 2017

Atmospheric warming is intensifying glacier melting and lake development in High Mountain Asia (HMA), which could increase glacial lake outburst flood hazards and impact water resource and hydroelectric power management. There is a pressing need for obtaining the comprehensive knowledge of the distribution, area of glacial lakes, and also quantification of variability in their size and type at high resolution in HMA. Here, we developed a HMA Glacial Lake Inventory (Hi-MAG) 20 database to characterize the annual coverage of glacial lakes from 2008 to 2017 at 30 m resolution using Landsat satellite imagery. It can be observed that glacial lakes exhibited total area increases of 90.14 km between 2008-2017, a +6.90% change relative to 2008 (1305.59±213.99 km). Annual increase in lake number and area are 306 glacial lakes and 12 km, respectively, and maximum increased lake number occurred in 5400 m elevation, which increased by 249. Proglacial lake dominated areas such as the Nyainqentanglha and Central Himalaya, where around more than half of the glacial lake area 25 (summed over 1°×1° grid) consisted of proglacial lakes showed obvious lake area expansion, while in the regions of Eastern Tibetan Mountains and Hengduan Shan, the unconnected glacial lakes occupied about over half of the total lake area in each grid, exhibited stability or slight reduction. Our results demonstrate proglacial lakes are a main contributor to recent lake evolution in HMA, accounting for 62.87% (56.67 km) of the total area increase. Proglacial lakes in the Himalaya ranges alone accounted for 36.27% (32.70 km) of the total area increase. Regional geographic variability of debris cover, together 30 with trends in warming and precipitation over the past few decades, largely explain the current distribution of supraand proglacial lake area across HMA. The Hi-MAG database is available at: https://doi.org/10.5281/zenodo.4059181 (Chen et al., 2020), it can be used for studies on the complex interactions between glacier, climate and glacial lake, glacial lake outburst floods, potential downstream risks and water resources.


Introduction
High Mountain Asia (HMA), consisting of the whole Tibetan Plateau and adjacent mountain ranges such as the Himalayas, Karakoram, and Pamirs, contains the largest area of mountainous glaciers in the world. Atmospheric warming has resulted in widespread glacier retreat and downwasting in many mountain ranges of HMA (Bolch et al., 2012;Brun et al., 2017), which favors the formation and development of a large number of glacial lakes. However, glacial lakes have been incompletely documented over small time intervals. Glacial-lake development varies according to climatic, cryospheric, and lake-specific conditions, including whether the basin geometry is connected to glaciers and the length of lake and glacier contact (Zhao et al., 2018).
There have been many previously published studies devoted to mapping glacial lakes using remote-sensing data over different regions of HMA. Some works have focused on investigating the development of relatively large glacial lakes. Rounce et al. (2017) identified 131 glacial lakes in Nepal in 2015 that had an area greater than 0.1 km 2 . Li et al. (2020) compiled an inventory of glacial lakes (≥ 0.01 km 2 ) with a spatial resolution of 30 m in the Karakoram mountains. Aggarwal et al. (2017) shared a new dataset of glacial and high-altitude lakes having an area > 0.01 km 2 for Sikkim, eastern Himalaya, in the period 1972-2015. Ukita et al. (2011) constructed a glacial-lake inventory of Bhutan in the Himalayas for the period 2006-2010 based on the high-resolution Panchromatic Remote-sensing Instrument for Stereo Mapping (PRISM) and Advanced Visible and Near Infrared Radiometer type 2 (AVNIR-2) data from the Advanced Land Observing Satellite (ALOS). Considering that small lakes represent less of a glacial-lake outburst flood (GLOF) risk, they set 0.01 km 2 as the minimum lake size. Ashraf et al. (2012) used Landsat-7 Enhanced Thematic Mapper Plus (ETM+) images for the 2000-2001 period to delineate glacial lakes greater than 0.02 km 2 in the Hindukush-Karakoram-Himalaya region of Pakistan.
Because small glacial lakes are highly variable in their shape, location, and occurrence and are clearly sensitive to the warming climate and glacier wastage, a growing number of scholars have been paying attention to their abundance. Salerno et al. (2012) provided a complete mapping of glacial lakes (including lake size less than 0.001 km 2 ) and debris-covered glaciers with a 10 m spatial resolution in the Mount Everest region in 2008. Wang et al. (2013) utilized Landsat Thematic Mapper (TM) and ETM+ images for the years 1990, 2000, and 2010 to map glacial lakes with areas greater than 0.002 km 2 in the Tien Shan mountains. Luo et al. (2020) examined glacial-lake changes (lake area > 0.0036 km 2 ) for the entire western Nyainqêntanglha range for five periods between 1976 and 2018 using multitemporal Landsat images. The International Centre for Integrated Mountain Development provided comprehensive information about the glacial lakes (greater than or equal to 0.003 km 2 ) of five major river basins of the Hindu Kush Himalaya using Landsat images from 2005(Sudan et al., 2018. Nie et al. (2017) mapped the distribution of glacial lakes across the entire Himalaya in 2015 using 348 Landsat images at 30 m resolution. They set the minimum mapping unit to 0.0081 km 2 . Zhang et al. (2015) presented a database of glacial lakes larger than 0.003 km 2 in the Third Pole for the years 1990, 2000, and 2010.
All of these studies significantly help to fill the data gap relating to information about glacial lakes in the HMA region. At the global scale, Pekel et al. (2016) used millions of Landsat satellite images to record global surface water over the past 32 years at 30 m resolution, and many large and visible glacial lakes were also included. More recently, Shugar et al. (2020) mapped glacial lakes with areas > 0.05 km 2 around the world using 254 795 satellite images from 1990 to 2018. X.  developed an inventory of glacial lakes with areas greater than 0.0054 km 2 across HMA at two time points (1990 and 2018) using manual mapping with 30 m Landsat images. They were the first to introduce a glacial-lake inventory at such a large scale, and the data shared will serve as a baseline for further studies related to water resource assessment and glacier hazards.
In summary, a homogeneous, annually resolved inventory and analysis of the spatial and temporal extent of different types of glacial lakes over the entire HMA region are still lacking. In this study, we developed an HMA glacial-lake inventory (Hi-MAG) database to characterize the annual coverage of glacial lakes from 2008 to 2017 at 30 m resolution. A total of 40 481 Landsat scenes were processed using the Google Earth Engine (GEE) cloud-computing platform to delineate glacial lakes (located within 10 km of the nearest glacier terminus) larger than 9 (e.g., 3 × 3) pixels (0.0081 km 2 ) (Nie et al., 2017).
Lakes were manually classified into four categories according to their position relative to the parent glacier or their formation mechanisms (Fig. A1). Category (i) constitutes proglacial lakes, which are usually connected to the glacier tongue and dammed by glacier ice or unconsolidated or icecemented moraines (a mixture of ice, snow, rock, debris, clay, etc.). Proglacial lakes are located next to the glacier terminus and receive meltwater directly from their mother glaciers. Category (ii) constitutes supraglacial lakes, which are ponds that form in depressions on low-sloping parts of the surface of a melting glacier and are dammed by ice or the end moraine or stagnating glacier snout. Category (iii) constitutes unconnected glacial lakes, which are not currently directly connected to their parent glaciers, but they may to some extent be fed by at least one of the glaciers located in the basin. They may (but not necessarily) have recently detached from ice contact due to glacial recession. Although not directly connected with the parent glaciers, these glacial lakes are also an outcome of glacier melting in response to atmospheric warming. They can supply fresh water to major river systems of the HMA region, and their changes have F. Chen et al.: Annual 30 m dataset for glacial lakes in High Mountain Asia 743 significant scientific and socioeconomic implications (Nie et al., 2017;. Finally, category (iv) constitutes ice-marginal lakes, which are generally distributed on one side of the glacier tongue, meaning that the lake is dammed by the glacier ice on this side, while on the other side, it is bounded by a lateral moraine. With the increase in atmospheric warming and accelerated melting of glaciers, some glacier tributaries gradually detach from a main trunk glacier. These detachment locations, where glacier melting has been particularly intense, are in some cases also likely to form icemarginal lakes. We note that such ice-marginal lakes are very common in some parts of the world (e.g., Alaska) but are not common in HMA (Armstrong and Anderson, 2020;Capps et al., 2011). Additionally, purely glacier-dammed lakes are formed by the advance of glaciers and dammed by almost pure glacier ice. Although the dam composition and structure are slightly different between proglacial lakes and glacierdammed lakes because they are all located in the front of the glacier tongue and driven by the mother glacier, in the process of appending attributed information to each glacial lake, glacier-dammed lakes were merged into the proglacial-lake category.
Every lake was cross-checked manually for its boundary and attribution. We defined an uncertainty of 1 pixel for the detected glacial-lake boundaries and calculated the error in the lake area for the whole HMA region. We also assessed the inventory for climatic and geomorphological influences on lake distribution across HMA.

Study area
The HMA region refers to a broad high-altitude region in South and Central Asia that covers the whole Tibetan Plateau and adjacent mountain ranges, including the eastern Hindu Kush, western Himalaya, eastern Himalaya, central Himalaya, Karakoram, western Pamir, Pamir-Alay, northern and western Tien Shan, Dzhungarsky Alatau, western Kunlun Shan, Nyainqêntanglha, Gangdise Mountains, Hengduan Shan, Tibetan interior mountains, Tanggula Shan, eastern Tibetan mountains, Qilian Shan, eastern Kunlun Shan, Altun Shan, eastern Tien Shan, central Tien Shan, and eastern Pamir (Figs. 1 and 6a). It extends from 26 to 45 • N and from 67 to 105 • E, and the altitude of the plateau is about 4500 m on average (Baumann et al., 2009). It is made up of alternating mountains, valleys, and rivers, and the terrain is fragmented, showing a decreasing terrain from northwest to southeast. The HMA region has a series of east-west mountain ranges that occupy most of the area. Among these, Tanggula Shan lies in the central part of HMA, with an altitude of over 6000 m. The heights of the 15 highest mountains in the Himalayas are greater than 8000 m, while the peaks of the mountains in the northern plateau are greater than 6500 m. The north-south mountain ranges are mainly distributed in the southeast of the plateau and near the Hengduan Mountains area. These two groups of mountains constitute the geomorphic framework and control the basic pattern of the plateau landform. Continuous and discontinuous permafrost has developed on the higher land and north-facing slopes.
The HMA region is the source of several of Asia's major rivers, including the Yellow, Yangtze, Indus, Ganges, Brahmaputra, Irrawaddy, Salween, and Mekong. They play a crucial role in downstream hydrology and water availability in Asia (Immerzeel and Bierkens, 2010). Most glaciers in the Tibetan Plateau are retreating, except for the western Kunlun (Neckel et al., 2014;Kääb et al., 2015) and the Karakoram, where a slight mass gain is occurring (Bolch et al., 2012;Gardner et al., 2013). Moreover, glaciers in different mountain ranges show contrasting patterns. Local factors (e.g., exposure, topography, and debris coverage) may partly account for these differences, but the spatial and temporal heterogeneity of both the climate and degree of climate change may be the main reason. Glacial lakes are formed and develop temporally with the retreat or thinning of glaciers and are directly or indirectly fed by glacier meltwater. They are located within 10 km of the nearest glacier terminus (Wang et al., 2013;Zhang et al., 2015).
The HMA climate is under the combined and competing influences of the East Asian and South Asian monsoons and of the westerlies (Schiemann et al., 2009). This unique geographical position produces an azonal plateau climate characterized by strong solar radiation, low air temperatures, large daily temperature variations, and small differences between annual mean temperatures (Yao et al., 2012). The annual mean temperature is 1.6 • C, with the lowest temperature of −1 to −7 • C occurring in January and the highest temperature of 7 to 15 • C occurring in July. The cumulative annual precipitation is about 413.6 mm.

Dataset
A total of 40 481 satellite images, including Landsat 5 TM imagery during 2008-2011, Landsat 7 ETM+ imagery in 2012, and Landsat 8 Operational Land Imager (OLI) during the period 2013-2017, were available in GEE and were used to produce the annual glacial-lake maps over the entire HMA (Fig. 2). Here, when Landsat 5 or 8 data were available, Landsat 7 ETM+ imagery with Scan Line Corrector (SLC)-off gaps was generally excluded due to the artifacts induced by the slatted appearance of the original images, but these were exclusively used for the glacial-lake mapping in 2012 since no other Landsat data were acquired that year. For the years before 2008, all the available Landsat 5 TM data in each year (e.g., 2004, 2005, 2006, and 2007) do not fully cover the HMA region.
The SLC-off condition of Landsat ETM+ introduces artifacts because the slatted appearance of the original images is occasionally carried into the glacial-lake map in 2012. Techniques to fill the SLC-off gaps exist, but these create artificial values that will result in false detections of water (Chen et al., 2011). Considering the strong spatial and temporal variability in glacial lakes such as supraglacial lakes, techniques that merge data from one or more SLC-off fill scenes for generation of a gap-free image require careful use, even when using the thousands of Landsat ETM+ images. It is noted that water mapping using time series images at large scales usually avoids the use of such techniques (Mueller et al., 2016). Therefore, Landsat 7 ETM+ data with an intensive slatted appearance are not suitable for the classification of numerous glacial lakes. In this study, because the only useable data source for 2012 was Landsat 7 ETM+, to ensure continuity of annual data from 2008 to 2017, we applied our best efforts to manual extraction of the glacial lakes from the 2012 ETM+ images with the highest possible accuracy.

Satellite imagery selection strategy
One effective solution to reduce the influence of seasonal lake fluctuations on the mapping is to map glacial lakes and measure their long-term changes during stable seasons, when the lake extents are minimally affected by meteorological conditions and glacier runoff. Here, based on analyses of the mapping times of glacial lakes in different regions, the selected time series of Landsat data were generally from July to November. During this period of each year, the Landsat imagery featured lower perennial snow coverage. Following glacier runoff and precipitation, the area of a glacial lake is large, and changes in this area will be small (Nie et al., 2017;Zhang et al., 2015). These lakes may also reach their maximum extent around the end of the glacier ablation season (June to August) (Gardelle et al., 2013;Liu et al., 2014), except in the central and eastern Himalaya, where peak ablation extends into post-monsoon September and October. In monsoon-affected areas such as Nepal and Bhutan, monsoon cloud cover from July to mid-September means that clear-sky images can mostly only be obtained from late September to November. The southeastern Tibet region is problematic not only because the observation season is short but also as a result of abundant cloud cover, which is formed by the warm and humid airflow raised by the topog- raphy Umesh et al., 2018;Qiao et al., 2016).
As the most highly variable glacial lakes in the study area, supraglacial lakes change preferentially in the year, showing an increase in area during the pre-monsoon and rising to their peak area in the early monsoon (June to July) (Miles et al., 2017a, b). Although the selected image seasons are slightly different due to the meteorological conditions in different regions, they all comply with the same criterion that the lakes were in clear-sky images having little snow coverage. This ensured the initial reliability of the mapping of glacial lakes through the GEE cloud-computing platform. If no valid observations could be obtained, then the optimal mapping time needed to be broadened during the whole year.
To further increase data availability and also as the basis for data selection in the periods beyond the optimum mapping time, we set two criteria for the selection of imagery with valid observations over the potential glacial-lake area by using the cloud-score functions in GEE, including (i) cloud cover being less than 20 % in the 10 km buffer around each glacier outline of a Landsat scene or (ii) less than 20 % cloud cover for the entire scene. The cloud-score functions in GEE may have significant difficulty in detecting clouds in mountain headwaters with high snow and ice cover, where large amounts of snow and ice are likely to be identified as clouds. However, in this study, it was considered better to use much stricter criteria to filter out a larger number of images with lots of clouds or objects that look like clouds (snow or ice) to finally select only images with good observations.

Extraction of glacial-lake outlines
For the development of the Hi-MAG database, we applied a systematic glacial-lake detection method that comprised two steps: initial glacial-lake extraction and subsequent manual refinement of these lake-mapping results. The main procedures for glacial-lake mapping using Landsat data, as shown in Fig. 3, are as follows. (i) The Landsat top-of-atmosphere 746 F. Chen et al.: Annual 30 m dataset for glacial lakes in High Mountain Asia data were clipped according to the extent of the glacier buffers and assembled into a time series dataset. (ii) Poorquality observations were identified. These included areas affected by clouds, cloud shadow, topographic shadow, and SLC-off gaps. Here, we used the Fmask routine (Zhu and Woodcock, 2012) to detect the clouds and cloud shadows in the imagery. Fmask has the advantage of being able to process a large number of images in a computationally efficient way. Topographic shadows are located in the areas where the sunlight is blocked. Generally, on the dark side of high mountains, the surface gradients are great, and the terrain reliefs are small. Therefore, topographic shadows were masked using the slopes (larger than 10 • ) and shaded relief values (less than 0.25) calculated from Shuttle Radar Topography Mission (SRTM) data (Li and Sheng, 2012;Quincey et al., 2007). This removes a considerable number of mountain shadows that have a similar spectral reflectance as water bodies. However, the SRTM digital elevation model (DEM) was generated in 2000, which is different from the acquisition time of the Landsat images used for the glacial-lake mapping in this study. The derived slopes and shaded relief cannot therefore fully represent the conditions on the date a given Landsat scene was acquired. As a consequence, some lakes that have grown at steep glacier tongues may be masked, and some mountain shadows that interfere with the mapping results of glacial lakes from GEE still remain, leading to the fact that glacial lakes in steep areas are omitted, and residual shadows are misclassified as glacial lakes. As for the SLC-off gaps in the ETM+ images, lakes outside the gaps will be accurately classified, but if they are covered by gaps, then they will be misclassified. Errors caused by striped gaps in Landsat ETM+ data were manually corrected using additional high-quality scenes across the whole year with the assistance of images from adjacent years. (iii) The modified normalized difference water index (MNDWI) was calculated (Xu, 2006). (iv) The potential glacial-lake areas were extracted by applying an adaptive MNDWI threshold (Li and Sheng, 2012). The minimum number of water pixels used to define a glacial lake in an image is inconsistent in different studies. For example, Zhang et al. (2015) set the smallest detectable glacial lakes in the Third Pole as being larger than 0.0027 km 2 (3 connected pixels) using the Landsat TM and ETM+ data. Nie et al. (2017) selected 0.0081 km 2 (9 connected pixels) as the minimum mapping unit to map glacial lakes in the Himalayas. Other studies have set the minimum threshold areas as 0.001 km 2 (Salerno et al., 2012), 0.002 km 2 (Wang et al., 2013), 0.0036 km 2 (Luo et al., 2020), 0.0054 km 2 (X. , or 0.01 km 2 (Li et al., 2020). A smaller minimum mapping unit will detect more glacial lakes. However, the uncertainty this brings will also be larger than using a larger threshold at the same resolution (Salerno et al., 2012). Our results demonstrate that a lake area covering fewer than 9 water pixels will have an area error of greater than 50 % (see Sect. 4). Given the uncertainty in the areas of glacial lakes and the spatial resolution of Landsat data, in this study, glacial lakes larger than 9 pixels (≥ 0.0081 km 2 ) were considered to be the minimum mapping unit. (v) Manual inspection and refinement of individual glacial lakes were conducted, and the related attributions were added for each lake.
Based on the automated processing, nearly 60 % of glacial lakes in each year can be correctly classified. Of the other lakes that were not properly classified, 30 % were missed, and 10 % were misclassified. For such a large-scale area that is characterized by various and complex climatic, geological, and terrain conditions, this classification method is simple but effective. The results are also reasonable since they provide very low commission errors. To ensure the quality of the inventory, strict quality control was conducted to visually inspect and correct the mapping errors after the automated processing using GEE. False lake features, mainly identified as mountain shadows and river segments, were manually removed by overlapping mapped lake shorelines in the source Landsat imagery and higher-resolution imagery in Google Earth. Some glacial lakes may be covered by ice and clouds for years, grow at steep glacier tongues, or show heterogeneous reflectance with the surrounding backgrounds. For these missing glacial lakes, their boundaries were edited further using ArcGIS. Furthermore, a crosscheck and modification were conducted for each glacial lake based on the lake-mapping results in conjunction with multitemporal Landsat imagery. Here, all the Landsat imagery that was used for the inspection was downloaded manually from the United States Geological Survey (USGS) Earth Explorer website (https://earthexplorer.usgs.gov/, last access: 30 September 2020). The outputs for each lake polygon include information about the lake type, elevation, Euclidian distance to the nearest glacier terminus, area, and perimeter. Note that if there was more than one suitable satellite image in a year, the image with the lowest cloud cover was selected for the calculation of the area and perimeter of a given lake. Each mountain range was characterized individually by utilizing the mountain boundary shapefile in HMA (http://geo.uzh.ch/~tbolch/data/regions_hma_v03.zip, last access: 4 December 2020).

Yearly lake-area-change calculations
Based on the final generated lake inventory data, we used the slope of a linear regression of the lake area (over the grid cells of 1 • × 1 • ) versus mapping year to qualify the yearly changes in lake area during the study period. The change analysis used the Theil-Sen estimator, which chooses the median slope of all the derived fitted lines and can effectively represent long-term area changes due to its robustness for trend detection and its insensitivity to outliers. It is also useful for the elimination of effects arising from differences in sensor performance for the mapping of glacial lakes (Sen, 1968;Song et al., 2018).
Although all the lakes were manually checked and edited, due to the limitation of available images and other factors, the conditions for glacial-lake mapping were not perfectly consistent for each year. For example, the image dates were not consistent across the whole HMA region because of atmospheric disturbances, and there were also influences from varying lake characteristics, image quality (Bhardwaj et al., 2015;Thompson et al., 2012), ice, and shadows that obscured the lakes, which all contributed to detection errors in the lake extent and their annual variation. Generally, these errors were objective and acceptable as a result of the nature of the limited remote-sensing data. For this study, because we used time series data covering a period of 10 years for the estimation of annual changes in lake area and also because the errors only account for a small proportion of the total glacial-lake area for each year, the errors in the observed lake area caused by these different effects do not appear to affect the trends in the statistical results. In addition to the Theil-Sen estimator, a Mann-Kendall trend test was used to detect and further confirm the statistical confidence of the linear regression results, and all the estimated trends were found to fall within the 90 % confidence intervals. The upper and lower change estimates satisfying the 90 % confidence interval for the slope were also derived over the whole HMA region (Fig. A2).

Cross-validation and uncertainty estimates
Accuracy assessment of the mapping results is difficult due to the lack of field measurements of glacial lakes in continentalscale areas such as HMA. To obtain quality-controlled data, the glacial-lake vectors over the entire HMA for the years from 2008 to 2017 were rechecked and re-edited individually through dynamic cross-validation by 10 trained experts. This was a time-consuming process but was essential for maximizing the quality of the data.
A key factor influencing the estimation of the uncertainty in the glacial-lake area measurements is the spatial resolution of the satellite data. In this study, the uncertainty in the glacial-lake area was estimated as an error of ±1 pixel on either side of the delineated lake boundary. The percentage error in the area determinations, A er , is then proportional to the sensor resolution and is given by (Krumwiede et al., 2014) where n is the number of pixels on the boundary of a glacial lake, as approximated by the ratio of the perimeter length and the spatial resolution; m is the area of a pixel in the Landsat image (m 2 ); A gl is the lake area (m 2 ); and the factor 100 converts the value to a percentage.
Assuming an uncertainty of 1 pixel for the detected glacial-lake boundaries, we calculated the systematic errors for the whole HMA region, and the results are shown in Fig. 4. For the years between 2008 and 2017, the area uncertainty in each glacial lake generally ranged from 0.30 % to 50 %, with the mean value falling around 17 % and the standard deviation around 11 % (Fig. 4a). The maximum and mean values of area uncertainty for the glacial lakes in 2010 were the lowest, while for 2016, the corresponding statistics were the highest. This can be attributed to a number of different factors. The maximum in the area uncertainty in glacial lakes is related to the shape and size of a certain lake, as can be seen from Eq. (1). However, its mean value is equal to the sum of the area uncertainties in each glacial lake divided by the total number, which depends on the total number of glacial lakes in a given year as well as the shape and area of each lake. Furthermore, a close relationship can be found between the area uncertainties and sizes of the glacial lakes (Fig. 4b). Most of the large glacial lakes (area ≥ 0.04 km 2 ) have a mean area uncertainty of about 7 %. These systematic errors were more significant for the small-sized glacial lakes. We measured glacial lakes down to 0.0081 km 2 (9 pixels in Landsat imagery), where systematic errors calculated by Eq. (1) were ∼ 50 %.

Distribution of various types and sizes of glacial lakes
The area coverage of glacial lakes increased by 90.14 km 2 in the period 2008-2017, a 6.90 % increase relative to 2008 (1305.59 ± 213.99 km 2 ) (Fig. 5a). A Theil-Sen regression fit to all the data showed a mean expansion rate of 12 km 2 a −1 for the 10-year record, as shown in Fig. 5a. Meanwhile, the estimated changes in glacial-lake number from 2008 (12 593 lakes) to 2017 (15 348 lakes) showed an average increase of 306 lakes a −1 . The steeper percentage increase in the number  of lakes (22.33 %) compared to the slower expansion of their area (8.79 %) based on their linear-fit trends shows that many small glacial lakes formed over this period. The number of lakes increased most rapidly in areas beyond 4400 m above sea level (a.s.l.), especially beyond 5300 m (Fig. 5b). The increase in proglacial lakes was concentrated above 4900 m (Fig. 5c). Unconnected glacial lakes grew very slightly in total area below 4400 m (Fig. 5d) but increased notably more at higher elevations. Glaciers are retreating and thinning at ever-higher elevations (Nie et al., 2017), causing the formation of new supraglacial lakes at high elevations, the expansion of existing ice-contact lakes, and the detachment of glaciers from some lakes. Annual changes in glacial lakes were further analyzed spatially using a 1 • × 1 • grid over 22 mountain regions (Fig. 6a) using Theil-Sen regression analysis. An analysis of the mountain-wide lake area loss or gain from 2008 to 2017 was conducted (Table A1). Negative or undiscernible changes in glacial-lake area were observed in the eastern Tien Shan, eastern Hindu Kush, Hengduan Shan, and eastern Tibetan mountains (Fig. 6b), thus reducing the otherwise overall increasing glacial-lake area in HMA. The eastern Hindu Kush lost 2.8 km 2 of lake area (Table A1) A3) and rapid growth, +0.94 km 2 a −1 , in lake area due to retreat and thinning of debris-covered glaciers . Moderate area gains occurred along most of the western Kunlun and Tanggula Shan, e.g., +0.38 km 2 a −1 in Tanggula Shan. The areas of glacial lakes in Pamir-Alay, eastern Pamir, and eastern Kunlun Shan were spatially and temporally invariant across the whole observation record.
We found that glacial lakes exhibited different expansion trends for different lake types, and supraglacial and icemarginal lakes have relatively few coverage areas comparing with proglacial and unconnected lakes (Fig. 6b, c). In the Nyainqêntanglha and central Himalaya, around half of the glacial-lake area consisted of proglacial lakes, where most growth occurred. In the negative-lake-growth (shrinkage) regions of the eastern Tibetan mountains and Hengduan Shan, unconnected glacial lakes were dominantly occupied. As the interaction with a glacier gradually weakens, part of the water source supplied by that glacier is reduced, and when combined with the effects from atmospheric warming and a decrease in precipitation, this means that regions mainly consisting of unconnected glacial lakes show a trend of decreasing area. Proglacial lakes contributed approximately 62.87 % (56.67 km 2 ) to the total area increase over HMA (Tables A1 and A2). Proglacial lakes in the cen-tral Himalaya, eastern Himalaya, and western Himalaya accounted for 36.27 % (32.70 km 2 ) of the total area increase. In general, proglacial lakes are the main contributor to recent lake evolution in HMA.
We also noted that the large area growth of lakes occurred in areas with a relatively large proportion of small glacial lakes, and this was mainly due to the rapid growth of existing lakes and the formation of new lakes (Fig. 6d). For example, in some areas of the central and eastern Himalaya and Nyainqêntanglha that have large annual increases in lake area (greater than 0.23 km 2 a −1 ), glacial lakes with a size of less than 0.16 km 2 occupied more than 30 % of the total area (Table A3). In particular, in Nyainqêntanglha, the area of small glacial lakes (≤ 0.16 km 2 ) accounted for 69.47 % of the total area.

Influencing factors of current distribution of glacial lakes
To explore factors that have potentially influenced the glacial-lake distribution across HMA, we focus on proglacial and supraglacial lakes, for which the changes are closely related to glaciers, and expansion is most rapid. Proglacial lakes frequently develop from the enlargement and coalescence of one or more supraglacial lakes (Thakuri et al., 2016;Umesh et al., 2018). Proglacial-and supraglacial-lake development from 2008 to 2017 is significantly correlated with initial lake area in 2008 (R = 0.82; Table A4); larger icecontact proglacial lakes imply a larger water body in contact with the calving front of the glacier and more rapid retreat (Truffer and Motyka, 2016;King et al., 2019). For the years before 2008, the year-round Landsat 5 TM data in many years do not fully cover the HMA region. In this study, we constructed the inventory over a 10-year time period. This is shorter than typical glacier response times, which start from a minimum of 10 years for short, steep glaciers to over 150 years for long, debris-covered glaciers (Scherler et al., 2011). Hence, lake expansion is not expected to be coupled with short-term climate trends, particularly for debris-covered glaciers (Umesh et al., 2018). In the inclusion of mass balance forcing of glacial-lake changes, the same questions about response times also occur. Hence, rather than focus on the short-term evolution of lake expansion, we investigated whether the climate and other factors have influenced the overall distribution of lake area, as observed in 2017.
To investigate the factors influencing the predominance of proglacial and supraglacial lakes, geomorphic, topographic, and climate parameters were correlated with lake area over a 1 • × 1 • grid, and these were aggregated (taking the mean or sum) for HMA regions. A statistically significant positive correlation exists between lake area and debris-covered glacial area (after Scherler et al., 2018) across HMA (R = 0.36; Table A4), confirming the predominance of proglacial and supraglacial lakes forming on debris-covered glacier tongues (Nie et al., 2017). Correlations and significance levels strengthen if the Karakoram is excluded (Table A5). The Karakoram is known as an anomaly of positive glacier mass balances and glacier advances (Gardelle et al., 2012) and also has an anomalously small area of proglacial lakes. Glacier length (RGI-Consortium, 2017) and debris cover are strongly correlated (R = 0.85; Table A4), reflecting abundant debris on most large, low-gradient valley glacier tongues in HMA; in turn, there is a statistically significant direct correlation between glacier length and lake area (R = 0.32; Table A4) as these tongues provide the ideal conditions for the coalescence of supraglacial ponds and formation of large proglacial lakes (Nie et al., 2017;Richardson and Reynolds, 2000). Glaciers are generally longest and most heavily debriscovered in the Hindu Kush Himalaya region ( Fig. 7a and b).
Some regions have comparable numbers of large debriscovered glaciers but substantial differences in total lake area and area growth rates (for example, the central Himalaya compared to central Tian Shan or western Pamir; Table A5).
Regional differences in multi-decadal climate trends could play a role in this observation, with Nyainqêntanglha and the central and eastern Himalayan regions all being characterized by rapid warming and decreased precipitation since 1979 ( Fig. 7c and d), favoring negative glacial mass balances (Brun et al., 2017). This plausibly explains why the lake area is typically larger in these regions relative to adjacent regions further to the west and north (e.g., the western Himalaya) despite often similar glacier characteristics (in terms of debris cover and glacier length) ( Fig. 7e and f). Furthermore, there is very little debris-covered area but rapid warming in the eastern Himalaya, where proglacial lakes are abundant (Fig. 7f). These results emphasize that the distribution of supraglacial and proglacial lakes across HMA is primarily associated with the presence of large debris-covered glaciers, but regional variability in warming and precipitation trends over the past few decades has likely also had some influence (Shugar and Clague, 2011;Zhao et al., 2019;Umesh et al., 2018;Scherler et al., 2018). These results are consistent with previous findings at regional scales, which have demonstrated a rapid expansion of proglacial lakes on debris-covered glaciers, with expansion in the upstream direction demonstrated to occur primarily through a process of subsidence at the lake-contact debris-covered glacier tongue (Harrison et al., 2018;Song et al., , 2017a.

Comparison with other lake datasets
We compared our dataset with that of X.  for the closest period (2017 from the Hi-MAG database and 2018 from X.  over the spatial extent of our HMA region. The differences in the total number and area of lakes between these two datasets are 6206 and 223.97 km 2 , respectively. We also found that 2077 glacial lakes with a total area of 178.77 km 2 in our Hi-MAG dataset were not detected by X. . The main reasons for the missed glacial lakes in Hi-MAG are the interference of some bad observations (cloud or snow), glacial lakes that have dried up or burst out, and glacial lakes that were located in the middle of the river.
To test the spatial correlation of the distributions of the glacial lakes in the two datasets, we compared the numbers of glacial lakes and their areas aggregated on a 0.1 • × 0.1 • grid for the HMA regions. The results for the total glaciallake area, areas of glacial lakes larger than 0.04 km 2 , and the number of glacial lakes larger than 0.04 km 2 are shown in Fig. A4. A clear and strong correlation can be observed for all the statistics between the Hi-MAG dataset and the glaciallake data of X. . Most of the points are distributed around the 1 : 1 line, which shows that there is great consistency between the two sets of data.
To quantitatively and systematically evaluate the accuracy of our data, we implemented stratified random sampling (Song et al., 2017b;Stehman, 2012), in which the glacial lakes were divided into four strata. The sample sizes were the spatial resolution (30 m) of the data, and the strata were designed such that C0W0 indicates that both the results are non-glacial lakes, C0W1 indicates a non-glacial lake in the present data and a glacial lake in X. Wang et al.'s (2020) data, C1W0 indicates a glacial lake in the present data and a non-glacial lake in X. Wang et al.'s (2020) data, and C1W1 indicates that both results are glacial lakes.
A total of 4000 points were randomly selected, as shown in Fig. A5. The number of samples for C1W1 and C1W0 were 1300 and 700, respectively, and these numbers have almost the same ratio as that between the total areas for the two strata (1450.50 vs. 732.77 km 2 ). Because of the approximate total area with C1W0, we also randomly selected 700 samples from stratum C0W1. The remaining 1300 samples were from C0W0. Every validation sample was visually examined using Landsat imagery and higher-resolution imagery in Google Earth. Sample pixels were interpreted by a glacial-lake-mapping expert, and ambiguous samples were cross-validated by a second observer. If a sample was difficult to interpret, it was marked as ambiguous and excluded from the accuracy assessment. The sample number estimates were produced for each of the four strata (Table A6), and these strata totals were then summed to obtain the total accuracy.
For the 1300 samples that were considered by both datasets to be non-glacial lakes, after the pixel-by-pixel verification, 1215 were found to indeed be non-glacial lakes, while 37 were missed glacial lakes. In contrast, 1260 out of the 1300 samples belonged to the class of glacial lakes, and 25 were misclassified as glacial lakes by both inventories. A total of 307 error pixels were found in the results of X. , constituting about half of the total validation number. For the glacial lakes identified only by our inventory, 678 out of 700 were correctly classified. Our results yielded high overall classification accuracy (88 %), user's accuracy (97 %), and producer's accuracy (82 %) for glacial-lake classification using Landsat data.
The Hi-MAG dataset was also compared with other Landsat-based lake inventories (Nie et al., 2017;Pekel et al., 2016;Zhang et al., 2015). The number of lakes in Hi-MAG was found to be 7268 higher, and the area was 644.26 km 2 greater than the estimation for the Tibetan Plateau . The largest discrepancies were found in the Gangdise, Himalaya, and Nyainqêntanglha mountains in 2010. Across the Himalaya region, we found 476.09 km 2 of glacial lakes, 4.57 % more than previous estimates in 2015 (Nie et al., 2017). In addition, we qualitatively compared the lake extents between the publicly available high-resolution Global Surface Water (GSW) dataset (Pekel et al., 2016) and our Hi-MAG database summed by mountain range in 2015. For the sake of a reliable comparative analysis, lake polygons in the Hi-MAG dataset were converted into a grid format, and glacial lakes in the GSW were further extracted using the range of glacier buffer (10 km). Hi-MAG detected more glacial lakes in the Himalaya region, eastern Hindu Kush, and Tien Shan and fewer in eastern Pamir and western Kunlun Shan. Figure A6 illustrates the differences between our Hi-MAG glacial-lake results and the GSW-derived lake area for the whole HMA region.
The glacial-lake area observed in our lake dataset in the eastern Pamir and western Kunlun Mountains does not conform to the mapped surface water in the GSW for these sub-regions. While there are numerous glacial lakes from an open-water perspective, actually part of them are river segments. Additionally, the Himalaya, eastern Hindu Kush, and some other Tien Shan areas host thousands of glacial lakes that are not readily observable in the GSW dataset. Large discrepancies in mountainous glacial-lake estimates preclude a significant consistency between the GSW and our Hi-MAG lake data over the HMA region. The region with the highest consistency between GSW and Hi-MAG product is interior Tibet. There is little agreement for Tien Shan, where  Table A4 for details on data sources. the weather is rainy and snowy in the region above 3000 m, and large quantities of ancient glacial deposits have accumulated. Here, glacial lakes are characterized by small sizes, and due to the influence of their source glaciers and lake beds as well as the water depth and sediment inflow, glacial lakes appear to have heterogeneous reflectance in the images. Errors could exist in datasets produced by automated classification, but, as noted, we also conducted detailed manual editing, so we were not relying exclusively on automatic classification. The Karakoram region seems to have fewer glacial lakes in our estimate, owing to the overestimation of surface water on debris-covered glaciers in the GSW dataset.
The low agreement between our Hi-MAG glacial-lake data and the GSW data is mainly due to their lack of systematic glacial-lake inventory and mapping capabilities. The lake dynamics and differing climate contexts within HMA may also lead to inconsistencies between the sub-regions. Hi-MAG might have made better use of the optimum satellite imaging season to map glacial lakes, potentially resulting in more complete mapping by avoiding conditions such as periods of lake ice that may confound mapping.

Known issues and planned improvements
There are several important issues and limitations to the datasets produced and the methods used within this study that are important to highlight to potential users. (i) Bodies of water smaller than 9 connected pixels (e.g., 1×9 or 3×3 pixels, corresponding to 30 × 270 m or 90 × 90 m, respectively) and those obscured by frozen water surface or loose moraines or hidden by terrain shadows were not included. Broken floating ice or isolated moraines that stood in open water for some time were mapped. Supraglacial lakes such as melt ponds developed on the surface of glaciers present particular challenges because of their small size and highly dynamic properties. Most supraglacial lakes are transient or seasonal or at least fluctuate seasonally as they commonly drain and may refill. In fact, this short-duration seasonal water is in general more likely to be underestimated because of temporal discontinuities in the archive and gaps caused by persistent cloud cover. (ii) The spatial and temporal information reported in the Landsat dataset used in this study complements that acquired in the past. Nevertheless, the biggest limitations to glacial-lake mapping from these data are undoubtedly the geographic and temporal discontinuities of the Landsat archive itself. Historical data over the entire HMA before 2008 can be recovered partly from the Landsat 4 TM or Multispectral Scanner (MSS), Landsat 5 TM, and Landsat 7 ETM+ and partly from SPOT, other satellite systems, etc., although data access is not always at the full, free, and open level of Landsat. In this regard, the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) is freely accessible and has a higher resolution than Landsat, but its temporal coverage is very limited in most of HMA. Other Landsatlike moderate-resolution multi-spectral sources could also be used to improve and extend the temporal sampling. For example, the European Space Agency's Sentinel 2a satellite launched in 2015 and provides optical imagery at 10 m resolution (N. Yu et al., 2020), which will benefit future research combing all available satellite observations with GEE cloud-computing power and would make long-term monitoring of changes to HMA's glacial lakes and inland waters possible.

Data availability
The Hi-MAG database is distributed under the Creative Commons Attribution 4.0 License. The data can be downloaded from the data repository Zenodo at https://doi.org/10.5281/zenodo.4275164 .

Conclusions
In conclusion, the Hi-MAG dataset and others have used Earth observation satellite data, especially Landsat imagery, to provide a more consistent delineation of large-scale glacial-lake changes. Some glacial-lake mapping methods have enabled local-scale area estimation or spatial representation of lake extent and change. Such methods result in relatively good performance for lake areas that remain clear and show homogeneous reflectance in the image but do not allow for continental-scale mapping of glacial lakes that have spectral interference from other objects such as glaciers, snow, clouds, turbidity, and the sedimentation characteristics of the glacial lake itself or the atmospheric interference and terrain effects. Automated methods for the extraction of glacial lakes over large-scale areas have been further developed in this work. However, visual interpretation and manual editing are still effective ways to ensure high accuracy of lake inventories and append attributed information for further analysis. Based on an error of ±1 pixel on the lake boundary, the area uncertainty in each glacial lake ranges from 0.30 % to 50 % for the years between 2008 and 2017, and there is a mean area uncertainty of 17 % in the entire HMA region.
Mapping of glacial lakes across the Tibetan Plateau and adjoining ranges reveals a complex pattern of lake occurrence and growth or shrinkage. During the past 10 years, 2755 glacial lakes with a total area of 90.14 km 2 were increased in the HMA region. Proglacial lakes contributed 62.87 % of that increase. We found that most areas in HMA have experienced rapid expansions; the central and eastern Himalaya and central Tien Shan showed the most lake area increases (up to +0.94 km 2 a −1 ). Negative area changes were observed in the eastern Tien Shan, eastern Hindu Kush, Hengduan Shan, and eastern Tibetan mountains. The number of lakes grew very rapidly above 4400 m a.s.l., and proglacial-lake growth is proceeding at high elevations of above 4900 m, but glacier retreat and lake disconnections are also starting to occur at higher elevations, causing the number and area of both classes to increase. At low elevations, few glaciers remain where proglacial lakes can form, and already-detached lakes lack growth mechanisms. Overall, continued growth of glacial lakes can be expected, particularly where large debris-covered tongues remain.
The freely downloadable, detailed Hi-MAG dataset can also be used in future studies to provide a sound and consistent basis on which to quantify critical relationships and processes in HMA, including glacier-climate-lake interactions, glacio-hydrologic models, GLOFs and potential downstream risks, and water resources.      Table A1.

Area
No.

Area
No.

Area
No.

Area
No.

Area
No.

Area
No.

Area
No.

Area
No.

Area
No.

Area
No.      1.00 a Glacier data are derived from the Randolph Glacier Inventory (RGI Consortium, 2017), except for debris cover (after Scherler et al., 2018). Climate data are for ERA-Interim. b ERA-Interim near-surface temperature and precipitation fields for the period 1979-2017 were obtained from the Koninklijk Nederlands Meteorologisch Instituut (KNMI) climate explorer (https://climexp.knmi.nl, last access: 30 September 2020). Table A5. Regional summary of key topographic, geomorphic, and climatological parameters compared to proglacial-and supraglacial-lake area in 2017. Correlation coefficients are bold where p < 0.05; * indicates p < 0.01.

Region
Total area Lake Glacier (