Remote sensing of lake water volumes on the Arctic Coastal Plain of Northern Alaska

. The Pleistocene Sand Sea on the Arctic Coastal Plain (ACP) of northern Alaska is underlain by an ancient sand dune field, a geological feature that affects regional lake characteristics. Many of these lakes, which cover approximately 20% of the Pleistocene Sand Sea, are relatively deep (up to 25 m). In addition to the natural importance of ACP Sand Sea lakes for water storage, energy balance, and ecological habitat, the need for winter 20 water for industrial development and exploration activities makes lakes in this region a valuable resource. However, ACP Sand Sea lakes have received little prior study. Here, we use in situ bathymetric data to test 12 model variants for predicting Sand Sea lake depth based on analysis of Landast-8 Operational Land Imager (OLI) images. Lake depth gradients were measured at 17 lakes in mid-summer 2017 using a HumminBird 798ci HD SI Combo automatic sonar system. The field measured data points were compared to Red-Green-Blue (RGB) bands of a 25 Landsat-8 OLI image acquired on 8 August 2016 to select and calibrate the most accurate spectral-depth model for each study lake and estimate bathymetry. Exponential functions using a simple band ratio (with bands selected based on lake turbidity and bed substrate) yielded the most successful model variants. For each lake, the most accurate model explained 81.8% of the variation in depth, on average. Modeled lake bathymetries were integrated with remotely sensed lake surface area to quantify lake water storage volumes, which ranged from 1.056 ×10 -3 to 57.416 30 ×10 -3 km 3 . Due to variation in depth maxima, substrate, and turbidity between lakes, a regional model is currently infeasible, rendering necessary the acquisition of additional in situ data with which to develop a regional model solution. Estimating lake water volumes using remote sensing will facilitate better management of expanding development activities and serve as a baseline by which to evaluate future responses to ongoing and rapid climate change in the Arctic. All sonar depth data and modeled lake bathymetry rasters can be freely accessed at and Arp, 2018) and (Simpson, 2019), respectively.

1 typically reach maximum depths between 1 to 3 m (Hinkel et al., 2012), an anomalous group of lakes on the ACP approach depths up to approximately 25 m. 45 Deep lakes that are the focus of this study are located on the Pleistocene Sand Sea (Fig. 1), a distinctive region of the ACP named for its foundational Pleistocene-aged sand sheet and sand dunes (Carter, 1981;Williams, 1983;Williams et al., 1978). Located west of the Colville River, this region spans approximately 15,000 km 2 and contains over 16,000 lakes (Jorgenson et al., 2014). The underlying dune field impacts the regional lithology and lake morphology. Lakes here are nestled between the crests of sand dunes and display a form distinct from that of 50 lakes across the rest of Alaska's North Slope (Hinkel et al., 2005;Jorgenson and Shur, 2007). Deep central basins and wide, shallow littoral shelves surrounded by bluffs distinguish Sand Sea lakes from lakes that have formed in ice-rich permafrost terrain. Studies by Livingstone (1954), Rex (1961), Carson and Hussey, (1962), and Carson (1968) assert that the bluffs around lakes erode by winds, which carry sand from the bluff faces into the lakes, forming characteristic sandy littoral shelves. These shelves only reach depths of up to three m, whereas the central 55 basins of such lakes can reach depths over eight times that. Due to this striking depth contrast, the distinction between littoral shelves and central basins is apparent in satellite imagery of most lakes in the area (given low-wind and ice-free conditions). Understanding the geological context and morphology of Sand Sea lakes is important when interpreting their spectral signatures in remotely sensed imagery.
In this study, we (1) present a dataset to help fill the gap concerning lake depth -particularly deep lake 60 depth -measurements in Arctic regions, (2) leverage the dataset to tune linear spectral-depth models at individual lakes for the purpose of lake-wide bathymetry mapping, and (3) integrate the modeled depths across each lake to quantify water volumes. Finally, we assess spectral-depth similarity in lakes across the Sand Sea to evaluate the prospects of regional water volume modeling. The ultimate goal of this research -individual estimates of water volume stored in Sand Sea lakes -is important when evaluating aquatic habitats, conducting industrial activities that 65 require local freshwater supplies (i.e. ice road construction), and understanding regional water and energy balance.
Compared with lakes in surrounding regions of the ACP, Sand Sea lakes tend to be deeper and thus less likely to freeze to the bottom during the winter. Their notable depth means that Sand Sea lakes tend to have lower evaporative losses and are more likely to have basins characterized by floating (rather than bedfast) ice in the winter (Arp et al. 2015;Engram et al., 2018). These unfrozen lake basins provide crucial overwintering habitat for fish and 70 other aquatic life (Jones et al., 2009;Sibley et al., 2008). Furthermore, liquid water is essential for industry during winter, primarily for ice road construction, but also for ice airstrip and ice pad construction, exploratory oil-well drilling, and withdrawal of water for drillers' and researchers' in-camp use (Jones et al., 2009). Unfrozen winter lakes can also store more heat, affecting the regional energy balance (Jeffries et al., 1999). Therefore, depth and volume quantifications of deep Sand Sea lakes can help monitor fish habitat and direct locations of water extraction 75 for wintertime infrastructure and consumption for other purposes.
Previous studies have evaluated water depth and bathymetry of lakes in nearby regions using various methods, but are limited either to shallow lakes or by coarse depth resolution (e.g. Hinkel et al., 2012;Jeffries et al., 1996;Jones et al., 2017;Kozlenko and Jeffries, 2000;Sellman et al., 1975). Such limitations make deep lake depth and volumetric estimation unfeasible. For example, Jeffries et al. (1996) used satellite imagery and radar data to 80 determine which lakes in regions near Utqiaġvik (Barrow) and Atqasuk, Alaska (including lakes in this paper's study area) froze to the bottom during the winter, extrapolating from their results a classification of lakes as being less than or greater than 2.2 m deep. When used in concert with an ice-growth model, this provided a proxy for coarse lake volume estimation, but was limited to shallow lakes. Hinkel et al. (2012) measured in situ bathymetry for 28 lakes. However, the maximum lake depth of this study on the inner ACP was 2.3 -5.2 m. Thus, our dataset 85 and study is unique in its consideration of deep lakes. Furthermore, while one of the model variants we test was successfully used to extrapolate bathymetry in tropical and sub-tropical coastal marine environments (Jagalingam et al., 2015;Stumpf et al., 2003), to our best knowledge, the model has never been applied to high Arctic lakes.
Volumetric estimates with the resolution provided here (30 m horizontal; 0.03 m vertical) have never been attempted for Pleistocene Sand Sea lakes and the method of depth derivation used in this paper has not been employed in the 90 Arctic.

Field methods
Depth points were sampled across 19 lakes during a field expedition between 22 July 2017 and 27 July 95 2017. The method of data collection required landing on each target lake in a float plane. A HumminBird 798ci HD SI Combo automatic sonar unit was attached to the back of a float and sampled depth as the plane taxied or drifted across the lake. Depth points were each measured discretely as part of a depth-gradient transect and were sampled at a frequency of one point per second with an accuracy of 0.03 m (due to intrinsic machine error). The number of points collected per lake is specified in Table 1. 100 Lakes were targeted that were large enough for a float plane to land on in windy conditions (i.e. > ~1 km 2 lake surface area), and showed the presence of a distinct littoral shelf and a deep basin on 2.5-m color-infrared aerial photography (U.S. Geological Survey Digital Orthophoto Quadrangles [DOQs]). A single straight transect line was mapped across each target lake prior to field visits to encompass a wide depth range, however due to windy conditions, such lines were not always followed (Fig. 2). Nevertheless, in all but two lakes, a depth range from the 105 littoral shelf to the deep central basin was captured (Table 1). It should be noted that, as transects were comprised of individual points whose relationship to one another was unimportant to the modeling, the direction, angle, and other qualities of the transect are significantly less important than the range of depths captured.

Depth data processing
Depth data points from 17 of the 19 sampled lakes were compiled into a single file to facilitate initial 110 processing, with the lake IDs maintained in the database for lake-specific analysis. Two lakes at which sampling occurred contained an insufficient number of measurements to justify modeling their bathymetry (models produced for these two lakes would have been strongly overfitted). Top-of-Atmosphere (TOA) reflectance values from the blue band (band 2; 452 -512 nm), green band (band 3; 533 -590 nm), and red band (band 4; 636 -673 nm) of the Landsat image (described in the section below) were extracted to each point. Although Surface Reflectance (SR) 115 imagery was available, we elected to use TOA reflectance initially because SR algorithms are often suboptimal when looking at water bodies due to the low level water leaving radiance and furthermore, we are working at high altitudes, where SR corrections are unreliable. Upon comparison, the SR and TOA reflectance values in our selected RGB imagery (discussed below) were very similar (R 2 > 0.99) at our sample locations. The coastal band (band 1; 435 -451 nm) was not included here as there was no basis for its examination in prior similar studies (e.g. 120 Jagalingam et al., 2015) at the time this analysis was conducted and unexpectedly, preliminary results were not greatly improved by the inclusion of the coastal band.
The dataset was filtered to 13,735 depth points: for each transect collected with the HumminBird sonar, discrete points were evaluated relative to the depths of their neighbors and anomalous and zero depth points were manually removed from the dataset. This step mitigated sonar errors and improved the smoothness of the 125 bathymetric profiles that were generated from each transect. Subsequently, depths collected at the margins of two lakes at the Pik Dunes (70.234 °N, 153.183 °W) were removed from the dataset after manual inspection due to their anomalous spectral signatures. The unique, white color of the sandy substrate at this group of lakes and the extreme shallow nature of the littoral shelves (~ 0.5 m deep) produces a spectral signature near the margins of lakes in the Pik Dunes area that is easily confused with that of the surrounding land and thus should not be used to analyze lake 130 depth. These Pik Dunes depth points represent outliers and had they been included, our models would have had to reconcile associating strikingly different spectral values with similar depth values. This likely would have decreased overall model performance with the only potential benefit of modeling a limited number of marginal pixels more accurately.
To further minimize error caused by associating a single pixel's spectral signature with multiple depth 135 points (i.e. to reduce compatibility issues between the spatial resolution of the sonar transects and Landsat imagery to which the depth points were compared), the dataset was resampled to include only one depth per pixel. This depth was calculated by averaging the sonar depths of all measurements within the pixel, removing depths greater than one standard deviation from this average, and re-calculating the depth mean of the pixel. Aggregating per-pixel measurements allowed us to identify the dominant depth represented by the pixel's lake color and improve the 140 precision of training data (i.e. reduce the range of input depths associated with a given band ratio). This pixelrepresentative depth point provides the final depth value used in analysis. All data visualization and manual data editing was undertaken using ArcMap; automated data editing was done with the aid of ArcGIS and python.

Landsat image selection, processing, and analysis 145
Landsat-8 Operational Land Imager (OLI) imagery was chosen for comparison with measured depth data due to its large swath, 30-m spatial resolution, and quality (as assured by U.S. Geological Survey pre-processing). A cloud-free Landsat image (LC08_L1TP_077011_20160805_20170222_01_TI) was selected that both covered the study area and was acquired on 5 August 2016, that is, at a similar time of year to that of field data collection from the following year (suitable imagery was not available for 2017). The late summer was chosen to provide data for a 150 time when lakes are at an intermediate level, that is, lakes are free of ice, but have not yet reached their lake level minimums (determined when evaporation exceeds precipitation; Jones et al., 2009). It should be noted that water volume varies seasonally and interannually in accordance with precipitation of the preceding twelve months, and therefore the estimated depth data may not be representative of the lake levels year round or from year-to-year.
Nevertheless, these variations in lake level are relatively small, with surface area changes often around 0.6% of total 155 surface area (Jones et al., 2009). Furthermore, of these area changes, the majority of change occurs at the shallow littoral shelves and therefore results in little volume change (Jones et al., 2009).
As no ice-free, cloud-free Landsat images exist that cover all study lakes for late summer 2017, we selected a Landsat image from 2016 in order to maximize the number of lakes included at which field data exists, i.e. the number of lakes at which we could model volume. One potentially promising Landsat image exists that covers our 160 study area, however (a) it was acquired at the end of June, just after ice out, when the lake levels are at a seasonal high, and (b) slight cloudiness over some study lakes produced models that predicted depths up to 48% less accurately. The use of 2016 imagery is further justified as the interannual depth and volume changes are smaller than our error metrics. When considering one representative lake ( Observation of an imagery timeseries of a different group of lakes that are typically highly responsive to water level changes (located at 70.539 °N, 152.733 °W) similarly revealed lake level conditions to be more comparable between 170 5 August 2016 and 21-27 July 2017 than between these latter dates and 1 July 2017. Overlaying lake surface area changes on an airborne LiDAR-derived digital surface model showed a change in water level of ~ 0.10 m between 5 August 2016 and 1 July 2017, indicating a depth change well within our error margins (Alaska North Slope LiDAR Data -Project Code ALCC2012-05). The chosen Landsat image was clipped to the study area and a Normalized Difference Water Index (NDWI) water mask was created using ArcGIS tools to subset our study lakes from the 175 surrounding land pixels (McFeeters, 1996).
Study lakes were then visually assessed in ArcGIS to provide a Boolean turbidity rating for the purpose of analyzing the success of different models. Lake clarity was determined by comparing the selected Landsat image (as an RGB true color composite) with a Landsat image acquired 13 July 2016 (23 days prior to the acquisition date of the selected Landsat image), as well as color-infrared aerial photographs (DOQs) with a 2.5-m horizontal resolution 180 ( Fig. 3). Lakes that showed presence of sediment plumes or water cloudiness near the site of in situ data collection on the selected Landsat image were designated as turbid. Lakes which displayed minimal suspended sediment distant from the area at which depths were recorded were designated as turbid as well, however they were analyzed as if they were clear, as the impacts of sediment would not be seen in the depth point-derived spectral signatures.
Lakes that did not have sediment plumes were designated as clear. We validated our visual assessment using the 185 Total Suspended Matter (TSM) algorithm created by Nechad et al. (2010).

Model application and volume estimation
Twelve variations of a spectral-depth algorithm were examined, each characterized by a specific band ratio, adjustment factor, and growth factor (Table 2). More specifically, the blue to green, blue to red, and green to red 190 band ratios were considered. Such ratios were either simple (e.g. blue band/green band) or transformed according to Stumpf et al. (2003): (1) where Ri and Rj represent the TOA reflectances for bands i and j, respectively. A constant n is included to effect a positive output (Stumpf et al., 2003). We set n to 500, as it ensured that the logarithm would be positive 195 given any feasible band value input, R, from our image.
The band ratio and the depth measurement of the point at which the spectral signature was extracted were correlated using either a linear regression or an exponential function (Fig. 4a-c). The constants obtained from each of these models became the parameters with which to tune the linear or exponential equations for the validation data.
The root mean squared error (RMSE) of each regression between input depths and input band ratios provided error 200 statistics for modeled depths. In summary, the twelve model variations were each characterized by (1) one of three band ratios, (2) one of two transformation methods, and (3) one of two growth relationships (Table 2).
For each lake, half of the data was semi-randomly selected as input data while the remaining data was used for validation purposes. To ensure that the model was trained and validated with data spanning the full range of input depths, however, the maximum and minimum depths were assigned to the group of data to input into the 205 model, while the second deepest and second shallowest depth points were retained in the list of validation data. To obtain the best regional model, this same process was undertaken (i.e. selection of half of the data to train the models; application of each of the 12 models), however a sufficient number of depth points exist in the full dataset such that the explicit assignment of extreme depths values to input and validation data was unnecessary (i.e. the selection was fully random). 210 Each of the 12 models was tested at each of the 17 lakes and on a regional scale. To account for the slight variations in each model's capacity for depth prediction given different random sets of training data, 1,000 trials were performed. This allowed us to assert that the model designated as the most accurate model for a given lake (as determined from one trial) was the same model that most frequently produced the best results for that lake. The best model for each lake, as evaluated by the coefficient of determination between target and predicted data, was used to 215 calculate depths at each pixel in that lake. Depths were multiplied by 900 m 2 (the area of one Landsat pixel) and integrated to quantify the lake's water volume.

Results
The best model variants for individual lakes at which depth data were collected were able to account for 220 58.5% -97.6% of depth variability (median R 2 = 0.86, mean R 2 = 0.82; Table 3). Regional-scale models, however, were able to accurately explain less than half of the regional depth variability. Median uncertainty of single lake depth models (based on RMSE) was 1.23 m, while the average RMSE of the models was 1.44 m. However error was not distributed equally across depths, and models tended to overestimate shallower depths and underestimate deep depths (Fig. 4d-f). When considering model-predicted depths at all study lakes, depth points less than 2.95 m 225 were overestimated by an average of 0.21 m (or 17.2% of their true depth), with 61.3% of depths in this shallowwater group experiencing some model over-prediction. Meanwhile, 66.9% of depths greater than 2.95 m were underestimated, with an average difference between measured and modeled depths of 0.97 m. On average, points deeper than 2.95 m were underestimated by 5%. The threshold of 2.95 m represents the intersection between the 1:1 line and the correlation between measured and predicted depths. Coincidentally, approximately half of the data 230 points used for model validation fall below this threshold. Lake volumes ranged from 1.056 ×10 -3 km 3 at the smallest lake (total surface area = 1.089 km 2 ) to 57.416 ×10 -3 km 3 at the largest lake (total surface area = 18.998 km 2 ), with a median volume of 7.20 ×10 -3 km 3 (Table 4).
Volume and surface area were strongly correlated (R 2 = 0.90) for the 14 lakes at which complete volumes could be modeled (Fig. 5). Linear models predicted negative depths across much of the lakes' shallow littoral shelves; thus, 235 the modeled volumes of the three lakes at which linear models produced the most accurate results are an incomplete representation of the lake's water storage. Pixels at which models predicted negative depths were eliminated and the water volume was instead calculated for the surface area with predicted depths greater than zero (Fig. 6). Ground truth lake volume data do not exist for the study lakes at a similar scale of analysis, rendering error metrics unfeasible (aside from those implicitly contained in the depth model error). 240 The most accurate models (i.e. the models that were best able to determine lake depth for the greatest number of lakes) were models with an exponential growth factor with input band ratios blue/green or blue/red (Table 3). In all but three of the study lakes, an exponential relationship was found between spectral signature and depth. At only two lakes did the green to red band ratio provide the best results. The transform ratio provided the best results in 4 out of the 17 lakes, while the simple ratio was used to best model depths in the remainder of the 245 lakes. The difference between the modeled results of the pure versus transform ratio was marginal however, with an average difference between R 2 values generated by the respective models of 0.016.
The ACOLITE (software developed at the Royal Belgian Institute of Natural Sciences for aquatic applications of Landsat and Sentinel-2) implementation of the TSM algorithm (Nechad et al., 2010) provided quantitative support for the qualitative turbidity assessment, agreeing with the visual assessment in 14 out of 17 250 lakes. However this algorithm proved highly sensitive to depth (spearman rank order correlation = -0.774; p-value < 0.001) and did not detect sediment in deeper waters to the same extent as shallow waters, effectively ignoring the sediment plumes identified visually. Furthermore, the majority of shallow waters were assigned high TSM values by the algorithm, making the differentiation by turbidity at the lake-wide level irrelevant. Considering the points in our transects, 91% of high-sediment (i.e. TSM values in or above the 75 th percentile) points had measured depths < 2 m 255 and only five outlier high-sediment values were detected in points with depths > 4.6 m. To directly address the sediment content in deeper waters, the mean TSM value was calculated at each lake from sample points with depths > 2m. Seven out of eight lakes with the highest average TSM values had been designated as turbid by our qualitative assessment (note that one of these lakes was designated as turbid away from the sampling sitethis is counted as an error). In addition, all but one of the nine lakes with the lowest mean TSM values were designated as clear at the 260 sampling site.
Turbidity appears to partially explain band ratio success. All of the eight lakes at which the blue to green band ratio provided the best result were free of sediment where measurements were taken. Furthermore, all of the seven lakes designated as turbid at the data collection site required incorporation of the red band to achieve the best depth prediction. One anomalous lake at which no sediment was detected required incorporation of the red band to 265 predict depth most accurately.

Modeled depth analysis
Depth was accurately derived from Landsat OLI imagery for individual lakes (the average R 2 value of the 270 selected models for each lake was 0.82). Our R 2 values are consistent with those found in the literature (e.g. Jagalingam et al., 2015;Stumpf et al., 2003) and thus our selected models can be considered successful. The regional scale model, however, was unsuccessful and regional volume analysis was rejected. This finding is consistent with Smith and Pavelsky (2009), who found surprisingly high variability in a collection of remotely sensed lake storage volumes on the Peace-Athabasca Delta, Canada, despite their having similar physiographic 275 setting and morphology.
Unsurprisingly, the blue band proved to be the most useful in determining depth, and was used to tune depths at all but two lakes. Blue light has a shorter wavelength, and consequential higher energy, which allows it to be absorbed less in water than either green or red light. Thus, the reflectance of the blue band decreases less than either the green or red bands in proportion to increasing depth. In contrast, red light is able to penetrate only several 280 meters into most types of water before it becomes absorbed. This band proved useful in distinguishing depths at both the sandy littoral shelves, where water is typically 0.5 -3 m in depth, and where suspended sediment was present in the water. As sand reflects red light more than blue or green light, this is expected. Thus, applying the blue to red band ratio to predict depths at turbid lakes, where sediment reduces blue and green wavelength penetration into deeper waters, is recommended. Conversely, applying the blue to green ratio at clear lakes where blue and green 285 light can effectively penetrate water to the expected deeper depths will likely provide the best depth predictions.
However, it should be noted that the blue band is the most susceptible to contamination from atmospheric aerosols, which may contribute to the lack of model portability between lakes.
The two lakes at which the green/red band ratio best tuned the model were unique in terms of physical factors or sampling locations. One of these lakes showed the presence of an unusual purple-red patch on a shelf 290 between the littoral shelf and deep zone. Underwater vegetation likely accounts for this unusual spectral signature and thus it is unsurprising that this lake required a unique band ratio to accurately tune the model. Measurements at the second lake accounted for the shallowest range of depths of any lake (0.2 m -2.1 m), which may have led to stronger reflectance in the red band, as the sand was more prominent.
In addition, an exponential relationship was able to better model depth ranges that include shallow depths 295 of around 0.8 m, a finding that is likely the result of incomplete transect sampling rather than physical significance.
Of the three lakes at which a linear function provided the best model, two were the lakes at which depths on the littoral shelf were not measured; the third lake contained only a single measurement of the littoral shelf. Therefore, the lakes best modeled with a linear growth relationship are associated with measured bathymetry profiles that do not contain sufficiently shallow littoral shelf depths. This is evidenced by the prediction of negative depths at littoral 300 shelves when applying linear models, the product of the strongly negative y-intercepts that render low spectral signature ratios negative. This leads us to conclude that the linear relationship between band ratios and depths at these lakes is more likely the product of the locations at which data was gathered rather than a result of physical significance. It is thus important to tune models to all regions (and all depths) of the lake.
On the whole, applying an exponential function to compare depths to a simple band ratio provided the best 305 accuracy, with model input bands dependent on physical factors specific to each lake. The simple band ratio is easier to calculate and the accuracy provided by each ratio type model variant is comparable, with slightly better overall results when using the simple ratio. However, to address the underestimation of deep depths and overestimation of shallow depths in our models, additional transformations must be made, a goal that is outside the scope of this project. 310

Limitations
The depth estimates are only tuned to the extrema of depths measured at each lake. Although gathering data across a lake's full bathymetric profile was attempted, it is likely that the depth minima and maxima were not captured at all lakes. Collecting data with sonar attached to a float plane limited the measurement of depths 315 approaching 0 m. Few pixels were sampled at the minimum depth that was able to be measured (0.2 m) and thus there is insufficient tuning to accurately model the littoral shelves of lakes. Furthermore, while we attempted to gather depths across the deep central basins, it is impossible at present to know whether we sampled the deepest point without measuring the entirety of the basin. Thus, modeled depths may not accurately depict a lake's maximum depth. 320 The limited spatial resolution of Landsat imagery, in comparison with sonar depth data, constitutes the primary limitation to this work. As depths had to be averaged to conform to the assumption that each spectral signature corresponded to a discrete depth, the spatial resolution and depth precision of the sonar depths was greatly degraded, potentially accounting for some of the inaccuracies in the model variants. Modeling bathymetry with satellite imagery of a higher spatial resolution would allow for the use of more training points and thus likely 325 improve the accuracy of depth and volume predictions. Furthermore, samples were taken at a small fraction (in terms of surface area) of the lake (i.e. the entire lake's bathymetry was not mapped, rather, data points were collected along discrete and irregular transects). Thus, there exists a mismatch regarding the validation data and the natural phenomenon being modeled. Data at such a small spatial scale can never confirm with total accuracy the detailed nature of lake bed bathymetry. Constrained by cost and time however, collecting data at 17 remote lakes is 330 an important step towards understanding Sand Sea lake bathymetries on Alaska's Arctic Coastal Plain.

Implications and future directions
Lakes on the Pleistocene Sand Sea may be categorized based on depth, littoral substrate, and water clarity, as seen in the study lakes, with such categories providing candidates for different model variations. Future projects 335 may use this work to semi-automatically derive depths across the region, first manually classifying target lakes, and then applying different model variations to each class. Furthermore, subregions of each lake (e.g. deep basins, shallow shelves) may be classified in future studies and a different model variant applied to each subregion (e.g. variants that incorporate the red band applied to littoral shelves). Methods of lake subregion differentiation may include either (1) manual delineation based on spectral signatures or (2) automatic delineation with the aid of 340 synthetic aperture radar (SAR) to determine regions of floating versus bedfast ice (which correspond with deep and shallow water, respectively; demonstrated by Engram et al., 2018;Jeffries et al., 1996). Additional future work may include validation of lake water volumes as additional bathymetric datasets become available.

Conclusion 345
This work represents one successful application of leveraging in situ data collection with satellite remote sensing to estimate lake water depth and volume. Lake volumes can be monitored using remote sensing, however at least one field visit must be made in order to select the best model for a given lake. As of yet, it is still challenging to universally model the bathymetry of lakes across northern Alaska. Instead, field data continues to be necessary to train and calibrate models on a per-lake basis. 350 Furthermore, lake morphology may evolve in glaciated regions such as northern Alaska in response to hydroclimatic changes and permafrost degradation (Arp et al., 2011, Liljedahl et al., 2011, Nitze et al., 2017. This implies that individual field surveys and static modeling efforts such as this one may not accurately represent ground conditions ad infinitum, particularly in the presence of a rapidly warming Arctic climate (Nitze et al., 2017). In addition to the persistent need for field data to address modeling limitations to spatial scale, field data collection 355 and/or dynamic models will be important components if we are to model bathymetry across a longer temporal scale.
Despite these limitations, the simplicity of the modeling approach has important benefits. The models can be tuned very rapidly and require relatively few data points for training in comparison to machine learning models (e.g., Sagawa et al., 2019), a useful feature when training data must be collected in a relatively inaccessible region such as northern Alaska. In addition, the comparative nature of the demonstrated modeling facilitates analysis of 360 individual lake characteristics. Overall, this work provides an effective means for mapping bathymetry of individual lakes in a unique geologic setting on the ACP.

Data Availability
We present a dataset to greatly increase the number of in situ measurements of lake depth on the little-365 studied Inner Arctic Coastal Plain of Alaska. The dataset contains 13,735 point measurements of bathymetric depth measured across 19 lakes, and is freely available through the National Science Foundation Arctic Data Center: https://doi.org/10.18739/A2SN01440 (Simpson and Arp, 2018). The second dataset created for this project is comprised of 17 bathymetry rasters, one for each lake at which a sufficient number of depth points was collected.
These rasters represent the depth predictions of the best performing model for each individual lake and are also 370 freely available through the National Science Foundation Arctic Data Center: https://doi.org/10.18739/A2TQ5RD83 (Simpson, 2019).

Author contributions
Claire E. Simpson and Christopher D. Arp designed the sampling. Christopher D. Arp secured the funding 375 and instrumentation for field work. Claire E. Simpson, Christopher D. Arp, and Benjamin M. Jones conducted the field work. Claire E. Simpson processed and analyzed the data and prepared the figures and tables. Claire E.
Simpson prepared the manuscript with contributions from all co-authors.

Competing interests 380
The authors declare that they have no conflict of interest. Tables   Table 1: Sampling specifications for each study lake. The number of sample points and measured depth range  560 were calculated after the points were processed for quality assurance (e.g. anomalous depth pixels removed) but before resampling to the single point per pixel dataset.
Lake ID Centroid latitude (dd) Centroid longitude (dd) Number of points sampled Table 2: Modeled depth (Z) is calculated with each of four equations that are tuned with each of three input band pairs. Ri and Rj represent the Top-of-Atmosphere reflectances of bands i and j, respectively. Band pairs (band i/band j) include the blue and red bands, the blue and green bands, and the green and red bands. Tunable parameters m1 and m0 are derived by comparing spectral signatures with depth (as in Fig. 4a-c).

Ratio adjustment type
Transform Simple Growth factor Linear Exponential 570  Table 4: Modeled Lake Volumes. Individual lake volumes were estimated by multiplying the modeled depth for each pixel by a constant factor of 900 m 2 (Landsat spatial resolution). Depths were modeled by applying the best spectral-depth model for the lake (Table 3). Linear depth models predicted negative depths for some pixels; volume estimates derived from such models (namely the models applied at lakes 2964, 4365, and 6199) include only those pixels with modeled depths greater than zero. The percent of the surface area for which depth estimates at a lake 595 were positive (in contrast to the total surface area of a given lake derived using the NDWI mask) is quantified.    Coefficients of the trendlines between band ratios and measured depths (a-c) are used to tune the depth models for each lake. Different models (specified for each lake in Table 3) best predicted lake depth at each of these three lakes. Correlation between measured and modeled lake depths at three representative lakes (d-f) reveals underestimation of deeper depths and overestimation of shallow depths. Error bars represent root mean squared error (RMSE). 24 640 Figure 5. A strong correlation exists between surface area and modeled volume for the 17 lakes we analyzed.