the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A high-resolution inland surface water body dataset for the tundra and boreal forests of North America
Yijie Sui
Min Feng
Chunling Wang
Xin Li
Download
- Final revised paper (published on 19 Jul 2022)
- Preprint (discussion started on 10 Dec 2021)
Interactive discussion
Status: closed
-
RC1: 'Comment on essd-2021-278', Anonymous Referee #1, 18 Jan 2022
Review of “A high-resolution inland surface water body dataset for the tundra and boreal forests of North America” by Sui et al.
The presented dataset includes polygons of 6.7 million water bodies in northern North America, where the novelty is the inclusion of small water-bodies <0.1 km2 in size (90% of the included water bodies). I can see a value in this dataset, especially for studies of impacts of climate change on high latitude greenhouse gas exchange – where small aquatic ecosystems are likely to have a very disproportional influence.
The dataset is based on analysis of 10m resolution Sentinel-2 data from 2019. The approach to delineate and validate the polygons for water bodies seems robust, but I suggest that the authors discuss and perhaps analyze what the implications are of using satellite data from a single year. The spatial extent of many high-latitude water bodies, especially smaller water bodies in relatively flat regions, can vary significantly both inter- and intra-annually. Many regions of northern North America have relatively dry climates, and thus can experience significant multi-year drought or flood conditions. For example, it would be good to know what the deviations from normal conditions were in terms of cumulative precipitation for the 2 years prior to the collection of the remote sensing data. If some regions had relatively higher or lower long-term precipitation prior to 2019, this could indicate that the resulting dataset either over- or under- estimates the number and areas of small water bodies.
It is not explained why the dataset doesn’t also include high latitude regions in Eurasia. I’m assuming it’s about computing power? A full Pan-Arctic lake dataset would have additional uses compared to one that only focuses on North America, as it e.g. would allow for global estimates of high latitude aquatic greenhouse gas emissions.
A minor suggestion could be to change the name of the dataset so it is more precise, so that the name of the dataset indicates the region which it covers. Perhaps “The North America High Latitude Water Body Dataset”, NAHL-WBD?
The manuscript has significant deficiencies in its discussion/explanation of the physiography of North America, and the dominant lake types in different regions. Detailed comments are below in the “specific comments” section. The description of different regions within North America is confusing and hard to follow what regions actually are considered. The maps used in the manuscript do not indicate any of the key regions that are discussed. Other key points are that the reduction of lakes types in the discussion into just glacial and thermokarst lakes is insufficient (other types of lake are important, including those in peatland regions), and that focusing on differences in lake characteristics between just the boreal and tundra regions likely misses more important controls on lake morphology and size distribution (e.g. that the influence of surficial and bedrock geology and is more important for lakes than the terrestrial biome they are located in).
Several previous attempts at assessing the importance of aquatic ecosystems for greenhouse gas emissions have relied on databases that only included larger water bodies, along with assumptions of the number and area of smaller lakes (e.g. see Holgersson et al., 2016 and Cael and Seekell 2016). It would be interesting to see how the number and areas of small waterbodies in the presented dataset compares to the assumptions made in these previous studies. While not necessary for the study, you could include an analysis of the relationships between number and areas of water bodies for different regions of study domain, (contrast the lake size/area distribution for the Canadian Shield with that of coastal lowlands and peatland regions). Are these observed distributions similar to, or different from those previous attempts which were based on assumptions rather than observations?
Specific Comments:
L40 – What is the reference for the statement “especially in the high latitudes, where half of the lakes are thermokarst lakes”? Depending on what you define as “high latitudes”, this statement may not be accurate. The most lake-rich region in the boreal and arctic region is the Canadian Shield, which predominately is not affected by thermokarst the same way as the coastal tundra plains in Russia and Alaska.
L44 – I don’t understand what is meant by “The shapes of the water bodies correlate to suitability of surrounding ecosystems.” Please clarify.
L46 – The second part of this sentence after the semicolon is not clear in what it refers to: “Lake connectivity affects fish migration (Laske et al., 2019; McCullough et al., 2019), fish habitats, and aquatic assemblages (Napiórkowski et al., 2019; Jiang et al., 2021); improves water self-purification and accelerates water cycling (GliÅska- Lewczuk, 2009).
L49 – What is meant by “water density”?
L39-53. This paragraph is meandering – it starts out with a focus on lake thermokarst, and then it pivots to focus on the ecological influences from lake size and shape. The connection between these two topics is not clear.
L69 – Why was the focus on North America, why not also include the Eurasian boreal and tundra regions to have a consistent high latitude freshwater dataset? Was the limitation computational?
L77 – The Canadian Cordillera generally follow the border between Alberta and B.C, and then the border between the Yukon and the Northwester Territories. I can’t see how the Canadian Cordillera can be described as “eastern mountains” when it is located in western Canada. I also do not understand what is referred to when it is said that the Canadian Cordillera separates the continent into east coastal plains and west plateaus.
L78 – The geographical description of the “eastern coastal plain … located near the Pacific Ocean” doesn’t make sense to me – I don’t understand what region is referred to.
L80 – The climate description of the Canadian Shield doesn’t make sense. The Canadian Shield stretches from the great lakes to the Arctic Ocean (north-south), so there is a very large difference in climate between different parts of the Canadian Shield.
L82 – I think it is inaccurate to say that lakes dominate the landscape when you then state that they cover 36% of the land surface. The word “dominate” indicates >50% in my mind.
L83 – It is not clear what landscape is referred to in this sentence, the full study domain or the Canadian Shield?
L83 – The statement on 36% lake cover is not referenced. Where does this data come from?
L84 – Now the lake area is 30%, which contradicts the stated 36% in the previous sentence?
L86 – I don’t understand what regions are referred to when described as “east and north coast”.
L74-87. The geographical description of lakes in boreal and tundra biomes of North America is very poor in this paragraph. Please use better geographical names and descriptions to make a reader be able to follow which regions you are referring to. It is also very simplistic to only consider two types of lakes, glacial and thermokarst lakes, and it is not correct. Very important in this study area also organic lakes (not affected by permafrost), and there are also fluvial lakes, meteorite lakes, volcanogenic lakes, and anthropogenic lakes (dams) in the region.
L105 – What does the abbreviation JRC stand for? Not explained.
Figure 2 – What is the scale of the images?
L122 – Please add the reference for the PeRL dataset.
L128 – Perhaps include references in the figure legend – (This study) for SWBI and (Muster et al., 2017) for PeRL.
Figure 4 – You are using a different abbreviation here for your data product – WBI, should it not be SWBI?
L179 – There are substantial areas of boreal forest at elevations > 1 km, especially in the Rocky Mountains and in interior B.C. and the Yukon. How was the 1 km cut-off chosen?
L204-206 – The exclusion of water bodies that were large and elongated; did you check how this exclusion worked on the Canadian Shield? Many lakes on the shield are extremely elongated and may here fall in the excluded category? Also to be noted, the distinction between lakes and rivers can be hard to define on the Canadian Shield as many of the very elongated “lakes” really are part of the watershed drainage network and convey water downstream.
L250-253 – I think the interpretation that Tundra region has circular lakes formed by thaw, while the Boreal region has irregular-shaped lakes formed by glaciation is lacking. It is not Tundra vs Boreal biome that determines the shape of lakes, but is it much better described as being determined by the surficial geology, which is not tied to borders of biomes. The irregular-shaped lakes are found on the Canadian Shield, where the there is only thin surficial geology and most of the lakes are incised into the bedrock – and this can be found both in Tundra and Boreal biomes. Similarly, the more circular-shaped lakes are found in regions with thick overburden – either a result from being unglaciated (including aeolian deposits), or from being former seabottom that has been rising through isostatic rebound, or by being located in regions with thick moraines or widespread peatlands. Especially I would point out the extensive peatland regions of the Hudson Bay lowlands and the Mackenzie river basin – where large boreal regions are found in SWBI to have circular-shaped lakes.
L256 – This is a generalization which I don’t think needs to be adjusted similar to the comment above. That is, lake size is likely more linked to surficial geology rather than terrestrial biome. Also, the average lake size of the boreal biome is likely strongly influenced by the series of very large lakes that are found along the transition from the interior plains onto the Canadian shield – Great Bear Lake, Great Slave Lake, Lake Athabasca, Lake Winnipeg, Manitoba, Winnipegosis. Again – i.e. the location of these lakes is not linked to which terrestrial biome they are in but rather due to specific geological transitions. I would again also further emphasize that the vast boreal peatland regions seem to be very distinct in terms of lake size and shape when compared to the Canadian Shield.
L339-340 – As noted above, round lakes is not only due to permafrost processes, but common for lakes in regions with thick surficial geology, which is not only restricted to the tundra biome.
L365-366 – I’d like to see a bit expanded discussion on the use of a single year to determine lake area. Especially for small lakes this is likely to be a source of uncertainty, especially for high latitudes where there often is both important seasonal trends in landscape inundation, as well as very pronounced interannual variabilities due to multi-year dry or wet conditions. Do you have data that can shed light on whether 2019 (and the fall and winter of 2018 that led into 2019) was a year with normal, dry or wet conditions for different parts of North America? That could give an indication on whether these estimates are likely to be higher or lower than the long term normal.
Citation: https://doi.org/10.5194/essd-2021-278-RC1 -
AC1: 'Reply on RC1', Yijie Sui, 05 Jun 2022
We appreciate the reviewer's comment on the value of the dataset. As you suggested, an additional analysis was added in revision to examine the difference between the lake numbers observed in our data and those estimated from the power-law relationship; the name of the dataset was changed from SWBI to WBD-NAHL to be more clearly associated with the North American part of the biomes; we expanded the discussion to discuss the implication of a single year mapping and added the limitation of the biome-based analysis.
-
AC1: 'Reply on RC1', Yijie Sui, 05 Jun 2022
-
RC2: 'Comment on essd-2021-278', Anonymous Referee #2, 25 Jan 2022
General comments:
This manuscript presents a data set of inland surface water bodies (SWBI) for the tundra and boreal forests of North America. The data inventory is generated by an automated approach of mapping the 10 m resolution Sentinel-2 multispectral satellite imagery. The resulting SWBI includes approximately 6.7 million water bodies > 0.001 km2, of which there are 6 million (~90%) smaller than 0.1 km2. The data set is also compared with other earlier regional or global water body products (e.g., JRC GSW, PeRL, GLWD, and HydroLAKES) and manually interpreted data. The data set, if with good quality, can provide finer-scale water body distribution information over the tundra and boreal forest regions of NA Arctic, which is critical for studying Arctic surface water bodies in response to changing climate and thawing permafrost. However, several fundamental problems related to the remote sensing mapping method and insufficient data quality check prevent the paper from being considered for publication.
1. Mapping method
(1) The water body mapping method in section 4.1 is not described clearly. The logic of mapping procedures, including mapping updated water frequency, applying the machine learning model for water body identification, and deriving the final water body map is not described sufficiently, confusing the audience about the rationality of the methodology.
(2) The mapping method integrates the Sentinel-2 derived water frequency and JRC water dataset. However, the JRC dataset generated from Landsat imagery has a spatial resolution of 30m, which is inconsistent with 10-m Sentinel-2 data. How to create a 10-m SWBI data set with the water body size as small as 0.001 km2, about one pixel for Landsat imagery? Another water frequency data product based on Landsat imagery, the GLAD group’s Global surface water dynamics 1999-2020 (https://geog.umd.edu/news/new-global-surface-water-dynamics-maps-published-remote-sensing-environment), may be also considered for the integrated mapping.
(3) The mapping algorithm (Eq. 4) adopts a weighted linear combination. The weights and water frequency thresholds seem to be determined arbitrarily. In addition, whether the training set size (1250 points) is sufficient to establish the machine learning models for mapping approximately 6.7 million water bodies requires more evidence.
2. Data quality control
The reviewer roughly went over the SWBI dataset. Many mapping errors exist in the data product, e.g., mixing with ocean water areas near the coastline, remaining river segments, the rough and tumble polylines, etc. The flowchart (Figure 3) suggests that an identification procedure was conducted, excluding rivers/streams and removing noises. However, many river segments and coastal waters still exist in the data set. How to separate the multiple lakes linked by river/stream? This will influence the number and morphologic calculation of water bodies in the data product. I suggest severe quality assurance before publishing the data set.
3. Data validation
In this study, the data quality assessment includes two aspects, the comparison with other data products, e.g., JRC GSW data, HydroLAKES, and GLWD, and the validation by manual interpreted data. However, the earlier data products have inconsistent water body definitions with the SWBI. For example, the raster-format JRC GSW contains all water bodies, e.g., lake, river/stream, fish ponds, etc., while the HydroLAKES and GLWD only include lakes (and reservoirs). It is therefore not valid to attribute the differences to mapping quality.
Other comments:
Line18: This data set does not include water bodies in Eurasia Arctic. How to say it is a more complete representation than the PeRL data?
Line19: I would like to recommend the public share and open access of the manually interpreted data, which can be helpful for the quality assessment of the future published water body data sets.
Line82: “Lakes and ponds” used here have different means?
Line83: “…about 50% of the lakes and 30% of lakes by area…” should be “…about 50% of the lakes by count and 30% of lakes by area…”?
Line97-98: For the Sentinel-2 sensors, there are three bands at the 60-m resolution among the total of 12 bands.
Line131: The MNDWI calculation requires the input of SWIR band, with a resolution of 20m for Sentinel-2 imagery. How to process the different resolutions for different spectral bands?
Line141-143: The references for the NDWI (McFeeters, 1996) and MNDWI (Xu, 2006) were mistakenly cited.
Line155: how to determine the threshold of “hue <0.45” for extracting water pixels? Please test the threshold sensitivity for water bodies under different conditions and images.
Line161: A is the updated water frequency. What is the final mapping result of water body data set?
Line162: how to derive the Sentinel-2-derived water frequency (As) here? By the machine learning model as introduced in the following part?
Line163: the threshold setting by combining water frequency and elevation looks a little weird. What is the rationality for doing this?
Line167: “(Figure 4a)”---the wrong inserting place.
Line190-191: the terrain shadows have few influences on the water body mapping for the areas with an elevation below 1500 m?
Line197-200: As shown in the data set, the polygons of mapped water bodies were generalized by GIS tool. Does the simplification tolerance affect the calculation of geometry metrics of polygons?
Line243-246: The analyses of water body abundance (the power-law statistics), the counts of different lake size levels for the SWBI, should be added.
Figure 7a: the 5 km × 5 km grid can contain the water area >500 km2???
Line262-263: please indicate the sub-title for maps (a, b, c, and d).
Line273-280: why are the mapped water body areas mostly smaller than the manually interpreted data?
Line315-316: the number of SWBI water bodies for the size levels (100~1000, 10~100, 1~10, 0.1~1) are all slightly smaller than that of HydroLAKES, why? for lake changes (disappearances) during different mapping periods, or mapping uncertainty?
Citation: https://doi.org/10.5194/essd-2021-278-RC2 -
AC2: 'Reply on RC2', Yijie Sui, 05 Jun 2022
We appreciate the reviewer for the encouraging comment on the value of the dataset. Indeed, having a fine-scale water body dataset would be critical for studying the surface water in the pan-Arctic region in response to the changing climate and thawing permafrost. We also thank the reviewer for pointing out the issues in the method description and data quality, particularly regarding rivers/streams in the identified water bodies. We have extensively revised the Methods section to clarify the process of water probability layer, water extent detection, and the final water body identification. A new procedure was introduced to further examine whether the water bodies are small rivers and streams. Details of the changes can be found in the response to the specific relative comments.
-
AC2: 'Reply on RC2', Yijie Sui, 05 Jun 2022