The authors have responded to all reviewer comments in detail and clearly have taken care to improve the manuscript, and I commend them for their hard work and attention to detail. The manuscript has become clearer in places, including the description of the datasets and the greater emphasis on validation. However, the manuscript has a few areas where it still needs significant improvement.
First, there are important terminology issues that need to be addressed. As the manuscript is currently written, it is extremely misleading, as it does things like claim that Landsat is not NRT but ICESat-2 is, and also claim that ICESat-2 observes way more lakes than Sentinel-2. These things are true in the context of the datasets that are used in this paper (i.e. GSWD for Landsat, BLUEDOT for Sentinel-2, QL data for ICESat-2) but are not true generally, and so this language needs to be updated throughout the paper to prevent the paper from reading like a fundamental misunderstanding of remote sensing data. There’s more detail in the comments below, but I’d suggest referencing the specific dataset, NOT the sensor, throughout the paper to clear up some of this confusion. Similarly, the authors refer to the GSWD-geostatistical model volume time series as the ‘Landsat’ time series in multiple places (including in Figure 3), which is also highly misleading and confusing. I also am still unsure about the value of the NRT analysis in this paper, and feel it may weaken the paper compared to the GSWD-geostatistical model dataset. Lastly, the paper is missing some much needed discussion/results around frequency of observation and revisit times.
Major Comments
1. Problematic use of data language and descriptions
There are significant issues with how the datasets are described throughout the paper. I understand that the reason so few lakes can be measured with Sentinel-2 is because the authors are not actually performing any classification themselves, but rather using the BLUEDOT dataset. I am not asking the authors to perform this classification, but I do think that every time the authors use the term ‘Sentinel-2’ in this paper, they should replace it with ‘BLUEDOT’. It just seems very misleading to claim that ~24,000 lakes can be estimated in NRT with ICESat-2 vs. only ~4,000 lakes with Sentinel-2.
For one, this is really comparing apples to oranges – ICESat-2 has a revisit time of ~91 days, whereas Sentinel-2’s is every 5 days, so time series estimated from Sentinel-2 would be much denser and more valuable. Not acknowledging this distinction throughout the paper is disingenuous. The ‘science’ value of a with a 91-day NRT revisit vs. a 5-day NRT revisit is wildly different, so this should be explicitly stated.
And secondly, it isn’t the actual Sentinel-2 dataset that is being analyzed, rather a derived product. If the authors had actually classified all Sentinel-2 imagery over the HydroLAKES dataset, they would be able to build NRT time series for all ~170,000 lakes for which they estimated volume in the historical time series. Therefore, the results described here are quite misleading as they way understate the value of Sentinel-2 and way overstate the utility of ICESat-2 for NRT monitoring. At the very least, along with changing the terminology, these two issues (i.e. the difference in revisit time and the fact that the Sentinel-2 actually observes all lakes) should be discussed thoroughly in the discussion section.
Additionally, as noted in a specific comment below, the authors refer to the practice of deriving H-V relationships for the NRT data as ‘Landsat-ICESat-2’, but this is not a good way to describe what they are doing. They are (1) building a volume time series by combining the gap-filled JRC-GSWO (Landsat) dataset with the geostatistical model and then (2) correlating this with ICESat-2 observations to produce volume estimates from each ICESat-2 observation. This needs to be explicitly stated, and calling it a V-H correlation a ‘Landsat-ICESat-2’ correlation is highly misleading, because *Landsat does not measure volume*! This is also extra confusing when the authors describe V-A correlations as ‘Landsat-Sentinel-2’ since technically those both measure area…
2. Concerns about the NRT approach
I still feel very conflicted about the NRT method. If I’m understanding it correctly, the first part of the NRT method works by assessing the correlation between the historical volume data (which itself is based on a geostatistical model) and either NRT height or area, and then simply using a lookup table to convert the NRT data into volume time series. Most studies trying to use area and height data to calculate volume are specifically building curves between area and height and then extrapolating those (which is also done in this paper), whereas the geostatistical lookup table approach (i.e. correlated area OR height with estimated volume which itself is based on the combination of a geostatistical model and area observations), while applicable to large numbers of lakes, just seems to be overly complicated, especially as it is currently explained.
Also, I’m not sure what the value of this NRT analysis is (especially the ICESat-2 part), as it feels like it would be easier to just classify lake extent from any number of NRT optical sensors and then relate these to the geostatistical model to calculate volume if *actual* NRT data were needed, rather than trying to Frankenstein all these other datasets together, especially in way that significantly overstates the value of ICESat-2 and understates the value of Sentinel-2. If the goal is just to produce a global dataset of consistent estimates of lake volume, why not just wait until the GSWD data are updated? What value does having an NRT observation every ~91 days actually provide? (especially when you could just classify Sentinel-2 data and then have an observation for EVERY lake every 5 days…).
Overall, I understand and appreciate the authors’ approach here – take a bunch of publicly available datasets, try putting them all together – but the end result as currently presented is misleading in places and I do question its scientific value (i.e. what will the NRT data actually be useful for). I personally think the paper and dataset would be stronger if the NRT section were removed entirely and this paper/dataset focused more on the fusion of the gap-filled GSWD data with the geostatistical model (which is a valuable contribution to the community), but I will of course leave that to the authors’ and editor’s discretion.
3. Lack of discussion of revisit times
My apologies if I’ve missed it, but it seems to me this paper is entirely missing discussion about the frequency of observations, particularly for the NRT data. Thorough discussion and presentation of results about revisit times and how often you actually get an NRT observation of each lake is absolutely vital for a reader of this paper to determine the usability/relevance of the resulting dataset. As noted above, this lack of discussion of revisit times in the results/discussion section is especially important given the huge difference between ICESat-2 and Sentinel-2 revisits. For example, Figure 3 and the section about ‘how many lakes can be monitored’ using this approach is not useful without some discussion around frequency of observation.
Specific Comments:
Line 49-53: You might considering referencing/discussing GeoDAR here, a new global reservoir and dam database that is arguably better than the databases discussed here, in this paragraph: https://essd.copernicus.org/articles/14/1869/2022/
Line 74: There’s a mistake here – the sentence just reads “Compared to…” but presumably there should be more text there
Line 81: The phrasing “whose number soars exponentially as smaller lake sizes are considered” doesn’t make sense here, maybe replace with something like “which are exponentially more numerous than large lakes”
Appreciate the adding of discussion around the different ICESat-2 products.
Figure 3 (and paragraph above): When discussing what percent of lakes in a given basin that can be observed using these techniques, it is imperative to note the percent of what (i.e. what is the denominator. There are millions upon millions of lakes globally, so I assume the denominator here is the 170,000 lakes whose storage dynamics were tracked, but you should be explicit about this.
Line 332: It’s meaningless to state that ‘Landsat and ICESat-2 together could measure lake water storage in nearly all rivers basins worldwide’. If ICESat-2 is only measuring a handful of lakes in a basin, is that really estimating its storage? Perhaps rephrase this sentence.
Line 332-333: When you state that extents and levels are correlated for ~1/4 of all lakes, is that because ICESat-2 only observes 1/4 of all lakes, or are there only good correlations for 1/4?
Line 325-330: I am confused here by the three complementary approaches – I think this should read (1) geostatistical model + extent (Landsat + geostatistical model + Sentinel-2), (2) geostatistical model + height (Landsat + geostatistical model + ICESat-2) and (3) extent + height (Sentinel-2 + Landsat). As written, it is confusing because you are essentially equating Landsat with the geostatistical model. Throughout the paper (for example, in Figure 3), please fix this and make it clear that when you say ‘Landsat’ you mean the GSWO dataset + the geostatistical model.
Line 359: What is GL?
Figure 5. What is the unit on the y axis?
Figure 6. What is GL?
Line 375: I’m not sure I agree that the relative agreement is more important than the absolute error. I can see what the authors mean here, but when discussing the comparison with the Tortini data, the authors should include the SMAPE/MAE to provide better validation context.
Figure 8: All lakes chosen to display in Figure 8 have very high R values, but there clearly are lakes with far worse agreement. This combined with the fact that the authors do not report any absolute error and only the R values (and then don’t show any of the worse R values) makes me somewhat question the accuracy of the results.
Line 447: This is wrong to state that ‘Landsat cannot provide NRT observations’. Landsat data IS available NRT (at 16-day revisit), but rather the JRC-GSWD product is not NRT.
Line 466: I’m not sure it’s correct to state that the spatial resolution of SWOT is worse than Landsat and Sentinel-2. SWOT doesn’t have a spatial resolution in the same way that Landsat and Sentinel-2 do (i.e. it requires spatial averaging to produce height observations). SWOT will be able to measure lake height and area ~6 million lakes globally, which is far more than the number of lakes observed here, so perhaps just remove this part about the spatial resolution of SWOT. |