the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Dual-hemisphere sea ice thickness reference measurements from multiple data sources for evaluation and product inter-comparison of satellite altimetry
Abstract. Sea ice altimetry currently remains the primary method for estimating sea ice thickness from space, however time-series of sea ice thickness estimates are of limited use without having been quality-controlled against reference measurements. Such reference observations for sea ice thickness validation in the polar regions are sparse and rarely presented in a format matching the satellite-derived products. Here, the first published comprehensive collection of sea ice reference observations including freeboard, thickness, draft and snow depth from sea ice-covered regions in the Northern Hemisphere (NH) and the Southern Hemisphere (SH) is presented. The observations have been collected using airborne sensors, autonomous drifting buoys, moored and submarine-mounted upward-looking sonars, and visual observations. The data package has been prepared to match the spatial (25 km for NH and 50 km for SH) and temporal (monthly) resolutions of conventional satellite altimetry-derived sea ice thickness data products for a direct evaluation of these. This data package, also known as the Climate Change Initiative (CCI) sea ice thickness (SIT) Round Robin Data Package (RRDP) was produced within the ESA CCI sea ice project. The current version of the CCI SIT RRDP covers the polar satellite altimetry era (1993–2021) and is part of ongoing efforts to keep the dataset updated. The CCI SIT RRDP has been collocated to satellite-derived sea ice thickness products from CryoSat-2, Envisat and ERS-1/2 produced within ESA CCI and the Fundamental Data Records for Altimetry (FDR4ALT) project to demonstrate the overlap and inter-comparison between the reference observations and satellite-derived products. Here, the CCI SIT RRDP is introduced along with examples of its use as a validation source for satellite altimetry products, where the averaging, collocation and uncertainty methodology is presented and their advantages and limitations are discussed.
- Preprint
(3924 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on essd-2024-234', Alek Petty, 22 Nov 2024
The paper by Olsen et al., introduces a compiled dataset of sea ice thickness related ‘reference’ measurements (freeboard, ice draft, snow depth, sea ice thickness) from various sources towards the goal of validating satellite-derived (radar) products across both poles through an ESA Climate Change Initiative project. They aim to align the various data with monthly gridded (25/50 km) satellite grid-scales to more easily enable evaluations. The authors make the claim in the abstract (and similar statements in the main manuscript) that this is “the first published comprehensive collection of sea ice reference observations including freeboard, thickness, draft and snow depth from sea ice-covered regions in the Northern Hemisphere (NH) and the Southern Hemisphere (SH)”.
Overall, I think this was a decent effort to compile various sea ice datasets of interest, but I was ultimately disappointed with how basic the methodology was for processing the different datasets and accounting for the different uncertainties and significant differences in spatial scales (representation errors) that I remain unsure how useful this ‘reference’ catalog will really be. It also didn't include a lot of the more recently available data I was expecting to see. Our community hasn’t produced an agreed upon ‘reference’ data collection as it’s very hard to do this and be consistent with the uncertainties and include a full accounting of things like representation/sampling error, and it often depends on the exact goal of the validation effort.
If your primary goal is to bring in datasets that measure sea ice at vastly different spatial/temporal scales to convert these into ‘reference’ measurements to validate (gridded) satellite products, then you really need to consider how best to do that. I know a lot of studies just bin data into a grid-cell (myself included), but if this paper is focused on creating a reliable/useable reference processed dataset, then I think you need to acknowledge when this works and when it doesn’t and ideally explore better ways of doing that through more sophisticated statistical means.
In a lot of your results example cases, you compare one of the ‘reference’ datasets with a satellite product, observe differences between the two, then say well they are maybe different because the reference dataset has issues (e.g. related to spatial scales and how they were aggregated) …so why produce this reference dataset and use it in the first place? What’s the value of a bad reference dataset that we don't really trust?
Similarly, you treat airborne data as a ‘reference’ dataset, but I think that is very dangerous. NASA’s Operation IceBridge is great for coverage and the multi-sensor nature of the mission, but it still has a lot of issues that are frustratingly yet to be resolved, e.g. the big uncertainties in snow depth from different algorithms applied to the snow radar (King et al., 2015, Kwok et al., 2017) and significant biases between the quick-look and final snow depths (Petty et al., 2023, Fig. S3) which needs to be acknowledged. I was quite surprised this wasn’t mentioned at all really.
I also think for this study to work, you should try to actually characterize the uncertainties and/or errors in a consistent way. Your effort to summarize how the uncertainties are described in the product is a decent one and I appreciated the effort you put into this. But take IceBridge for example, you neglect all the algorithm differences I point to above, so how useful really are those individual product uncertainties?
You state that the reference data should be ‘used with care’ a few times, but to me this is the job of this study! Decide which data to remove as it is just not a trust-worthy reference dataset for satellite validation for whatever reason. Seems like a cop-out to just say use it with care.
Finally, the datasets listed as future work (IceBird, MOSAIC, Nansen Legacy) would have been great to see in this study! Again I think this paper was neither exhaustive of all available data nor thorough in the methodology, so I encourage the authors to decide on a better strategy based on my comments above.
Specific comments:
I thought it was strange how much the intro talked about radar issues. Why not make it more about the science of why we want to measure basin-scale sea ice thickness? Then if your focus is radar, make that clear from the start, laser creeping in sometimes was confusing. Probably also easier to reference the papers that discuss the various issues in more detail, keep your focus on the reference datasets.
L39 – I think that’s still very much TBD and depends on the approach/freeboard used etc!
L41 – this is mixing up actual errors and theoretical uncertainties propagation which I think is confusing.
L45 – this seems like a bit of a stretch for an introduction! Do we really know that with confidence? Is that true for all types of freeboard and ice regime?
L47 – well this is really ‘a lack of uncertainty quantification data’ rather than uncertainties directly I think.
L80 onwards – ok so your aim is to reconcile radar thickness measurements. I think it would thus help to start with what you interested in then provide the uncertainty discussion to back that up, as before it was confusing how little you talked about laser.
The CDR is SIT, so shouldn’t thickness be the main validation target?
L135 – “this data package and the methodologies applied herein have the potential of becoming the reference for future comparisons of current and future SIT products.” This is a big claim and I don’t think you have demonstrated this potential considering all the caveats and issues, and the basic methodology (aggregation) discussed here and even in your results.
L407: How is accuracy qualitative? A little confused by that statement. I think it’s basically the same as error, no? So it requires a known truth? Whereas uncertainty can be more theoretical.
L505 ok so maybe stick with the higher number of 10 cm then?
L598: “Collocation is performed by finding all satellite data points obtained within ± 15 days from the date of the reference data, and within the 25 km (50 km for SH) grid cell of the reference coordinates. The average (arithmetic mean) of these satellite points are subsequently allocated to the reference data.” Ok so what uncertainties do we think this introduces? I think you need to provide some educated guesses at the very least.
L660 – why bother comparing if you then say it’s not right to compare them? Would you have stated the same if the stats were better? Much better to state from the off which data are appropriate to compare against and why, then show how to use those..!
IMB discussion – ok so there’s two things – you’re underestimating the actual uncertainties AND also not really dealing with the representation error.
“Additionally, no specific uncertainty for SD versus SIT is provided, resulting in the acoustic rangefinder sounders’ accuracy used as the uncertainty for both SD and SIT.” Why? I think you should be attempting to figure out what that should be, even if you have to make some assumptions.
References
King, J., Howell, S., Derksen, C., Rutter, N., Toose, P., Beck- ers, J. F., Haas, C., Kurtz, N., and Richter-Menge, J.: Eval- uation of Operation IceBridge quick-look snow depth esti- mates on sea ice, Geophys. Res. Lett., 42, 2015GL066389, https://doi.org/10.1002/2015GL066389, 2015.
Kwok, R., Kurtz, N. T., Brucker, L., Ivanoff, A., Newman, T., Farrell, S. L., King, J., Howell, S., Webster, M. A., Paden, J., Leuschen, C., MacGregor, J. A., Richter-Menge, J., Harbeck, J., and Tschudi, M.: Intercomparison of snow depth retrievals over Arctic sea ice from radar data acquired by Operation IceBridge, The Cryosphere, 11, 2571–2593, https://doi.org/10.5194/tc-11- 2571-2017, 2017.
Petty A. A., N. Keeney, A. Cabaj, P. Kushner, M. Bagnardi (2023), Winter Arctic sea ice thickness from ICESat-2: upgrades to freeboard and snow loading estimates and an assessment of the first three winters of data collection, The Cryosphere, 17, 127–156, doi: 10.5194/tc-17-127-2023.
Citation: https://doi.org/10.5194/essd-2024-234-RC1 -
CC1: 'Deriving CS2 penetration factors with snowradar data', Robbie Mallett, 15 Jan 2025
Given what Alek has written about significant inter-product variability in snowradar data, I wanted to briefly raise a point about line 635; it’s suggested that snowradar-derived radar freeboards & snow depths can be used to “directly evaluate” the penetration depth of CryoSat-2’s SIRAL instrument.
We tried to do exactly this for some recent work, and found that the derived penetration depth depended quite strongly on the snowradar algorithm, such that we could not meaningfully achieve what the authors are suggesting in L635.
Our investigation is documented in Section 2 of the supplementary material of Nab et al. (2024). The authors are right that if there were some roughly constant CS2 “penetration factor”, then it could be estimated by regressing the difference in the CS2 & OIB radar freeboards against the coincident OIB-derived snow depths. Higher snow depths would lead to bigger mismatches in the radar freeboards, as the impact of limited penetration would grow. The rate at which the mismatch scales with snow depth would reveal the penetration factor: if the mismatch remained the same as the snow got deeper, the CS2 penetration factor would be 100%. If the mismatch grew in a 1:1 ratio with the snow depth, then the inferred CS2 penetration would be zero (i.e. operating as a laser altimeter).
When we did this, we found the regression slope is 0.21 (penetration = ~80%) for the QL product, but ~0.6 (penetration = ~40%) for the wavelet and peakiness retrackers deployed with pysnowradar. So we couldn’t estimate the penetration depth in this way, without assuming one algorithm is so good as to be “the truth”. It’s possible that there is a “best” algorithm, but I’m yet to see a convincing case made.
There is an alternative way of doing this where you assume penetration happens by absolute (not fractional) depth. I.e. Let's imagine the CS2 return originates X cm below the snow surface, vs X % of the snow depth below. This approach also leads to an unacceptable level of variability in the derived penetration depth based on snowradar algorithm.
For what it’s worth, our inability to figure out the CS2 penetration depth with snowradar data led to our use of ULS moorings in the main part of the paper. The ULS data allowed us to calculate some penetration depths a bit more reliably. If the authors are looking for a way in which the reference measurements compiled here allow us to learn about CS2 penetration depths/factors, this is potentially a good example.
Nab, C., Mallett, R., Nelson, C., Stroeve, J., & Tsamados, M. (2024). Optimising interannual sea ice thickness variability retrieved from CryoSat‐2. Geophysical Research Letters, 51(21), e2024GL111071.
Citation: https://doi.org/10.5194/essd-2024-234-CC1 - AC3: 'Reply on CC1', Ida Olsen, 20 Feb 2025
-
CC1: 'Deriving CS2 penetration factors with snowradar data', Robbie Mallett, 15 Jan 2025
-
RC2: 'Comment on essd-2024-234', Anonymous Referee #2, 26 Jan 2025
The manuscript aims to combine a range of “reference” dataset containing sea ice thickness, sea ice freeboard, snow depth and sea ice draft measurements. It’s commendable the amount of work that has gone into compiling this dataset. However, the dataset and the description of it requires some additional work. Specifics has already been pointed out by Reviewer 1 and I have opted to not repeat those also here.
Comments
A concern for the quality of the reference dataset is the introduction of unknown information into the reference dataset, e.g., by adding a timestamp to the AEM-AWI when a time is missing. The adding of a time stamp (effectively assigning the data additional information) is then listed as a very minor pre-processing. Similarly, is the submarine dataset manipulated by adding dates to the dataset, the pre-processing level is then listed as 1-3. Minor pre-processing in my opinion would be removing clearly erogenous data points in a QC of the data, not adding information. A definition of the authors interpretation of the pre-processing flags is therefore needed in the manuscript. For section 8 should it be clearly stated that at times NaN have been replaced with arbitrary time stamps.
The IMB-CRREL data is listed as the most important data in this study (L337-338) and has also been listed as a dataset where major pre-processing is needed, and that the data should be used with care (L342). Assigning the dataset as the most important and at the same time the most unreliable raises questions to the usefulness of this entire reference dataset.
In section 3 the levels 0-3 are listed as pre-processing flags, and in section 4 levels 0-3 are listed as uncertainty flags. Using the same numbers/listings may cause confusion, a recommendation is therefore to use different numbers for the different types of levels.
How was the length of the different sensor time series selected? E.g., why is the Fram Strait wide mooring data after 2018 not included in the study (L224), why are IMB buoys only until 2015 included (L242), and why are only ASSIST data until 2021 included (L278)? On L238, why is the updated version of the data not used in this study? The updated dataset is from 2022 which is now 3 years ago.
Deformed ice appears to be defined as ice >3m thick (L441). Meaning that large parts of MYI will fall into the deformed ice category and many FYI areas, incl. rubble fields, will be classified as level with this definition. Should perhaps a different thickness range have been used here? Rough ice is used in other places (e.g. L462), what is the overall sea ice classifications used in this work? Should rough be the same as deformed? For clarity please include a sea ice type definition early on in the manuscript, I understand that there may be a large number of ice types and that there will be significant overlap between different ice types but it’s good to be able to see the definition in one place.
Section 4.2.4. This data is based on visual observation and is very dependent on the experience of the person making the observations, yet this dataset has been given an uncertainty flag of 1. Whereas data that are independent on human errors such as the airborne data has been given an uncertainty flag of 3. Measured and quantified errors have therefore been given a higher degree of uncertainty than those who is not easy (impossible?) to assess errors. E.g. how common is it for the estimates for the ship data to be made by an experienced observer? In addition is there a statement on L367-369 where the ship-observations show the opposing trend to what is expected, is the reason behind this the data quality? On L669 the SIT for ship observations is described by the authors as dubious. Should the ship-based observations perhaps therefore be ranked a level 3 like the airborne measurements? Section 7.5. It is great to see uncertainty in the data being discussed, though a clear and consistent definition is needed.
Section 7.1. Flight measurements may at times avoid certain ice types, such as thin/young ice, and open water areas which will affect the sea ice distribution in these datasets. The deployment of IMBs on pre-dominantly stable (thicker) sea ice is brought up it would be useful to also discuss the un-representativeness of the ice types in the air-borne data. This may help explain the discrepancies discussed on L638-640. The sea ice in the brackish Baltic Sea has a very low salinity, in areas equivalent to fresh ice. What is the error uncertainty associated with this ice type in the reference dataset?
A “reference” dataset such as the one mentioned could be used by other research groups outside the altimeter community. The down sampling to 25 km NH (50 km for SH) grid cell of the reference coordinates, therefore, makes the compiled dataset less useful. Others within the altimeter community may also want to perform this down sampling in different mathematical ways, potentially rendering the dataset less used than if the original resolution of the reference data is kept. The effect of this down sampling should also be discussed with the uncertainty assessment in mind, i.e. what uncertainty is introduced by the down sampling. A quantification of the uncertainties and errors introduced to the data should be discussed and quantified in section 7.6.
Minor comments
L4-5. What is meant with format matching? The file type, the data type?
L47-48. The issues with combining in-situ/airborne/drone based etc validation data to any remote sensing sensor is challenging due to the differences in temporal and spatial resolution. The snow depth is not unique in this regard, and it’s unclear why this parameter is listed as uniquely challenging.
L138. Why is section 8 listed before 3,4… etc?
Table 1. ASSIST data originates not only from ice breakers, but also ice capable ships etc. Later in the text the terms ship (e.g. Figure 2) or support vessels are used, it would be good to be consistent throughout the manuscript. Ship would suffice.
Figure 2. How was dominate data source assigned? >8%?
Introduce the acronyms the first time they are used, e.g. now AEM is used many times not explaining that is stands for Airborne Electro Magnetic soundings (?) EM is explained on L184.
L337-338. What makes this the most important data for this study?
L451-452. What is the uncertainty for the NPI data?
L467. Consider using months instead of summer and winter to allow for easier interpretation of the time period for the SH and NH.
L476. Is depth = draft depth?
Citation: https://doi.org/10.5194/essd-2024-234-RC2 - AC2: 'Reply on RC2', Ida Olsen, 07 Feb 2025
- AC1: 'Reply on RC1', Ida Olsen, 07 Feb 2025
Data sets
Sea ice thickness reference measurements (ESA CCI SIT RRDP) Ida Lundtorp Olsen and Henriette Skourup https://figshare.com/s/77be0cfd6842d08f1b6b
Model code and software
Code for sea ice thickness reference measurements Ida Lundtorp Olsen and Henriette Skourup https://github.com/Ida2750/ESA-CCI-RRDP-code
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
545 | 102 | 55 | 702 | 15 | 18 |
- HTML: 545
- PDF: 102
- XML: 55
- Total: 702
- BibTeX: 15
- EndNote: 18
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1