A six-year circum-Antarctic icebergs dataset (2018–2023)

Chen, Zilong; Liu, Xuying; Guan, Zhenfu; Li, Teng; Cheng, Xiao; Li, Tian; Liu, Yan; Liang, Qi; Zheng, Lei; Liu, Jiping

doi:10.5194/essd-18-147-2026

Articles | Volume 18, issue 1

https://doi.org/10.5194/essd-18-147-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/essd-18-147-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 18, issue 1

Data description paper

|

06 Jan 2026

Data description paper |

| 06 Jan 2026

A six-year circum-Antarctic icebergs dataset (2018–2023)

Zilong Chen, Xuying Liu, Zhenfu Guan, Teng Li, Xiao Cheng, Tian Li, Yan Liu, Qi Liang, Lei Zheng, and Jiping Liu

Download

Final revised paper (published on 06 Jan 2026)
Supplement to the final revised paper
Preprint (discussion started on 15 May 2025)

Interactive discussion

Status: closed

RC1:
'Comment on essd-2025-51', Anonymous Referee #1, 23 Jun 2025

This paper presents a circum-Antarctic iceberg database using Sentinel-1 SAR images in the Google Earth Engine platform. Their image segmentation and random forest classifier seem to work successfully in capturing the spatiotemporal distributions of icebergs, including their number and sizes, across the Southern Ocean. However, the authors need to provide more details about their iceberg detection model. While the authors mentioned that they used an ensemble random forest classifier with four different RF classifiers, based on different input features, they did not provide any details about this ensemble result (i.e., weights to each classifier, importance of statistical features, histogram features, and texture features). I encourage the authors to provide the details of their ensemble process to support the robustness of their method. Please also see my detailed comments below.
L146-147: How are these three subsets divided? Randomly or by any other criteria?
L210: Maybe it would be better to use 40 m, instead of 0.04 km, as already used throughout the manuscript (L69 and L216).
L241: “Based on this analysis, we selected an average thickness of 232 m for the icebergs” -> It is not clear how this value of 232 m is derived.
L256-259: Then, does it mean that 2018 data was included in training for all iterations but not tested at all, and 2023 data was never used for training? If so, I don't think this is a fair training strategy because the model could be biased to 2018 data. Would it be better to conduct 6-fold cross-validation (or so-called Leave-One-Out cross-validation), for example, 2018 data as test data and the remaining years as training data for iteration 1, 2019 data as test data and the remaining years as training data for iteration 2, and so forth? The authors mentioned that they used this strategy to “adapt to the time-series nature of the data while minimizing the risks of overfitting” (L256), but I’m not sure how the current strategy can achieve this.
Tables 3 and 4: The authors conducted performance evaluations twice: (i) evaluation for each year (Table 3) and (ii) evaluation with rolling window validation (Table 4). I’m not sure that these two different evaluations are really necessary. To evaluate the model performance, I believe cross-validation in Table 4 is enough.
L263-264: So, what model is finally used for building the iceberg database? The database is built each year separately based on the random forest model in Table 3, or does the entire database use a single model trained from the final iteration in Table 4?
Section 4.1: The authors should have provided a detailed performance of their “ensemble” RF model. In L150-154, the authors mentioned that they used four RF classifiers and assigned weights to these classifiers, but the manuscript lacks details about this process. It is necessary to specify the performance of these four classifiers and how the authors select the weights between these models.
L300: “several tens of kilometers: This is too ambiguous. Please provide specific numbers.
L301-303: I would like to ask the authors to provide more details about why the BYU/NIC database cannot capture so many > 5 km icebergs. Does it intentionally skip relatively small icebergs (near 5 km size), or does its iceberg detection algorithm, by itself, have limitations in capturing near-5-km icebergs? What about much larger icebergs, for example, > 10 km?
L339-349: I wonder if the total number of icebergs here and in Table 5 is the “true” number of icebergs. That is, if an iceberg is detected in two different Sentinel-1 scenes, how is this iceberg counted? This iceberg could be counted in duplicate, as the methods proposed in this study can only “detect” icebergs but cannot “track” identical icebergs. This could not be so significant because the authors used mosaiced data, but there is a possibility that the same icebergs are detected in duplicate (or some icebergs are missed) due to their drift even over a short period. It would be worthwhile to mention this issue and include any relevant discussion about it.
L347: We -> we
L355-356: “in the West Antarctic region and in the East Antarctic region” -> It would be better to only specify Thwaites and Doston ice shelves and Holmes and Mertz ice shelves, without mentioning too ambiguous “West and East Antarctic regions”.
L379-382: “In the Ross Sea sector, the iceberg proportion remained stable at around 16 % in 2018 and 2019, … remained relatively stable at approximately 20% over the six-year period.” In those sentences, the “iceberg proportion” may indicate “the number of icebergs in each sector / the number of total icebergs in the Southern Ocean.” However, I feel like this term “iceberg proportion” can be confused with “how much area (in percentage) is covered by icebergs (i.e., iceberg area / total ocean area of each sector).” Please consider rephrasing these sentences to clarify the meaning of the iceberg proportion. It could be good to discuss just the numbers (in Figure 11a), rather than the proportions (in Figure 11b).
L387: This is similar to the previous comment; please clarify the meaning of “total area.” I believe this means the total area of icebergs.
Figure 11: The caption should be corrected; I don’t think this figure includes any information about “five categories.”
L394-401: I’m not sure that this part really “validates” the small iceberg formation mechanism. The authors just present the distance from large icebergs, and it does not provide any direct clues for the small iceberg formation mechanisms. I don’t think this part is necessary.
L418: Although -> although

Citation: https://doi.org/10.5194/essd-2025-51-RC1
- AC1: 'Reply on RC1', Teng Li, 21 Aug 2025
  
  The comment was uploaded in the form of a supplement:
  
  Citation: https://doi.org/10.5194/essd-2025-51-AC1
RC2:
'Comment on essd-2025-51', Anne Braakmann-Folgmann, 04 Jul 2025

The research article “A Six-year circum-Antarctic icebergs dataset (2018-2023)” presents a novel and valuable dataset of iceberg population, distribution and area estimates for October in six consecutive years covering the whole Southern Ocean south of 55 deg (wherever Sentinel 1 EW data is available). It is the first study to include icebergs of all sizes with a minimum of 0.04 km² and covering both open water and sea ice. Therefore, I consider this study novel, innovative and valuable for many downstream applications and future studies and recommend publication after some minor revisions listed below:
General: On zenodo, where the data is published, there is one section specifically for iceberg detection code and the iceberg sample set, but not for the iceberg vector outlines, which are the main dataset. I would suggest adding a paragraph on them explaining what the data contains and what units each variable comes in! Ideally, the units should also be added to the header within the dataset (e.g. area [km^2] rather than just area) or there should be a readme file with the same information added to the iceberg vector outlines zip file for ease of use.
L9/10: You don’t mention mass as a geometric attribute, but that there is an uncertainty estimate for mass. As mass is not directly derived from the data, I would either leave it out or explain that mass is derived using a constant thickness and density.
L12: The statement that this is related to A68 is not clearly backed up by your analysis or discussed in the paper. Either leave out or add more discussion
Table 1: I would suggest adding the studies by Wesche and Dierking and Barbat to the table
Figure 1: This is a nice plot and clearly motivates why you picked October. However, did you just pick one location (indicated by the coordinates in the legend) for each surface type? And did e.g. the iceberg not move? From the text it is not clear at all whether this analysis was based on 1 pixel, 1 area or how many samples (area and number of images, locations) were used. Please explain.
Figure 2: How does the iceberg classification result impact your iceberg thickness calculation? Isn’t it solely based on altiberg? And the area/perimeter is independent of thickness?! So, I would suggest two parallel processing chains and merging them only for the mass (if I understand correctly).
L93: I assume most places are covered by several Sentinel 1 scenes within 1 month. How do you select which scenes to use and how do you ensure that icebergs are not missed or counted twice when they drift between scenes that are up to 30 days apart?
I am missing an explanation somewhere in your methods how you define each iceberg object. My understanding is that you classify each superpixel into iceberg or not and then do a manual correction. When do you merge neighbouring superpixels that were classified as iceberg into one iceberg? And have you tested how far apart two icebergs need to be to separate them? Or does each superpixel need manual redrawing of the outline anyway before it becomes an iceberg?
L145: What do you mean by sample points here? Are these the superpixels derived by SLIC? Or individual pixels? Or merged icebergs?
L178/179: How do you identify which icebergs are counted twice? Most have rather generic shapes or can rotate and break up in between..
L182: As you use a constant thickness and density for all icebergs, I think it would be better to just assign those parameters to individual icebergs that are actually derived from the data (i.e. area, perimeter, axes, coordinates) and only use the thickness and density to calculate the overall mass of icebergs in each year. For this application your assumptions seem fair and some of the uncertainty will average out, whereas the smaller bergs will certainly be thinner than you assume and some giant bergs will be thicker, so assigning the average thickness to each berg seems like an unnecessary stretch.
L301-303: I am surprised that BYU/NIC miss so many icebergs. Are most of the ones missing from their database around the threshold of 5 km? Or are you sure you weren’t counting some double? Or accidentally merged two smaller bergs into one bigger one?
L332 Small icebergs are more influenced by wind, not by currents.
L350-352: It does not make sense to analyse trends in mass if your thickness and densities are constant. You can analyse trends in area, but the mass is just a multiple of your area, so just leave this section out.
Figure 8: Very nice figure!
L417 Thickness and density will also depend on the calving location/mother ice shelf (Dowdeswell and Bamber, 2007 and Ligtenberg et al. 2011).
Figure 10: Add a comment that the y axis in c starts at 80 % - it’s easy to miss

Citation: https://doi.org/10.5194/essd-2025-51-RC2
- AC2: 'Reply on RC2', Teng Li, 21 Aug 2025
  
  The comment was uploaded in the form of a supplement.
  
  Citation: https://doi.org/10.5194/essd-2025-51-AC2

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

AR by Teng Li on behalf of the Authors (21 Aug 2025) Author's response Author's tracked changes Manuscript

ED: Reconsider after major revisions (12 Sep 2025) by Désirée Treichler

Dear authors,

thank you for submitting to ESSD. The two reviews of your manuscript and dataset are generally positive, although both highlight aspects that need improvement. Based on these reviews, and my own reading of the response to reviews and the manuscript itself, I encourage you to re-submit the manuscript after addressing these reviews.

In particular, both reviewers ask for clarification on the iceberg detection/segregation, on the way thickness is estimated and used, and how double detections are avoided/iceberg movement is handled. Note that a careful description and discussion of errors and limitations of your dataset is essential for it to be useful for the community.
The reviewers also request further details on the comparison with the BYU/NIC dataset. A discussion of differences in the detection abilities of the approaches behind the validation datasets will be useful for the readers. The suggestions of referee #2 for the zenodo dataset metadata/description greatly improves user-friendliness of the data.

Additional editor comments:
- In your reply to the reviewers, you use descriptions such as likelihoods/uncertainties are "very small", or "very low", with "insignificant effect", these are vague terms. Can you quantify these uncertainties/likelihoods?
This being a data documentation publication, information about uncertainties in your dataset and comparisons to other datasets are crucial for the reader and data user. Any uncertainties (also as a result of method choices/limitations) need to be stated, if possible quantified, and discussed in the validation and uncertainty or discussion section. This needs to be improved before resubmitting.

- In the reply to the reviewers, you included figures/tables that seem to belong to supplementary information. Are you planning on adding (some of) these to a manuscript supplement? If so, please refer to these in the manuscript text.

Minor comments:
- Construction of Incremental random forest classifiers (added after a request from reviewer 1): Please pay extra attention to this section, and ensure the process is transparent and understandable for the reader.
- Some comments seem not considered, e.g. reviewer 2 for Fig. 10.

In preparing your revisions, please ensure that all reviewer requests for clarifications are reflected in changes in the manuscript (in a summarised way), to avoid readers having the same questions. Please ensure that the manuscript describes the *data* and its uncertainties and limitations, rather than scientific analyses thereof.

Kind regards
Désirée Treichler, ESSD topical editor

Hide

AR by Teng Li on behalf of the Authors (26 Sep 2025) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (15 Oct 2025) by Désirée Treichler

RR by Anonymous Referee #1 (28 Oct 2025)

RR by Anonymous Referee #2 (11 Nov 2025)

ED: Publish subject to minor revisions (review by editor) (19 Nov 2025) by Désirée Treichler

AR by Teng Li on behalf of the Authors (24 Nov 2025) Author's response Author's tracked changes Manuscript

ED: Publish as is (10 Dec 2025) by Désirée Treichler

AR by Teng Li on behalf of the Authors (13 Dec 2025)

Download

Article (14366 KB)
Full-text XML

Short summary

Our study uses Google Earth Engine to create a dataset of Antarctic icebergs in the Southern Ocean (south of 55° S) from October 2018 to 2023. The dataset includes icebergs larger than 0.04 km², with details on their locations, sizes, and shapes. It shows significant changes in iceberg number and area, mainly driven by major ice shelf calving events – especially in the Weddell Sea. This resource fills key gaps in understanding iceberg impacts on the ocean and climate.

A six-year circum-Antarctic icebergs dataset (2018–2023)

Download

Interactive discussion

Peer review completion

Suggestions for revision or reasons for rejection

Suggestions for revision or reasons for rejection