A Six-year circum-Antarctic icebergs dataset (2018&ndash;2023)

Chen, Zilong; Liu, Xuying; Guan, Zhenfu; Li, Teng; Cheng, Xiao; Li, Tian; Liu, Yan; Liang, Qi; Zheng, Lei; Liu, Jiping

doi:https://doi.org/10.5194/essd-2025-51

Preprints

https://doi.org/10.5194/essd-2025-51

Preprints

15 May 2025

| 15 May 2025

Status: this preprint is currently under review for the journal ESSD.

A Six-year circum-Antarctic icebergs dataset (2018–2023)

Zilong Chen, Xuying Liu, Zhenfu Guan, Teng Li, Xiao Cheng, Tian Li, Yan Liu, Qi Liang, Lei Zheng, and Jiping Liu

Abstract. The distribution of Antarctic icebergs is crucial for understanding their impact on the Southern Ocean's atmosphere and physical environment, as well as their role in global climate change. Recent advancements in iceberg databases, based on remote sensing imagery and altimetry data, have led to products like the BYU/NIC iceberg database, the Altiberg database, and high-resolution SAR-based iceberg distribution data. However, no unified database exists that integrates various iceberg scales and covers the entire Southern Ocean. Our research presents a comprehensive circum-Antarctic iceberg dataset, developed using Sentinel-1 SAR imagery from the Google Earth Engine (GEE) platform, covering the Southern Ocean south of 55°S. A semi‐automated classification method that integrates incremental random forest classification with manual correction was applied to extract icebergs larger than 0.04 km² , resulting in a dataset for each October from 2018 to 2023. The resulting dataset not only records the geographic coordinates and geometric attributes (area, perimeter, long axis, and short axis) of the icebergs but also provides uncertainty estimates for area and mass. The dataset reveals significant interannual variability in iceberg number and total area-the number of icebergs increased from 33,823 in 2018 to approximately 51,332 in 2021, corresponding to major ice shelf calving events (e.g., the A68a iceberg in the Weddell Sea), followed by a decline in 2022. The annual average total iceberg area is 44,518 ± 4800 km², and the average mass is 8,779 ± 3,029 Gt. Validation using test set samples and a rolling cross-validation of interannual variability shows that the integrates incremental random forest classification achieves accuracy, recall, and F1 scores exceeding 0.90, and after manual correction, all performance metrics should be even better. Comparisons with existing iceberg products (including the BYU/NIC iceberg database and the Altiberg database) indicate a high consistency in spatial distribution, while our dataset demonstrates clear advantages in terms of spatial coverage, iceberg detection scale, and identification capabilities in regions with dense sea ice. This dataset serves as a novel data resource for investigating the impact of Antarctic icebergs on the Southern Ocean, the mass balance of ice sheets, the mechanisms underlying ice shelf collapse, and the response mechanisms of iceberg disintegration to climate change.

Received: 26 Jan 2025 – Discussion started: 15 May 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Zilong Chen, Xuying Liu, Zhenfu Guan, Teng Li, Xiao Cheng, Tian Li, Yan Liu, Qi Liang, Lei Zheng, and Jiping Liu

Status: final response (author comments only)

RC1: 'Comment on essd-2025-51', Anonymous Referee #1, 23 Jun 2025

This paper presents a circum-Antarctic iceberg database using Sentinel-1 SAR images in the Google Earth Engine platform. Their image segmentation and random forest classifier seem to work successfully in capturing the spatiotemporal distributions of icebergs, including their number and sizes, across the Southern Ocean. However, the authors need to provide more details about their iceberg detection model. While the authors mentioned that they used an ensemble random forest classifier with four different RF classifiers, based on different input features, they did not provide any details about this ensemble result (i.e., weights to each classifier, importance of statistical features, histogram features, and texture features). I encourage the authors to provide the details of their ensemble process to support the robustness of their method. Please also see my detailed comments below.
L146-147: How are these three subsets divided? Randomly or by any other criteria?
L210: Maybe it would be better to use 40 m, instead of 0.04 km, as already used throughout the manuscript (L69 and L216).
L241: “Based on this analysis, we selected an average thickness of 232 m for the icebergs” -> It is not clear how this value of 232 m is derived.
L256-259: Then, does it mean that 2018 data was included in training for all iterations but not tested at all, and 2023 data was never used for training? If so, I don't think this is a fair training strategy because the model could be biased to 2018 data. Would it be better to conduct 6-fold cross-validation (or so-called Leave-One-Out cross-validation), for example, 2018 data as test data and the remaining years as training data for iteration 1, 2019 data as test data and the remaining years as training data for iteration 2, and so forth? The authors mentioned that they used this strategy to “adapt to the time-series nature of the data while minimizing the risks of overfitting” (L256), but I’m not sure how the current strategy can achieve this.
Tables 3 and 4: The authors conducted performance evaluations twice: (i) evaluation for each year (Table 3) and (ii) evaluation with rolling window validation (Table 4). I’m not sure that these two different evaluations are really necessary. To evaluate the model performance, I believe cross-validation in Table 4 is enough.
L263-264: So, what model is finally used for building the iceberg database? The database is built each year separately based on the random forest model in Table 3, or does the entire database use a single model trained from the final iteration in Table 4?
Section 4.1: The authors should have provided a detailed performance of their “ensemble” RF model. In L150-154, the authors mentioned that they used four RF classifiers and assigned weights to these classifiers, but the manuscript lacks details about this process. It is necessary to specify the performance of these four classifiers and how the authors select the weights between these models.
L300: “several tens of kilometers: This is too ambiguous. Please provide specific numbers.
L301-303: I would like to ask the authors to provide more details about why the BYU/NIC database cannot capture so many > 5 km icebergs. Does it intentionally skip relatively small icebergs (near 5 km size), or does its iceberg detection algorithm, by itself, have limitations in capturing near-5-km icebergs? What about much larger icebergs, for example, > 10 km?
L339-349: I wonder if the total number of icebergs here and in Table 5 is the “true” number of icebergs. That is, if an iceberg is detected in two different Sentinel-1 scenes, how is this iceberg counted? This iceberg could be counted in duplicate, as the methods proposed in this study can only “detect” icebergs but cannot “track” identical icebergs. This could not be so significant because the authors used mosaiced data, but there is a possibility that the same icebergs are detected in duplicate (or some icebergs are missed) due to their drift even over a short period. It would be worthwhile to mention this issue and include any relevant discussion about it.
L347: We -> we
L355-356: “in the West Antarctic region and in the East Antarctic region” -> It would be better to only specify Thwaites and Doston ice shelves and Holmes and Mertz ice shelves, without mentioning too ambiguous “West and East Antarctic regions”.
L379-382: “In the Ross Sea sector, the iceberg proportion remained stable at around 16 % in 2018 and 2019, … remained relatively stable at approximately 20% over the six-year period.” In those sentences, the “iceberg proportion” may indicate “the number of icebergs in each sector / the number of total icebergs in the Southern Ocean.” However, I feel like this term “iceberg proportion” can be confused with “how much area (in percentage) is covered by icebergs (i.e., iceberg area / total ocean area of each sector).” Please consider rephrasing these sentences to clarify the meaning of the iceberg proportion. It could be good to discuss just the numbers (in Figure 11a), rather than the proportions (in Figure 11b).
L387: This is similar to the previous comment; please clarify the meaning of “total area.” I believe this means the total area of icebergs.
Figure 11: The caption should be corrected; I don’t think this figure includes any information about “five categories.”
L394-401: I’m not sure that this part really “validates” the small iceberg formation mechanism. The authors just present the distance from large icebergs, and it does not provide any direct clues for the small iceberg formation mechanisms. I don’t think this part is necessary.
L418: Although -> although

Citation: https://doi.org/10.5194/essd-2025-51-RC1
RC2: 'Comment on essd-2025-51', Anne Braakmann-Folgmann, 04 Jul 2025

The research article “A Six-year circum-Antarctic icebergs dataset (2018-2023)” presents a novel and valuable dataset of iceberg population, distribution and area estimates for October in six consecutive years covering the whole Southern Ocean south of 55 deg (wherever Sentinel 1 EW data is available). It is the first study to include icebergs of all sizes with a minimum of 0.04 km² and covering both open water and sea ice. Therefore, I consider this study novel, innovative and valuable for many downstream applications and future studies and recommend publication after some minor revisions listed below:
General: On zenodo, where the data is published, there is one section specifically for iceberg detection code and the iceberg sample set, but not for the iceberg vector outlines, which are the main dataset. I would suggest adding a paragraph on them explaining what the data contains and what units each variable comes in! Ideally, the units should also be added to the header within the dataset (e.g. area [km^2] rather than just area) or there should be a readme file with the same information added to the iceberg vector outlines zip file for ease of use.
L9/10: You don’t mention mass as a geometric attribute, but that there is an uncertainty estimate for mass. As mass is not directly derived from the data, I would either leave it out or explain that mass is derived using a constant thickness and density.
L12: The statement that this is related to A68 is not clearly backed up by your analysis or discussed in the paper. Either leave out or add more discussion
Table 1: I would suggest adding the studies by Wesche and Dierking and Barbat to the table
Figure 1: This is a nice plot and clearly motivates why you picked October. However, did you just pick one location (indicated by the coordinates in the legend) for each surface type? And did e.g. the iceberg not move? From the text it is not clear at all whether this analysis was based on 1 pixel, 1 area or how many samples (area and number of images, locations) were used. Please explain.
Figure 2: How does the iceberg classification result impact your iceberg thickness calculation? Isn’t it solely based on altiberg? And the area/perimeter is independent of thickness?! So, I would suggest two parallel processing chains and merging them only for the mass (if I understand correctly).
L93: I assume most places are covered by several Sentinel 1 scenes within 1 month. How do you select which scenes to use and how do you ensure that icebergs are not missed or counted twice when they drift between scenes that are up to 30 days apart?
I am missing an explanation somewhere in your methods how you define each iceberg object. My understanding is that you classify each superpixel into iceberg or not and then do a manual correction. When do you merge neighbouring superpixels that were classified as iceberg into one iceberg? And have you tested how far apart two icebergs need to be to separate them? Or does each superpixel need manual redrawing of the outline anyway before it becomes an iceberg?
L145: What do you mean by sample points here? Are these the superpixels derived by SLIC? Or individual pixels? Or merged icebergs?
L178/179: How do you identify which icebergs are counted twice? Most have rather generic shapes or can rotate and break up in between..
L182: As you use a constant thickness and density for all icebergs, I think it would be better to just assign those parameters to individual icebergs that are actually derived from the data (i.e. area, perimeter, axes, coordinates) and only use the thickness and density to calculate the overall mass of icebergs in each year. For this application your assumptions seem fair and some of the uncertainty will average out, whereas the smaller bergs will certainly be thinner than you assume and some giant bergs will be thicker, so assigning the average thickness to each berg seems like an unnecessary stretch.
L301-303: I am surprised that BYU/NIC miss so many icebergs. Are most of the ones missing from their database around the threshold of 5 km? Or are you sure you weren’t counting some double? Or accidentally merged two smaller bergs into one bigger one?
L332 Small icebergs are more influenced by wind, not by currents.
L350-352: It does not make sense to analyse trends in mass if your thickness and densities are constant. You can analyse trends in area, but the mass is just a multiple of your area, so just leave this section out.
Figure 8: Very nice figure!
L417 Thickness and density will also depend on the calving location/mother ice shelf (Dowdeswell and Bamber, 2007 and Ligtenberg et al. 2011).
Figure 10: Add a comment that the y axis in c starts at 80 % - it’s easy to miss

Citation: https://doi.org/10.5194/essd-2025-51-RC2

Zilong Chen, Xuying Liu, Zhenfu Guan, Teng Li, Xiao Cheng, Tian Li, Yan Liu, Qi Liang, Lei Zheng, and Jiping Liu

Model code and software

A Six-year circum-Antarctic icebergs dataset (2018-2023) Zilong Chen et al. https://doi.org/10.5281/zenodo.15332566

Zilong Chen, Xuying Liu, Zhenfu Guan, Teng Li, Xiao Cheng, Tian Li, Yan Liu, Qi Liang, Lei Zheng, and Jiping Liu

Viewed

Total article views: 353 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
281	55	17	353	13	18

HTML: 281
PDF: 55
XML: 17
Total: 353
BibTeX: 13
EndNote: 18

Views and downloads (calculated since 15 May 2025)

Month	HTML	PDF	XML	Total
May 2025	126	23	3	152
Jun 2025	89	9	7	105
Jul 2025	64	21	7	92
Aug 2025	2	2	0	4

Cumulative views and downloads (calculated since 15 May 2025)

Month	HTML	PDF	XML	Total
May 2025	126	23	3	152
Jun 2025	89	9	7	105
Jul 2025	64	21	7	92
Aug 2025	2	2	0	4

Viewed (geographical distribution)

Total article views: 360 (including HTML, PDF, and XML) Thereof 360 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 07 Aug 2025

Short summary

Our study uses Google Earth Engine to create a dataset of Antarctic icebergs in the Southern Ocean (south of 55°S) from October 2018 to 2023. The dataset includes icebergs larger than 0.04 km², with details on their locations, sizes, and shapes. It shows significant changes in iceberg number and area, mainly driven by major ice shelf calving events – especially in the Weddell Sea. This resource fills key gaps in understanding iceberg impacts on the ocean and climate.


Total:	0
HTML:	0
PDF:	0
XML:	0