the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A Six-year circum-Antarctic icebergs dataset (2018–2023)
Abstract. The distribution of Antarctic icebergs is crucial for understanding their impact on the Southern Ocean's atmosphere and physical environment, as well as their role in global climate change. Recent advancements in iceberg databases, based on remote sensing imagery and altimetry data, have led to products like the BYU/NIC iceberg database, the Altiberg database, and high-resolution SAR-based iceberg distribution data. However, no unified database exists that integrates various iceberg scales and covers the entire Southern Ocean. Our research presents a comprehensive circum-Antarctic iceberg dataset, developed using Sentinel-1 SAR imagery from the Google Earth Engine (GEE) platform, covering the Southern Ocean south of 55°S. A semi‐automated classification method that integrates incremental random forest classification with manual correction was applied to extract icebergs larger than 0.04 km² , resulting in a dataset for each October from 2018 to 2023. The resulting dataset not only records the geographic coordinates and geometric attributes (area, perimeter, long axis, and short axis) of the icebergs but also provides uncertainty estimates for area and mass. The dataset reveals significant interannual variability in iceberg number and total area-the number of icebergs increased from 33,823 in 2018 to approximately 51,332 in 2021, corresponding to major ice shelf calving events (e.g., the A68a iceberg in the Weddell Sea), followed by a decline in 2022. The annual average total iceberg area is 44,518 ± 4800 km², and the average mass is 8,779 ± 3,029 Gt. Validation using test set samples and a rolling cross-validation of interannual variability shows that the integrates incremental random forest classification achieves accuracy, recall, and F1 scores exceeding 0.90, and after manual correction, all performance metrics should be even better. Comparisons with existing iceberg products (including the BYU/NIC iceberg database and the Altiberg database) indicate a high consistency in spatial distribution, while our dataset demonstrates clear advantages in terms of spatial coverage, iceberg detection scale, and identification capabilities in regions with dense sea ice. This dataset serves as a novel data resource for investigating the impact of Antarctic icebergs on the Southern Ocean, the mass balance of ice sheets, the mechanisms underlying ice shelf collapse, and the response mechanisms of iceberg disintegration to climate change.
- Preprint
(37599 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 20 Jul 2025)
-
RC1: 'Comment on essd-2025-51', Anonymous Referee #1, 23 Jun 2025
reply
This paper presents a circum-Antarctic iceberg database using Sentinel-1 SAR images in the Google Earth Engine platform. Their image segmentation and random forest classifier seem to work successfully in capturing the spatiotemporal distributions of icebergs, including their number and sizes, across the Southern Ocean. However, the authors need to provide more details about their iceberg detection model. While the authors mentioned that they used an ensemble random forest classifier with four different RF classifiers, based on different input features, they did not provide any details about this ensemble result (i.e., weights to each classifier, importance of statistical features, histogram features, and texture features). I encourage the authors to provide the details of their ensemble process to support the robustness of their method. Please also see my detailed comments below.
L146-147: How are these three subsets divided? Randomly or by any other criteria?
L210: Maybe it would be better to use 40 m, instead of 0.04 km, as already used throughout the manuscript (L69 and L216).
L241: “Based on this analysis, we selected an average thickness of 232 m for the icebergs” -> It is not clear how this value of 232 m is derived.
L256-259: Then, does it mean that 2018 data was included in training for all iterations but not tested at all, and 2023 data was never used for training? If so, I don't think this is a fair training strategy because the model could be biased to 2018 data. Would it be better to conduct 6-fold cross-validation (or so-called Leave-One-Out cross-validation), for example, 2018 data as test data and the remaining years as training data for iteration 1, 2019 data as test data and the remaining years as training data for iteration 2, and so forth? The authors mentioned that they used this strategy to “adapt to the time-series nature of the data while minimizing the risks of overfitting” (L256), but I’m not sure how the current strategy can achieve this.
Tables 3 and 4: The authors conducted performance evaluations twice: (i) evaluation for each year (Table 3) and (ii) evaluation with rolling window validation (Table 4). I’m not sure that these two different evaluations are really necessary. To evaluate the model performance, I believe cross-validation in Table 4 is enough.
L263-264: So, what model is finally used for building the iceberg database? The database is built each year separately based on the random forest model in Table 3, or does the entire database use a single model trained from the final iteration in Table 4?
Section 4.1: The authors should have provided a detailed performance of their “ensemble” RF model. In L150-154, the authors mentioned that they used four RF classifiers and assigned weights to these classifiers, but the manuscript lacks details about this process. It is necessary to specify the performance of these four classifiers and how the authors select the weights between these models.
L300: “several tens of kilometers: This is too ambiguous. Please provide specific numbers.
L301-303: I would like to ask the authors to provide more details about why the BYU/NIC database cannot capture so many > 5 km icebergs. Does it intentionally skip relatively small icebergs (near 5 km size), or does its iceberg detection algorithm, by itself, have limitations in capturing near-5-km icebergs? What about much larger icebergs, for example, > 10 km?
L339-349: I wonder if the total number of icebergs here and in Table 5 is the “true” number of icebergs. That is, if an iceberg is detected in two different Sentinel-1 scenes, how is this iceberg counted? This iceberg could be counted in duplicate, as the methods proposed in this study can only “detect” icebergs but cannot “track” identical icebergs. This could not be so significant because the authors used mosaiced data, but there is a possibility that the same icebergs are detected in duplicate (or some icebergs are missed) due to their drift even over a short period. It would be worthwhile to mention this issue and include any relevant discussion about it.
L347: We -> we
L355-356: “in the West Antarctic region and in the East Antarctic region” -> It would be better to only specify Thwaites and Doston ice shelves and Holmes and Mertz ice shelves, without mentioning too ambiguous “West and East Antarctic regions”.
L379-382: “In the Ross Sea sector, the iceberg proportion remained stable at around 16 % in 2018 and 2019, … remained relatively stable at approximately 20% over the six-year period.” In those sentences, the “iceberg proportion” may indicate “the number of icebergs in each sector / the number of total icebergs in the Southern Ocean.” However, I feel like this term “iceberg proportion” can be confused with “how much area (in percentage) is covered by icebergs (i.e., iceberg area / total ocean area of each sector).” Please consider rephrasing these sentences to clarify the meaning of the iceberg proportion. It could be good to discuss just the numbers (in Figure 11a), rather than the proportions (in Figure 11b).
L387: This is similar to the previous comment; please clarify the meaning of “total area.” I believe this means the total area of icebergs.
Figure 11: The caption should be corrected; I don’t think this figure includes any information about “five categories.”
L394-401: I’m not sure that this part really “validates” the small iceberg formation mechanism. The authors just present the distance from large icebergs, and it does not provide any direct clues for the small iceberg formation mechanisms. I don’t think this part is necessary.
L418: Although -> although
Citation: https://doi.org/10.5194/essd-2025-51-RC1
Model code and software
A Six-year circum-Antarctic icebergs dataset (2018-2023) Zilong Chen et al. https://doi.org/10.5281/zenodo.15332566
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
197 | 32 | 10 | 239 | 8 | 9 |
- HTML: 197
- PDF: 32
- XML: 10
- Total: 239
- BibTeX: 8
- EndNote: 9
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1