the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
SAR Image Semantic Segmentation of Typical Oceanic and Atmospheric Phenomena
Abstract. The ocean surface exhibits a variety of oceanic and atmospheric phenomena. Automatically detecting and identifying these phenomena is crucial for understanding oceanic dynamics and ocean-atmosphere interactions. In this study, we select 2,383 Sentinel-1 WV mode images and 2,628 IW mode sub-images to construct a semantic segmentation dataset that includes 12 typical oceanic and atmospheric phenomena. Each phenomenon is represented by approximately 400 sub-images, resulting in a total of 5,011 images. The images in this dataset have a resolution of 100 meters and dimensions of 256 × 256 pixels. We propose a modified Segformer model to segment semantically these multiple categories of oceanic and atmospheric phenomena. Experimental results show that the modified Segformer model achieves an average Dice coefficient of 80.98 %, an average IoU of 70.32 %, and an overall accuracy of 87.13 %, demonstrating robust segmentation performance of typical oceanic and atmospheric phenomena in SAR images.
- Preprint
(2019 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on essd-2024-222', Anonymous Referee #1, 07 Aug 2024
I appreciate the effort and work put into this manuscript, which focuses on constructing a dataset of SAR images annotated for 12 types of oceanic and atmospheric phenomena and developing a deep learning model to segment these phenomena. The paper addresses a significant topic and provides valuable contributions. However, several areas need to be addressed to improve the overall quality and clarity of the manuscript.
1. Dataset
1) Ground Truth Determination: The criteria for determining the boundaries of each phenomenon are not clearly defined. For example, the internal wave is identified by its wave crest lines, while the pure ocean wave includes both wave crest lines and surrounding seawater. The boundary size for eddies is not clearly defined, and typically, eddies detected by SAR are accompanied by biological slicks, which are not considered in the dataset. The sea ice regions in the images seem to include ice leads, yet the entire area is labeled as sea ice. Additionally, the separation between low wind speed areas and biological slicks or oil spills is not clearly explained. A more rigorous and transparent method for defining these boundaries is needed.
2) Sample Diversity: Compared to the Sentinel-1 SAR dataset by Wang et al. (2019), this manuscript adds internal waves and eddies, but uses the IW mode data, which primarily captures nearshore areas. Internal waves and eddies, particularly eddies, typically occur offshore. Using IW mode data limits the representation of these phenomena. Additionally, the manuscript mentions using 484 IW images to select samples of internal waves and eddies, but it is unclear where these images are located, how representative they are, and the criteria for their selection.
3) Geographical Information: Oceanic and atmospheric phenomena vary significantly with the scale and the region of occurrence. The TenGeo-SAR dataset (Wang et al., 2019), on which this study builds, does not provide geographical information, making it difficult for readers or users to assess the representativeness of the images. Except for the final rainfall image, the manuscript does not provide geographical coordinates for all the SAR images, hindering reproducibility and assessment of the data’s representativeness.
2. Deep Learning Model
1) Metric Calculation: Many of the selected phenomena, such as fronts, internal waves, and icebergs, are significantly smaller in pixel count compared to the background (seawater). The manuscript does not exclude the background when calculating metrics, leading to potentially inflated performance scores. A model that outputs only seawater would achieve high scores under these conditions. A more accurate evaluation would exclude the background from metric calculations.
2) Segmentation Accuracy: The manuscript includes full SAR images for testing the model, which is commendable. However, in the case of internal wave extraction (Figures 12 and 13), several rain cells are visible but not identified by the model. The manuscript should include a comparison with ground truth and corresponding metrics for all phenomena. Additionally, the rationale for selecting only internal waves and rain cells for demonstration should be clarified (Section 4.4). The use of GPM half-hour rainfall data introduces temporal and spatial discrepancies with SAR imaging, which should be acknowledged. Figure 16 illustrates a noticeable discrepancy in the center location of the rainfall. It is recommended to define the criteria for identifying rain cells and directly compare them with ground truth rather than relying on GPM data.
Others
1) Geographical Coordinates and Imaging Time: Remote sensing images should include geographical coordinates and imaging time, which are crucial in geoscience research.
2) Terminology and Labeling: On line 289, page, ‘individual’ might not be accurate. It is a group approaching the shore (Figure 13b).
3) The abbreviation IW is ambiguous and can refer to both Sentinel-1 imaging mode and internal waves.
4) The term BG in the figures is not explained in the text.
5) Units and numbers should have a space in between (e.g., lines 348 and 349, page 14).
This manuscript presents valuable work on SAR image segmentation of oceanic and atmospheric phenomena. Addressing the issues mentioned above will significantly enhance the manuscript’s clarity, rigor, and impact. I look forward to seeing the revised version.
Citation: https://doi.org/10.5194/essd-2024-222-RC1 -
RC2: 'Comment on essd-2024-222', Anonymous Referee #2, 18 Oct 2024
Overview:
The manuscript focuses on using Sentinel-1 Synthetic Aperture Radar (SAR) images to create a dataset for semantic segmentation of typical oceanic and atmospheric phenomena. The authors construct a dataset using 2,383 Wave mode (WV) images and 2,628 Interferometric Wide swath mode (IW) sub-images, focusing on 12 phenomena (including pure ocean waves, wind streaks, micro-scale convection, rainfall, low wind areas, biological slicks, atmospheric fronts, oceanic fronts, icebergs, sea ice, ocean eddies, and internal waves). The images are manually annotated and prepared for deep learning tasks.
The authors modify the Segformer deep learning model to enhance segmentation accuracy by incorporating advanced features like Atrous Spatial Pyramid Pooling (ASPP) and Coordinate Attention (CA) modules. The authors claim that the proposed model achieves an overall accuracy of 87.13%, outperforming other models (like U-Net and DeepLabV3+ in most tasks), especially in detecting small-scale phenomena like icebergs and eddies. However, it is unclear if these results are influenced by hyperparameter tuning or the preprocessing of the images (e.g. poor image contrast).
The manuscript demonstrates the model's improved performance in segmenting multiple phenomena and verifies and validates the model against held-out data and relative to other technologies (GPM).
Major comments:
- The manuscript focuses on the validation of a machine-learning model. This seems to be beyond the scope of this journal and I recommend the work be submitted elsewhere. The manuscript also creates a dataset - this is relevant to the journal; however, the dataset details are not sufficient to make it useful to others in the community. For example, how were the images selected? Are these images representative of the population? How many examples of each class are available? Is that number sufficient statistically (or enough for developing AI/ML models)? How was the labeling done (what were the guidelines)?
- A major limitation of the dataset and model is that multiple phenomena are not possible in one pixel. There are several statements in the manuscript suggesting that multi-tagging is possible... so, it is confusing.
- More references and clarifications are needed. The manuscript is difficult to follow because the statements are ambiguous. Several examples listed in the specific comments below identify statements that could be improved. Mainly the Introduction and Conclusion could use more references to provide the appropriate context.
- All Figures and Tables should be referenced in the text.
- The major finding is that the modified Segformer model performs better than the other models. However, it is unclear how hyperparameter tuning would influence the results.
Specific comments:
L12 - Are you referring to the ocean surface as observed by SAR? Otherwise, it is obvious that the ocean surface exhibits ocean phenomena.
L13 - Why is automatically detecting phenomena crucial? The statement is unjustified. Are you referring to the size of the SAR data?
L18 - Maybe some readers are unfamiliar with the “average dice coefficient” so these 2 statements might not be the best way to communicate the results. I suggest generalizing your findings in the abstract better.
L22 - change matter to mass and add a reference or two.
L27 - why is remote sensing efficient? It covers space well but not time! It is expensive to develop, launch, maintain a satellite system and datasets.
L30-31: add or 2 a reference to support this statement.
L35-38: add a reference 2 to support this statement.
L44: numerous studies - but only 1 is listed. I suggest adding more references here
L47: good motivation - but what studies are you referring to? Add references
L59: “we think that using only 100 manually annotated samples for each phenomenon is insufficient to achieve the best segmentation results”
->This might be so… but in scientific journals, opinions should be supported by evidence or prior objective studies that support that statement. Please revise
L65 and L69-71 are inconsistent. The goals and the sections should be consistent.
(It seems) The development of the AI/ML model might not be within the scope of this journal.
L78 - Why are these phenomena “typical”? I expect that their occurrence in the open ocean is rare.
L82- wind streaks
L95 - How were the 2383 WV images selected? How does this sample affect your results?
L97 Figure 2 - the contrast is poor and it is difficult to view the phenomena. I question if the pre-processing described in Wang et al., (2019) (referenced in L103) was applied correctly.
L101 - How were the 484 IW images selected? How does this sample affect your results?
L123 - 8bit vs 16bit - Why is this distinction important? How does the digit precision impact the results? This information is distracting if it is unimportant.
L122 -256x256 subimages - Why?
This window size (256*100 m) might not resolve all phenomena. This spatial scale limits the model development and output by not considering features larger than ~25 km.
L131 Figure 5:
Label all subpanels
Describe all subpanels in the text
BG - I guess that means background?
(a) This is likely not an atmospheric front but rather an atmospheric gravity wave.
(c) regions of slicks are also low wind areas
How would this multi-tagging influence the results? This seems to be a potential issue of this approach.
(c third one from the left) The POW might be WS but the contrast makes it difficult to decipher.
L136 “improved” relative to what? This statement seems to be comparing this approach to a previous approach. Please clarify or revise the statement.
L155 “better utilize contextual information” - Maybe this is true but it is not common knowledge. I suggest adding references to support these claims of why the model setup is more appropriate.
L172 The idea of augmentations is good. However, how the augmentations are implemented is unclear. My concern is that some of the augmentations might not be feasible depending on the physical environment and satellite trajectories.
L182 Table 1 & L217 Table 2- how does the hyperparameter tuning influence the results? I am unsure if this is a fair comparison between models.
L191 Figure 7 - The poor contrast in the iceberg example might be making the problem very difficult for the models.
L226 small icebergs - I expect the reason is the poor contrast in the images rather than the size of the target.
L241 What is the overlap rate?
L240 Figure 8 - there are likely mixed rolls and cells in the bottom left of the image. It does not seem that the combination of these phenomena is possible in your approach. There might be more eddies than the 2 noted in your figure.
L254: How are these images selected?
They look mostly homogenous with 1-2 phenomena each.
L259 Figure 10 - the image contrast is poor. This likely makes it more difficult for your models.
Figure 13b - It seems the smaller-scale internal waves are not tagged (in the hand-tagged image) or found by the model.
L306 Figure 15: Use a color of the detected regions that are not in the grey scale. It is difficult to interpret now. It seems that cold pools (old rain) are not labeled in the archive or captured by the model.
L335-342: Ambitions are high but most statements are not justified or linked to the literature on the topic.
L346: It does not seem that multiple tags are possible for a given pixel. Please clarify
L353: What is the contribution to oceanography? The work is focused on validating a tool(/model) that can be applied to physical science problems.
Citation: https://doi.org/10.5194/essd-2024-222-RC2
Data sets
A dataset for semantic segmentation of typical oceanic and atmospheric phenomena from Sentinel-1 images Quankun Li, Xue Bai, and Xupu Geng https://doi.org/10.5281/zenodo.11410662
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
397 | 105 | 22 | 524 | 13 | 18 |
- HTML: 397
- PDF: 105
- XML: 22
- Total: 524
- BibTeX: 13
- EndNote: 18
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1