the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A Manually Labeled Contrail Dataset from MSG/SEVIRI
Abstract. Contrails — thin ice clouds formed by aircraft — are a major contributor to aviation-induced climate forcing, yet their observational characterization remains limited. We present a manually labeled contrail dataset derived from observations of the Meteosat Second Generation (MSG) SEVIRI instrument over Europe and the North Atlantic, comprising 140 scenes of 256 × 256 pixels. Each scene was independently annotated by three labelers, with ground truth established via majority consensus. To provide additional context, the dataset includes outputs from two satellite retrievals: CiPS (Cirrus Properties from SEVIRI) and ProPS (Probabilistic Cloud Top Phase retrieval), offering information on cloud cover and cloud top phase, cirrus probability, ice optical thickness, and ice cloud top height. These complementary variables enable detailed investigations, such as factors influencing contrail visibility. The dataset supports analyses of contrail detection, contrail characteristics, cloud-contrail interactions, and environmental conditions affecting detection. By providing high-quality labeled data with auxiliary cloud information, this resource facilitates the development and evaluation of contrail studies, contributes to improved understanding of aviation-related cloud effects, and informs strategies for climate impact mitigation. The full dataset is available under: https://doi.org/10.5281/zenodo.17669444.
- Preprint
(4569 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on essd-2025-740', Anonymous Referee #1, 07 Feb 2026
-
RC2: 'Comment on essd-2025-740', Anonymous Referee #2, 13 Feb 2026
This paper describes the development and characteristics of a dataset of manually labeled contrails in images captured by the MSG SEVIRI instrument.
The paper is extremely well written: I did not find any typographical errors. It reads well and I have very few line-by-line comments, which mostly pertain to the figures.
The study contributes to our understanding of the limitations in manually labeled datasets of contrails in satellite images, which is crucial given that these datasets are typically used to create and evaluate automated contrail detection techniques.
I also have some more general questions:
- The period over which satellite data is acquired is relatively long: are there any relevant issues related to sensor degradation over this time period, that could affect the creation of this dataset? And what is the rationale behind the decision to label images in the period 2013 – 2018, but then also between 2023 and 2024?
- Were the images projected in some way, to address image distortion at higher satellite viewing zenith angles? Were the locations of images that were labeled in some way limited by the viewing zenith angle?
- Why was the decision made to only consider images from the hour leading up to labeled image? And was this found to be sufficient, or would a longer period have been helpful? For example, (Meijer et al., 2022) uses 2 hours.
- Were the labelers trained in some way? The paper mentions a labeling guide: it would be very helpful to include this with the paper, like was done by (Meijer et al., 2022; Ng et al., 2024). Additionally, I believe Ng et al. (2024) implemented a “training process” that had labelers learn on a “gold-standard dataset” before “graduating” to labeling new images.
- How was labeler agreement affected by the proportion of land/sea pixels in an image? How was labeler agreement affected by the number of contrail pixels present in the image?
- How does the size of this dataset compare to previous labeled datasets (Meijer et al. 2022, Ng et al. 2024)? It would be interesting to compare the number of labeled pixels (accounting also for the lower resolution of the SEVIRI instrument) and the fraction of contrail pixels within these datasets. The distribution of contrail pixels among images might be another interesting feature to compare.
Another general comment is that for certain words, like “sun”, the capitalization is inconsistent across the paper. It might be worthwhile to do a check throughout the manuscript.
Once my questions and line-by-line comments are addressed, I consider this paper ready for publication.
Line-by-line comments
Figure 1: It might also be helpful to include the original satellite image (like figure 5a or 5e) next to this for interpretation.
Line 101: Why does this say: (e.g. 2021)?
Figure 2: The legend is cut-off. It might also be helpful to include the original satellite image (like figure 5a or 5e) next to this for interpretation.
Figure 6: The resolution of this figure is a bit too low to inspect the ash RGB images. I think it may also be helpful to combine the 3 different labels in a way that highlights their agreement and disagreement, as I now have to switch back and forth between the different labels to identify differences.
Line 178: “the own image assessment” what is meant with this?
Table 3: It is not clear to me how the different columns in this table are computed. There are 6 columns, but 14 labelers in total?
Figure 7: perhaps include horizontal axis labels for plots b) and c).
Figure 8c): abbreviation “PC” not defined.
Line 370: “as global attribute” -> “as a global attribute”
Table A1:
- “timestamp” is in UTC?
- row “native_coordinates”: “boundary box” -> “bounding box”
Table A2:
- First row: “Consensus” should this be capitalized?
- “acquisition_time”: First letter in “Description” not capitalized
References
Meijer, V.R., Kulik, L., Eastham, S.D., Allroggen, F., Speth, R.L., Karaman, S., Barrett, S.R.H., 2022. Contrail coverage over the United States before and during the COVID-19 pandemic. Environ. Res. Lett. 17, 034039. https://doi.org/10.1088/1748-9326/ac26f0
Ng, J.Y.-H., McCloskey, K., Cui, J., Meijer, V.R., Brand, E., Sarna, A., Goyal, N., Van Arsdale, C., Geraedts, S., 2024. Contrail Detection on GOES-16 ABI With the OpenContrails Dataset. IEEE Trans. Geosci. Remote Sens. 62, 1–14. https://doi.org/10.1109/TGRS.2023.3345226
Citation: https://doi.org/10.5194/essd-2025-740-RC2 -
RC3: 'Comment on essd-2025-740', Anonymous Referee #3, 22 Feb 2026
Gabriel et al. present a manually labeled dataset of aircraft-induced contrails derived from MSG/SEVIRI imagery. Given the radiative importance of contrails, particularly in the context of quantifying aviation-related anthropogenic climate forcing, this dataset represents a valuable contribution. The provision of a carefully constructed ground truth is especially relevant for the development and evaluation of machine learning–based contrail detection algorithms.
The manuscript is clearly written, and the methodology for contrail identification and labeling is well described. The inclusion of auxiliary variables improve the usefulness of the dataset. The data structure and metadata are generally well documented in both the text and the accompanying tables. I have only a few minor comments that should be addressed before the manuscript can be considered for publication in ESSD:
Line 80: It may be worthwhile sharing why the considered dataset has a gap of 5 years between 2018 and 2023.
Line 122: The manuscript states that approximately 40% of scenes contain no visible contrails and 60% were selected to include contrails. Was this segregation intentional? The wording suggests that it was. If so, this should be clarified, including the rationale for this choice.
Line 124: The manuscript notes that the selected 140 images are not uniformly distributed in space and time. This suggests that the dataset is not intended to represent real-world contrail climatology. In addition, the relatively small sample size indicates that the primary application of the dataset is likely to be the development and validation of contrail detection algorithms (e.g., ML-based approaches), rather than for statistical analyses of contrail occurrence or detailed studies of meteorological drivers. I suggest clearly highlighting these points in the manuscript.
Table 3 is not discussed in the manuscript text. Also, it is not clear what the columns represent.
Figure 8 presents contrail length and width distributions in pixel units. Since pixel size varies with viewing geometry, it would be more physically meaningful to express these quantities in distance units (e.g., km).
Line 255: It is worth noting here that the auxiliary parameters related to cloud optical properties are only available during daytime.
Citation: https://doi.org/10.5194/essd-2025-740-RC3
Data sets
Annotated Contrail Dataset for Meteosat Second Generation (MSG) V. Santos Gabriel et al. https://doi.org/10.5281/zenodo.17669444
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 252 | 138 | 31 | 421 | 20 | 37 |
- HTML: 252
- PDF: 138
- XML: 31
- Total: 421
- BibTeX: 20
- EndNote: 37
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
The authors present a manually labeled contrails dataset derived from Meteosat Second Generation SEVIRI imagery. The manuscript describes the labelers as following established best practices from the literature for their labeling campaign such as having multiple labelers annotate each scene and providing temporal context to the labelers. Additionally the authors set a nice precedent, being the first group to additionally collate and provide auxiliary data (CiPS and ProPs, Ash color scheme, reflectances/BTs, land-sea mask, surface altitude) for each scene that is useful for further scientific/ML analysis. The manuscript is well written, clear and comprehensive in its documentation of the dataset. The dataset is of high quality and fills an unmet need of contrail labels in geostationary imagery over the MSG domain, and represents a substantial amount of expert manual labor that would be nontrivial to reproduce. I had no trouble reading the article nor downloading and using the dataset based on it.
There are however two major technical corrections needed in the data that must be resolved before acceptance:
Minor issues to also address please:
Data:
Manuscript: