the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Characterizing clouds with the CCClim dataset, a machine learning cloud class climatology
Axel Lauer
Rémi Kazeroni
Martin Stengel
Veronika Eyring
Abstract. We present a new Cloud Class Climatology dataset (CCClim), quantifying the global distribution of established morphological cloud types over 35 years. CCClim combines active and passive sensor data with machine learning (ML) and provides a new opportunity for improving the understanding of clouds and their related processes. CCClim is based on cloud property retrievals from the European Space Agency's (ESA) Cloud_cci dataset, adding relative occurrences of eight major cloud types as defined by the World Meteorological Organization (WMO) at 1° resolution. The ML framework used to obtain the cloud types is trained on data from multiple satellites in the Afternoon Constellation (A-Train). Using multiple spaceborne sensors reduces the impact of single-sensor problems like the difficulty of passive sensors to detect thin cirrus or the small footprint of active sensors. We leverage this to generate sufficient labeled data to train supervised ML models. CCClim's global coverage being almost gapless from 1982 to 2016 allows for performing process-oriented analyses of clouds on a climatological time scale. Similarly, the moderate spatial and temporal resolutions make it a lightweight dataset while enabling straightforward comparison to climate models. CCClim creates multiple opportunities to study clouds, of which we sketch out a few examples. Along with the cloud type frequencies, CCClim contains the cloud properties used as inputs to the ML framework, such that all cloud types can be associated with relevant physical quantities. CCClim can also be combined with other datasets such as reanalysis data to assess the dynamical regime favoring the occurrence of a specific cloud type or its radiative effects. Additionally, we show an example of how to evaluate a global climate model by comparing CCClim with cloud types obtained by applying the same ML method used to create CCClim to output from the icosahedral nonhydrostatic atmosphere model (ICON-A).
CCClim can be accessed via the digital object identifier: 10.5281/zenodo.8369202 (Kaps et al., 2023b)
- Preprint
(12286 KB) - Metadata XML
- BibTeX
- EndNote
Arndt Kaps et al.
Status: open (until 26 Dec 2023)
-
RC1: 'Comment on essd-2023-424: Scratching the surface of extending active observation "curtains" of clouds', Anonymous Referee #1, 24 Nov 2023
reply
This paper is a noteworthy addition to the growing literature of applying ML to Earth Science remote sensing, in this case cloud remote sensing. As the observational systems change and certain capabilities are interrupted or discontinue, we need innovative methods to cover data gaps. AI and ML have a lot to offer in that regard.
My recommendation is to accept the paper with minor revisions. I was tempted to recommend a major revision because I think that the active dataset is not used to its full potential, but it'd be unfair to demand from the authors a paper of a different direction. So this paper is judged on the merits of the fundamental choices they have made. Still, there are some major issues on how ML was implemented (see below).
What makes this paper in my opinion less than it could have been is the information content of the active observations they decided to use: so-called "cloud types". There are two issues: (1) The main appeal of active observations is that they can resolve (in many cases) cloud vertical structure, so to choose just a cloud type "flag" (the only or most dominant cloud type in an active observation "ray" or profile) seems like the least interesting choice; (2) Even these "cloud types" from 2B-CLDCLASS-LIDAR (CC-L) are taken too literally by the authors: they're not the WMO cloud types the authors imply them to be, but just cloud labels that may have some association; but I suspect frequently they do not have the morphological characteristics that surface observers use to classify clouds. More on this topic below.
On cloud types:
The WMO has 10 main cloud classifications https://cloudatlas.wmo.int/en/cloud-classification-summary.html. There's no Deep Convection (DC) as in CC-L, but rather Cumulonimbus (Cb). There is also cirrocumulus and cirrostratus, which do not exist in 2BCL (presumably all under Cirrus). But the biggest tell of the lack of correspondence between CC-L and WMO is the small occurrence frequency of stratus (noted by the authors). If one checks daytime stratus occurrence over the ocean from surface (ship) observers in the Warren Atlas https://atmos.uw.edu/CloudMap/WebO/index.html, there is plentiful stratus in the extratropical oceans. So, the same way the 9 ISCCP cloud types defined by arbitrary boundaries in the TAU-CTP joint histogram cannot be taken as equivalent to the corresponding WMO cloud types (cloud morphology as appears from surface), mentioned by the authors, the CC-L cloud types can also be assumed to mean the same thing as the WMO classification. Another tell: the mall SW CRE peak the authors find for Ns which are optically thick rain-producing clouds.
Other major comments:
-- Applying the ML algorithm cloud type to off-nadir MODIS pixels, when the training has been conducted with nadir MODIS cloud retrievals (that coincide with the active observations) can be justified only if it has been previously shown that the cloud retrievals are statistically the same at different parts of the MODIS swath, i.e., there are no biases in off-nadir retrievals (especially as one moves further aways from nadir).
-- Similarly, applying the ML algorithm to ESACCI clouds (from AVHRR) has to be justified by showing that MODIS and AVHRR cloud property retrievals are statistically equivalent (the authors are aware they're not -- lines 337-338). Because of different retrieval algorithms I doubt they are. Also, does ESACCI also include morning clouds? The training was conducted with afternoon clouds.
-- Similarly, applying the ML algorithm on climate models clouds is a huge stretch. What makes the model clouds equivalent to those of MODIS? You cannot even rigorously define a cloud-only grid column optical thickness when the grid is not overcast (depends on cloud fraction profile and overlap). At the minimum the model should've provided cloud output from the MODIS simulator.
Some minor comments:
-- Figure 1 would have been more complete if the datasets used in each step (CUMULO, ESACCI, etc) were added as labels.
-- Figure 2: Unclear to me: So if I properly weighted land and ocean and then normalized by the undetermined, I'd get close to what is shown in Fig. 2a? Perhaps you can say that.
-- Figure 4: Why do this over the southern oceans where (beyond the nearly non-existent St) there is virtually no Cu and no DC? Perhaps show time series and trends for different regions for each cloud type, namely the areas where each dominates relative to its global mean? (Fig. 6)
-- Figure 7: An interesting complement to Fig. 7 would be a table showing the contribution of each cloud type to the global CRE, weighted by RFO (subject to the disclaimer of no "pure" grid cells).
-- Lines 319-322: Of course there'll be few St in CCClim since you started with small frequency of St in CC-L. Same with DC. The difference between DC and St is though that DC are truly very rare while St are artificially rare in CC-L.
-- Lines 344-346: Not sure what you mean by this.
Citation: https://doi.org/10.5194/essd-2023-424-RC1
Arndt Kaps et al.
Data sets
CCClim - A machine-learning powered cloud class climatology Arndt Kaps, Axel Lauer, Veronika Eyring https://doi.org/10.5281/ZENODO.8369202
Arndt Kaps et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
42 | 10 | 2 | 54 | 4 | 2 |
- HTML: 42
- PDF: 10
- XML: 2
- Total: 54
- BibTeX: 4
- EndNote: 2
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1