the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Updated observations of clouds by MODIS for global model assessment
Paul A. Hubanks
Steven Platnick
Kerry Meyer
Robert E. Holz
Denis Botambekov
Casey J. Wall
Download
- Final revised paper (published on 14 Jun 2023)
- Preprint (discussion started on 06 Sep 2022)
Interactive discussion
Status: closed
-
RC1: 'Comment on essd-2022-282', Anonymous Referee #1, 03 Oct 2022
Review for “Updated observations of clouds by MODIS for global model assessment” by Pincus et al.
This manuscript documents a new dataset that allows for the comparison of clouds observed by the National Aeronautics and Space Administration (NASA)’s two Moderate Resolution Imaging Spectroradiometer (MODIS) instruments onboard the Terra and Aqua satellites and clouds that are simulated by climate models. The manuscript describes the idiosyncrasies of the product, its technical implementation and caveats regarding bugs in the processing of the dataset.
Overall, the documentation is easy to read and well-written. The background is presented quite well, except for a few ambiguities that are detailed below. My main suggestions would be to further clarify the purpose and scope of the product and manuscript about the product, as well as the variables that are described in the manuscript. I am also wondering if it would make more sense for the dataset to be completely finished before writing the documentation in a separate publication from than the User’s Guide since it may lead to confusion or disorganization if the same product is documented in different places. The unfinished products were mentioned twice in the manuscript. Specific comments follow.
- The authors state that the product is made for the “convenience” of end users, but also mention that on line 39: “The system was also quite fragile and ceased production when NASA updated the production of MODIS datasets” in the Introduction. It’s not transparent to me what is meant by that the system was quite fragile and why production ceased. Also the statement on line 46: “The dataset, produced using a system designed to be more robust to changes in the upstream data, provides a set of custom cloud-related parameters using specific dataset definitions more closely aligned with the MODIS simulator than are the standard datasets” is also not transparent to me. What upstream data are the authors referring to and what changes were made to them? Also, importantly, does this mean that using the standard MODIS product to compare against climate models is incorrect or is this product really only designed for the convenience of end users? Please clarify.
- A number of variables are described but I would recommend that in each case the variables themselves (as listed in Table 1) are spelled out to be clear and equations written out if applicable, which would be relevant for a publication in ESSD, e.g. line 257: when weighting by the “cloud fraction” in the MODIS simulator, Section 3.3 variables.
- Why is there no height-resolved cloud retrieval fraction saved as a separate product? Only the height-resolved cloud fraction from the cloud mask seems to be reported.
- Lines 260-263: “To facilitate comparisons with the MODIS simulator we have provided Python code, described below, that transforms a set of monthly files containing all variables to datasets with time series of each variables, which may be written as netCDF files and/or Zarr stores.” Does this sentence mean that the Python code takes individually saved netCDF files that each contain one month of data and simply concatenates the files into a time series for any individual variable and time period of choice? Please clarify.
- Is there a reason why this manuscript was submitted when the complete dataset is still under development? It seems that it would make more sense for the product to be finished first and then a final paper to be published on it for completeness and to avoid confusion in the future where the same product might be separately documented. It seems that the current User’s Guide that is already available on the Internet is sufficient in the interim?
- Figure 8: So if a user wanted the total joint histogram for liquid and ice, could they simply sum up the liquid and ice separately?
- Lines 136-137: “…and the condensed water path estimated from the product of optical thickness and particle size”. For ice clouds, what equation was used?
Typos and other minor edits:
- Line 211: “Liquid clouds are substantially more common than ice clouds (see Figure 5)”. This is a general statement but does not seem to hold in the middle right part of the plot. Also, although the outline of the continents is somewhat visible from the cloud fields themselves, it would be nice to have the outline of the continents drawn in.
- Typo on line 109: “of the of the”
- Typo on line 127: “is is ratio…”
- Why are low clouds consistently labelled as p_c greater than or equal to 440 hPa? Is this a consistent typo? I found this in a couple of places. I think you mean 680 hPa? But then line 229 uses 800 hPa as the threshold which is inconsistent with the previously mentioned definitions.
- Although it is probably clear to most readers, please spell out all acronyms to avoid any possible ambiguity, including MODIS and ISCCP
Citation: https://doi.org/10.5194/essd-2022-282-RC1 -
RC2: 'Comment on essd-2022-282', Anonymous Referee #2, 03 Oct 2022
Review of “Updated observations of clouds by MODIS for global model assessment”
by R. Pincus et al.This paper describes the data processing of new MODIS products that have been specifically designed to facilitate the evaluation of climate models. The paper is well written and the content is adequate for publication in ESSD. It provides a very valuable overview of the dataset for the modelling community, presented in a more accessible format than the Users' Guide. I believe this paper is a useful addition to the scientific literature and deserves publication with only minor changes. Please see my specific commments below.
L25: CFMIP. Please expand acronym.
L50: summaries -> summarises?
L69-78: Is the 5km resolution true or nominal? It is not clear to me if the number of pixels that go into the calculation of the cloud fraction depends on the VZA. The text seems to imply that 25 '1km pixels' are used for all VZAs.
L89: missing ( in citation.
L109: "of the of the"
L139-142: it would be helpful to explain why this approach for computing monthly averages is used.
L183-184: please can you quantify the impact of this bug, how small is "small"?
Figures 2-6: sometimes the global-mean values are mentioned in the text. It would be helpful to provide the global-means in the figure titles or in the captions.
Colour scales: please rationalise the use of the colour scales. Many different colour scales are used for no apparent reason. I suggest to consolidate all of them into two: dark to white, white to dark.
Figure 7: I assume that the results are area-weighted global means? Please clarify in the figure caption.
Figures 7,8,9: please revise labelling of y axes (e.g. cloudtoppressure) and include units.
Citation: https://doi.org/10.5194/essd-2022-282-RC2 -
RC3: 'Comment on essd-2022-282', Anonymous Referee #3, 04 Oct 2022
Summary
The authors describe a new cloud dataset derived from MODIS satellite observations intended for direct comparison to the output from the MODIS simulator that is part of the broader COSP satellite simulator package run in many modern climate models. The paper details the salient portions of the MODIS cloud retrieval and processing stream, the manner in which the present dataset is constructed (and how it is different from other MODIS level 3 products), and provides some visualization of the types of fields that are readily available from the dataset and their potential scientific uses. This will certainly be a useful dataset for the community working on clouds, in particular those evaluating clouds in climate models.
Major Comments
I am a bit surprised that the dataset is not produced in such a way as to be applicable to models “right out of the box” as is the case for datasets like GOCCP. The motivation for leaving some processing as an exercise for the end-user, which seems ripe for accidental misuse, was not clear to me (though the provided python code is of course welcome). Is it an attempt to leave some flexibility for end-users’ diverse needs? If the COSP run in climate models produces monthly mean cloud property fields (cloud fraction, joint histograms, log(tau), etc), why can’t this dataset provide the same right out of the box? Lines 257-260 indicate that even the model output will have to be further processed in order to match what is provided here, another place where user error can creep in (which fields require weighting by cloud fraction? Does cloud fraction have to be weighted by cloud fraction? Which cloud fraction -- mask or retrieval -- should be used as the weight?). Is there a plan to provide python code for processing the MODIS simulator output in such a way as to be directly comparable to what is produced by the provided python code? Could all of this post-processing be avoided from the outset by just providing an idiot-proofed dataset that is as close as possible to MODIS simulator output?
I am also a bit surprised (as I am currently downloading the dataset via the github instructions) that the filenames have such cryptic names, particularly the timestamp which seems to be reporting a Julian day at the start of the month rather than a format like YYYYMM. This results in further reliance on the python script rather than being able to quickly assess what a file contains.
For several fields, it is not clear to me that there is a COSP counterpart; what is the reasoning for including these fields in the “MODIS-COSP” dataset if COSP does not provide them? These fields include the two versions of cloud fraction (from the mask and from the retrieval); the partly cloudy pixel fields; and the additional statistics like standard deviations, sum-of-squares, etc.
Minor Comments
- Throughout, the low cloud criteria is listed as p>440 hPa (line 129, Figure 4 caption, Table 1 caption) where as I believe it should be something else like p>680 hPa. Also, in the caption of Figure 4 and Table 1, mid-level clouds are listed as those with CTP greater than 680 hPa and smaller than 440 hPa
- L164: “also able compute”
- L178: I believe “optical” should be capitalized.
- L187-190: Suggest reassuring the reader here that a script is provided to do this
- L200: “for much the”
- Figs 4 and 5: The sum of these maps equals 1 (not the retrieved total cloud fraction) everywhere, right? In this case, should the colorbar label instead be just “fraction”?
- Figs 5,8,9: Suggest putting ice clouds above liquid clouds, which seems a more natural orientation.
- Figure 5 caption: should be “phase” singular, I believe
- Figs 7-9: please provide units on the axes. Also, the y-axis in Figure 7 is labeled as “cloudtoppressure” – suggest breaking it up into 3 words.
- L261: “variables” should be singular
- Tables 1 and 2: Is it possible to report the name of the equivalent field from the MODIS simulator in a 4th column, or state that it is not available?
- Figures: The authors clearly had a lot of fun trying out various matplotlib color schemes, including my favorite, the Eye of Sauron scheme in Figure 4. I wonder if this may be distracting and unintentionally conveying differences that are not meant to be conveyed, considering many figures show the same field (cloud fraction). I am not sure whether any of these are unfriendly for color-blindness, but that should also be considered. The scheme in Figure 9 seems to artificially distinguish cloud fractions larger than about 0.007 from those below, but I’m not sure why that would be useful. In some figures lighter colors = larger values, but the opposite is true for Figures 3, 7-9.
Citation: https://doi.org/10.5194/essd-2022-282-RC3 -
AC1: 'Reply to reviewer comments on essd-2022-282', Robert Pincus, 20 Dec 2022
We are grateful for the prompt and constructive feedback provided by all three reviewers. We apologize for the length of time in responding which was partly the result of the pandemic. Below we describe how we have modified the manuscript to address the concerns. We have corrected typographic and graphical mistakes identified by the reviewers without further comment. Reviewer comments below are italicized.
Common concerns
Three comments by two reviewers focused on the clunkiness of the data set as provided:
Rev 1: Lines 260-263: “To facilitate comparisons with the MODIS simulator we have provided Python code, described below, that transforms a set of monthly files containing all variables to datasets with time series of each variables, which may be written as netCDF files and/or Zarr stores.” Does this sentence mean that the Python code takes individually saved netCDF files that each contain one month of data and simply concatenates the files into a time series for any individual variable and time period of choice? Please clarify.
Rev 3: I am a bit surprised that the dataset is not produced in such a way as to be applicable to models “right out of the box” as is the case for datasets like GOCCP. The motivation for leaving some processing as an exercise for the end-user, which seems ripe for accidental misuse, was not clear to me (though the provided python code is of course welcome). Is it an attempt to leave some flexibility for end-users’ diverse needs?
Rev 3: I am also a bit surprised (as I am currently downloading the dataset via the github instructions) that the filenames have such cryptic names, particularly the timestamp which seems to be reporting a Julian day at the start of the month rather than a format like YYYYMM. This results in further reliance on the python script rather than being able to quickly assess what a file contains.
We have expanded the discussion at the end of section 2 to explain why the data are provided in this less-than-ideal format, which is the result of the constraints under which the data are produced. We’ve moved the discussion of the simplifying Python scripts to the same location and elaborated on others tools we’ve made available.
Reviewers 2 and 3 found our choice of different color schemes to plot different-but-related quantities to be distracting:
Figures: The authors clearly had a lot of fun trying out various matplotlib color schemes... I wonder if this may be distracting and unintentionally conveying differences that are not meant to be conveyed, considering many figures show the same field (cloud fraction). I am not sure whether any of these are unfriendly for color-blindness, but that should also be considered. The scheme in Figure 9 seems to artificially distinguish cloud fractions larger than about 0.007 from those below, but I’m not sure why that would be useful. In some figures lighter colors = larger values, but the opposite is true for Figures 3, 7-9.
Colour scales: please rationalise the use of the colour scales. Many different colour scales are used for no apparent reason. I suggest to consolidate all of them into two: dark to white, white to dark.
We have experimented with using more uniform color scale, e.g. with using a grey scale in Figures 1-5. Recognizing that this is somewhat a matter of opinion we found this more confusing, since it lulls readers into the mistaken sense that they are comparing the same quantity. We have now noted in both text and figure captions that each distinct physical quantity is plotted with a unique color scale. We have ensured that brighter colors represent more and/or more reflective clouds in the maps (Figures 2, 4, 5) and have noted in the caption to figure 3 that darker colors indicate more dramatic differences. The color scales are from the “Colorcet” package from Holoviz and are indeed designed with various color-blindnesses in mind. The use of darker colors to indicate larger values in joint histograms follows the conventions used in plotting such histograms (e.g. doi:10.1175/JCLI-D-11-00248.1).
Reviewer 1
The authors state that the product is made for the “convenience” of end users, but also mention that on line 39: “The system was also quite fragile and ceased production when NASA updated the production of MODIS datasets” in the Introduction. It’s not transparent to me what is meant by that the system was quite fragile and why production ceased. Also the statement on line 46: “The dataset, produced using a system designed to be more robust to changes in the upstream data, provides a set of custom cloud-related parameters using specific dataset definitions more closely aligned with the MODIS simulator than are the standard datasets” is also not transparent to me. What upstream data are the authors referring to and what changes were made to them? Also, importantly, does this mean that using the standard MODIS product to compare against climate models is incorrect or is this product really only designed for the convenience of end users? Please clarify.’
We have revised the introduction to make our goals more explicit although we hesitate to spend much time describing datasets that are no longer being produced. We have enumerated a longer list of barriers, added language to emphasize that our data set is a technical convenience, and been more explicit about “upstream data.” We have now emphasized that the present data is a direct aggregation of the pixel-scale observations (the older data were not) but have not explained why the older data production ceased.
A number of variables are described but I would recommend that in each case the variables themselves (as listed in Table 1) are spelled out to be clear and equations written out if applicable, which would be relevant for a publication in ESSD, e.g. line 257: when weighting by the “cloud fraction” in the MODIS simulator, Section 3.3 variables.
That the observations are averaged to reflect the underling population, rather than assuming that each day is equally well-sampled, is explained three times in the manuscript. We address the time averaging of simulations in an expanded section 4.2 described more fully in the response to reviewer 3.
Why is there no height-resolved cloud retrieval fraction saved as a separate product? Only the height-resolved cloud fraction from the cloud mask seems to be reported.
As we noted on line 145 “ Summing the [joint histogram of cloud optical thickness and cloud top pressure] over all optical thickness bins and reducing the resolution in cloud-top pressure allows users to compute high, middle, and low cloud fractions consistent with cloud optical properties (as opposed to the cloud mask).”
Is there a reason why this manuscript was submitted when the complete dataset is still under development? It seems that it would make more sense for the product to be finished first and then a final paper to be published on it for completeness and to avoid confusion in the future where the same product might be separately documented. It seems that the current User’s Guide that is already available on the Internet is sufficient in the interim?
We are unclear what the reviewer means here. The data set is complete though we are still advocating for continuous updating. The user’s guide is certainly valuable but it is extremely detailed, mutable, and has not been subject to peer review.
Figure 8: So if a user wanted the total joint histogram for liquid and ice, could they simply sum up the liquid and ice separately?
They could. As noted at line 85, this would exclude pixels for which the phase could not be identified, though these pixels would contribute to the “Total” or unsegregated histogram.
Lines 136-137: “…and the condensed water path estimated from the product of optical thickness and particle size”. For ice clouds, what equation was used?
To keep the focus on the aggregated data set we are producing we refer readers to the papers describing the MODIS pixel-scale data.
Reviewer 2L69-78: Is the 5km resolution true or nominal? It is not clear to me if the number of pixels that go into the calculation of the cloud fraction depends on the VZA. The text seems to imply that 25 '1km pixels' are used for all VZAs.
Thanks; we’ve clarified this in the text, and indeed the reviewer is correct.
L139-142: it would be helpful to explain why this approach for computing monthly averages is used.
We have added a phrase explaining that the time-averaging used by standard MODIS products makes the tacit assumption that each day is equally well-sampled.
L183-184: please can you quantify the impact of this bug, how small is "small"?
As the data are missing it’s hard to gauge the true impact. We now comment on the lower limit: cloud fraction is based on fewer pixels than is cloud mask in less than 1% of monthly-mean grid cells.
Tables 1 and 2: Is it possible to report the name of the equivalent field from the MODIS simulator in a 4th column, or state that it is not available?
We have instead added text at the end of the table captions to guide users. Adding this column makes the table overflow the page.
Reviewer 3
If the COSP run in climate models produces monthly mean cloud property fields (cloud fraction, joint histograms, log(tau), etc), why can’t this dataset provide the same right out of the box? Lines 257-260 indicate that even the model output will have to be further processed in order to match what is provided here, another place where user error can creep in (which fields require weighting by cloud fraction? Does cloud fraction have to be weighted by cloud fraction? Which cloud fraction -- mask or retrieval -- should be used as the weight?). Is there a plan to provide python code for processing the MODIS simulator output in such a way as to be directly comparable to what is produced by the provided python code? Could all of this post-processing be avoided from the outset by just providing an idiot-proofed dataset that is as close as possible to MODIS simulator output?
For several fields, it is not clear to me that there is a COSP counterpart; what is the reasoning for including these fields in the “MODIS-COSP” dataset if COSP does not provide them? These fields include the two versions of cloud fraction (from the mask and from the retrieval); the partly cloudy pixel fields; and the additional statistics like standard deviations, sum-of-squares, etc.
Both of these perceptive questions focus on the ability to compare the observational data described here to output from the MODIS simulator. We have revised and added to section 4.2 describing these comparisons. On the technical front we note that time aggregation is normally done within climate models as the simulation advances. We highlight the observational averaging strategy specifically so that users can implement time aggregation correctly. (The post-processing solution proposed by the reviewers won’t usually apply.) We have added material emphasizing how definitions of cloud fraction differ between observations and the proxy and pointing readers to previous work exploring how these might be reconciled.
Citation: https://doi.org/10.5194/essd-2022-282-AC1