Reply on RC3

This paper investigates the influence of a snow cover on near-surface air temperature measurements which are generally subject to solar radiation-induced errors (even when sensors are protected in a shield), Such errors are amplified by a contribution of surfacereflected shortwave radiation, additionally heating sensor shields and sensors. To quantify this so-called albedo effect, several pairs of identical sensor/shield combinations are deployed at two neighboring locations, experiencing the same meteorological conditions. At one site the seasonal snow cover was unchanged, at the other snow has been removed after every snowfall such that these sensors were mounted above bare ground. Prior to the field experiment, the sensor pairs have been tested and characterized in controlled laboratory environments to determine relative errors and biases. Air temperature, shortwave radiation and wind speed data have been collected and analyzed to quantify the albedo effect on air temperature measurements by comparing temperature observation at the two sites. Observed temperature differences were related to the magnitude of incident solar radiation and wind speed to assess their relative importance. Albedo-induce errors were found to be as large as 3.8°C. The study follows certain WMO guidelines and proposes recommendations for measurement protocols to be applied on reference stations, e.g., in the context of GCOS.

Nevertheless, I have several concerns, suggestions, and questions which I would like to bring to the attention of the authors, detailed in the list of specific and sometimes more general comments below. It seems to me that the available data and results are partly underexploited and carry potential for more conclusive results. The manuscript needs substantial revision as indicated by the comments below. Integrating these suggested elements, clarifications, and discussions could strengthen this work and make it a more mature and consolidated paper. I am aware that I am raising a lot of points, but I hope that these comments are constructive suggestions and helpful for further improving this paper.
Thanks for the suggestions, comments, and points that help us improving the manuscript. All of them are addressed below: The manuscript needs linguistic editing to improve spelling, grammar, and wording.
The manuscript has been thoroughly reviewed by English mother tongue collaborators.
The abstract should be more specific with respect to results and findings of the study.
The second part of the abstract has been revised, adding the main findings and more details on the results. Some text on recommendation has been removed, to keep word limits.
Field data have been acquired at 2m above ground. When the paper talks about "nearsurface" air temperature measurements, what is be the upper limit for this term, and up to which height do you expect the albedo effect influencing air temperature measurements? g., sensor on a 10m mast?
For the experiment we relied on WMO/CIMO guide #8 to instruments and methods of observation, which prescribes that near-surface sensors should be placed between 1.25 and 2 m above ground. In the chosen site, snow cover higher than 2-1.25 = 0.75 m is extremely rare, therefore we stipulated that no mechanism for adjusting sensor height was necessary, as it was within specifications in all conditions. Temperature sensors at 10 m of height are not standard and not considered as "near-surface", therefore we did not estimate the contribution of albedo influencing such installations.
In order not to weigh down the text, these considerations have not been added. Only a quick reference to the definition of "near surface" has been added at the first occurrence in the text.
Related to comment #2, given a 5m radius of the snow-free surface and a 2m height of the sensors above ground, how does this relate to the field of view of the radiometer used which normally sees a hemisphere. In other words, what is the footprint the sensors (pyranometer and thermometer) see? Is a 10m diameter sufficiently large to neglect contributions from outside the circle?
The stated field of view of the pyranometer is 180° (but the effective one is a little less). In our configuration, a 1.5 m high pyranometer sees a diameter of 10 m at the ground under an angle of 146°, which we considered enough for our purposes. For trigonometric considerations, doubling the free radius under the sensor (thus, multiplying the area to be freed from snow fourfold) would have increased the field of view by mere 16°. This consideration has been added in the revised manuscript.
Section 2: I don't see much of a "theoretical study" in this section; is this statement related to previous work of the authors? Maybe consider revising the section title.
Yes, the "theoretical study" refers to Musacchio et al., 2019. The section title has been revised in the updated manuscript Section 2.2 could briefly mention the influence of the identified quantities. In particular, the role of snow depth and the snow properties should be mentioned. Why is snow density not considered?
The influence of the identified quantities was more explicit in the previous work, and we felt it would have been unnecessary repetition. However, it has been added in the revised manuscript.
Snow density was not considered because literature works (like for instance Bohren and Beschta 1979) concluded that snowpack albedo is only weakly dependent upon it. This has been added to the revised manuscript.
Has the solar zenith angle been considered to have an impact? This is not only influencing the shortwave radiative heat flux but also the snow albedo (even significantly at large zenith angles).
These are really questions for the previous theoretical study (Musacchio et al 2019). However, zenith angle was not considered for two reasons: first, the site is in a narrow valley and sun's rays never come from large zenith angles, being blocked off by the orography of the site; second, while it is true that zenith angle influences the flux of reflected radiation, we only considered the ratio between global and reflected which is much less influenced by it: for instance Xiong et al (2015) show that, for high values of albedo (like those of snow), the dependence is almost flat; for smaller values, the dependence steepens after ~60° which is basically never achieved at our site.
P05L143: Why is the sensor at the central location considered the "reference" for the other two air temperature measurements if subsequently not considered in the analysis?
"Reference" in this instance means only that it was used only to have a second set of measurements of temperature to check if the other sensors were measuring correctly. Maybe the wording is misleading, so it has been changed in the revised manuscript.
Tables 1 and 2: Provide manufacturer, device model and sensor accuracy, resolution and response time information for temperature sensor and shield in the Tables. Additionally, describe the differences between sensor/shield combinations C and D, as well as E and F; this information is absent from the text and table. Which data logging devices have been used in this study?
Since the very beginning of the experiment, it was decided that no manufacturer and model would have been made explicit in the text, for various reasons: it was not intended as a competition, and we did not want to influence the market in any way. Makers have been informed on the results of their instruments, but this is an information we do not intend to make public.
Other information, like (stated) uncertainty, and resolution will be added to the Tables. Logging devices were the standard ones provided by each manufacturer, whose uncertainty was added in the final budget. Figure 2 is not necessary and can be removed. The completed Tables 1 and 2 will be more useful.
We think Figure 2 is important, given that we are not going to explicitly state manufacturer and models of the sensors, because it immediately conveys the different sizes and shapes of the shields. Besides, it shows the laboratory characterization phase, which was an integral part of the experiment. We would therefore like to keep the picture in the revised manuscript. Negative signs only mean that the sensor that arbitrarily is going to be installed in the measurement point a measures lower values than the identical one which is going to be installed in the measurement point b. Redrawing the plot just to avoid negative values seems unnecessary to us: an explanation of the signs has been added to the caption.
Sensors are sampled at the same frequency (10 min), which is the frequency later used during the field experiment. This is better explained in the revised text.
All plots have been redrawn in colors, trying to maintain readability when printed in greyscales and for color blind persons. E and F are not shown because they were added later in the batch, so for them we relied on manufacturer's characterization. This information has been added in the revised manuscript.  Looking at Figure 5, I am surprised by the site selection; this is very heterogeneous terrain with buildings, trees, roads, streams, and complex topography nearby, all being elements that should be absent or far away according to the cited list of criteria. How do you justify the choice of this site given these constraints? The discussion section gives some explanation and justification for this choice. Nevertheless, my understanding of flat, open, homogeneous terrain is very different, and I would have selected a location with homogeneous fetch of at least a few 100m in all directions. I even fear that in wintertime sensors could experience shading from the tall trees South of the sites when the sun is low. Did the recent studies of Coppa et al. (2021) and Garcia Izquierdo et al. also consider potential effects of albedo in the hemispheric field of view of the sensors, influence of obstacles on local turbulence and sensible heat fluxes or terrain effects by topography and vegetation on longwave radiation?
We would have liked to have a flat, unobstructed, homogeneous 100-m field as well, but that's not what you find in the Italian Alps, and I suspect, in most other mountain settings as well. We had to work with what was available. The site was chosen because easily reachable, complete with electric power and other instruments, constantly managed and guarded. We judged that the influence of the cited obstacles was symmetrical on the two sites, and given that all the measurements we considered were relative, we stipulated they cancel out in the analysis.
The shading possibility by the trees was considered, but ruled out once in the field.
The cited studies did not investigate the effects mentioned by the reviewer, because it was not their scope: in these studies, vegetation was uniform (thus the albedo), topography was more forgiving. Local turbulence was investigated and found to be negligible with the sampling frequency used in both studies (10 min). The reviewer is correct in pointing this out. As a matter of fact, radiometers were not yet installed in that picture: they were installed later, together with sensors E and F which, as said, were added to the experiment at a later stage.
Unfortunately, we don't have detailed pictures that show all the three measurement points in summer configuration, with the radiometers. They can be spotted, with a little difficulty, in Figures 7 (now 6(a)) and especially 8 (now 6(b)), as the horizontal lower bar extending to the right (South). The pictures have been combined as Figure 6(a) and 6(b).

A sentence is added to the caption in
Section 3.2 lacks information on the experiment duration, time periods measured, sampling frequency.
All this information is present in section 4, where it seems more fit.
Section 2.2. raises expectations on how snow thickness and snow conditions have been measured with reference to Section 3.2, which then only states that any fresh snow has been removed, and the snow at site (a) has not been characterized. If no snow has been removed at site (a), the snow surface characteristics change over time influencing the snow albedo. How has the snow depth been measured, manually or with an automatic acoustic range finder? Not clear in the manuscript. Snow depth has been measured as always lower than ~40 cm. While it is true that snow characteristics change over time, our measurement periods after snow removal lasted only few days, to prevent snow degradation from becoming important.
It is true that snow at site was never touched, but since the analysis was always conducted after a snowfall, we assume that conditions of the snow were always with close-to-maximum albedo.
All this has been explained better in the revised version of the manuscript.
P08L232 (cf. also #12): Apart from topographic shading, shadows should actually be excluded based on the site selection criteria. And if an instrument is shaded by another sensor this should be identical for both stations. Otherwise, there are undesired perturbations from nearby objects or topography which should be removed. Also, vegetation, water level in rivers should not be a factor for a homogeneous site (see P08L240-241). It is not clear to me, how you account for these influences, e.g., the water level in the streams on both sides of the measurement field, and I have a hard time understanding the values presented in Table 4. I think this needs further explanation.
We agree with the reviewer that undesired perturbations from nearby objects or topography should not be a factor for homogeneous sites. In real conditions things always need compromises, to minimize such factors from one side and have logistical opportunities (access, power, maintenance) on the other. This and further explanations have been added to clarify the differences and uncertainty values in table 4. Figure 9: Include the sensor type and shield in the figure legend. Y-axis label: put the units in [brackets] or (parenthesis), otherwise it could be misread as "delta_t per degree C"; this applies to most figures in the manuscript.
As for the request before, we do not think it is wise to clog plots with information that can be obtained elsewhere in the manuscript.

As for the units, some journals apply this convention in the plots. There is no prescription in the submission guidelines of this Journal on the format of plot label units. However, plots have been redrawn with label format updated, following reviewer's request.
P10L292: I understand the motivation for choosing a threshold value for the difference in reflected SW radiation between the two sites for identifying the largest albedo effect. However, it would be interesting to have a quantification of this effect over the continuum of differences in reflected radiation. Could this still be considered? Otherwise, include a reference to Figure 10 where this choice is graphically justified.
An attempt to include data also below this threshold cut limit was conducted, resulting in a large amount of data resulted in terms of temperature differences below 0.1 °C, thus extending the 0 °C -0.2 °C range (first bar of graph in Fig. 10  -now Fig. 9). The resulting plot would decrease its graphical information in the higher and more important difference values, which result "compressed" thus less detailed. Moreover, below such threshold it was impossible to discriminate among the different kind of sensors and shields. This has been added to text, and more is added to the figure caption. The reflected radiation over snow should always be larger than that over bare ground?! How do you explain the negative differences sometimes exceeding 100 W/m2? Data shown in the plots are the whole 10-min dataset, not filtered. This info has been better explained in the caption.
The plots are not actually bar plots but line plots: this explains the diagonal lines that, depending on the pixel resolution of the image, sometimes produce this "moiré" effect. However, the plots have been re-drawn to reduce this effect.
Regarding color-coding essential information in the plots, we would like to reduce that to a minimum: as a color blind person, I have a lot of difficulties. Besides, putting together a) and b) plots has already been tried, with little success in terms of readability.
Negative values in panel c) are very few with regard to the whole dataset (~600 out of ~ 28000), and most of them are very small in magnitude: these are due to errors in the measurement of radiation being larger than the values themselves, just like shown in figure 8. Larger negative values are mostly confined to days around 14 November, even before the first snowfall event. We cannot be sure about what happened, but it was not related to snow. This explanation has been added to the caption as well.
Section 4.2 does not explicitly discuss the difference of passive vs. actively ventilated radiation shields which is a crucial point that definitely should be included in the discussion. This section also gives the misleading impression that the differences depend on the sensor type while the main cause is rather the sensor shield absorbing (incident and reflected) shortwave radiation. This point should be clarified.
By "sensor type" we always mean "combination of sensors and shield", as the "types" in Table 1 clearly state. This has been added in the text to make it clearer.
As we only had 1 type of actively aspirated shields, it did not seem fair to draw general conclusions. Besides, it has never been the scope of the manuscript: this work was performed in order to test the method proposed in Musacchio et al (2019).
Given that this issue is addressed again by the reviewer in comment 25, it seems fit to discuss it there. Figure 12: Add the sensor type and radiation shield information to the panel letter to facilitate reference.
We don't want to repeat an information that's already present in the manuscript, with the risk of clogging the plot. Please, refer to Table 1 for info. Figure 13: Use color for better readability. I would also try to identify clusters to characterize different behavior of the available sensor/shield combinations.
As for the previous comment on colors, we would like to keep them in plots to a minimum.
However, in order to improve the readability of the plot, it has been divided into 6 panels, one for each pair of sensors. Figure 14 and Section 4.3: I am surprised that any possible difference between ventilated and passive shields is not at all discussed (apart from one sentence in Section 5). Figure 14 suggests that an aspirated sensor is not necessarily performing better than a non-ventilated, cf. A vs. D, respectively. From Figures 13 and 14 it can be concluded that the magnitude of the albedo effect is a combination of the influence of reflected shortwave radiation and wind speed; radiation alone does not explain the temperature differences (see Figure 14) and wind speed seems to dominate radiation effects. Sections 4.2 and 4.3 would largely benefit from such kind of a discussion. Additionally, a discussion on how the albedo-induced error in air temperature measurements compares to the magnitude of errors due to incident solar radiation heating of the shield and sensor may be another interesting point. Perhaps even identifying a ratio/relation between the two. And finally, what is the advantage of a helical shield in comparison to a standard multi-disk shield?
As mentioned in the answer to comment 22, having only 1 type of actively ventilated shields prevents us to draw general conclusions on the issue actively vs. passively ventilated shields.
We agree with the reviewer's remark about active shields not necessarily being better than passive ones. However, we must keep in mind that, by comparing A and D systems, we are not only talking about shields but also about sensors. As a matter of fact, sensor A is a Pt100 while sensor D is a thermo-hygrometer: a conclusion about performance of the shield alone does not seem definitive. Helical shields seem to maximize the air intake effectively cooling down the sensor inside better, with respect to other kinds of shields, but this is something to be investigated perhaps with a theoretical study. This has been added to the revised manuscript as well.

In other similar experiments on obstacle effects on near-surface temperature measurements, it often emerges the fact that wind dominates radiation in terms
Section 4.4: Certainly an interesting point, but out of the scope of this study (as you already mention) since it is not related to albedo and solar radiation but rather to longwave radiation. You risk opening a can of worms here and I would remove this section completely (as well as the related lines 359-361).

Paragraph and related sentences are removed.
Have the instruments been monitored with an automatic camera to see occurrences of precipitation deposition on the sensors (especially the radiometers) or effects of riming? This way, spurious data can be flagged and removed.
No automatic camera was set up, because the costs would not be proportionate to the benefit. As a matter of fact, only periods after a snowfall (and subsequent snow clearing by us) where used for the analysis: in these cases, snow was always cleared from the sensors by hand.
A major point that really surprised me is that the albedo has not been plotted, analyzed, and exploited. Given that incoming and reflected shortwave measurements are available at both sites, I would have expected a plot and a detailed analysis of the albedo for the entire period. This is actually what the paper title suggests. Also, it is a pity that surface temperature has not been measured at the two sites which I consider would have been very useful for the analysis. IR surface temperature is an interesting indicator linked to longwave emission and possible sensible heat flux contributions. I strongly suggest including these elements, at least the albedo if IRT measurements are not available.
A full and detailed exploration of the relationship between the albedo and the error induced on the temperature sensors is out of the scope of the work; however, we agree with the reviewer that a plot and an analysis of the albedo can be of interest for the reader. Figure 8 has been added to the manuscript, showing the evolution of albedo measured in both sites.
As for the surface temperature, it is certainly something that we'll keep in mind should we repeat the experiment in the future.
The "recommendation" at the end of the manuscript suggests the generation of a correction for the albedo influence on near-surface air temperature measurements, but the paper does not propose such a correction for the study performed. Such a correction, maybe even shield-specific, would be of interest to the measurement community.
The reason why such correction was not calculated is already stated in the conclusion: it would need much more data and time, in order to cover all possible meteorological and geometrical combinations. Within the scope of our experiment, uncertainties would have been simply too high, with the risk of doing more harm than good.
This study quantifies the albedo effect on air temperature measurements which is the objective of the paper. This effect has been quantified to be as large as 3.8°C. For this value and as an example, the given conditions could be added, e.g., the corresponding incident and reflected SW radiation as well as the wind speed. It should be mentioned that these albedo-induced errors do not include radiative errors due to heating of the sensor shield from incident solar radiation; that error has to be added to determine a complete shortwave radiation-induced error on air temperature measurements. Since no reference air temperature independent from radiation errors is available in this study, the total error due to heating of the sensor by solar radiation is not quantified which should be mentioned explicitly (one again in the conclusions).
As a matter of fact, some of the information on the conditions that generated the largest albedo effect is already available in Figure 12. However, it has been put in more evidence in the revised text.
The valid considerations raised by the reviewer have been added as well.
Almost throughout the manuscript, why do you call your results, analysis, tests etc. "preliminary"? e.g., L 134, 149, 154, 234, 273, 275, 304. We are trying to repeat the experiment with other instruments and, hopefully, a longer time baseline which will give us more robust results. Since this is not mentioned in the manuscript, however, we remove all instances of the word "preliminary".
All the following minor comments have been agreed to and addressed in the text, except when explicitly stated, with due explanation.
We cannot be precise at this stage, since we are talking in general, not about our site. If someone wants to repeat the experiment and has a lot of space to use, it is useless for us to say "5 m" for instance. We replaced the vague statement with "at least".