Comment on essd-2021-313

It’s a great pleasure for me to participate in this open discussion. This paper proposed clearly significant work for improving the practicality of the current MODIS LST product. As my phD dissertation is also related to cloudy-sky LST estimation and LST data applications, this work highly attracts me and I believe if the released dataset achieves satisfactory accuracy and data quality, it will definitely help the communities in monitoring surface thermal dynamics, .etc.

This study assessed the results by hypothetically removing some clear-sky days and filling them, and then the filled results were compared with the official MODIS LST. However, not like the shortwave variables (surface reflectance), LST values under clouds are affected by the cloud coverage (cooling/cooling effect at daytime/nighttime), thus clear-sky samples are not representative for the cloudy-sky cases.
Besides, the hypothetically removed clear-sky pixels usually have enough clear-sky adjacent pixels for interpolation, and they can easily get good interpolation accuracy than actual cloudy cases, which might be not fair to only use this assessment method to demonstrate the accuracy in cloudy-sky. Moreover, this assessment method cannot involve the filled LST samples where official MODIS has bad data quality (partially cloudcovered or large view zenith angle), while they are also important parts of the contributions of this work. Therefore, I am really curious about the accuracy validated by ground measurements that can comprehensively assess both clear-sky and cloudy-sky samples, independently. This work has mentioned that site measured record is not comparable to remote sensing retrieved LST due to spatial scale mismatch or broadband emissivity (BBE) uncertainty. In fact, those issues have been discussed in previous studies. Li et al. (2021) have proved that SURFRAD sites have little heterogeneity issue for validating 1-km LST product by analyzing Landsat LST, and BBE would not introduce noticeable errors for the validation (Xing et al., 2021). And that's why most related studies in this topic used ground measurement for assessment , Xu and Cheng, 2021, Zeng et al., 2018.
The proposed work followed Li et al. (2018) and didn't consider the site validation, but I think this is mainly because Li et al. (2018) only focused on US urban areas that there is no available ground LST or upward longwave radiation measurements, besides, the heterogeneity issue in urban areas is substantial and cannot be ignored. As this study generated the global gap-free LST, it would be a great opportunity to assess the product by using high-quality site measurements, such as SURFRAD and BSRN, globally.
It is reasonable that the site-validated RMSE is not comparable to the cross-validation results, but by separately validating clear-sky and cloudy-sky results using ground site measurement and comparing the RMSEs in clear days and cloudy days, this study can demonstrate the accuracy stability and real interpolation accuracy of the proposed product, which I believe will be what users care about.
2) I'm still a little bit confused about the methodology. Would you please use plain language to explain how the interpolation methodology captures the day-to-day variation signals where there are cloud covers with considerable spatiotemporal scales, such as raining seasons (continuity could be weeks) in southern China or tropical areas without introducing passive microwave or modeling data?
3) Interpolation-based method usually has a trade-off between calculation efficiency and accuracy because essentially they used statistical relationships with referred pixels, the more referred pixels they can obtain, the higher accuracy they will achieve, but more time will be consumed. This category of the methodology has been well developed and would you please strengthen the breakthrough of this proposed work, and what are its differences from previous interpolation methods, and how could the efficiency be improved while high accuracy is maintained? 4) I also have a question about neighboring pixels in a spaital window: it looks like land cover type and elevation differences were not considered in the method (maybe I missed). Land cover types highly impact the LST values even they have close spatial distance. For example, forest LST is the thermal signal from the canopy, but neighboring LST of open grassland could be from the ground (even soil surface), which are significantly different. That's why Zhang et al. (2021) build models by land cover types in each window. Jin and Treadon (2003) also build typical diurnal temperature cycles for each land cover type. May I ask if you have included such consideration? If not, could you please have some discussion to clarify that those differences won't significantly affect the interpolation accuracy?
Other Comments: I also have some minor comments that might be helpful for the paper publication. 1) I found some typos and grammar errors in the manuscript so would you please revise them? These are not all of them so I would suggest that English editing would improve the writing quality.
Line 11: parameter -> parameters Line 35: measured is not accurate, could be retrieved Line 40: ASTER -> Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) Line 47: "a" is missing before "larger number" Line 50: "a" is missing before "daily" Line 57: "the" is missing before conterminous US.
Line 65: relationships Line 71: "a" is missing before hybrid method Line 83: "the" is missing before "methods mentioned above"  2) in the supplementary materials, would you please explain why this method refers to center pixels from neighboring blocks for interpolation (could be 10 km far from the target pixel) rather than using neighboring pixels. S1 is easily understood but could you have a clarification on why the "S2 Implementation of the ICW method" is designed like this? Thanks!