A long-term daily gridded snow depth dataset for the Northern Hemisphere from 1980 to 2019 based on machine learning

Hu, Yanxing; Che, Tao; Dai, Liyun; Zhu, Yu; Xiao, Lin; Deng, Jie; Li, Xin

doi:10.5194/essd-2022-63

Preprints

https://doi.org/10.5194/essd-2022-63

Preprints

28 Mar 2022

| 28 Mar 2022

Status: this preprint has been withdrawn by the authors.

A long-term daily gridded snow depth dataset for the Northern Hemisphere from 1980 to 2019 based on machine learning

Yanxing Hu, Tao Che, Liyun Dai, Yu Zhu, Lin Xiao, Jie Deng, and Xin Li

Abstract. A high-quality snow depth product is very import for cryospheric science and its related disciplines. Current long time-series snow depth products covering the Northern Hemisphere can be divided into two categories: remote sensing snow depth product and reanalysis snow depth products. However, existing gridded snow depth products have some shortcomings. Remote sensing-derived snow depth products are temporally and spatially discontinuous and tend to underestimate snow depth, while reanalysis snow depth products have coarse spatial resolutions and great uncertainties. To overcome these problems, in our previous work we proposed a novel data fusion framework based on Random Forest Regression of snow products from Advanced Microwave Scanning Radiometer for the Earth Observing System (AMSR-E), Advanced Microwave Scanning Radiometer 2 (AMSR-2), Global Snow Monitoring for Climate Research (GlobSnow), the Northern Hemisphere Snow Depth (NHSD), ERA-Interim, and Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2), incorporated geolocation (latitude and longitude), and topographic data (elevation), which were used as input independent variables. More than 30,000 ground observation sites were used as the dependent variable to train and validate the model in different time period. This fusion framework resulted in a long time series of continuous daily snow depth product over the Northern Hemisphere with a spatial resolution of 0.25°. Here we compared the fused snow depth and the original gridded snow depth products with 13,272 observation sites, showing an improved precision of our product. The evaluation indexes of the fused (best original) dataset yielded a coefficient of determination R² of 0.81 (0.23), Root Mean Squared Error (RMSE) of 7.69 (15.86) cm, and Mean Absolute Error (MAE) of 2.74 (6.14) cm. Most of the bias (88.31 %) between the fused snow depth and in situ observations was distributed from -5 cm to 5 cm depths. The accuracy assessment of independent snow observation sites – Sodankylä (SOD), Old Aspen (OAS), Old Black Spruce (OBS), and Old Jack Pine (OJP) – showed that the fused snow depth dataset had high precision under snow depths of less than 100 cm with a relatively homogeneous surrounding environment. In the altitude range of 100 m to 2000 m, the fused snow depth had a higher precision, with R² varying from 0.73 to 0.86. The fused snow depth had consistent trends based on the spatiotemporal analysis and Mann-Kendall trend test method. This fused snow depth product provides the basis for understanding the temporal and spatial characteristics of snow cover and their relation to climate change, hydrological and water cycle, water resource management, ecological environment and snow disaster and hazard prevention. The new fused snow depth dataset is freely available from the National Plateau Data Center (TPDC) and can be downloaded at https://dx.doi.org/10.11888/Snow.tpdc.271701 (Che et al., 2021). This snow depth also can be downloaded at https://zenodo.org/record/6336866#.Yjs0CMjjwzY.

This preprint has been withdrawn.

Received: 15 Feb 2022 – Discussion started: 28 Mar 2022

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 3034 KB)

Withdrawal notice
This preprint has been withdrawn.
Preprint (3034 KB)

Download & links

This preprint has been withdrawn.

Yanxing Hu, Tao Che, Liyun Dai, Yu Zhu, Lin Xiao, Jie Deng, and Xin Li

Interactive discussion

Status: closed

RC1: 'Comment on essd-2022-63', Baptiste Vandecrux, 03 May 2022

Review of “A long-term daily gridded snow depth dataset for the Northern Hemisphere from 1980 to 2019 based on machine learning” by Hu et al.

Baptiste Vandecrux (bav@geus.dk)

General comment:

This article uses multiple gridded snow depth products and combine them with a random forest (RF) algorithm trained on more than 30 000 in situ observations of snow depth. The study covers an interesting topic as snow depth is critical for many climatic and ecological processes. The use of machine learning (ML) algorithms is also interesting for the combination of data from multiple sources. The study has therefore a clear potential as the dataset could benefit to many other research groups. However, I have major concerns on the following topic:

- The novelty of the work: It seems that the same authors have presented a similar study in Hu et al. (2021). It is unclear what are the differences between this previous study and the current work. I am unsure that adding one or two inputs to the same framework justifies a new publication. However, I also identify several flaws in the methods (see comments below) that, if fixed, could justify that the new product has improved enough since Hu et al. (2021).

- There is a confusion between snow depth and snow water equivalent (SWE) in the studies cited in the introductions. The two quantities are not interchangeable, and the authors should justify, with the appropriate studies, why snow depth is a variable relevant to monitor and difficult to observe on large scale. Work on SWE can naturally be reviewed as a related, but distinct, field of research.

- The choice and training of the RF algorithm: The RF algorithms are known to be very good at (over)fitting the training data and to perform poorly outside of their training set. There is currently nothing in the method that ensure that the training set is representative of the conditions in which the RF algorithm is being used. It is now standard procedure to de-cluster the training data to make sure that the training data is not dominated by one specific type of sample.

- Overfitting: Nothing is said about how the hyper-parameters of the RF algorithm are being set and about any measures to prevent over-fitting. For example, neural networks can be trained with noise added to the training data to prevent overfitting. Maybe something similar exists for RF regressions. Additionally, the evaluation of the of the fused dataset using randomly selected samples cannot evaluate properly the output of overfitted algorithm because the random subset will have the same distribution, and therefore the same structure (clustering) as the rest of the training set. Only a carefully set-up spatial cross-validation can make sur that a ML algorithm works appropriately in all the different areas where it is eventually applied.

- The trend analysis has some major methodological flaws that will need to be addressed.

Considering these issues, I recommend a major revision of the paper, unless the editor considers that the necessary reshaping would deserve a new submission.

Specific comments:

l.48 “mass” of what? Please split the sentence in two.

l.50-53: The second half of this paragraph states that "knowledge on snow depth and its trends are lacking", that there are "limited surface observations" and that remote sensing methods are "inadequate". This is not exactly the current state of research in snow depth mapping because this same study builds on numerous gridded snow depth products and thousands of in situ observations. Please re-frame this paragraph and acknowledge properly the previous work.

l. 61-63: Please give references for each of these products.

l.85: “conventional” do you mean “convolutional”?

l. 79-90: Consider discussing this additional reference:

Shao, D., Li, H., Wang, J., Hao, X., Che, T., and Ji, W.: Reconstruction of a daily gridded snow water equivalent product for the land region above 45° N based on a ridge regression machine learning approach, Earth Syst. Sci. Data, 14, 795–809, https://doi.org/10.5194/essd-14-795-2022, 2022.

l.90: Hu et al. (2021). This study seems very similar to what is presented here. Please describe the study in further detail and make explicit how the presented study builds on top the previous one. What were the limitations of the previous study and what are the novelty in this new one?

l.94: Mudryk et al. (2015) compared snow water equivalent (SWE), not snow depth.

l.93: “more than 50%” in SWE, not snow depth

l.95: Mortimer et al. (2020) evaluate SWE products, not snow depth.

l.97: “Globsnow snow depth”, it was the SWE that was evaluated there

l.98: “Previous assessments” which studies do you refer to?

l.107: Snauffer et al. (2018) should be added and discussed in line 79-90 where different ML algorithms are being used.

l.107-115 are partly redundant with l.79-90. They should be moved there and merged into a paragraph dedicated to ML algorithms used for snow depth retrieval.

l.115-117: Are the methods, and therefore produced data, the same as in Hu et al. (2021)?

If yes, then I think it raises the issue of the novelty of the study.

If not, then a paragraph in the intro should be dedicated to the limitations of Hu et al. (2021) and how the present study builds further and presents an improved product compared to Hu et al. (2021).

l.144-145 “In these two…” This sentence is unclear. Is the snow depth always set to 5 cm when being detected? How are deeper snowpack considered?

l.147-148: “The accuracy…” Give reference

l.166: remove “,” between “study” and “attempted”

l. 166-167: “Venäläinen et al., (2021)” This reference was not discussed in the intro when introducing the ML algorithms in snow depth retrieval.

l.181: Please give a reference for this dataset.

l.185: Please give a reference for this dataset.

l.190: Please give a reference for this dataset.

l.196: Please give a reference for this dataset.

l.200-205: These data are very important as they are the most objective way to evaluate your fused dataset. Please show on a map that they are located across a wide range of geographical locations and cover different land category for which you fit different RF models.

l.219-223: This is an insufficient level of detail for the core of your method. The documentation should be sufficient to reproduce your product. The fitting procedure and the hyperparameter selection should also be detailed to show how you avoid overfitting and to make the RF able to predict outside of its training set.

ML algorithms are very sensitive to the training data and to any imbalance therein. The training data should be de-clustered: it should be made sure that the observed snow depth covers the whole spectrum of retrieved snow depth and are located in all the elevations and all the land categories that are used as input to the algorithm. The de-clustering could be done by assigning weights to observations and or by duplicating observations from under-represented subsets.

Since your objective is to use the fused dataset for spatio-temporal analysis, a spatial or temporal cross validation should be conducted to investigate the robustness of your algorithm. This could be done by iteratively removing different regions or different years from the training set and using these removed samples for evaluation. Of course the final product should use as many samples as possible, but the evaluation of the RF algorithm is currently insufficient to build trust in its output.

L.269-275: This should be moved in paragraph 3.1. Or even in the description of the input snow depth dataset further up. Please quantify these data gaps.

L. 271 and 272: Replace “missing” by “gaps”

l. 278: “was properly…” replace with “projection was set to”

l.278: “spatio” replace with “spatial”

Section 4.1: The first paragraph about temporal availability and the last paragraph about file format could be moved as a new subsection 3.3 as it is not properly a result. it is about data availability and format.

l.289-291: In the training set of snow depth observations, many samples are redundant (f.e. daily snow courses will have similar values from one day to the next). Consequently, randomly extracting samples from the observation dataset will leave just as much information in the training set. The RF algorithm will then be very good at (over)fitting the training set and producing outstanding results on the test set. For a fair evaluation of all products, some observations should be left out from the RF training. Preferably this left out data should be representative of various geographical and natural settings to evaluate the product in different conditions. I thought that the data presented in lines 200-205 would serve that purpose?

l. 293: “..in situ observations.” Add a reference to Figure 1.

Figure 1: Are these statistics applying to the same samples? Can you give their number? I understood that the original snow depth products have different spatial coverage and are sometimes missing data. Are these evaluation samples have data available for all products?

l.299-306 and Figure 2: I suggest that you present a mosaic of scatter plots with the original snow depth products it will illustrate your statement line 302-306. Please also be quantitative. What is "not very accurate" l. 305?

Figure 2: Is there any point above 250cm? You could narrow the axis' limits.

Figure 3: I am surprised by the little amount of observations in the Himalayas. Isn't there any snow depth measurements available there?

Table 2: Please present the number for transparency. NaN means "not a number". A bad number is still a number.

l. 338-339: “Compared with the original…” Please present a mosaic of scatter plots at the 7 sites and for the 6 products involved. This will illustrate properly this statement.

Figure 4: Please make these plots fit on one page.

Section 4.4: Are you again comparing the training set? If yes, then it should be moved just after the section 4.2. Please refer to Figure 5 early in this paragraph and please guide the reader to which panel each statement is related.

l. 371:. “BIAS” is a word, not an acronym. It should be lower case in all the manuscript.

Figure 5: Please add unit for bias. make bias lowercase and resize so that all panels fit on one page. The last three panels should have the same bin size as the others.

Table 3: Please provide and discuss the mean error.

Section 4.6: Consider having section 4 only for the evaluation of the dataset and a section 5 for the spatio-temporal analysis.

l. 399: “North America and Eurasia” It is unclear what is included in these two domains. Are the northern part of Africa and south America included? If they are, then the domains should be renamed in something more neutral (A & B, or west & east). Please be aware that significant snowpacks can be present in the northern Andes and in the Atlas Mountains.

I recommend making trend analysis in narrower regions (North/south America, West/east Europe, Asia with or without Himalayas...). These regions should be illustrated in Fig 6. Please present first the spatial pattern of average snow depth (without any trend) and then the trend analysis to avoid confusion.

l. 400. “There was an overall trend decrease followed by a slowly increase” From when to when? By how much? With what level of significance? Please refer to Fig 6.

Section 4.6: The first paragraph is about trends, then the following two paragraphs are about spatial distribution, then comes section 4.7 that presents trends again. Please present the spatial distribution of average snow depth before analyzing the temporal trends.

Figure 6: I am surprised that the Himalayas are not being highlighted as a deep snow area. Rearrange Fig 6 and 7 so that Fig 6 has all the maps of snow depth and Fig 7 has all the trend analysis for different seasons and for different regions.

l. 417 What is “roughly similar”? Please be specific and quantitative.

l. 420 “… significantly lower than that in winter and spring.” what was the average in winter and spring then?

Figure 7: Have you investigated why spring 1984 had snow depth 30% higher than average?

Heading of section 4.7: Comparison to what?

Section 4.7: There is a confusion between the analysis of a snow depth change rate, which refers to the fitting of a linear model and the Mann-Kendall test, that is a statistical test that only tests whether the trend is monotonic or not. The Mann-Kendall test, in its original form, does not give the magnitude of the trend. It only tells if a trend is significantly positive or negative. A linear regression and the discussion of whether the fitted slopes are statistically different from zero would be here more suited. The results of this trend analysis should be discussed in more details. Are these results reasonable? Do they match with other studies?

l. 431: I don't understand this test value. Does it apply to the hemisphere-average snow depth trend?

l.444: I don't see how this sentence is related to the rest of the paragraph.

l. 444-446: Please move to method.

l. 447: Yes, but how do you deal with it? It is not clear. Do you extrapolate or fill with a certain value the product south of 35degN? Do you only have RF models without GlobSnow south of 35degN? This should be explained clearly in the methods.

l. 447: ”In this study…” Can you elaborate? How can it be fixed in the future?

l. 451-452: “more snow survey” More data is good, better data is even better. What data would you need to make you fusion even better? Are there certain geographical areas or land type, elevation or latitude that has insufficient in situ observations? Please elaborate and please be specific and quantitative when possible.

l. 453-458.”black-box models” This is only partially true, and you raise an interesting point: how to understand and interpret the output of the ML algorithm. Tests such as the permutation feature importance (Breiman, 2001) or Shapely value (https://github.com/slundberg/shap, Strumbelj and Kononenko, 2014) would represent a valuable addition to the paper to explain which of the input snow depth data is the most important in different regions or periods.

Breiman, Leo.“Random Forests.” Machine Learning 45 (1). Springer: 5-32 (2001).

Shapley sampling values: Strumbelj, Erik, and Igor Kononenko. "Explaining prediction models and individual predictions with feature contributions." Knowledge and information systems 41.3 (2014): 647-665.

l.458: “In future study…” This sentence is not clear. Consider removing.

Section 5.2: This paragraph is not clear to me. Please provide a table that summarizes the different tests being considered and metrics that allow the comparison of the results of these different tests. It looks very similar to a down-sized spatial and/or temporal cross-validation. As mentioned earlier, this should be a key of how the algorithm is evaluated and therefore presented in greater details.

Section 5: Consider removing the Discussion section and renaming Section 4 as “Results and discussion”. The content of section 5 can be merged with existing paragraphs or if not possible, remain as seperate subsections.

l.470-471: “It the future…” That is very true, that is why the fitting procedure should be subject of extra care to avoid overfitting and to allow the RF algorithm to perform decently outside of its training set.

l.476 “Regarding the limitations…” This sentence should be moved in the previous subsection as it deals with limitations.

Citation: https://doi.org/10.5194/essd-2022-63-RC1
RC2:
'Comment on essd-2022-63', Anonymous Referee #2, 24 Jun 2022
Review for Hu et al. (2022)

Summary

The authors have produced a daily gridded snow-depth dataset for the northern hemisphere for the period 1980 to 2019 using machine learning, specifically using a random forest method. The dataset incorporates remote sensing data from multiple products derived from various sensors, reanalysis data, and in situ measurements. Different combinations of datasets are chosen for different periods based on data availability. Additional datasets such as land surface type and topographic information are incorporated into the scheme as additional input variables to improve the estimated depth values. The authors indicate that the scheme improves snow depth estimates substantially relative to the best available current products using various metrics including the coefficient of determination, root mean squared error, and mean absolute error. The fused dataset is less accurate at high elevations, and would benefit in terms of accuracy from further validation and input datasets.

General comments

The dataset appears to be well thought out and such a dataset is indeed important and potentially useful for many applications from climate studies to water resource management. However I feel the manuscript requires major changes with regard to presentation as well as further evidence and descriptions to support the authors’ claims. I do not feel it is suitable for final publication in the present form. In general:

The methods used to produce the dataset are not adequately discussed. It appears that the authors have discussed some of the methods and their reasons for choices e.g. which machine learning algorithm to use, in previous studies, but these decisions and choices should also be summarized in this description paper for the benefit of users of this dataset.

I am concerned that the in situ locations used for validation are the same locations as the training data, and therefore the method may artificially appear to be successful. I believe the authors mention performing some sensitivity experiments at the end of the paper but I think a section should be devoted to exploring some of the choices made and their impact on the dataset. Additionally is any uncertainty information for the training in situ datasets incorporated into the analysis? How might this affect the results?

It would be greatly beneficial to users of the dataset to have an estimate of uncertainty for all points and times in the final dataset, or at least to provide some flags for low/high/medium quality data, derived from the data used here. This should be provided if possible.

The discussion section is very brief and now more or less summarizes the paper. The purpose of the discussion section should be: e.g. discussing limitations of the methods and data used. I believe the discussion section needs more elaboration on this.

I’m missing important information on the timing of the snow depth values in the different snow depth products and in situ observations. Only for the meteorological station observations from China you mention the measurement time of 8am, but not for the other products and observations. This potential mismatch in timing may induce a larger bias than is actually the case. Please elaborate on these timings or, if the timings are not available, elaborate on the potential effect of different timings between each product/observation.

The writing in the manuscript needs considerable improvement. Spelling and grammar should be reviewed by a fluent English speaker. Sometimes paragraphs are too short and should be combined. In particular I suggest using present rather than past tense in most cases which would improve readability. The text is sometimes redundant and should be revised to avoid repetition of information.

Please add citations for the remote sensing and in situ datasets in the data description section.

Specific comments

L16: The abstract is a bit long and could be shortened somewhat.

L16: Please briefly elaborate on this statement. Why is it important for these disciplines?

L25-26: I suggest starting a new sentence here, e.g. “and topographic data. Here we incorporated these datasets as independent input variables to a random forest regressor to generate a gridded northern hemisphere snow depth dataset for the 1980 to 2019 period.”

L32: Change “was distributed…” to “was in the range of -5 to 5 cm.”

L39-42: Does this sentence belong here or should it be in the data availability section?

L54: What are manual observations? Do you mean in situ observations?

L58: Can you quantify the “highest elevations”?

L68: Please briefly explain what it means when a snow depth data set saturates.

L69-70: Please elaborate on the structural limitations.

L74: Suggest changing to read “Additionally, some reanalysis datasets…” unless you mean that some reanalysis datasets overestimate snow depth at high latitudes.

L105: Please elaborate on “combined and integrated improvements”.

L107-108: Why is this a new approach? It is more precise compared to what? What is the auxiliary information here?

L109-110: What does it mean that the fusion is improved? Do you mean that errors were reduced?

L110-111: Please note which “machine learning methods” were used.

L112: Please quantify the increased estimation accuracy if an estimate was provided.

L114: How do you know that the positive aspects of each product are incorporated into the fused data set?

L115-116: Please describe what the candidate independent variables are and why they are candidates.

L117: Please elaborate on the different bins.

L118: The data fusion framework has already been proposed in Hu et al. 2021. Please clarify that this paper isn’t proposing the framework but is presenting the dataset and validation by comparing it with observational snow depth data.

L123: Please elaborate on what validation works are.

L143-144: Can you elaborate on the calibration?

L144-145: From this sentence it seems that the snow depth can only have a depth of 5 cm. Is that what you mean to say? Is this a minimum snow depth value?

L145-146: Please elaborate briefly on the spatio-temporal interpolation.

L153-154: Is this less than a 30% deviation with respect to observations? Please clarify.

L155-157: Please provide a citation for this statement.

L163: In L67 you say this product excludes the area above 35N. Please check and adjust accordingly.

L164-165: Precision and accuracy are two different concepts. Do you mean you use both of them?

L166-167: If the attempt was successful, please elaborate. If not, this sentence is not necessary.

L172: Please elaborate on how you get daily data from the 6-hourly data. Do you take the mean?

L174: Do you mean in the process of “making” the MERRA-2 data set? If so, please rewrite to make that clear.

L175: Please elaborate on improving the quality of the data.

L178: Is this nearest neighbor interpolation? Please clarify.

L181-182: As noted in the general comments, please provide citations for these datasets.

L182: What is the spatial distribution of the GHCN data set? Is it distributed sufficiently across the NH to be able to draw conclusions about regions outside of China and Russia?

L187-188: Please explain the meaning of “per five days data.”

L189: Please elaborate on the rigorous data standards.

L191-192: What is a quality checked field? Please elaborate.

L192: What was the method of removing the anomalous snow depth fields? Is this the quality checking procedure?

L197-198: Please elaborate on the inter-annual consistency and climatological outlier check.

L198: The amount of station sites used in the two Russian data sets are missing.

L200: Are the seven data sets mentioned here different from the four data sets described in section 2.2? If not, please elaborate.

L200: How/why did you choose these data sets? Could sites not used in the training also be chosen for this purpose?

L202-203: It might be good to mention the specific years that are covered by the other data sets as well.

L203-204: What is meant by “snow depth retrieved model” are these simulations of snow in earth system models? Please clarify.

L207: What else does the auxiliary data include?

L211-212: Can you justify this assumption? How might this impact the results?

L212-213: Please elaborate on what you mean with “snow depth data as a whole”.

L215: Which dataset is being referred to here, GMTED2010 and/or GTOPO30? Please elaborate.

L220-221: I believe RFR is the abbreviation for random forest fusion framework. Please clarify. Although this is discussed in another study, it would be helpful to include details as to why the RFR method showed the best performance. Also a brief description of each of these methods should be provided.

L232-233: Please elaborate on what is meant by “different models were established”, also what is meant by “15 models can be employed to train and verify the model.”

L236-237: Additional details are needed here. Why is the random forest model the best?

L244: Please elaborate on the ““leave-one-year-out” cross-validation”.

L259-260: This is confusing. Suggest revising to read: “As noted above the fused dataset provides continuous daily data from 1980 to 2019, with several gaps.” Then the gaps can be mentioned. It is not clear whether the gaps occur every year or whether only certain years have gaps.

L264-265: Please elaborate on this.

L266-267: Briefly explain why these areas are excluded.

L271: Please elaborate on why the NHSD and GlobSnow inevitably have a large amount of data missing.

L272: Why do data gaps in 2 of the 7 snow depth products lead to data gaps in the fused data set? Shouldn’t the other datasets be able to fill the gap? Please elaborate.

L275-276: Do the data gaps arise because of the striping you mention in L274-275? Please elaborate and clarify.

L276-277: This seems an important limitation of the machine learning fused framework. Please elaborate on why this happens and what it means for your results.

L299-300: The in situ observations are at the point-scale, while the fused data set is at 0.25 deg resolution. Can you comment on errors introduced from this comparison?

L302-306: Where do you get these conclusions from? If from figure A2, please refer to that figure.

L304: What does “its overestimation and underestimation were obvious” mean? Please elaborate.

L305-306: What does “and there were many points of underestimating and overestimating disorderly distribution” mean? Please rewrite.

L309: Suggest changing “BIAS” to “bias” throughout. The statement here is unclear. Suggest revising to: “The fused data bias fell mostly between -5 and +5 cm, with 88.31% of the bias falling within that range.”

L317-318: What does “percentage of each interval” mean? The percentage of the total amount of data?

L320-323: In section 4.2 you say you use 90% of the in situ observations for model training while you retain the other 10% for model verification. I believe the 10% of measurements are taken from the same locations as the other 90% while these are separate locations. But this is unclear. How do the locations here relate to the other in situ locations mentioned earlier? Would it be possible to also exclude some of those locations to improve the analysis?

L324: Please briefly elaborate on which regions you mean.

L325: I suggest extending the analysis shown in Table 1 to also be performed on the other snow depth datasets. This will reveal the success of the various methods assessed against the independent in situ measurements. As it stands the analysis only describes the strengths and weaknesses of the fused dataset without showing its performance against other datasets.

L329: It would be good to mention the countries or regions these sites are in.

L329-330: Not necessary to explain abbreviations of R2, RMSE, and MAE, you already did this.

L330: Does this mean it is impossible to calculate the R2? I'm not sure why a large error would impede you from calculating the R2

L330: Suggest changing the column descriptions in Table 2 from “RMSE / cm” to ‘RMSE [cm]”. Same for the other column descriptions. It now reads as “RMSE per cm”.

L332-333: “... their accuracies were still relatively high compared to those of other gridded snow depth products”. Please elaborate on which snow products you mean.

L333-334: Not sure what this means, please rewrite. I also do not see any inflection points, which is where the direction of the curvature changes. The curves in Figure 4 are all in the same direction.

L338-339: Here it appears that a comparison is made with the performance of the original gridded datasets at this site. However, the data is not provided. As noted above it would be best to also include that analysis, perhaps as a set of tables in the appendix.

L340: Please elaborate briefly on what a relative low elevation is.

L340: Please elaborate briefly on the “better performance”. The performance is better than what?

L341: Please describe what is in the file, rather than the file type.

L344: Please describe why this site is better suited for measuring precipitation.

L346: In L342 you say that SBBSA is located in a basin. Is the basin above 3700?

L348-349: Are two sites sufficient to characterize an entire basin? How large is the basin? Please elaborate briefly. And what does 'the large area snow depth' mean? The area of the 0.25 deg pixel or of a larger regional scale?

L350: Please elaborate on what you mean with “but this site has a higher altitude”.

L352-354: How do you get to these conclusions? Please elaborate briefly. I believe that the authors are discussing changes in snow depth with elevation. There is a rapid change in depth with elevation that cannot be captured in the fused dataset. This is consistent with a larger bias for the highest snow depths. Please clarify.

L369-381: This paragraph needs rewriting. Please clarify the meaning of “relative frequency of BIAS”, “slightly overestimated trend”, and “distribution charts of relative frequency”.

L390-391: Accuracy cannot have poor precision. Data can have accuracy and precision. Please rewrite.

L391: Use either elevation or altitude consistently throughout the manuscript.

L392: Consistency is probably not what you mean here. Please rewrite.

L393: What do you mean with “both snow depth and error of the fused dataset were greater”? Please clarify.

L394-395: Move this sentence to earlier in the paragraph when you talk about these elevation ranges.

L400: Discuss when these changes (decrease followed by increase) occur in the timeseries.

L402: What does “relatively smooth” mean? Please clarify.

L407-408: What do you mean with this sentence? Figure 6b shows high snow depth values in the west, as well as the east. Please adjust. Also remove the word 'distribution'.

L408: From the spatial pattern in Canada? That is probably not what you mean, but this sentence does make it look like that. Please adjust.

L409: The snow depth of the Tibetan Plateau was also less than what? Please clarify.

L418: What is a “distribution area”. Please clarify.

L418-419: What about Scandinavia, Svalbard, eastern Siberia, and Alaska?

L419: What do you mean with “eastern European plain”? The little area slightly east of the European Alps? That hardly seems like an important area to mention given all the other large areas with high snow depth values.

L421: What does “relatively smooth” mean? Please clarify.

L421-422: Are the authors referring to the machine learning methods with regard to dividing into seasons, and the method of validation when referring to dividing snow depth into different intervals?

L422: What does “more reasonable and precise” mean?

L430: What is a “changing trend”? Suggest replacing this with simply “trend”.

L431: What does a test value of -3.28 mean?

L431-432: What shows a significantly decreasing trend? Can you quantify this?

L433: Please quantify the trends.

L446-451: Redundant and does not belong in the discussion section.

L454: Please elaborate on “based on experience”.

L454: Add citations to “previous studies”.

L462-465: This should be in the results section.

L463: What do you mean with “different spatial positions in the training sample (same time), different times of training samples”? Please clarify.

L465-467: What do you mean with this?

L476-468: This is not a proper way to train and verify the ML model. These years may differ significantly in climate, and thus in snow depth. You need more years of training data to train the ML model.

L468-470: Please clarify what this means.

L470-471: What do you mean with this? In L465 you say that you use all the NH data because of the generalization ability of ML. Also, please cite these claims.

L471-472: Here you say again that the ML model is able to generalize. Please clarify.

L472: Please elaborate on “new training is advisable”.

L472: Not clear what you mean with eliminating “one variable”. What variables are these?

L474-475: Here you say again that the ML model cannot generalize across different spatial locations. This argument is inconsistent and needs to be revised.

L477: Add citation to “as found in previous studies”.

L492: Do you mean accuracy instead of precision?

L493: If you've validated this, your conclusion cannot be "likely more accurate...". You should be able to have a firmer conclusion. Also missing citations.

Technical comments

L18: Replace “product” with “products”.

L25: Change “incorporated” to “incorporating”.

L27: Replace “different time period” with “a different time period” or “different time periods”.

L34: Replace “under” with “for”.

L46: Replace “is measured” with “are measured”.

L48: Replace “spatial-temporal” with “spatio-temporal”.

L59: Remove “retrieved”.

L59: Replace “spatiotemporal” with “spatio-temporal”.

L69: Change “susceptive” to “susceptible”.

L73: Remove comma after “latitudes”.

L89: Change “showed” to “have exhibited”.

L92: Replace “Mudrky” with “Mudryk”.

L99: Replace “plain” with “plains” and “forest” with “forested”.

L100: Replace “satisfying” with “satisfactory”. Can you quantify this statement?

L104: Not sure the word “even” is necessary here.

L106: Remove “real”.

L108: Replace “the ANN model” with “their ANN model”.

L108-109: Replace “to have a lower … than” with “to have a reduced MAE of 40% compared to an MAE of 60% of”.

L112: Replace “compared with” with “compared to”.

L113: Remove “products”.

L124: Replace semicolon with period.

L124: Replace “summarized” with “discussed”.

L128: Remove “the” before Northern.

L135: Make separations between row descriptions (most left column) more clear.

L145: Replace “spatiotemporal” with “spatio-temporal”.

L149: Replace “the ANN” with “an ANN”.

L154-155: This sentence is redundant.

L156: Change “underestimate when the snow depth” to “underestimate snow depth when the depth is deeper than…”

L160: Combine into one paragraph.

L161: Replace “included some in situ” with “includes a number of in situ”.

L162: Replace “mountain” with “mountainous”.

L166: Replace “mountain” with “mountainous”.

L166: Remove comma after “study”.

L169: Remove “from the fourth generation of reanalysis”.

L170: Replace “from” with “by”.

L173: Combine into one paragraph.

L193: Remove “also”.

L195: Not sure what this sentence means. Please rewrite.

L209: Remove “covers”.

L210: Remove “land” after Hemisphere.

L218: In L30 you use indexes as the plural for index. Here you use indices. Both are correct but it's best to be consistent throughout the manuscript.

L220: Replace “try fuse” with “generate fused”.

L220: Replace “datasets at” with “datasets of”.

L222-223: Replace “was referenced from” with “can be found in”.

L233-235: Replace “existing accuracy assessment” with “an existing accuracy assessment” or “existing accuracy assessments”.

L240: Change “second period include” to “second period includes”.

L245: It is 2022 now, so the data set you’re using does not cover the last 40 years. Please rewrite.

L247-248: Change to read “We evaluated the accuracy of the fused snow depth and the original gridded snow depth products against the in situ observations.”

L249: Change to “snow depth products as follows:”

L252: Make sure the variables in the text are aligned with the rest of the text. They are elevated right now.

L252: Both variables are now called S_i. Please change one of them and adjust accordingly.

L255: Combine this with the previous paragraph..

L255: Replace “variation trend” with “trend” and “We” with “we”.

L270-271: Replace “large data missing exist” with “large amounts of data are missing”.

L271: Remove “were”.

L272: Change “resulting in the similar missing in the” to “resulting in similar data gaps in the”

L278: Please make the projection part of the sentence more clear.

L278: Replace “spatio” with “The spatial”.

L279: You already mentioned the GeoTiff file type. This sentence can be removed.

L281-282: The first part of this sentence can be removed; you elaborate on the filename format in the next sentence.

L285: This sentence is not necessary.

L288: Remove comma after “2019”.

L289: Do you mean “machine learning model training”?

L305: Remove “as a reanalysis snow depth product”.

L308: Replace “snow depth” with “fused snow depth”.

L310-311: Remove “This also indicated that the consistency between the fused snow depth and ground station observations was very good over the entire Northern Hemisphere”.

L333: Replace “The fused snow depth can accurately estimate deeper snow” with “The fused snow depth product contains accurate estimates of deeper snow”.

L340: Replace “an” with “a”.

L344: Replace semicolon with period.

L345: Replace “shallower” with “smaller”.

L346: Replace “land cover type of this pixel” with “land cover type of this site”.

L347-348: Replace “range was varied” with “varies”

L351: Remove “During the winter … at this altitude”.

L363: (Fig. 3) I suggest compressing this figure so that it fits on one page. This could be done by removing some of the locations and moving them to the appendix, compressing the y-axis and reducing space between figures. Please adjust the x-ticks to improve readability. Suggestion: fewer x-ticks and mention just the year, not the month/day. This will also reduce the size of each sub-figure.

L368: Remove “levels of”.

L369: Replace “levels” with “depths”.

L370: Combine paragraphs.

L371-372: The wording is strange here. I would suggest noting that for 90% of the data, the bias falls between -5 and 5 cm. Similar wording can be applied throughout.

L378-379: Replace “In the last … than 50 cm;” with “For snow depths larger than 50 cm,”.

L379-380: Replace “Although the … were underestimated” with “Although the estimates for large snow depths are underestimated”.

L380: Figure 5 needs to be referenced in the beginning of this paragraph, not at the end.

L381: What is the difference between “small error” and “high accuracy” in this sentence? They seem to mean the same. Please clarify.

L386: (Fig. 5) Is the legend item "Frequent Count" meant to represent the bias between fused snow depth and in situ observations? If so, please clarify that in the legend. Also, please explain the legend item "Gauss" in the legend. This must be a gaussian distribution fit to the data. ?

L397: Remove abbreviation explanations. They have already been explained.

L400: Replace “slowly” with “slow”.

L406: Replace “shallower” with “less”.

L409: Replace “shallower” with “less”.

L419: Replace “the farthest east of Canada” with “eastern Canada”.

L419: Replace “Alps” with “European Alps”.

L420: Remove capitalization of the seasons.

L424: Please make the x-ticks in plots a, b, and c consistent. Either one tick every 20 degs, or every 40 degs.

L427: Remove “of change”.

L433: Replace “area” with “of the area”.

L437: Remove “very”.

L439: Replace “changed response times” with “change rate”.

L446: Replace “spatio-temporal” with “spatio-temporally”.

L476: Remove “Regarding the limitations of this study,”.

L483: Replace “theses” with “these”.

L490: Replace “consistency” with “agreement”.

L498: Replace “leaning” with “learning”.
Citation: https://doi.org/10.5194/essd-2022-63-RC2

Interactive discussion

Status: closed

RC1: 'Comment on essd-2022-63', Baptiste Vandecrux, 03 May 2022

Review of “A long-term daily gridded snow depth dataset for the Northern Hemisphere from 1980 to 2019 based on machine learning” by Hu et al.

Baptiste Vandecrux (bav@geus.dk)

General comment:

This article uses multiple gridded snow depth products and combine them with a random forest (RF) algorithm trained on more than 30 000 in situ observations of snow depth. The study covers an interesting topic as snow depth is critical for many climatic and ecological processes. The use of machine learning (ML) algorithms is also interesting for the combination of data from multiple sources. The study has therefore a clear potential as the dataset could benefit to many other research groups. However, I have major concerns on the following topic:

- The novelty of the work: It seems that the same authors have presented a similar study in Hu et al. (2021). It is unclear what are the differences between this previous study and the current work. I am unsure that adding one or two inputs to the same framework justifies a new publication. However, I also identify several flaws in the methods (see comments below) that, if fixed, could justify that the new product has improved enough since Hu et al. (2021).

- There is a confusion between snow depth and snow water equivalent (SWE) in the studies cited in the introductions. The two quantities are not interchangeable, and the authors should justify, with the appropriate studies, why snow depth is a variable relevant to monitor and difficult to observe on large scale. Work on SWE can naturally be reviewed as a related, but distinct, field of research.

- The choice and training of the RF algorithm: The RF algorithms are known to be very good at (over)fitting the training data and to perform poorly outside of their training set. There is currently nothing in the method that ensure that the training set is representative of the conditions in which the RF algorithm is being used. It is now standard procedure to de-cluster the training data to make sure that the training data is not dominated by one specific type of sample.

- Overfitting: Nothing is said about how the hyper-parameters of the RF algorithm are being set and about any measures to prevent over-fitting. For example, neural networks can be trained with noise added to the training data to prevent overfitting. Maybe something similar exists for RF regressions. Additionally, the evaluation of the of the fused dataset using randomly selected samples cannot evaluate properly the output of overfitted algorithm because the random subset will have the same distribution, and therefore the same structure (clustering) as the rest of the training set. Only a carefully set-up spatial cross-validation can make sur that a ML algorithm works appropriately in all the different areas where it is eventually applied.

- The trend analysis has some major methodological flaws that will need to be addressed.

Considering these issues, I recommend a major revision of the paper, unless the editor considers that the necessary reshaping would deserve a new submission.

Specific comments:

l.48 “mass” of what? Please split the sentence in two.

l.50-53: The second half of this paragraph states that "knowledge on snow depth and its trends are lacking", that there are "limited surface observations" and that remote sensing methods are "inadequate". This is not exactly the current state of research in snow depth mapping because this same study builds on numerous gridded snow depth products and thousands of in situ observations. Please re-frame this paragraph and acknowledge properly the previous work.

l. 61-63: Please give references for each of these products.

l.85: “conventional” do you mean “convolutional”?

l. 79-90: Consider discussing this additional reference:

Shao, D., Li, H., Wang, J., Hao, X., Che, T., and Ji, W.: Reconstruction of a daily gridded snow water equivalent product for the land region above 45° N based on a ridge regression machine learning approach, Earth Syst. Sci. Data, 14, 795–809, https://doi.org/10.5194/essd-14-795-2022, 2022.

l.90: Hu et al. (2021). This study seems very similar to what is presented here. Please describe the study in further detail and make explicit how the presented study builds on top the previous one. What were the limitations of the previous study and what are the novelty in this new one?

l.94: Mudryk et al. (2015) compared snow water equivalent (SWE), not snow depth.

l.93: “more than 50%” in SWE, not snow depth

l.95: Mortimer et al. (2020) evaluate SWE products, not snow depth.

l.97: “Globsnow snow depth”, it was the SWE that was evaluated there

l.98: “Previous assessments” which studies do you refer to?

l.107: Snauffer et al. (2018) should be added and discussed in line 79-90 where different ML algorithms are being used.

l.107-115 are partly redundant with l.79-90. They should be moved there and merged into a paragraph dedicated to ML algorithms used for snow depth retrieval.

l.115-117: Are the methods, and therefore produced data, the same as in Hu et al. (2021)?

If yes, then I think it raises the issue of the novelty of the study.

If not, then a paragraph in the intro should be dedicated to the limitations of Hu et al. (2021) and how the present study builds further and presents an improved product compared to Hu et al. (2021).

l.144-145 “In these two…” This sentence is unclear. Is the snow depth always set to 5 cm when being detected? How are deeper snowpack considered?

l.147-148: “The accuracy…” Give reference

l.166: remove “,” between “study” and “attempted”

l. 166-167: “Venäläinen et al., (2021)” This reference was not discussed in the intro when introducing the ML algorithms in snow depth retrieval.

l.181: Please give a reference for this dataset.

l.185: Please give a reference for this dataset.

l.190: Please give a reference for this dataset.

l.196: Please give a reference for this dataset.

l.200-205: These data are very important as they are the most objective way to evaluate your fused dataset. Please show on a map that they are located across a wide range of geographical locations and cover different land category for which you fit different RF models.

l.219-223: This is an insufficient level of detail for the core of your method. The documentation should be sufficient to reproduce your product. The fitting procedure and the hyperparameter selection should also be detailed to show how you avoid overfitting and to make the RF able to predict outside of its training set.

ML algorithms are very sensitive to the training data and to any imbalance therein. The training data should be de-clustered: it should be made sure that the observed snow depth covers the whole spectrum of retrieved snow depth and are located in all the elevations and all the land categories that are used as input to the algorithm. The de-clustering could be done by assigning weights to observations and or by duplicating observations from under-represented subsets.

Since your objective is to use the fused dataset for spatio-temporal analysis, a spatial or temporal cross validation should be conducted to investigate the robustness of your algorithm. This could be done by iteratively removing different regions or different years from the training set and using these removed samples for evaluation. Of course the final product should use as many samples as possible, but the evaluation of the RF algorithm is currently insufficient to build trust in its output.

L.269-275: This should be moved in paragraph 3.1. Or even in the description of the input snow depth dataset further up. Please quantify these data gaps.

L. 271 and 272: Replace “missing” by “gaps”

l. 278: “was properly…” replace with “projection was set to”

l.278: “spatio” replace with “spatial”

Section 4.1: The first paragraph about temporal availability and the last paragraph about file format could be moved as a new subsection 3.3 as it is not properly a result. it is about data availability and format.

l.289-291: In the training set of snow depth observations, many samples are redundant (f.e. daily snow courses will have similar values from one day to the next). Consequently, randomly extracting samples from the observation dataset will leave just as much information in the training set. The RF algorithm will then be very good at (over)fitting the training set and producing outstanding results on the test set. For a fair evaluation of all products, some observations should be left out from the RF training. Preferably this left out data should be representative of various geographical and natural settings to evaluate the product in different conditions. I thought that the data presented in lines 200-205 would serve that purpose?

l. 293: “..in situ observations.” Add a reference to Figure 1.

Figure 1: Are these statistics applying to the same samples? Can you give their number? I understood that the original snow depth products have different spatial coverage and are sometimes missing data. Are these evaluation samples have data available for all products?

l.299-306 and Figure 2: I suggest that you present a mosaic of scatter plots with the original snow depth products it will illustrate your statement line 302-306. Please also be quantitative. What is "not very accurate" l. 305?

Figure 2: Is there any point above 250cm? You could narrow the axis' limits.

Figure 3: I am surprised by the little amount of observations in the Himalayas. Isn't there any snow depth measurements available there?

Table 2: Please present the number for transparency. NaN means "not a number". A bad number is still a number.

l. 338-339: “Compared with the original…” Please present a mosaic of scatter plots at the 7 sites and for the 6 products involved. This will illustrate properly this statement.

Figure 4: Please make these plots fit on one page.

Section 4.4: Are you again comparing the training set? If yes, then it should be moved just after the section 4.2. Please refer to Figure 5 early in this paragraph and please guide the reader to which panel each statement is related.

l. 371:. “BIAS” is a word, not an acronym. It should be lower case in all the manuscript.

Figure 5: Please add unit for bias. make bias lowercase and resize so that all panels fit on one page. The last three panels should have the same bin size as the others.

Table 3: Please provide and discuss the mean error.

Section 4.6: Consider having section 4 only for the evaluation of the dataset and a section 5 for the spatio-temporal analysis.

l. 399: “North America and Eurasia” It is unclear what is included in these two domains. Are the northern part of Africa and south America included? If they are, then the domains should be renamed in something more neutral (A & B, or west & east). Please be aware that significant snowpacks can be present in the northern Andes and in the Atlas Mountains.

I recommend making trend analysis in narrower regions (North/south America, West/east Europe, Asia with or without Himalayas...). These regions should be illustrated in Fig 6. Please present first the spatial pattern of average snow depth (without any trend) and then the trend analysis to avoid confusion.

l. 400. “There was an overall trend decrease followed by a slowly increase” From when to when? By how much? With what level of significance? Please refer to Fig 6.

Section 4.6: The first paragraph is about trends, then the following two paragraphs are about spatial distribution, then comes section 4.7 that presents trends again. Please present the spatial distribution of average snow depth before analyzing the temporal trends.

Figure 6: I am surprised that the Himalayas are not being highlighted as a deep snow area. Rearrange Fig 6 and 7 so that Fig 6 has all the maps of snow depth and Fig 7 has all the trend analysis for different seasons and for different regions.

l. 417 What is “roughly similar”? Please be specific and quantitative.

l. 420 “… significantly lower than that in winter and spring.” what was the average in winter and spring then?

Figure 7: Have you investigated why spring 1984 had snow depth 30% higher than average?

Heading of section 4.7: Comparison to what?

Section 4.7: There is a confusion between the analysis of a snow depth change rate, which refers to the fitting of a linear model and the Mann-Kendall test, that is a statistical test that only tests whether the trend is monotonic or not. The Mann-Kendall test, in its original form, does not give the magnitude of the trend. It only tells if a trend is significantly positive or negative. A linear regression and the discussion of whether the fitted slopes are statistically different from zero would be here more suited. The results of this trend analysis should be discussed in more details. Are these results reasonable? Do they match with other studies?

l. 431: I don't understand this test value. Does it apply to the hemisphere-average snow depth trend?

l.444: I don't see how this sentence is related to the rest of the paragraph.

l. 444-446: Please move to method.

l. 447: Yes, but how do you deal with it? It is not clear. Do you extrapolate or fill with a certain value the product south of 35degN? Do you only have RF models without GlobSnow south of 35degN? This should be explained clearly in the methods.

l. 447: ”In this study…” Can you elaborate? How can it be fixed in the future?

l. 451-452: “more snow survey” More data is good, better data is even better. What data would you need to make you fusion even better? Are there certain geographical areas or land type, elevation or latitude that has insufficient in situ observations? Please elaborate and please be specific and quantitative when possible.

l. 453-458.”black-box models” This is only partially true, and you raise an interesting point: how to understand and interpret the output of the ML algorithm. Tests such as the permutation feature importance (Breiman, 2001) or Shapely value (https://github.com/slundberg/shap, Strumbelj and Kononenko, 2014) would represent a valuable addition to the paper to explain which of the input snow depth data is the most important in different regions or periods.

Breiman, Leo.“Random Forests.” Machine Learning 45 (1). Springer: 5-32 (2001).

Shapley sampling values: Strumbelj, Erik, and Igor Kononenko. "Explaining prediction models and individual predictions with feature contributions." Knowledge and information systems 41.3 (2014): 647-665.

l.458: “In future study…” This sentence is not clear. Consider removing.

Section 5.2: This paragraph is not clear to me. Please provide a table that summarizes the different tests being considered and metrics that allow the comparison of the results of these different tests. It looks very similar to a down-sized spatial and/or temporal cross-validation. As mentioned earlier, this should be a key of how the algorithm is evaluated and therefore presented in greater details.

Section 5: Consider removing the Discussion section and renaming Section 4 as “Results and discussion”. The content of section 5 can be merged with existing paragraphs or if not possible, remain as seperate subsections.

l.470-471: “It the future…” That is very true, that is why the fitting procedure should be subject of extra care to avoid overfitting and to allow the RF algorithm to perform decently outside of its training set.

l.476 “Regarding the limitations…” This sentence should be moved in the previous subsection as it deals with limitations.

Citation: https://doi.org/10.5194/essd-2022-63-RC1
RC2:
'Comment on essd-2022-63', Anonymous Referee #2, 24 Jun 2022
Review for Hu et al. (2022)

Summary

The authors have produced a daily gridded snow-depth dataset for the northern hemisphere for the period 1980 to 2019 using machine learning, specifically using a random forest method. The dataset incorporates remote sensing data from multiple products derived from various sensors, reanalysis data, and in situ measurements. Different combinations of datasets are chosen for different periods based on data availability. Additional datasets such as land surface type and topographic information are incorporated into the scheme as additional input variables to improve the estimated depth values. The authors indicate that the scheme improves snow depth estimates substantially relative to the best available current products using various metrics including the coefficient of determination, root mean squared error, and mean absolute error. The fused dataset is less accurate at high elevations, and would benefit in terms of accuracy from further validation and input datasets.

General comments

The dataset appears to be well thought out and such a dataset is indeed important and potentially useful for many applications from climate studies to water resource management. However I feel the manuscript requires major changes with regard to presentation as well as further evidence and descriptions to support the authors’ claims. I do not feel it is suitable for final publication in the present form. In general:

The methods used to produce the dataset are not adequately discussed. It appears that the authors have discussed some of the methods and their reasons for choices e.g. which machine learning algorithm to use, in previous studies, but these decisions and choices should also be summarized in this description paper for the benefit of users of this dataset.

I am concerned that the in situ locations used for validation are the same locations as the training data, and therefore the method may artificially appear to be successful. I believe the authors mention performing some sensitivity experiments at the end of the paper but I think a section should be devoted to exploring some of the choices made and their impact on the dataset. Additionally is any uncertainty information for the training in situ datasets incorporated into the analysis? How might this affect the results?

It would be greatly beneficial to users of the dataset to have an estimate of uncertainty for all points and times in the final dataset, or at least to provide some flags for low/high/medium quality data, derived from the data used here. This should be provided if possible.

The discussion section is very brief and now more or less summarizes the paper. The purpose of the discussion section should be: e.g. discussing limitations of the methods and data used. I believe the discussion section needs more elaboration on this.

I’m missing important information on the timing of the snow depth values in the different snow depth products and in situ observations. Only for the meteorological station observations from China you mention the measurement time of 8am, but not for the other products and observations. This potential mismatch in timing may induce a larger bias than is actually the case. Please elaborate on these timings or, if the timings are not available, elaborate on the potential effect of different timings between each product/observation.

The writing in the manuscript needs considerable improvement. Spelling and grammar should be reviewed by a fluent English speaker. Sometimes paragraphs are too short and should be combined. In particular I suggest using present rather than past tense in most cases which would improve readability. The text is sometimes redundant and should be revised to avoid repetition of information.

Please add citations for the remote sensing and in situ datasets in the data description section.

Specific comments

L16: The abstract is a bit long and could be shortened somewhat.

L16: Please briefly elaborate on this statement. Why is it important for these disciplines?

L25-26: I suggest starting a new sentence here, e.g. “and topographic data. Here we incorporated these datasets as independent input variables to a random forest regressor to generate a gridded northern hemisphere snow depth dataset for the 1980 to 2019 period.”

L32: Change “was distributed…” to “was in the range of -5 to 5 cm.”

L39-42: Does this sentence belong here or should it be in the data availability section?

L54: What are manual observations? Do you mean in situ observations?

L58: Can you quantify the “highest elevations”?

L68: Please briefly explain what it means when a snow depth data set saturates.

L69-70: Please elaborate on the structural limitations.

L74: Suggest changing to read “Additionally, some reanalysis datasets…” unless you mean that some reanalysis datasets overestimate snow depth at high latitudes.

L105: Please elaborate on “combined and integrated improvements”.

L107-108: Why is this a new approach? It is more precise compared to what? What is the auxiliary information here?

L109-110: What does it mean that the fusion is improved? Do you mean that errors were reduced?

L110-111: Please note which “machine learning methods” were used.

L112: Please quantify the increased estimation accuracy if an estimate was provided.

L114: How do you know that the positive aspects of each product are incorporated into the fused data set?

L115-116: Please describe what the candidate independent variables are and why they are candidates.

L117: Please elaborate on the different bins.

L118: The data fusion framework has already been proposed in Hu et al. 2021. Please clarify that this paper isn’t proposing the framework but is presenting the dataset and validation by comparing it with observational snow depth data.

L123: Please elaborate on what validation works are.

L143-144: Can you elaborate on the calibration?

L144-145: From this sentence it seems that the snow depth can only have a depth of 5 cm. Is that what you mean to say? Is this a minimum snow depth value?

L145-146: Please elaborate briefly on the spatio-temporal interpolation.

L153-154: Is this less than a 30% deviation with respect to observations? Please clarify.

L155-157: Please provide a citation for this statement.

L163: In L67 you say this product excludes the area above 35N. Please check and adjust accordingly.

L164-165: Precision and accuracy are two different concepts. Do you mean you use both of them?

L166-167: If the attempt was successful, please elaborate. If not, this sentence is not necessary.

L172: Please elaborate on how you get daily data from the 6-hourly data. Do you take the mean?

L174: Do you mean in the process of “making” the MERRA-2 data set? If so, please rewrite to make that clear.

L175: Please elaborate on improving the quality of the data.

L178: Is this nearest neighbor interpolation? Please clarify.

L181-182: As noted in the general comments, please provide citations for these datasets.

L182: What is the spatial distribution of the GHCN data set? Is it distributed sufficiently across the NH to be able to draw conclusions about regions outside of China and Russia?

L187-188: Please explain the meaning of “per five days data.”

L189: Please elaborate on the rigorous data standards.

L191-192: What is a quality checked field? Please elaborate.

L192: What was the method of removing the anomalous snow depth fields? Is this the quality checking procedure?

L197-198: Please elaborate on the inter-annual consistency and climatological outlier check.

L198: The amount of station sites used in the two Russian data sets are missing.

L200: Are the seven data sets mentioned here different from the four data sets described in section 2.2? If not, please elaborate.

L200: How/why did you choose these data sets? Could sites not used in the training also be chosen for this purpose?

L202-203: It might be good to mention the specific years that are covered by the other data sets as well.

L203-204: What is meant by “snow depth retrieved model” are these simulations of snow in earth system models? Please clarify.

L207: What else does the auxiliary data include?

L211-212: Can you justify this assumption? How might this impact the results?

L212-213: Please elaborate on what you mean with “snow depth data as a whole”.

L215: Which dataset is being referred to here, GMTED2010 and/or GTOPO30? Please elaborate.

L220-221: I believe RFR is the abbreviation for random forest fusion framework. Please clarify. Although this is discussed in another study, it would be helpful to include details as to why the RFR method showed the best performance. Also a brief description of each of these methods should be provided.

L232-233: Please elaborate on what is meant by “different models were established”, also what is meant by “15 models can be employed to train and verify the model.”

L236-237: Additional details are needed here. Why is the random forest model the best?

L244: Please elaborate on the ““leave-one-year-out” cross-validation”.

L259-260: This is confusing. Suggest revising to read: “As noted above the fused dataset provides continuous daily data from 1980 to 2019, with several gaps.” Then the gaps can be mentioned. It is not clear whether the gaps occur every year or whether only certain years have gaps.

L264-265: Please elaborate on this.

L266-267: Briefly explain why these areas are excluded.

L271: Please elaborate on why the NHSD and GlobSnow inevitably have a large amount of data missing.

L272: Why do data gaps in 2 of the 7 snow depth products lead to data gaps in the fused data set? Shouldn’t the other datasets be able to fill the gap? Please elaborate.

L275-276: Do the data gaps arise because of the striping you mention in L274-275? Please elaborate and clarify.

L276-277: This seems an important limitation of the machine learning fused framework. Please elaborate on why this happens and what it means for your results.

L299-300: The in situ observations are at the point-scale, while the fused data set is at 0.25 deg resolution. Can you comment on errors introduced from this comparison?

L302-306: Where do you get these conclusions from? If from figure A2, please refer to that figure.

L304: What does “its overestimation and underestimation were obvious” mean? Please elaborate.

L305-306: What does “and there were many points of underestimating and overestimating disorderly distribution” mean? Please rewrite.

L309: Suggest changing “BIAS” to “bias” throughout. The statement here is unclear. Suggest revising to: “The fused data bias fell mostly between -5 and +5 cm, with 88.31% of the bias falling within that range.”

L317-318: What does “percentage of each interval” mean? The percentage of the total amount of data?

L320-323: In section 4.2 you say you use 90% of the in situ observations for model training while you retain the other 10% for model verification. I believe the 10% of measurements are taken from the same locations as the other 90% while these are separate locations. But this is unclear. How do the locations here relate to the other in situ locations mentioned earlier? Would it be possible to also exclude some of those locations to improve the analysis?

L324: Please briefly elaborate on which regions you mean.

L325: I suggest extending the analysis shown in Table 1 to also be performed on the other snow depth datasets. This will reveal the success of the various methods assessed against the independent in situ measurements. As it stands the analysis only describes the strengths and weaknesses of the fused dataset without showing its performance against other datasets.

L329: It would be good to mention the countries or regions these sites are in.

L329-330: Not necessary to explain abbreviations of R2, RMSE, and MAE, you already did this.

L330: Does this mean it is impossible to calculate the R2? I'm not sure why a large error would impede you from calculating the R2

L330: Suggest changing the column descriptions in Table 2 from “RMSE / cm” to ‘RMSE [cm]”. Same for the other column descriptions. It now reads as “RMSE per cm”.

L332-333: “... their accuracies were still relatively high compared to those of other gridded snow depth products”. Please elaborate on which snow products you mean.

L333-334: Not sure what this means, please rewrite. I also do not see any inflection points, which is where the direction of the curvature changes. The curves in Figure 4 are all in the same direction.

L338-339: Here it appears that a comparison is made with the performance of the original gridded datasets at this site. However, the data is not provided. As noted above it would be best to also include that analysis, perhaps as a set of tables in the appendix.

L340: Please elaborate briefly on what a relative low elevation is.

L340: Please elaborate briefly on the “better performance”. The performance is better than what?

L341: Please describe what is in the file, rather than the file type.

L344: Please describe why this site is better suited for measuring precipitation.

L346: In L342 you say that SBBSA is located in a basin. Is the basin above 3700?

L348-349: Are two sites sufficient to characterize an entire basin? How large is the basin? Please elaborate briefly. And what does 'the large area snow depth' mean? The area of the 0.25 deg pixel or of a larger regional scale?

L350: Please elaborate on what you mean with “but this site has a higher altitude”.

L352-354: How do you get to these conclusions? Please elaborate briefly. I believe that the authors are discussing changes in snow depth with elevation. There is a rapid change in depth with elevation that cannot be captured in the fused dataset. This is consistent with a larger bias for the highest snow depths. Please clarify.

L369-381: This paragraph needs rewriting. Please clarify the meaning of “relative frequency of BIAS”, “slightly overestimated trend”, and “distribution charts of relative frequency”.

L390-391: Accuracy cannot have poor precision. Data can have accuracy and precision. Please rewrite.

L391: Use either elevation or altitude consistently throughout the manuscript.

L392: Consistency is probably not what you mean here. Please rewrite.

L393: What do you mean with “both snow depth and error of the fused dataset were greater”? Please clarify.

L394-395: Move this sentence to earlier in the paragraph when you talk about these elevation ranges.

L400: Discuss when these changes (decrease followed by increase) occur in the timeseries.

L402: What does “relatively smooth” mean? Please clarify.

L407-408: What do you mean with this sentence? Figure 6b shows high snow depth values in the west, as well as the east. Please adjust. Also remove the word 'distribution'.

L408: From the spatial pattern in Canada? That is probably not what you mean, but this sentence does make it look like that. Please adjust.

L409: The snow depth of the Tibetan Plateau was also less than what? Please clarify.

L418: What is a “distribution area”. Please clarify.

L418-419: What about Scandinavia, Svalbard, eastern Siberia, and Alaska?

L419: What do you mean with “eastern European plain”? The little area slightly east of the European Alps? That hardly seems like an important area to mention given all the other large areas with high snow depth values.

L421: What does “relatively smooth” mean? Please clarify.

L421-422: Are the authors referring to the machine learning methods with regard to dividing into seasons, and the method of validation when referring to dividing snow depth into different intervals?

L422: What does “more reasonable and precise” mean?

L430: What is a “changing trend”? Suggest replacing this with simply “trend”.

L431: What does a test value of -3.28 mean?

L431-432: What shows a significantly decreasing trend? Can you quantify this?

L433: Please quantify the trends.

L446-451: Redundant and does not belong in the discussion section.

L454: Please elaborate on “based on experience”.

L454: Add citations to “previous studies”.

L462-465: This should be in the results section.

L463: What do you mean with “different spatial positions in the training sample (same time), different times of training samples”? Please clarify.

L465-467: What do you mean with this?

L476-468: This is not a proper way to train and verify the ML model. These years may differ significantly in climate, and thus in snow depth. You need more years of training data to train the ML model.

L468-470: Please clarify what this means.

L470-471: What do you mean with this? In L465 you say that you use all the NH data because of the generalization ability of ML. Also, please cite these claims.

L471-472: Here you say again that the ML model is able to generalize. Please clarify.

L472: Please elaborate on “new training is advisable”.

L472: Not clear what you mean with eliminating “one variable”. What variables are these?

L474-475: Here you say again that the ML model cannot generalize across different spatial locations. This argument is inconsistent and needs to be revised.

L477: Add citation to “as found in previous studies”.

L492: Do you mean accuracy instead of precision?

L493: If you've validated this, your conclusion cannot be "likely more accurate...". You should be able to have a firmer conclusion. Also missing citations.

Technical comments

L18: Replace “product” with “products”.

L25: Change “incorporated” to “incorporating”.

L27: Replace “different time period” with “a different time period” or “different time periods”.

L34: Replace “under” with “for”.

L46: Replace “is measured” with “are measured”.

L48: Replace “spatial-temporal” with “spatio-temporal”.

L59: Remove “retrieved”.

L59: Replace “spatiotemporal” with “spatio-temporal”.

L69: Change “susceptive” to “susceptible”.

L73: Remove comma after “latitudes”.

L89: Change “showed” to “have exhibited”.

L92: Replace “Mudrky” with “Mudryk”.

L99: Replace “plain” with “plains” and “forest” with “forested”.

L100: Replace “satisfying” with “satisfactory”. Can you quantify this statement?

L104: Not sure the word “even” is necessary here.

L106: Remove “real”.

L108: Replace “the ANN model” with “their ANN model”.

L108-109: Replace “to have a lower … than” with “to have a reduced MAE of 40% compared to an MAE of 60% of”.

L112: Replace “compared with” with “compared to”.

L113: Remove “products”.

L124: Replace semicolon with period.

L124: Replace “summarized” with “discussed”.

L128: Remove “the” before Northern.

L135: Make separations between row descriptions (most left column) more clear.

L145: Replace “spatiotemporal” with “spatio-temporal”.

L149: Replace “the ANN” with “an ANN”.

L154-155: This sentence is redundant.

L156: Change “underestimate when the snow depth” to “underestimate snow depth when the depth is deeper than…”

L160: Combine into one paragraph.

L161: Replace “included some in situ” with “includes a number of in situ”.

L162: Replace “mountain” with “mountainous”.

L166: Replace “mountain” with “mountainous”.

L166: Remove comma after “study”.

L169: Remove “from the fourth generation of reanalysis”.

L170: Replace “from” with “by”.

L173: Combine into one paragraph.

L193: Remove “also”.

L195: Not sure what this sentence means. Please rewrite.

L209: Remove “covers”.

L210: Remove “land” after Hemisphere.

L218: In L30 you use indexes as the plural for index. Here you use indices. Both are correct but it's best to be consistent throughout the manuscript.

L220: Replace “try fuse” with “generate fused”.

L220: Replace “datasets at” with “datasets of”.

L222-223: Replace “was referenced from” with “can be found in”.

L233-235: Replace “existing accuracy assessment” with “an existing accuracy assessment” or “existing accuracy assessments”.

L240: Change “second period include” to “second period includes”.

L245: It is 2022 now, so the data set you’re using does not cover the last 40 years. Please rewrite.

L247-248: Change to read “We evaluated the accuracy of the fused snow depth and the original gridded snow depth products against the in situ observations.”

L249: Change to “snow depth products as follows:”

L252: Make sure the variables in the text are aligned with the rest of the text. They are elevated right now.

L252: Both variables are now called S_i. Please change one of them and adjust accordingly.

L255: Combine this with the previous paragraph..

L255: Replace “variation trend” with “trend” and “We” with “we”.

L270-271: Replace “large data missing exist” with “large amounts of data are missing”.

L271: Remove “were”.

L272: Change “resulting in the similar missing in the” to “resulting in similar data gaps in the”

L278: Please make the projection part of the sentence more clear.

L278: Replace “spatio” with “The spatial”.

L279: You already mentioned the GeoTiff file type. This sentence can be removed.

L281-282: The first part of this sentence can be removed; you elaborate on the filename format in the next sentence.

L285: This sentence is not necessary.

L288: Remove comma after “2019”.

L289: Do you mean “machine learning model training”?

L305: Remove “as a reanalysis snow depth product”.

L308: Replace “snow depth” with “fused snow depth”.

L310-311: Remove “This also indicated that the consistency between the fused snow depth and ground station observations was very good over the entire Northern Hemisphere”.

L333: Replace “The fused snow depth can accurately estimate deeper snow” with “The fused snow depth product contains accurate estimates of deeper snow”.

L340: Replace “an” with “a”.

L344: Replace semicolon with period.

L345: Replace “shallower” with “smaller”.

L346: Replace “land cover type of this pixel” with “land cover type of this site”.

L347-348: Replace “range was varied” with “varies”

L351: Remove “During the winter … at this altitude”.

L363: (Fig. 3) I suggest compressing this figure so that it fits on one page. This could be done by removing some of the locations and moving them to the appendix, compressing the y-axis and reducing space between figures. Please adjust the x-ticks to improve readability. Suggestion: fewer x-ticks and mention just the year, not the month/day. This will also reduce the size of each sub-figure.

L368: Remove “levels of”.

L369: Replace “levels” with “depths”.

L370: Combine paragraphs.

L371-372: The wording is strange here. I would suggest noting that for 90% of the data, the bias falls between -5 and 5 cm. Similar wording can be applied throughout.

L378-379: Replace “In the last … than 50 cm;” with “For snow depths larger than 50 cm,”.

L379-380: Replace “Although the … were underestimated” with “Although the estimates for large snow depths are underestimated”.

L380: Figure 5 needs to be referenced in the beginning of this paragraph, not at the end.

L381: What is the difference between “small error” and “high accuracy” in this sentence? They seem to mean the same. Please clarify.

L386: (Fig. 5) Is the legend item "Frequent Count" meant to represent the bias between fused snow depth and in situ observations? If so, please clarify that in the legend. Also, please explain the legend item "Gauss" in the legend. This must be a gaussian distribution fit to the data. ?

L397: Remove abbreviation explanations. They have already been explained.

L400: Replace “slowly” with “slow”.

L406: Replace “shallower” with “less”.

L409: Replace “shallower” with “less”.

L419: Replace “the farthest east of Canada” with “eastern Canada”.

L419: Replace “Alps” with “European Alps”.

L420: Remove capitalization of the seasons.

L424: Please make the x-ticks in plots a, b, and c consistent. Either one tick every 20 degs, or every 40 degs.

L427: Remove “of change”.

L433: Replace “area” with “of the area”.

L437: Remove “very”.

L439: Replace “changed response times” with “change rate”.

L446: Replace “spatio-temporal” with “spatio-temporally”.

L476: Remove “Regarding the limitations of this study,”.

L483: Replace “theses” with “these”.

L490: Replace “consistency” with “agreement”.

L498: Replace “leaning” with “learning”.
Citation: https://doi.org/10.5194/essd-2022-63-RC2

Yanxing Hu, Tao Che, Liyun Dai, Yu Zhu, Lin Xiao, Jie Deng, and Xin Li

Data sets

Long-term series of daily snow depth dataset over the Northern Hemisphere based on machine learning (1980-2019) Che, T., Hu, Y., Dai, L., Xiao, L. https://zenodo.org/record/6336866#.Yjs0CMjjwzY

Long-term series of daily snow depth dataset over the Northern Hemisphere based on machine learning (1980-2019） Che, T., Hu, Y., Dai, L., Xiao, L. https://dx.doi.org/10.11888/Snow.tpdc.271701

Yanxing Hu, Tao Che, Liyun Dai, Yu Zhu, Lin Xiao, Jie Deng, and Xin Li

Viewed

Total article views: 2,708 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,978	637	93	2,708	115	175

HTML: 1,978
PDF: 637
XML: 93
Total: 2,708
BibTeX: 115
EndNote: 175

Views and downloads (calculated since 28 Mar 2022)

Month	HTML	PDF	XML	Total
Mar 2022	109	28	5	142
Apr 2022	200	54	7	261
May 2022	110	30	5	145
Jun 2022	54	20	5	79
Jul 2022	41	18	1	60
Aug 2022	33	24	2	59
Sep 2022	39	9	0	48
Oct 2022	27	15	1	43
Nov 2022	33	11	1	45
Dec 2022	20	8	0	28
Jan 2023	19	8	0	27
Feb 2023	21	11	1	33
Mar 2023	23	10	0	33
Apr 2023	24	8	1	33
May 2023	10	4	1	15
Jun 2023	29	4	1	34
Jul 2023	15	9	0	24
Aug 2023	10	7	0	17
Sep 2023	31	9	2	42
Oct 2023	47	8	2	57
Nov 2023	24	1	0	25
Dec 2023	18	5	0	23
Jan 2024	34	18	1	53
Feb 2024	36	15	3	54
Mar 2024	38	14	3	55
Apr 2024	32	11	5	48
May 2024	31	14	2	47
Jun 2024	41	6	2	49
Jul 2024	26	15	3	44
Aug 2024	14	9	3	26
Sep 2024	12	5	0	17
Oct 2024	15	9	2	26
Nov 2024	21	6	1	28
Dec 2024	19	5	0	24
Jan 2025	10	11	3	24
Feb 2025	9	5	2	16
Mar 2025	23	7	2	32
Apr 2025	25	12	2	39
May 2025	38	9	2	49
Jun 2025	23	24	1	48
Jul 2025	26	9	3	38
Aug 2025	60	10	3	73
Sep 2025	302	12	2	316
Oct 2025	22	18	1	41
Nov 2025	53	39	2	94
Dec 2025	49	24	5	78
Jan 2026	73	21	2	96
Feb 2026	9	8	3	20

Cumulative views and downloads (calculated since 28 Mar 2022)

Month	HTML	PDF	XML	Total
Mar 2022	109	28	5	142
Apr 2022	200	54	7	261
May 2022	110	30	5	145
Jun 2022	54	20	5	79
Jul 2022	41	18	1	60
Aug 2022	33	24	2	59
Sep 2022	39	9	0	48
Oct 2022	27	15	1	43
Nov 2022	33	11	1	45
Dec 2022	20	8	0	28
Jan 2023	19	8	0	27
Feb 2023	21	11	1	33
Mar 2023	23	10	0	33
Apr 2023	24	8	1	33
May 2023	10	4	1	15
Jun 2023	29	4	1	34
Jul 2023	15	9	0	24
Aug 2023	10	7	0	17
Sep 2023	31	9	2	42
Oct 2023	47	8	2	57
Nov 2023	24	1	0	25
Dec 2023	18	5	0	23
Jan 2024	34	18	1	53
Feb 2024	36	15	3	54
Mar 2024	38	14	3	55
Apr 2024	32	11	5	48
May 2024	31	14	2	47
Jun 2024	41	6	2	49
Jul 2024	26	15	3	44
Aug 2024	14	9	3	26
Sep 2024	12	5	0	17
Oct 2024	15	9	2	26
Nov 2024	21	6	1	28
Dec 2024	19	5	0	24
Jan 2025	10	11	3	24
Feb 2025	9	5	2	16
Mar 2025	23	7	2	32
Apr 2025	25	12	2	39
May 2025	38	9	2	49
Jun 2025	23	24	1	48
Jul 2025	26	9	3	38
Aug 2025	60	10	3	73
Sep 2025	302	12	2	316
Oct 2025	22	18	1	41
Nov 2025	53	39	2	94
Dec 2025	49	24	5	78
Jan 2026	73	21	2	96
Feb 2026	9	8	3	20

Viewed (geographical distribution)

Total article views: 2,638 (including HTML, PDF, and XML) Thereof 2,638 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 07 Feb 2026

Download

This preprint has been withdrawn.

Preprint (3034 KB)
Metadata XML

Short summary

We propose a data fusion framework based on the random forest regression algorithm to derive a comprehensive snow depth product for the Northern Hemisphere from 1980 to 2019. This new fused snow depth dataset not only provides information about snow depth and its variation over the Northern Hemisphere but also presents potential value for hydrological and water cycle studies related to seasonal snowpacks.


Total:	0
HTML:	0
PDF:	0
XML:	0