Global snow water equivalent product derived from machine learning model trained with in situ measurement data

Seo, Jungho; Panahi, Mahdi; Kim, Ji Hyun; Bateni, Sayed M.; Kim, Yeonjoo

doi:10.5194/essd-2024-349

Preprints

https://doi.org/10.5194/essd-2024-349

Preprints

28 Jan 2025

| 28 Jan 2025

Status: this discussion paper is a preprint. It has been under review for the journal Earth System Science Data (ESSD). The manuscript was not accepted for further review after discussion.

Global snow water equivalent product derived from machine learning model trained with in situ measurement data

Jungho Seo, Mahdi Panahi, Ji Hyun Kim, Sayed M. Bateni, and Yeonjoo Kim

Abstract. Snow water equivalent (SWE) quantifies the volume of water stored in snowpacks and therefore critically attributes to the timing and amount of water discharged into groundwater sources and rivers. The SWE has been estimated using various methods, including in situ measurements, remote sensing, and physics-based models. However, each of these methods present certain limitations, including high costs, low spatiotemporal resolution, and uncertainty in model representation and parameter calibration. To address these challenges, in this study, we developed a machine learning-based daily global gridded SWE (SWEML) product with a spatial resolution of 0.25°, covering the period from 1980 to 2020. To develop this product, we first applied the k-means clustering algorithm using topographical and climatic variables to classify global in situ SWE measurements into 13 clusters. Subsequently, we adopted the random forest algorithm to correlate daily in situ SWE measurements (n = 11,653) with meteorological forcing and terrain attributes. We compared SWEML with other SWE datasets, including the GlobSnow dataset from the European Space Agency, the Global Land Data Assimilation System dataset, and SWE estimates from the Advanced Microwave Scanning Radiometer for the Earth Observation System. The overall root mean square error (RMSE) was 10.80 mm, and the overall bias was -6.89 mm globally, in particular, with high accuracy with Pearson correlation coefficient, R, of 0.99 and RMSE of 16.88 mm in mountainous and high-elevation areas, such as the Rocky Mountains in the U.S. Furthermore, both snow accumulation during winter and snow melting during spring, were well depicted in the SWEML, which is only possible with a high-temporal-resolution product. Overall, the daily gap-free global SWEML product introduced in this study can significantly contribute to water resource management efforts in snow-dominant regions and provide a robust reference for data assimilation in global-scale land surface modeling. The SWEML is available at https://doi.org/10.5281/zenodo.14195794 (Seo et al., 2024).

Received: 21 Aug 2024 – Discussion started: 28 Jan 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Jungho Seo, Mahdi Panahi, Ji Hyun Kim, Sayed M. Bateni, and Yeonjoo Kim

Status: closed

CC1: 'Comment on essd-2024-349', Jessica Lundquist, 04 Feb 2025

Hi! I just want to provide some references for you regarding your interpretation of your ML dataset's performance over mountain areas relative to many global datasets.
Basically, your model is trained on in situ data, such as Snotel, in mountains, and these in situ data report way more snow than either GLDAS or GlobSnow. Passive microwave remote sensing saturates in snow deeper than 200 mm and thus _underestimates_ mountain snow. GLDAS is too course resolution to resolve orographic precip and thus _underestimates_ mountain snow. Please check out the below references to better understand the variation in global datasets for mountain snow. They vary widely in their accuracy and inherent biases.
Fang, Y., Liu, Y., Li, D., Sun, H., & Margulis, S. A. (2023). Spatiotemporal snow water storage uncertainty in the midlatitude American Cordillera. The Cryosphere, 17(12), 5175-5195. https://tc.copernicus.org/articles/17/5175/2023/
Mortimer, C., Mudryk, L., Cho, E., Derksen, C., Brady, M., & Vuyovich, C. (2024). Use of multiple reference data sources to cross-validate gridded snow water equivalent products over North America. The Cryosphere, 18(12), 5619-5639. https://tc.copernicus.org/articles/18/5619/2024/
Mudryk, L., Mortimer, C., Derksen, C., Elias Chereque, A., & Kushner, P. (2025). Benchmarking of snow water equivalent (SWE) products based on outcomes of the SnowPEx+ Intercomparison Project. The Cryosphere, 19(1), 201-218. https://tc.copernicus.org/articles/19/201/2025/
Wrzesien, M. L., Pavelsky, T. M., Durand, M. T., Dozier, J., & Lundquist, J. D. (2019). Characterizing biases in mountain snow accumulation from global data sets. Water Resources Research, 55(11), 9873-9891.

Citation: https://doi.org/10.5194/essd-2024-349-CC1
RC1:
'Comment on essd-2024-349', Anonymous Referee #1, 25 Feb 2025
This paper describes a novel global snow water equivalent dataset at 0.25 degree spatial resolution and daily temporal resolution, derived from a machine learning model (random forest) trained with in situ SWE data and other land surface and climate variables. The authors describe their machine learning approach and datasets used, and then move on to evaluation of their global SWE product with in situ measurements, as well as comparison with other reference gridded SWE datasets. The authors claim that their dataset is far superior than other reference datasets, when evaluating their dataset against ground SWE measurements.
The idea of trying to generate such dataset from machine learning is great and will hopefully at some point be possible with high accuracy. In fact, it seems that the ML model learned spatial features quite well. However, I have found several big issues in the data that the authors have not presented or discussed at all. After inspecting the dataset and the paper, I think the current approach did not generate the outcome that the authors claim to have generated, especially temporal features of SWE time series. I am sorry that I cannot recommend this data paper to be published. Below I explain the reasons, together with supporting figures in the attached document. Despite my negative review of this dataset/manuscript, I do encourage the authors to either rebut me if I am wrong or to revise their approach completely and try to improve the dataset/paper.
The biggest issue shows up with the evaluation of the dataset. The authors claim their dataset is far superior to other reference SWE datasets based on a few performance metrics (RMSE, MAE, Bias) of their SWE estimates compared to ground observations. Indeed, the average performance values are much better (Table 3 in the manuscript), almost too good to be true, with errors very close to zero or very low. However, I think is an artifact that results from averaging over the entire globe and/or overfitting of the machine learning model. I might be wrong, but I suspect that the issue comes from the division between training/testing/validation of the in-situ data. The authors used a lot of long-term daily time series of SWE (e.g., SNOTEL) to train/test/validate the dataset, and as far as I can see from the model code that the authors shared (see github link they share), they divided the entire dataset randomly, instead of by stations or regions. I assume that the ML algorithm then simply removes random points from a time series, which is then very easily filled by the ML model (as a linear interpolation would equally do). When the time series are evaluated only for those removed points, the performance metrics are great, but a linear interpolation would do equally great.
Whether I am right or wrong with this, there are clear issues with the time series in the SWE dataset, which totally contradict the perfect performance metrics that the authors show in the paper. In fact, the only evaluation they have done is over average values of the entire dataset, and average monthly values in Figure 10. In Figure 11, it is already noticeable that average SWE values can be rather high even in areas that should have very little or even no snow. I plotted Peak SWE for the year 2012 (Figure 1 in attached document), to inspect maximum SWE values, and one can observe that SWE seems overestimated over many areas (from common snow climate knowledge). This is clearer when zooming into those areas. For example, there are large areas in low latitudes (tropical), without mountains, with SWE values that are not zero, which is unrealistic. I have plotted time series from a few points to check how daily time series of SWE look like in the dataset. In Figure 2 in the attached document I plotted SWE time series from SWEML at the exact location of a SNOTEL site, which is used for training. The time series here already have some strange features, with probably too many peaks, but they still look realistic. However, when moving to other areas where no training data was available (Figure 4 in Iceland, and Figure 5 in the Pyrenees), the SWE time series look completely wrong and unrealistic. In Figure 2 I also plotted the example SWE time series over a point in the amazon rainforest, with SWE values ranging around 1 all year long. I suggest that the authors should revise the entire protocol, approach, and machine learning algorithm to understand why the ML model learns the time series in such an erratic way. They should also apply a quality control to the time series generated, to avoid huge unrealistic jumps in SWE (e.g. Figure 4 in attached document), or non-zero SWE where there should not be any snow at all (e.g. Figure 2 in attached document). The authors should also apply a much more extensive evaluation of the time series, to make sure that the ML model did not only learn spatial features but also temporal features.
There are also other big issues that I briefly describe below:
It is unclear what the novelty of this dataset is, besides trying to improve the performance of estimated SWE compared to existing reference datasets. The resolution is still quite coarse, and the authors do not clearly explain which needs from the scientific community this dataset responds to.

Although the dataset includes mountain regions, the resolution is still very coarse for high mountain, but this is not discussed as a limitation at all.

The data clusters generated by the authors approach (Figure 4) seem to follow exactly the spatial patters of different datasets (Figure 2). It would be good to discuss why that is the case, as it looks strange.

In lines 109-111, what does it mean that “we adjusted the raw in situ measurements within individual grid cells to align with the mean and variance of the corresponding SWE data sourced from ECMWF”? So was the validation data “modified”? That could also help explain the low errors.

It is unclear what data is used for validation. Section 2.1.2 seems to say that the reference datasets (gridded datasets) are used for validation, but as far as I understood, the in situ point measurements are also used, so it is a bit confusing. Line 78 mentions “11,653 daily in situ measurements”, line 105 mentions “11,653 grid points” and line 136 mentions “1,706,825 data points from 6,133 sites”. At this point it is unclear what is what.

Figure 2: What is “meas. Elev”? Please also add units to the x axis labels. Furthermore, if there is so much data above 2,000 m, why is the evaluation in Figure 9 only done for elevations below or above 1,000 m?

Figure 7: As explained earlier, it is strange that the errors are so low, they should not be. When comparing “point data” (ground measurements) with gridded estimates (0.25 deg), there should be much more variability in the errors observed, as one value at 25 km resolution can’t perfectly match all the points in one same grid cell, so there is something odd here.

Figure 11: Clearly a variety of errors here. SWE can not be negative. Let alone in the GlobSnow dataset (panel c). This all needs to be revised as it is clearly wrong. This could explain the strange time series of Figure 10a.

The authors only briefly describe some potential data uses of the dataset in the conclusions, and there is no limitations section at all.

The details about how the data is stored and organised (NetCDF files) should be briefly mentioned at least.

There are a number of language and technical errors throughout the paper. The paper needs to be thoroughly revised.
Citation: https://doi.org/10.5194/essd-2024-349-RC1
RC2: 'Comment on essd-2024-349', Anonymous Referee #2, 26 Feb 2025

The manuscript ‘Global snow water equivalent product derived from machine learning model trained with in situ measurement data’ uses existing machine learning techniques and libraries to develop a daily gridded global SWE product. It uses a random forest approach trained on in situ SWE data, climate variables and land surface information. Unfortunately, I cannot recommend the manuscript for publication at this time. There are too many flaws and insufficient explanations of the methods and SWEML itself is of questionable quality.
First, the in situ ‘SWE’ data sources listed in Table 1 that are supposedly used to train the model do not all provide SWE observations. For example, GHCN provides snow depth, not SWE. I did a quick investigation of the list of sites (SWE_Insitu_data_info.txt) and not all of them actually observe SWE (e.g. some of the NVE sites are snow depth only). Did the authors screen for SWE observations? If so, how? It would be better to only report the number of sites with SWE in Table 1. The background information on the in situ data is insufficient and, at times, inaccurate. See Fontrodona-Bach et al. (2023), Mortimer et al. (2024), Mortimer and Vionnet (2025) as well as the literature already cited in your manuscript.
Second, the datasets used to assess SWEML are not the most appropriate choices. As detailed in Mortimer et al. (2020), which the authors cite in their paper, passive microwave only datasets are poorly correlated with other datasets and are not suitable in most environments – they can provide acceptable performance in open areas with relatively shallow snow. Similarly, although the ESA GlobSnow product provides improved SWE retrievals compared to standalone methods due to the assimilation of in situ snow depths to constrain the retrieval the product still saturates at moderate levels of SWE (~150-200mm). I recommend the authors consider the work of Mudryk et al. (2025) when selecting gridded reference datasets for SWEML evaluation. Both the EO-only products and GlobSnow underestimate in moderate and deep snowpacks and can have large uncertainties in forested areas. This contradicts the statement in your manuscript which says GlobSnow overestimates. Overestimation is usually limited to shallow snowpacks. Thus, if SWEML is a good representation of the true SWE value, you should not expect SWEML to correlate strongly with the AMSR-E products outside of shallow prairie snow.
The authors employ the commonly used Python Sklearn library for their machine learning work which is a justified and reasonable approach. However, in this instance the ‘black box’ situation is a bit problematic because the authors do not appear to have a good handle on the in situ data being fed into the model or of the gridded datasets being used to evaluate their output. Further, the methods described in the text are insufficient to reproduce the study. While I appreciate the authors sharing their code through the GitHub repository this does not replace the need to clearly outline the methods in the main manuscript. Use of a specific python library should be documented in the methods because many of the details provided in the text are specific to that workflow but have little meaning outside of it.
Finally, I am unconvinced about the SWEML product itself or of the added value it provides. The stated motivation is the need for reliable spatiotemporally continuous SWE estimates. However, there are already many gridded SWE products with higher spatial resolution than SWEML so it’s not clear what the added benefit of the product is beyond it being another time series of SWE estimates. In mountain regions specifically, the 0.25° grid spacing is pretty coarse. While machine learning provides a potential avenue for SWE estimation the result presented in this study does not appear to provide added value in terms of a better product than is already available from simple temperature index models. I conducted some preliminary checks of the data and there are nonzero SWE values in locations that do not generally receive snow.
Additional fairly major comments

Did the authors test other implementations? For the selected variables, did you test what happens when you remove a variable or did you rely only on the correlation analysis and literature? Having inspected the data I wonder if the authors might be able to improve the estimates. There is SWE where there shouldn't be any.
How can the China SWE reanalysis be considered the same type of information as the in situ data? Is it because of a lack of data in that region? If so, why not use other regional reanalysis in areas with sparse data coverage? What is the impact of this choice? Does it make sense?

Fig 5 and 6 – Are these comparisons of the machine learning output with data it was trained on? Or was this from a leave-one-out analysis? If it is the former then you would expect good agreement wouldn’t you? This is alluded to in Figure 10, where the mean SWE (if that is what is being presented, which is unclear) of SWEML has the same seasonal evolution of the ‘in situ’ data but is consistently 5-10mm higher.
Evaluation - How did you treat 0/no SWE in your evaluations? If all months and days are being included then the summer months might be artificially improving your evaluation results.
Table 1 - You need to provide references and links to the data. To my knowledge, GHCN does not provide SWE. The variable SNWD is snow depth (mm) while SNOW is snowfall in mm. https://www.ncei.noaa.gov/products/land-based-station/global-historical-climatology-network-daily; https://docs.opendata.aws/noaa-ghcn-pds/readme.html.
Finally, there is no mention of the fact that some of the data you are including are daily observation (e.g. automated sites) while others are more temporally limited (e.g. manual snow transect observations) (see references provided in general comments as well as the papers such as Hill et al. (2019) already cited in your manuscript) and whether this may influence how the model learns by providing more information in certain areas compared to others (this also applies to the confusing decision to include a reanalysis dataset as training data). L102 ‘removed outliers’ needs to be defined.
The literature cited is insufficient and at times misrepresented. In-text citations missing from the reference list.
Inconsistent between 0.25° grid in the text versus 25 km in the data availability and Zenodo. Which is it?
Technical and grammatical errors throughout.
Fontrodona-Bach, A., Schaefli, B., Woods, R., Teuling, A.J. and Larsen, J.R.: NH-SWE: Northern Hemisphere Snow Water Equivalent dataset based on in situ snow depth time series, Earth Syst. Sci. Data, 15, 2577–2599, https://doi.org/10.5194/essd-15-2577-2023, 2023.

Mortimer, C., Mudryk, L., Eunsang, C., Derksen, C., Brady, M., Vuyovich, C.: User of multiple reference data source to cross validate gridded snow water equivalent products over North America, The Cryosphere, 18, 5619–5639, https://doi.org/10.5194/tc-18-5619-2024, 2024.

Mortimer, C., Vionnet, V.: Northern Hemisphere in situ snow water equivalent dataset (NorSWE, 1979-2021), EGUsphere [preprint], https://doi.org/10.5194/essd-2024-602, 2025.

Mudryk, L., Mortimer C., Derksen, C., Elias-Chereque, A., Kushner, P.: Benchmarking of SWE products based on outcomes of the SnowPEx+ Intercomparison Project, The Cryosphere, 19, 201–218, https://doi.org/10.5194/tc-19-201-2025, 2025.

Citation: https://doi.org/10.5194/essd-2024-349-RC2

Status: closed

CC1: 'Comment on essd-2024-349', Jessica Lundquist, 04 Feb 2025

Hi! I just want to provide some references for you regarding your interpretation of your ML dataset's performance over mountain areas relative to many global datasets.
Basically, your model is trained on in situ data, such as Snotel, in mountains, and these in situ data report way more snow than either GLDAS or GlobSnow. Passive microwave remote sensing saturates in snow deeper than 200 mm and thus _underestimates_ mountain snow. GLDAS is too course resolution to resolve orographic precip and thus _underestimates_ mountain snow. Please check out the below references to better understand the variation in global datasets for mountain snow. They vary widely in their accuracy and inherent biases.
Fang, Y., Liu, Y., Li, D., Sun, H., & Margulis, S. A. (2023). Spatiotemporal snow water storage uncertainty in the midlatitude American Cordillera. The Cryosphere, 17(12), 5175-5195. https://tc.copernicus.org/articles/17/5175/2023/
Mortimer, C., Mudryk, L., Cho, E., Derksen, C., Brady, M., & Vuyovich, C. (2024). Use of multiple reference data sources to cross-validate gridded snow water equivalent products over North America. The Cryosphere, 18(12), 5619-5639. https://tc.copernicus.org/articles/18/5619/2024/
Mudryk, L., Mortimer, C., Derksen, C., Elias Chereque, A., & Kushner, P. (2025). Benchmarking of snow water equivalent (SWE) products based on outcomes of the SnowPEx+ Intercomparison Project. The Cryosphere, 19(1), 201-218. https://tc.copernicus.org/articles/19/201/2025/
Wrzesien, M. L., Pavelsky, T. M., Durand, M. T., Dozier, J., & Lundquist, J. D. (2019). Characterizing biases in mountain snow accumulation from global data sets. Water Resources Research, 55(11), 9873-9891.

Citation: https://doi.org/10.5194/essd-2024-349-CC1
RC1:
'Comment on essd-2024-349', Anonymous Referee #1, 25 Feb 2025
This paper describes a novel global snow water equivalent dataset at 0.25 degree spatial resolution and daily temporal resolution, derived from a machine learning model (random forest) trained with in situ SWE data and other land surface and climate variables. The authors describe their machine learning approach and datasets used, and then move on to evaluation of their global SWE product with in situ measurements, as well as comparison with other reference gridded SWE datasets. The authors claim that their dataset is far superior than other reference datasets, when evaluating their dataset against ground SWE measurements.
The idea of trying to generate such dataset from machine learning is great and will hopefully at some point be possible with high accuracy. In fact, it seems that the ML model learned spatial features quite well. However, I have found several big issues in the data that the authors have not presented or discussed at all. After inspecting the dataset and the paper, I think the current approach did not generate the outcome that the authors claim to have generated, especially temporal features of SWE time series. I am sorry that I cannot recommend this data paper to be published. Below I explain the reasons, together with supporting figures in the attached document. Despite my negative review of this dataset/manuscript, I do encourage the authors to either rebut me if I am wrong or to revise their approach completely and try to improve the dataset/paper.
The biggest issue shows up with the evaluation of the dataset. The authors claim their dataset is far superior to other reference SWE datasets based on a few performance metrics (RMSE, MAE, Bias) of their SWE estimates compared to ground observations. Indeed, the average performance values are much better (Table 3 in the manuscript), almost too good to be true, with errors very close to zero or very low. However, I think is an artifact that results from averaging over the entire globe and/or overfitting of the machine learning model. I might be wrong, but I suspect that the issue comes from the division between training/testing/validation of the in-situ data. The authors used a lot of long-term daily time series of SWE (e.g., SNOTEL) to train/test/validate the dataset, and as far as I can see from the model code that the authors shared (see github link they share), they divided the entire dataset randomly, instead of by stations or regions. I assume that the ML algorithm then simply removes random points from a time series, which is then very easily filled by the ML model (as a linear interpolation would equally do). When the time series are evaluated only for those removed points, the performance metrics are great, but a linear interpolation would do equally great.
Whether I am right or wrong with this, there are clear issues with the time series in the SWE dataset, which totally contradict the perfect performance metrics that the authors show in the paper. In fact, the only evaluation they have done is over average values of the entire dataset, and average monthly values in Figure 10. In Figure 11, it is already noticeable that average SWE values can be rather high even in areas that should have very little or even no snow. I plotted Peak SWE for the year 2012 (Figure 1 in attached document), to inspect maximum SWE values, and one can observe that SWE seems overestimated over many areas (from common snow climate knowledge). This is clearer when zooming into those areas. For example, there are large areas in low latitudes (tropical), without mountains, with SWE values that are not zero, which is unrealistic. I have plotted time series from a few points to check how daily time series of SWE look like in the dataset. In Figure 2 in the attached document I plotted SWE time series from SWEML at the exact location of a SNOTEL site, which is used for training. The time series here already have some strange features, with probably too many peaks, but they still look realistic. However, when moving to other areas where no training data was available (Figure 4 in Iceland, and Figure 5 in the Pyrenees), the SWE time series look completely wrong and unrealistic. In Figure 2 I also plotted the example SWE time series over a point in the amazon rainforest, with SWE values ranging around 1 all year long. I suggest that the authors should revise the entire protocol, approach, and machine learning algorithm to understand why the ML model learns the time series in such an erratic way. They should also apply a quality control to the time series generated, to avoid huge unrealistic jumps in SWE (e.g. Figure 4 in attached document), or non-zero SWE where there should not be any snow at all (e.g. Figure 2 in attached document). The authors should also apply a much more extensive evaluation of the time series, to make sure that the ML model did not only learn spatial features but also temporal features.
There are also other big issues that I briefly describe below:
It is unclear what the novelty of this dataset is, besides trying to improve the performance of estimated SWE compared to existing reference datasets. The resolution is still quite coarse, and the authors do not clearly explain which needs from the scientific community this dataset responds to.

Although the dataset includes mountain regions, the resolution is still very coarse for high mountain, but this is not discussed as a limitation at all.

The data clusters generated by the authors approach (Figure 4) seem to follow exactly the spatial patters of different datasets (Figure 2). It would be good to discuss why that is the case, as it looks strange.

In lines 109-111, what does it mean that “we adjusted the raw in situ measurements within individual grid cells to align with the mean and variance of the corresponding SWE data sourced from ECMWF”? So was the validation data “modified”? That could also help explain the low errors.

It is unclear what data is used for validation. Section 2.1.2 seems to say that the reference datasets (gridded datasets) are used for validation, but as far as I understood, the in situ point measurements are also used, so it is a bit confusing. Line 78 mentions “11,653 daily in situ measurements”, line 105 mentions “11,653 grid points” and line 136 mentions “1,706,825 data points from 6,133 sites”. At this point it is unclear what is what.

Figure 2: What is “meas. Elev”? Please also add units to the x axis labels. Furthermore, if there is so much data above 2,000 m, why is the evaluation in Figure 9 only done for elevations below or above 1,000 m?

Figure 7: As explained earlier, it is strange that the errors are so low, they should not be. When comparing “point data” (ground measurements) with gridded estimates (0.25 deg), there should be much more variability in the errors observed, as one value at 25 km resolution can’t perfectly match all the points in one same grid cell, so there is something odd here.

Figure 11: Clearly a variety of errors here. SWE can not be negative. Let alone in the GlobSnow dataset (panel c). This all needs to be revised as it is clearly wrong. This could explain the strange time series of Figure 10a.

The authors only briefly describe some potential data uses of the dataset in the conclusions, and there is no limitations section at all.

The details about how the data is stored and organised (NetCDF files) should be briefly mentioned at least.

There are a number of language and technical errors throughout the paper. The paper needs to be thoroughly revised.
Citation: https://doi.org/10.5194/essd-2024-349-RC1
RC2: 'Comment on essd-2024-349', Anonymous Referee #2, 26 Feb 2025

The manuscript ‘Global snow water equivalent product derived from machine learning model trained with in situ measurement data’ uses existing machine learning techniques and libraries to develop a daily gridded global SWE product. It uses a random forest approach trained on in situ SWE data, climate variables and land surface information. Unfortunately, I cannot recommend the manuscript for publication at this time. There are too many flaws and insufficient explanations of the methods and SWEML itself is of questionable quality.
First, the in situ ‘SWE’ data sources listed in Table 1 that are supposedly used to train the model do not all provide SWE observations. For example, GHCN provides snow depth, not SWE. I did a quick investigation of the list of sites (SWE_Insitu_data_info.txt) and not all of them actually observe SWE (e.g. some of the NVE sites are snow depth only). Did the authors screen for SWE observations? If so, how? It would be better to only report the number of sites with SWE in Table 1. The background information on the in situ data is insufficient and, at times, inaccurate. See Fontrodona-Bach et al. (2023), Mortimer et al. (2024), Mortimer and Vionnet (2025) as well as the literature already cited in your manuscript.
Second, the datasets used to assess SWEML are not the most appropriate choices. As detailed in Mortimer et al. (2020), which the authors cite in their paper, passive microwave only datasets are poorly correlated with other datasets and are not suitable in most environments – they can provide acceptable performance in open areas with relatively shallow snow. Similarly, although the ESA GlobSnow product provides improved SWE retrievals compared to standalone methods due to the assimilation of in situ snow depths to constrain the retrieval the product still saturates at moderate levels of SWE (~150-200mm). I recommend the authors consider the work of Mudryk et al. (2025) when selecting gridded reference datasets for SWEML evaluation. Both the EO-only products and GlobSnow underestimate in moderate and deep snowpacks and can have large uncertainties in forested areas. This contradicts the statement in your manuscript which says GlobSnow overestimates. Overestimation is usually limited to shallow snowpacks. Thus, if SWEML is a good representation of the true SWE value, you should not expect SWEML to correlate strongly with the AMSR-E products outside of shallow prairie snow.
The authors employ the commonly used Python Sklearn library for their machine learning work which is a justified and reasonable approach. However, in this instance the ‘black box’ situation is a bit problematic because the authors do not appear to have a good handle on the in situ data being fed into the model or of the gridded datasets being used to evaluate their output. Further, the methods described in the text are insufficient to reproduce the study. While I appreciate the authors sharing their code through the GitHub repository this does not replace the need to clearly outline the methods in the main manuscript. Use of a specific python library should be documented in the methods because many of the details provided in the text are specific to that workflow but have little meaning outside of it.
Finally, I am unconvinced about the SWEML product itself or of the added value it provides. The stated motivation is the need for reliable spatiotemporally continuous SWE estimates. However, there are already many gridded SWE products with higher spatial resolution than SWEML so it’s not clear what the added benefit of the product is beyond it being another time series of SWE estimates. In mountain regions specifically, the 0.25° grid spacing is pretty coarse. While machine learning provides a potential avenue for SWE estimation the result presented in this study does not appear to provide added value in terms of a better product than is already available from simple temperature index models. I conducted some preliminary checks of the data and there are nonzero SWE values in locations that do not generally receive snow.
Additional fairly major comments

Did the authors test other implementations? For the selected variables, did you test what happens when you remove a variable or did you rely only on the correlation analysis and literature? Having inspected the data I wonder if the authors might be able to improve the estimates. There is SWE where there shouldn't be any.
How can the China SWE reanalysis be considered the same type of information as the in situ data? Is it because of a lack of data in that region? If so, why not use other regional reanalysis in areas with sparse data coverage? What is the impact of this choice? Does it make sense?

Fig 5 and 6 – Are these comparisons of the machine learning output with data it was trained on? Or was this from a leave-one-out analysis? If it is the former then you would expect good agreement wouldn’t you? This is alluded to in Figure 10, where the mean SWE (if that is what is being presented, which is unclear) of SWEML has the same seasonal evolution of the ‘in situ’ data but is consistently 5-10mm higher.
Evaluation - How did you treat 0/no SWE in your evaluations? If all months and days are being included then the summer months might be artificially improving your evaluation results.
Table 1 - You need to provide references and links to the data. To my knowledge, GHCN does not provide SWE. The variable SNWD is snow depth (mm) while SNOW is snowfall in mm. https://www.ncei.noaa.gov/products/land-based-station/global-historical-climatology-network-daily; https://docs.opendata.aws/noaa-ghcn-pds/readme.html.
Finally, there is no mention of the fact that some of the data you are including are daily observation (e.g. automated sites) while others are more temporally limited (e.g. manual snow transect observations) (see references provided in general comments as well as the papers such as Hill et al. (2019) already cited in your manuscript) and whether this may influence how the model learns by providing more information in certain areas compared to others (this also applies to the confusing decision to include a reanalysis dataset as training data). L102 ‘removed outliers’ needs to be defined.
The literature cited is insufficient and at times misrepresented. In-text citations missing from the reference list.
Inconsistent between 0.25° grid in the text versus 25 km in the data availability and Zenodo. Which is it?
Technical and grammatical errors throughout.
Fontrodona-Bach, A., Schaefli, B., Woods, R., Teuling, A.J. and Larsen, J.R.: NH-SWE: Northern Hemisphere Snow Water Equivalent dataset based on in situ snow depth time series, Earth Syst. Sci. Data, 15, 2577–2599, https://doi.org/10.5194/essd-15-2577-2023, 2023.

Mortimer, C., Mudryk, L., Eunsang, C., Derksen, C., Brady, M., Vuyovich, C.: User of multiple reference data source to cross validate gridded snow water equivalent products over North America, The Cryosphere, 18, 5619–5639, https://doi.org/10.5194/tc-18-5619-2024, 2024.

Mortimer, C., Vionnet, V.: Northern Hemisphere in situ snow water equivalent dataset (NorSWE, 1979-2021), EGUsphere [preprint], https://doi.org/10.5194/essd-2024-602, 2025.

Mudryk, L., Mortimer C., Derksen, C., Elias-Chereque, A., Kushner, P.: Benchmarking of SWE products based on outcomes of the SnowPEx+ Intercomparison Project, The Cryosphere, 19, 201–218, https://doi.org/10.5194/tc-19-201-2025, 2025.

Citation: https://doi.org/10.5194/essd-2024-349-RC2

Jungho Seo, Mahdi Panahi, Ji Hyun Kim, Sayed M. Bateni, and Yeonjoo Kim

Data sets

Global snow water equivalent product derived from machine learning model trained with in situ measurement data Jungho Seo, Mahdi Panahi, JiHyun Kim, Sayed M. Bateni, and Yeonjoo Kim https://doi.org/10.5281/zenodo.14195794

Jungho Seo, Mahdi Panahi, Ji Hyun Kim, Sayed M. Bateni, and Yeonjoo Kim

Viewed

Total article views: 1,234 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,064	142	28	1,234	33	57

HTML: 1,064
PDF: 142
XML: 28
Total: 1,234
BibTeX: 33
EndNote: 57

Views and downloads (calculated since 28 Jan 2025)

Month	HTML	PDF	XML	Total
Jan 2025	103	15	3	121
Feb 2025	171	43	5	219
Mar 2025	99	10	1	110
Apr 2025	57	9	1	67
May 2025	51	15	5	71
Jun 2025	38	9	4	51
Jul 2025	49	12	3	64
Aug 2025	89	10	0	99
Sep 2025	329	5	3	337
Oct 2025	44	10	2	56
Nov 2025	34	4	1	39

Cumulative views and downloads (calculated since 28 Jan 2025)

Month	HTML	PDF	XML	Total
Jan 2025	103	15	3	121
Feb 2025	171	43	5	219
Mar 2025	99	10	1	110
Apr 2025	57	9	1	67
May 2025	51	15	5	71
Jun 2025	38	9	4	51
Jul 2025	49	12	3	64
Aug 2025	89	10	0	99
Sep 2025	329	5	3	337
Oct 2025	44	10	2	56
Nov 2025	34	4	1	39

Viewed (geographical distribution)

Total article views: 1,218 (including HTML, PDF, and XML) Thereof 1,218 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 15 Nov 2025

Short summary

This study introduces a machine learning-based daily global gridded snow water equivalent (SWE) product (SWEML) at 0.25° from 1980 to 2020. Comparison of SWEML to other global SWE datasets showed that SWEML exhibited high accuracy, especially in mountainous and high-elevation areas, such as the Rocky Mountains in the US. In addition, SWEML effectively depicted snow accumulation during winters and snow melting during springs.


Total:	0
HTML:	0
PDF:	0
XML:	0