Reconstruction of daily gridded snow water equivalent product for the 1 Pan-Arctic region based on a ridge regression machine learning 2 approach

. Snow water equivalent is an important parameter of the surface hydrological and climate systems, and it has a 9 profound impact on Arctic amplification and climate change. However, there are great differences among existing snow water 10 equivalent products. In the Pan-Arctic region, the existing snow water equivalent products are limited time span and limited 11 spatial coverage, and the spatial resolution is coarse, which greatly limits the application of snow water equivalent data in 12 cryosphere change and climate change studies. In this study, utilizing the ridge regression model (RRM) of a machine learning 13 algorithm, we integrated various existing snow water equivalent (SWE) products to generate a spatiotemporally seamless and 14 high-precision RRM SWE product. The results show that it is feasible to utilize a ridge regression model based on a machine 15 learning algorithm to prepare snow water equivalent products on a global scale. We evaluated the accuracy of the RRM SWE 16 product using Global Historical Climatology Network (GHCN) data and Russian snow survey data. The MAE, RMSE, R, and 17 R² between the RRM SWE products and observed snow water equivalents are 0.24, 30.29 mm, 0.87, and 0.76, respectively. 18 The accuracy of the RRM SWE dataset is improved by 24%, 25%, 32%, 7%, and 10% compared with the original AMSR- E/AMSR2 snow water equivalent dataset, ERA-Interim SWE dataset, Global Land Data Assimilation System (GLDAS) SWE dataset, GlobSnow SWE dataset, RRM SWE product. specific equation of accuracy evaluation

data integrity, their accuracy is poor, and its MAE is 0.65 (Snauffer et al., 2016). The snow water equivalent data from stations 49 and meteorological observations cannot meet the needs of hydrometeorological and climate change research. This is mainly 50 because SWE from stations is discontinuous in time series and severely missing. Further, hydrometeorological studies often 51 require spatiotemporally continuous grid data to be driven (Pan et al., 2003). There are great differences among remote sensing 52 SWE, reanalysis SWE data, data assimilation SWE and observation SWE. For remote sensing SWE, the spatio-temporal 53 characteristics of different passive microwave snow water equivalent data differ significantly due to differences in sensors or 54 retrieval algorithms (Mudryk et al., 2015a). For data assimilation SWE and reanalysis SWE data, they also tend to exhibit 55 different spatio-temporal characteristics due to differences in model design, driving data, assimilation methods, etc. (Vuyovich 56 et al., 2014). In summary, although there are a variety of snow water equivalent data in the world, the data quality is uncertain.

57
Previous studies have shown that all kinds of snow water equivalent data in the Northern Hemisphere have advantages and 58 disadvantages, and none of these data perform well in all aspects (Mortimer et al., 2020). An effective method is to fuse all 59 kinds of snow water equivalent data in time and space, integrate the advantages of all kinds of data, and then generate a 60 relatively complete snow water equivalent dataset. Many scholars have conducted in-depth studies on snow water equivalent 61 data fusion. The main fusion methods can be classified into the following categories: multiproduct direct average (Mudryk et   62 al., 2015b), linear regression (Snauffer et al., 2016), data assimilation (Pulliainen, 2006), "multiple" collocation (Pan et al., 63 2015) and machine learning (Snauffer et al., 2018;Xiao et al., 2018;Wang et al., 2020). Studies have shown that even the 64 simplest multisource data average is more accurate than a single snow water equivalent product (Snauffer et al., 2018). 4 data but also make full use of site observation data to train the sample data, which easily generates snow water equivalent data 73 products with large spatial scales and long time series (Broxton et al., 2019;Bair et al., 2018).

74
In summary, based on the existing snow water equivalent data products, combining a machine learning algorithm to fuse 75 multisource snow water equivalent data is an effective method to prepare snow water equivalent products with long time series 76 and large spatial scales and retain the advantages of single snow water equivalent data products. In this study, we integrated 77 multisource snow water equivalent data products of RRM SWE based on the ridge regression model of the machine learning 78 algorithm. We selected ERA-Interim SWE data, GLDAS SWE data, GlobSnow SWE data, AMSR-E/AMSR2 SWE data, and 79 ERA5-land SWE data with relatively complete time series as the original data for the production of RRM SWE product. The 80 missing parts of the ERA-Interim SWE data, AMSR-E/AMSR2 SWE data, and GlobSnow SWE data are filled by the spatial-81 temporal interpolation method. The GHCN dataset (Menne et al., 2016) and Russian snow survey data (Bulygina et al., 2011) 82 are used as training sample data of "true snow water equivalent", and the effect of altitude on the algorithm is also considered. The research region of the RRM SWE product is located in the land region north of 45° N (hereinafter referred to as the Pan-88 Arctic region) (Fig. 1)   parameters from January 1979 to present (https://apps.ecmwf.int/datasets/data/interim-full-daily/levtype=sfc/). We obtained 103 the snow water equivalent dataset with a daily temporal resolution, a spatial resolution of 0.25°, and NETCDF4 data format.

104
The spatial range of the data is the Pan-Arctic region north of 45°N.

113
The GLDAS is a model used to describe global land information; it contains data, such as global rainfall, water evaporation, 114 surface runoff, underground runoff, soil moisture, surface snow cover distribution, temperature, and heat flow distribution 115 (Rodell et al., 2004). This assimilation system includes data with spatial resolutions of 1°×1° and 0.25°×0.25° and temporal 116 resolutions of 3 hours, 1 day and 1 month. The GLDAS data are available for download from the Goddard Earth Sciences Data 117 and Information Services Center (GES DISC). We obtain a snow water equivalent dataset with the daily temporal resolution, 118 0.25° spatial resolution, and NETCDF4 data format.

208
According to the verification results in Fig. 3 and deviation between the ERA5-land SWE dataset and the measured snow water equivalent is higher than that of the GlobSnow 217 SWE dataset, but its correlation with the measured values was higher than that of the GlobSnow SWE dataset, and its integrity 218 is better in terms of temporal and spatial series. In addition, the overall accuracy of the ERA-Interim SWE dataset and GLDAS 219 SWE dataset is relatively low, but their integrities are higher than that of the GlobSnow SWE dataset and AMSR-E/AMSR2 220 SWE dataset in terms of temporal and spatial series. The AMSR-E/AMSR2 SWE dataset has a higher estimation accuracy for 221 the low-value region of snow water equivalent. Moreover, in the Pan-Arctic region, most of the existing snow water equivalent 222 data products are missing to varying temporal and spatial degrees. Obviously, the accuracies of the existing snow water 223 equivalent products were uneven, and any kind of snow water equivalent dataset is not absolutely perfect.

224
The verification results also indicate the following ranking orders: and it also fills the gap in the original snow water equivalent data in terms of spatial and temporal resolutions.

241
Based on the kernel density estimation method, we analyze the density distribution of different SWE datasets (Fig. 4). The 242 results show that the RRM SWE dataset is closer to the 1:1 line and has the highest accuracy. The RRM SWE dataset is 243 particularly accurate for SWE estimation in the low-value region, and the test data are concentrated near the 1:1 line in the 244 high-density region (kernel density estimation > 0.00015) (Fig. 4). In

293
The relative difference between the RRM SWE data and GLDAS SWE data is the highest, and the relative difference is 294 greater than 80% in most low altitude regions (Fig. 7). The relative difference between the RRM SWE data and the GlobSnow 295 SWE data is relatively small overall, especially in most high latitude areas where the relative difference is less than 10% (Fig.   296   7). Overall, the annual average relative differences of the RRM SWE data and AMSR2 SWE, ERA-Interim SWE, GLDAS 297 SWE, GlobSnow SWE, and ERA5-land SWE are 39%, 41%, 49%, 26%, and 33%, respectively (Fig. 7). Previous studies have 298 shown that the accuracy of snow water equivalent in the Northern Hemisphere estimated by GlobSnow SWE data is higher 299 (Pulliainen et al., 2020), while the spatial distribution pattern of the RRM SWE data is close to the estimation result of

308
Based on the Mann-Kendall trend test (see Fig. 8 and Table 3

329
In this study, we propose a method to fuse multisource snow water equivalent data by a ridge regression model based on 330 machine learning. A new method was utilized to prepare a set of spatiotemporal seamless snow water equivalent datasets of 331 RRM SWE, combined with the original AMSR-E/AMSR2 SWE dataset, ERA-Interim SWE dataset, GLDAS SWE dataset, 332 GlobSnow SWE dataset, and ERA5-land SWE dataset. In the RRM SWE dataset, the time series of the data is 1979-2019, the 333 temporal resolution is daily, the spatial resolution is 10 km, and the spatial range is the Pan-Arctic region.

334
The RRM SWE data product has the best accuracy, especially for the estimation of low snow water equivalent. The accuracy 335 ranking of the snow water equivalent dataset verified by the test dataset is described as follows: RRM SWE > GlobSnow 336 SWE > ERA5-land SWE > AMSR-E/AMSR2 SWE > ERA-Interim SWE > GLDAS SWE. The accuracy of the RRM SWE 337 dataset is higher than that of the existing snow water equivalent products at most elevation intervals. Moreover, the RRM SWE 338 dataset fills in the missing data of the original snow water equivalent dataset spatiotemporally.

339
Compared with traditional fusion methods, machine learning methods have a good advantage. We find that the simple 340 machine learning algorithm not only has high efficiency but also has good accuracy in the preparation of snow water equivalent 341 products on a global scale. Without losing the advantages of existing snow water equivalent products, this method can also 342 make full use of station observation data to integrate the advantages of various snow water equivalent products. The model 343 training process does not rely too much on a specific sample, and this model has a strong generalization ability. In addition,