The NIEER AVHRR snow cover extent product over China – A long-term daily snow record for regional climate research

. A long-term AVHRR snow cover extent (SCE) product from 1981 until 2019 over China has 15 been generated by the snow research team in the Northwest Institute of Eco-Environment and Resources (NIEER), Chinese Academy of Sciences. The NIEER AVHHR SCE product has the spatial resolution of 5-km and the daily temporal resolution, and is a completely gap-free product, which is produced through a series of processes such as the quality control, cloud detection, snow discrimination and gap-filling (GF). A comprehensive validation with reference to ground snow-depth measurements during snow 20 seasons in China revealed the overall accuracy is 87.4%, the producer’s accuracy was 81.0%, the user’s accuracy was 81.3%, and the Cohen’s kappa value was 0.717. Another validation with reference to higher-resolution snow maps derived from Landsat-5 Thematic Mapper (TM) images demonstrates an overall accuracy of 87.3%, a producer’s accuracy of 86.7%, a user’s accuracy of 95.7%, and a Cohen’s kappa value of 0.695. These accuracies Landsat-5 we the frequency distribution characteristics and at flowchart the three-level tree snow discrimination


15
been generated by the snow research team in the Northwest Institute of Eco-Environment and Resources (NIEER), Chinese Academy of Sciences. The NIEER AVHHR SCE product has the spatial resolution of 5-km and the daily temporal resolution, and is a completely gap-free product, which is produced through a series of processes such as the quality control, cloud detection, snow discrimination and gap-filling (GF). A comprehensive validation with reference to ground snow-depth measurements during snow 20 seasons in China revealed the overall accuracy is 87.4%, the producer's accuracy was 81.0%, the user's accuracy was 81.3%, and the Cohen's kappa value was 0.717. Another validation with reference to higher-resolution snow maps derived from Landsat-5 Thematic Mapper (TM) images demonstrates an overall accuracy of 87.3%, a producer's accuracy of 86.7%, a user's accuracy of 95.7%, and a Cohen's kappa value of 0.695. These accuracies were significantly higher than those of currently existing AVHRR 25 products. For example, compared with the well-known JASMES AVHRR product, the overall accuracy increased approximately 15 percent, the omission error dropped from 60.8% to 19.7%, the commission error dropped from 31.9% to 21.3%, and the CK value increased by more than 114 percent. The new AVHRR product is now already available at https://dx.doi.org/10.11888/Snow.tpdc.271381 (Hao et al. 2021).

1 Introduction
Snow cover is closely bound up with our climate. On one hand, owing to snow's unique optical properties (high albedo) it can affect the surface radiation budget severely, and thereby our climate systems significantly (Warren, 1982;Huang et al. 2019). On the other hand, changes in climate in turn affect global 35 and regional snow covers. With the continuous warming of the global climate, snow cover on the Earth has been shrinking evidently over the past several decades (Barnett et al., 2005;Bormann et al., 2018).
Therefore, long-term snow cover data are not only particularly important for climate research, but are also an indispensable indicator of climate change.
Remote sensing is a widely used tool for monitoring snow cover extent (SCE) globally and regionally at 40 various spatial and temporal resolutions (Konig et al., 2001;Dozier and Painter, 2004;Frei et al., 2012;Wang et al., 2014) since the beginning of the satellite era in the 1960s. The Northern Hemisphere Weekly Snow Cover and Sea Ice Extent (NHSCE) product provide weekly SCE with spatial resolutions of about 190 km from 1966 to 1997 (Robinson et al., 1993). Although the time coverage is long, the NHSCE product has a low spatio-temporal resolution, hand-drawn snow line maps, and incomplete spatial 45 coverage due to swath gaps or cloud obscuration, largely restricting its application in climate research.
With the development of satellite sensors, SCE products with high spatial resolution China have been issued in the last decades, such as the Interactive Multi-sensor Snow and Ice Mapping System (IMS), which provides daily SCE with spatial resolutions of 24 km, 4 km, and 1km from 1997 to the present (Helfrich et al., 2007;Ramsay, 1998). The Moderate Resolution Imaging Spectroradiometer (MODIS) 50 provides daily SCE with a spatial resolution of 500 m from 2000 to the present (Hall et al., 2002;Riggs et al., 2017). The Fengyun daily SCE products have a spatial resolution of 1 km from 2003 to the present (Min et al., 2021). These SCE datasets have good quality with a high spatio-temporal resolution, but their short period is insufficient to create a climatological baseline of snow cover.
The Japan Aerospace Exploration Agency (JAXA) recently issued the long-term SCE product JASMES 55 with a spatial resolution of 5 km throughout the Northern Hemisphere. This product consists of satellitederived daily, weekly, and half-monthly averaged global snow covers derived from 5 km resampled radiance data of AVHRR Global Area Coverage (GAC) radiance data onboard NOAA series satellites  and MODIS onboard Terra & Aqua satellites (2000-the present) (Hori et al., 2017).
Although the JASMES product presented a long time series and significantly enhanced spatial and 60 3 temporal resolution, several shortcomings have been found. (1) The JASMES product uses AVHHR before 2000 and MODIS data after 2000. Although calibrated by the authors, the bandwidths of the two sensors are not consistent, and using the same algorithms for both can cause discontinuities in the data.
(2) Previous work showed that the JASMES snow product has an excessive cloud mask, which would cause a considerable number of snow pixels to be misidentification as cloud pixels .

65
(3) JASMES snow algorithm tended to underestimate snow in China, especially on the Qinghai-Tibet Plateau . (4)Finally, JASMES SCE exhibits incomplete spatial coverage caused by clouds and data gaps. These shortcomings limit its application in snow monitoring and climate studies in China. Thus, China still lacks a high-quality, long-term SCE product with complete spatial coverage for climate research. ( 4) A multi-level decision tree for snow discrimination algorithm is applied， which significantly improved snow discrimination accuracy. (5) Improved gap-filling (GF) strategies are adopted to obtain complete snow coverage. (6) Land surface temperature reanalysis is used to exclude the 80 false snow identification. Due to these improvements, the new AVHRR SCE product may serve as a baseline record for climate and other related applications.

AVHRR surface reflectance CDR
The NOAA Climate Data Record (CDR) of AVHRR Surface Reflectance Version 4 (AVHRR SR V4) 85 was used as basic input data. AVHRR SR V4 is generated using AVHRR Global Area Coverage (GAC) Level 1b data through geolocation, calibration, and atmospheric correction, and has latitudinal and longitudinal dimensions of 3600×7200, covering the globe at 0.05° spatial resolution (Vermote et al., 4 2014). The dataset contains surface reflectance, brightness, temperatures, and quality control flags for the period between June 24, 1981, and May 16, 2019. Google established the Google Earth Engine 90 (GEE) cloud computing platform in 2012. GEE enables academics to quickly access massive amounts of remote sensing data without downloading it, which could support scientific analysis and visualization of geospatial datasets with petabyte-scale (Gorelick，2012). In this study, all AVHRR SR V4 images were processed by GEE cloud platform. The reflectance, brightness, and temperature data were described in Table 1. The quality control flags are summarized in Table 3.

Landsat-5 TM snow map
This study used two groups of Landsat-5 Thematic Mapper (TM) maps across China from 1985-2013.
The first group was used as "true" values to acquire the training data of AVHRR surface reflectance. TM snow maps were produced by the improved "SNOMAP" algorithm developed by Chen et al. (2020) for the snow season (beginning on November 1 through March 31 of the following year). Each map 100 contained three classes, namely snow, non-snow, and cloud. Considering sensor attenuation before and after 2000, the algorithm chose different TM images separately. Table 2 shows the number of Landsat-5 TM scenes used for training before and after 2000. The second group of maps was used as ground truth values to evaluate the AVHRR SCE product. A total of 9 Landsat-5 TM snow maps were used as the validation dataset (Fig.1). To ensure reliability and representativeness, the training and validating 105 samples were evenly distributed in three major seasonally snow-covered regions across China mainland, including North Xinjiang, Northeast China, and the Qinghai-Tibet Plateau.

AVHRR Training Samples
Snow and non-snow training samples from the AVHRR were generated from spatially and temporally

115
Ground snow-depth measurements provided by the China Meteorological Administration (CMA) were used to validate the AVHRR SCE products. Daily snow depth was measured near the stations using a professional meter ruler. All measurements were conducted at 08:00 Beijing time when the fractional snow cover in the field of view was more than 50% (C.M. A, 2003). Validation CMA stations were carefully selected because too many non-snow samples can affect the accuracy of assessment. To ensure 120 the validation reliability, the selected CMA stations had ≥ 20 days with true snow (>1cm) at the CMA site per snow season (Metsä mä ki, 2016). Finally, a total of 191 meteorological stations at 38-year periods (from 1981 to 2019; Fig.1) were used to validate the AVHRR SCE products. The available CMA stations were evenly distributed across the three major seasonally snow-covered regions in China.

Ancillary data
125 Che et al. (2008) and Dai et al. (2015) generated snow-depth data by using an inter-sensor calibration of multiple satellites' passive-microwave observations, which provides daily, 0.25-degree snow-depth data for China from 1979 to 2020. And this data set of long-term daily snow depth in China is available at http://data.tpdc.ac.cn. This data set was used as a supplement to the gap-filling strategies. We used the land surface temperature (LST) daily product to alleviate the cloud/snow confusion by averaging the 130 hourly ERA 5 land climate reanalysis dataset on the GEE platform (Muñoz Sabater, 2019). Digital Elevation Model (DEM) data were used as auxiliary data in the cloud and snow discrimination algorithm, mask, and validation. The SRTM DEM product has an original resolution of 90 m and is also available on the GEE. To match with AVHRR products, these products were resampled or aggregated into 5 km.
3 Methodology 135 Figure 2 shows the different steps in the generation of the NIEER AVHRR SCE product. Starting with AVHRR surface reflectance version 4 (AVHRR SR V4) data on the GEE platform, valid observations were selected first by the quality control flags of AVHRR SR V4. Then, an improved cloud detection algorithm was developed to distinguish cloudy pixels, water pixels, and clear pixels. Third, clear pixels were determined as snow-covered or not by a multi-level decision tree, generating a set of AVHRR 140 preliminary SCE records. Fourth, the gaps caused by clouds or invalid observations in the preliminary SCE record were filled with a set of gap-filling techniques, including Hidden Markov Random Field (HMRF)-based interpolation and snow-depth interpolation. Finally, postprocessing based on land surface temperature and DEM was conducted to exclude false snow identifications.

Quality control of AVHRR
Only observations valid in all AVHRR channels were employed to directly generate SCE records by using the quality control bit flags of AVHRR SR V4. Table 3 shows all the quality control information from AVHRR SR V4 and the status of usage in this study. After quality control processing, the valid pixels were used as input for retrieval and the invalid pixels were regarded as gap pixels.

Cloud detection algorithm
In this study, we could not directly adopt the cloudy flags of AVHRR SR V4 due to the obvious cloud overestimation .

160
We adopted the cloud test scheme by Hori et al. (2017), but the critical threshold value of BT37-BT11 was adjusted. As earlier thresholds of BT37-BT11 used a stronger cloud discrimination algorithm and ignored the cloud/snow confusion problem, further optimization was needed to minimize misclassification and the omission of clouds. Therefore, we focused on optimizing the cloud algorithm thresholds. Using the Landsat-5 TM maps for the true values, we trained the frequency distribution 165 characteristics of BT37-BT11 for cloud and snow samples from AVHRR SR. Table 4 shows the cloud discrimination schemes, with ten cloud detection schemes and four non-cloud schemes. With A1 type as an example, Fig. 3 shows the optimal BT37-BT11 determination scheme. Fig. 3 (a) presents the BT37-BT11 frequency distribution of cloud and snow training samples from AVHRR before 2000, and Fig. 3 (b) presents the variation of the overall accuracy at different BT37-BT11 thresholds. Optimum accuracy 170 (84.76%) occurred at the cross-point of snow and cloud frequency distributions, with a BT37-BT11 threshold of 14.5 K. This cross-point also represents a compromise for cloud omission (10.49%) and commission error (19.92%). Thus, the final threshold value was 14.5 K according to the optimal OA, which means that a pixel is classified as a cloud when BT37-BT11>14.5K. Following the same procedure, the optimal BT37-BT11 thresholds were obtained from AVHRR data before and after 2000, as listed in 175 Table 4.

180
(SR3-SR2), NDVI, the normalized difference snow index (NDSI), and BT differences between BT11 and BT12 (BT11-BT12). For snow discrimination, the NDSI was one of the primary tests. The NDSI is usually calculated using the green (around a wavelength of 0.50μm) and shortwave infrared (around a wavelength of 1.60 μm) bands. As there were no shortwave infrared observations around 1.60 μm in AVHRR SR V4, we used the reflectance at 3.7 μm for an NDSI-like calculation, following Hori et al.
To improve the snow discrimination under clear-skies, all decision rules were re-adjusted according to the training samples from high-resolution snow maps. We developed a three-level decision tree algorithm, which obtained the optimal threshold values from the training data. Using Landsat-5 TM data as true 190 values, we obtained the frequency distribution characteristics of each band from AVHRR data in the snow and non-snow areas at SR1, BT11, SR3/SR2, SR3-SR2, NDVI, and NDSI. Figure 4 shows the flowchart of the three-level decision tree snow discrimination algorithm.

1) First-level decision tree
SR1, BT11 combined with DEM, and SR3/SR2, were chosen as first-level discriminators. The main 195 purpose of the first-level decision tree is to exclude pixels that are definitely non-snow pixels. Snow has high reflectance in the SR1 band and low brightness temperature in the thermal infrared BT11 band.
Since the ability to distinguish snow of SR3/SR2 is lower than SR3-SR2 by our training test, the SR3/SR2 was chosen as a first-level discriminator. Based on the frequency distributions of snow and non-snow pixels for the first-level discriminators for Landsat-5 TM maps, a confidence level of 95% of snow 200 samples was set to obtain the threshold value of certain non-snow pixels. As shown in Table 5, for the samples before 2000, SR1 was >0.14 and BT11<274 K when DEM<1300 m, BT11 was <281 K when DEM≥1300 m, and SR3/SR2<0.50 were the possible snow images, while the remaining pixels were non-snow pixels. The potential snow pixels were used as input for the second-level decision tree.

205
NDVI and SR3-SR2 were chosen as second-level discriminators. The second-level decision tree was mainly used to obtain certain snow pixels from the possible snow pixels. Based on the frequency distributions of snow and non-snow pixels from potential snow pixels processed by the first-level decision tree, a confidence level of 99% of non-snow samples was set to obtain the threshold value of certain snow pixels. For the samples before 2000, a pixel was classified as certain snow when NDVI < -210 0.16 or SR3-SR2 < -0.81 (Table 5). Other pixels were considered the potential snow pixels, which were used as input for the third-level decision tree.

3) Third-level decision tree
NDSI was used as the third-level discriminator due to its excellent discrimination ability of snow cover and other land covers. Based on the frequency distributions of potential snow pixels derived from the 215 second-level decision tree, the optimal NDSI threshold value was calculated by a method similar to that of the cloud test. Figure 5 shows the optimal NDSI scheme. Fig.5 (a) presents the NDSI frequency distribution histogram of snow and non-snow pixels. The cross-point of snow and non-snow that has the highest overall accuracy (85.87%) was chosen as the optimal NDSI threshold (0.73), as shown in Fig   5(b). The cross-point also represents a compromise for the snow omission (15.83%) and commission 220 error (13.03%). Thus, pixels with NDSI>0.73 were identified as snow for the samples before 2000.
Following the same strategy, optimal snow discrimination threshold values were obtained from AVHRR data before and after 2000 (Table 5). Using the above-mentioned algorithm, we produced the AVHRR preliminary SCE record for China based on the AVHRR SR V4.

225
For daily AVHRR preliminary SCE records, gaps due to frequent cloud obscuration or swath gaps remained serious. Two gap-filling strategies described below were used to generate a spatially complete daily AVHRR SCE record.

HMRF-based spatio-temporal modeling
Here, we present a spatio-temporal modeling technique for filling up gap pixels in daily snow cover 230 estimates based on the time series of AVHRR preliminary SCE records. The spatio-temporal modeling technique integrated AVHRR preliminary SCE record spatial and temporal contextual information within a HMRF model (Melgani and Serpico, 2003). Initially, Huang et al. (2018) utilized HMRF based spectral information, spatio-temporal information, and environmental information to reclassify snow and nonsnow classes by MODIS snow products. In our study, only used the spatio-temporal information for 235 filling up gap pixels. The core of this method is computing the spatio-temporal cubic energy function for every gap from the neighborhood pixels and further classifying the gap pixels as snow pixels, non-snow pixels, or still gap pixels using where T U is the total energy function of belonging to the class of  we first calculated U(β1) and U(β2) based on a spatio-temporal, surrounding cube with 3 rows × 3 columns 245 × 3 days. If U(β1) was > U(β2), gap pixels were classified as snow pixels. Otherwise, they were classified as non-snow pixels. If U(β1) = U(β2) or there were not sufficient valid pixels for calculating U(βn), we extended the spatio-temporal neighborhood to 3 rows × 3 columns × 5 days. If there were still insufficient valid pixels, the spatio-temporal neighborhood was expanded to 5 rows × 5 columns × 5 days. If the strategy above failed, gap pixels were maintained.

10
The HMRF-based modeling provided a rigorous interpolation framework for optimally integrating spatial-temporal contexts. To test the effect of HMRF-based interpolation for gap pixels, we used the monthly average gap ratio of the AVHRR preliminary SCE record from 1981 to 2019 before and after HMRF-based interpolation (table 6). The gap ratio of the AVHRR preliminary SCE record before HMRF-based interpolation was within 40% -60% (average: 47.8%), and the gap ratio after HMRF-based 255 interpolation ranged between 0.2% and 6.4% (average: 2.7%). Almost 90% of gap pixels could be reduced. The HMRF-based spatio-temporal model significantly improved the practicability of the AVHRR SCE product.

Interpolation based on passive microwave snow-depth data
Although most gap pixels were filled after interpolating the HMRF-based spatio-temporal model, there

260
were still ~6% gaps left in the daily SCE data. Therefore, a fusion method combining the passive microwave daily snow-depth data and the AVHRR snow cover data was performed for these residual gap pixels. The passive microwave daily snow-depth data (25 km) were resampled to the same cell size as the AVHRR data (5 km) by the nearest neighbor interpolation method. If collocated snow depth was ≥ 2-cm, the gap was considered a snow pixel. Otherwise, it was considered a non-snow pixel (Hao et 265 al., 2019).

Postprocessing based on surface temperature and DEM
Because of their similar optical properties, ice-cloud pixels are sometimes mistaken for snow pixels, which will result in artifact snow covers in Southern China even during summers, where and when snow is impossible. Referencing the MODIS algorithm, the postprocessing adopts LST products of ERA5 270 reanalysis and DEM to eliminate these snow pixels. The corresponding thresholds are given as below: the pixel is reclassified as snow-free when LST is ≥ 275 K, and DEM is ≤ 1300 km, or LST is ≥ 281 K, and DEM is ≥ 1300 km.

275
A confusion matrix similar to that given in Table 7 is used to assess all associated AVHRR SCE data in the paper. Four kinds of accuracy metrics were used in this study followed on the previous studies (Dong et al., 2014;Zhang et al., 2019), including the OA, the producer's accuracy (PA), the user's accuracy (UA), and Cohen's kappa (CK) value. The OA is the fraction of the correctly detected cases and all cases.
The PA measures the probability of correctly detected snow cases by AVHRR in the actual snow cases.

280
The UA measures the proportion of true snow cases in all the detected snow cases by AVHRR. The sum of PA and omission error equals one, and the sum of UA and commission error equals one. (Arsenault et al., 2014). CK value is an overall measurement of the agreement and is considered a more robust metric than OA (Cohen, 1960;Powers and Ailab, 2011).

285
As mentioned above, we will use 38-year CMA ground snow-depth measurements at 191 stations to validate the new NIEER AVHRR SCE product.

290
(1977), this would place the level of agreement as "substantial". All reveal on a whole the new NIEER AVHRR product is accurate and has a good agreement with measurements of CMA stations.
To validate the stability and reliability of the NIEER AVHRR SCE product, Fig.7 presents the four accuracy metrics' annual fluctuation over the past 38 years. The OA ranged within 80%-90%, the PA and UA ranged within 70%-90%, and the CK value ranged from 0.61 to 0.8. Several considerable annual 295 fluctuations mainly occurred in 1993, 1994, and 2017, which were mainly caused by the poor quality of raw satellite data rather than the algorithm. In summary, the product maintained a higher precision with small annual fluctuations，which indicated the effectiveness and stability of the training framework with different thresholds before and after 2000.

Validation with Landsat-5 TM SCE maps
The measurements from CMA stations can provide time-continuous validation. However, the "point to area" evaluation method ignores the spatial heterogeneity of satellite images within one pixel (Huang et al., 2011). The snow condition of an individual CMA station may not represent the larger area viewed 310 by AVHRR. The "area to area" method using higher-resolution images has pointed out a good way to assess snow spatial distribution of AVHRR SCE product.
In the study, 9 Landsat-5 snow maps were used to further evaluate the NIEER AVHRR product. Table 9 gives the validation results of our maps versus the Landsat-5 TM SCE maps. The OA was as high as 87.3%. The high UA and low PA revealed that the product has a slight tendency to underestimate the 315 snow cover extent. The CK value (0.695) of the 'area to area' method also demonstrated 'substantial' agreement, which was close to that of ground measurements validation (0.717). Therefore, no matter from either point of view (ground measurements) or area of view (Landsat-5 SCE maps), the NIEER AVHRR product is accurate. In general，the NIEER AVHRR SCE product is promising to better serve the climatic and other related studies in China.
320 Figure 9 further displays three intuitional examples demonstrating the detailed difference between NIEER AVHRR SCE maps and Landsat-5 SCE reference maps. The three images (serial number "C1, C5, and C8") were located in Northeast China, the Qinghai-Tibet Plateau, and North Xinjiang, respectively. It was clear that the NIEER AVHRR SCE maps agree much better with higher-resolution snow maps in a wide range of snow-covered areas. However, in the boundaries of snow-covered areas, 325 the NIEER AVHRR SCE maps failed to identify most snow pixels in the Landsat-5 SCE maps, which could be explained by the low ability of our product to detect low fractional snow-covered pixels.

Uncertainties of the NIEER AVHRR SCE product
The validation based on both CMA stations and Landsat TM images indicated that the NIEER AVHRR 330 SCE product performs well for large and deep snow-covered. To explore the uncertainties of our product in the thin snow-covered areas, we set different snow depth (SD) thresholds based on CMA measurements to further evaluate the NIEER AVHRR SCE product. Figure 10 shows the accuracy 13 metrics of the product under different SD thresholds (SD≥1 cm, SD≥2 cm, SD≥3 cm, SD≥4 cm, and SD≥5 cm).

335
The results showed that the OA, UA, and CK values of the product decreased with increasing SD thresholds. While the PA values of the product increased with the increase of SD threshold. As SD increased, the UA presented a sharply decreasing trend and PA presented a slightly increasing trend. On a whole, OA and CK values showed a significant decreasing trend. We can see our algorithm performed well at lower SD thresholds, which indicated the product has a better recognition ability for shallow 340 snow.
According to the snow cover temporal distribution feature in China, three seasonal snow periods were defined, i.e., the snow accumulation period, stable snow period, and snow melting period. The snow accumulation period is November. The stable snow period ranges from the beginning of December of the year to the end of February, and the snow melting period is March. Figure 11 presents the accuracy 345 results of the NIEER AVHRR SCE product in different snow periods. The OAs of the accumulation period (87.7%), stable period (86.7%) and melting period (89.0%) showed a similar response. However, the PAs, UAs and CK values of the accumulation and melting periods were markedly lower than those of the stable snow period. The product had the highest omission errors (29.5%) during the accumulation period because of the mixed pixels in the early snowfall seasons; while the product had the highest 350 commission error (30.3%) during the melting period due to the influence of wet snow.

Comparison of NIEER AVHRR and JASMES SCE product
To more objectively assess our product, we compared the NIEER AVHRR SCE product with JASMES SCE products. Since the JASMES SCE product was only generated by AVHRR data from 1981 to 1999, comparisons were made against the same ground snow-depth reference measurements in 19 snow 355 seasons (1981)(1982)(1983)(1984)(1985)(1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999). 360 percent, the omission error dropped from 60.8% to 19.7%, the commission error dropped from 31.9% to 21.3%, and the CK value increased by more than 114%. The JASMES product markedly underestimated the snow in China. In addition, there were about 50 thousand validation samples in our product and only about 36 thousand SD measurements in that of the JASMES product. Thus, our product should fill more gap pixels than JASMES. On the whole, the snow and cloud detection algorithm and the gaps-filled 365 strategy of our product performed better than those of JASMES.
To better figure out the spatial distribution difference between the two sets of products, comparison maps were constructed for November 15, 1985. Figure 12 presents the two SCE maps and their difference.
There were significant differences in mapped snow extent between the two maps in the three major seasonal snow regions in China, i.e., North Xinjiang, Northeast China, and the Qinghai-Tibet Plateau.

370
Our product mapped more snow in North Xinjiang, the Qinghai-Tibet Plateau, and the non-forest area in the Northeast of China than JASMES. The most considerable discrepancy occurred on the Qinghai-Tibet Plateau, where our product identified more snow-covered areas than JASMES. JASMES maps had more snow in the forested area of Northeast China than our product. Three improvements contributed to this phenomenon. Firstly, the snow algorithm proposed improved snow discrimination accuracy and reduced 375 omission errors largely. Secondly, the cloud detection algorithm effectively improved the cloud-snow confusion, which identified the snow pixels that were misidentified as clouds pixels in the JASMES.
Thirdly, the gaps-filled strategy provided complete spatial coverage of snow cover.

Data availability
The NIEER AVHRR SCE product was named in a manner of NIEER_GF AVHRR 380 SCE_yyyymmdd_DAILY_5km_V01 (V01 denotes the first version). It has a spatial resolution of 5 km and a daily temporal resolution. It spans latitude 16-56°N and longitude 72-142°E, and now is freely accessible at https://dx.doi.org/10.11888/Snow.tpdc.271381 (Hao et al., 2021). Detailed information on the product is listed in Table 11. The values in the product are classified as non-snow (0), snow from AVHRR (1), snow from HMRF (2), snow from SD (3), water (4), and filling value (255).

15 7 Conclusions
In this study, a daily AVHRR SCE product with a spatial resolution of 5 km across China mainland from 1981 to 2019 has been generated by the snow research team in the NIEER, Chinese Academy of Sciences.

390
The NIEER AVHRR SCE product used a multi-level decision tree algorithm for cloud and snow discrimination and an improved GF technique. The product was validated using snow depth measurements provided by the China Meteorological Administration and higher spatial resolution SCE maps derived from Landsat-5 TM.

405
JASMES product, the NIEER product OA increased approximately 15 percent, the omission error dropped from nearly 60% to 19.7%, the commission error dropped from 31.9% to 21.3%, and the CK value increased by more than 114%. Accordingly, the NIEER AVHRR product had a higher accuracy than the JASMES product. Furthermore, the NIEER product provides a completely gap-free product for China, permitting its wide applications.

410
Finally, we assessed the behavior of the NIEER AVHRR product during the snow accumulation, stable snow, and melting periods. The SCE performed best during the stable period, and the product was more accurate in the snow accumulation than the melting period. In general, the algorithm had a relatively high ability to identify shallower snow, but some uncertainties existed in patchy snow areas, regarding thinner snow, and in rugged terrain areas. As a long-term record, the dataset will provide a valuable data source 415 for analyzing the influence of climate changes on the cryosphere on multiple time scales.     shows the determination of optimal threshold for cloud detection.