The AntSMB dataset: a comprehensive compilation of surface mass balance field observations over the Antarctic Ice Sheet

,


Introduction
Under the background of rapid global warming, wide international concerns have been arouse on changes in the Antarctic Ice Sheet (AIS) mass balance, which positively contributed 14.0±2.0 mm to global sea level rise over 1979(Rignot et al., 25 2019. Antarctic mass balance is dependent on the partitioning between ice discharge into the ocean and net snow accumulation at the surface, i.e., surface mass balance (SMB). Recent negative mass balance of the ice sheet reflects larger ice dynamical loss than mass gain from SMB (e.g., Shepherd et al., 2012;Shepherd et al., 2018). Despite the responsibility of ice discharge for Antarctic mass balance on the decadal or longer time scales, considerable inter-annual variability is largely determined by fluctuations in SMB (Rignot et al., 2019). Because annual net mass input into the entire ice sheet through snowfall is equivalent 30 to about 6 mm global sea level decline (IPCC, 2019), any small fluctuations in the Antarctic SMB can even result in large variability and trends of global sea level.
SMB is defined as the sum of precipitation, surface and drifting snow sublimation, erosion/deposition caused by drifting snow, and surface meltwater run-off. Since the first international polar year 1957/58 (IPY), a number of scientific Antarctic 35 traverses/expeditions have been performed with the goals of SMB measurements by means of stakes, ice cores/snow pits, ultrasonic sounders, or ground-penetrating radar (GPR) (e.g., Isaksson and Melvold, 2002;Mayewski et al., 2005). Due to logistical constraints in the harsh environment, gaps in the spatial coverage of SMB measurements are still large, and longterm samplings are also scarce (Favier et al., 2013). As a result, substantial caveats have been encountered when quantifying SMB at the ice sheet scale by using simple interpolation of these observations (Magand et al., 2008;Genthon et al., 2009). 40 Climate models and various atmospheric reanalysis products provide an important choice to assess SMB for large areas. The outputs of regional climate models have been used to calculate ice sheet SMB in recent decades by a wealth of Antarctic mass change estimate studies (e.g., Rignot et al., 2011;Shepherd et al., 2012;Rignot et al., 2019). However, these simulations depend on ground-based observations to improve their accuracy and resolution. Before application, the model's performance need to be carefully assessed based on in situ observations, as done by some previous studies (Medley et al., 2013;Wang et 45 al., 2015;Van Wessem 2018;Agosta et al., 2019;Wang et al., 2020). To improve the ice sheet SMB estimates, field measurements have been used by cross comparison with remotely sensed data (Arthern et al., 2006), or outputs of the climate models (e.g., Monaghan et al., 2006;Van de Berg et al., 2006;Medley et al., 2019;Wang et al., 2019). Thus, it is still pivotal to compile all available observations from the past to present to better estimate spatial and temporal variability in SMB, and to constraint climate models and remote sensing algorithm. 50 Vaughan and Russell (1997) performed the pioneering work to compile all multi-year averaged SMB field measurement data over the AIS, and this compilation was detailly introduced by Vaughan et al. (1999). However, according to Magand et al. (2007), this dataset includes a lot of unreliable data, and should be used with caution. To improve this, Favier et al. (2013) updated the database using the new field measurements carried out during 1999-2012, through a quality control proposed by 55 Magand et al. (2007). Recently, several compilations of SMB measurements at annual resolution have been published (e.g., Mayewski et al., 2013;Altnau et al., 2015;Thomas et al., 2017, Montgomery et al., 2018. In spite of numerous field measurements in these datasets, most cover only limited area of the AIS. In particular, these datasets missed a large amount of annually resolved stake/stake farm observations, such as data from the Japanese Antarctic Research Expedition (JARE), South Pole and Vostok, and so on. In addition, available SMB measurements derived from GPR are not or at least not fully collected 60 into these datasets. Furthermore, all available ultrasonic sounder data from automatic weather stations (AWSs) at daily or higher resolution have not been compiled until now.

3
In this study, our objective is to generate a comprehensive SMB database for Antarctica, using all available measurements by means of stake or stake network, snow pit or ice core, GPR, and ultrasonic sounder, with the control of data quality. This 65 dataset includes SMB measurements at daily, annual, and multi-year resolutions, which can be applied for validation and calibration of climate models and remote sensing, developments of remotely sensed algorithm, examination of spatial and temporal patterns in Antarctic SMB and estimate of the drivers of SMB changes across multiple scales. As a case of model validation, we make a comparison of the dataset with ERA5 reanalysis.
2 Description of the AntSMB dataset 70

Data collections and sources
We compile the dataset of SMB measurements over the AIS by searching the literature and public data portal platforms (e.g., the National Snow and Ice Data Center, NSIDC, PANGAEA and World Data Service for Paleoclimatology, NOAA), by collecting the supplements of publications, and by asking individual data generators to contribute their field measurements by email. If two or more request emails were not replied, we consider the data to be unavailable for the public, and thus they are 75 not included in this dataset.
The new data resources of the records in the database include GPR measurements over West Antarctic coastal zones during 2010-2017 (Dattler et al., 2019), over the Thwaites Glacier in 2009(Medely et al., 2013, and along between Dome C and Vostok in 2012 (Le Meur et al., 2018), respectively ( Fig.1). They cover the area of 22025 km 2 . A large amount of new stake 80 measurements were acquired by revisiting the traverses from Zhongshan Station to Dome Argus (Ding et al., 2015), from Syowa Station to Dome Fuji (Motoyama et al., 2015), and between Progress Station and Vostok Station (Khodzher et al., 2014). In addition to a new long-term ice core SMB records at the South Pole (Winski et al., 2019), this dataset includes previously published but unreleased time series of SMB records from ice cores drilled over the Lambert Glacier Basin (Xiao et al., 2001;Li et al., 2009;Ding et al., 2017). Furthermore, an important update of annually resolved SMB data results from 85 the continuous stake network measurements performed at the South Pole, Vostok, and six sites of the transverse between Syowa Station and Dome Fuji. In addition, this is the first public release of the published high-resolution ultrasonic sounder observations on Berkner Island (Reijmer et al., 1999;Reijmer and Van den Broeke, 2003), Dronning Maud Land (Van den Broeke et al., 2004), East Antarctic Plateau (Reijmer and Van den Broeke, 2003), and Chinese transvers from Zhongshan Station to Dome Argus (Liu et al., 2019), which are very useful for the investigation of intra-annual and seasonal cycles of 90 SMB. The other records of the database are obtained from existing SMB data compilations, including the multi-year averaged SMB measurements by Favier et al. (2013) and Wang et al. (2016), time series of ice core records at annual resolution by Mayewski et al. (2013), Altnau et al. (2015) and Thomas et al. (2017), and SMB component measurements over the Antarctic Ice Sheet and Greenland Ice Sheet (SUMup dataset) by Montgomery et al. (2018)..

Selection criteria 95
In order to establish a comprehensive, complete and quality-controlled AIS SMB product for a variety of scientific application, quantitative criteria are designed for record inclusion in the database to centre on the high-resolution and well-dated records, and to optimize data spatial coverage. The criteria are as follows.
Firstly, the records must be published through peer-review or publicly available. The duration and temporal resolution of the 100 records vary by the measurement types. We select the ultrasonic sounder records with the minimum duration of one year. For annually resolved archives (ice core, stake and stake network measurements), the duration of records included in this dataset should be at least 10 years, but smaller than 1000 years. For the multi-year averaged observations, the included records for average span more than 3 years, which are the minimum number of years for an accurate estimation of the mean local SMB with the uncertainty of smaller than 10% (Magand et al., 2007). 105 Secondly, the essential parameters for each SMB data are provided, including location, measurement methodology, data time coverage, and references to the primary data sources.
Thirdly, the different kind of records are quality-checked to the highest degree as possible, and then selected into the dataset. 110 1) To ensure the multi-year averaged SMB data reliable at each site, we select the data determined by the anthropogenic radionuclides and volcanic horizons with errors of smaller than 10%, or stake measurements for more than three years, as suggested by Magand et al. (2007). The records with dating based on both stable isotopes and chemical markers, and natural radionuclide are reliable (Magand et al., 2007), and thus included in the dataset. We also include the available GPR-based snow accumulation rate data, because their uncertainties can be below 5% at a firn depth of 10 m, and decrease with the 115 increase of the depth after post-processing including interpretation of reflectors, correct density estimates, and proper calibration with ice cores (Spikes et al., 2004;Eisen et al., 2008).
2) SMB records of annually resolved ice cores should be either cross-dated or layer-counted. Their chronology should include at least two age control points, with one near the youngest part and another near the oldest part of the time series (Stenni et al., 2017). Also, they must be confirmed by the data generator. Furthermore, ice core SMB records are corrected for the impact of 120 firn density and the vertical strain rate profile (Thomas et al., 2017).
3) The preliminary quality control for AWS snow accumulation data has been performed by data owner by means of removing the null measurements and physically anomalous snow accumulation data (i.e., data outside of the initial and final accumulation values) (e.g., Braaten et al., 1997;2000). Some high-frequency noises still occur in the AWS snow accumulation data. To reduce the noises, we discard the data points outside of one standard deviation of a running daily value as done by Fountain et 125 al. (2010), and Cohen and Dean (2013).

Stakes
Stakes are the easiest and most traditional way to measure SMB. After placing a stake vertically in the snow or ice, relative variations of snow surface heights over a certain period can be determined by repeated measurements of the distance between 130 the top of the stake and the surface. Changes in snow heights are multiplied by snow density to yield the corresponding SMB.
This simple method has been widely applied over Antarctica by almost all national glacier surveys. However, in most cases, spatial representativity of a single stake records is very limited due to large natural spatial variability, and small-scale disturbance from post-depositional effects such as the interactions between the stake and local wind. To reduce the related uncertainties, stake lines along a transect or stake farms are often used (e.g., Frezzotti et al., 2005;Kameda et al., 2008;Ding 135 et al., 2011). In particular, these measurements are useful for the investigation of the spatial distribution of SMB at the scale of less than kilometer.
Given the repeated measurements, stake observations are only performed over the easily accessible regions. Due to logistic constraint in the extreme environment of Antarctica, the time span for the measurements usually ranges from 1 year to several 140 years or even more.

Snow pits/ice cores
Snow pits and ice cores are used to construct SMB changes in time by determining the age and density of different layers. The dating is dependent on the different time markers preserved in the column of snow pit and ice core. Annual layer is dated through counting of seasonal changes in various parameters including the visual stratigraphy, the oxygen and hydrogen isotopic 145 composition, major chemical ion content, hydrogen peroxide, electric conductivity, and so on. When intergraded with the prominent horizons of known age from volcanic or radioactive markers, accuracy of dating is largely improved and results in time series of annual snow accumulation. Furthermore, the valuable reference horizons can be also used for the estimation of the SMB between horizons.

150
In Antarctica, counting annual layer based on the seasonal variations of multiparameter records combining with reference horizons can calculate annual SMB on the high accumulation zones. However, seasonal cycles can hardly be identified at regions with low accumulation of smaller than 100 kg m -2 yr -1 , especially for the East Antarctic Plateau. Thus, reference horizon may be the most reliable dating method at the low accumulation area, and only yields a mean SMB between two reference horizons. 155 6 2.3.3 GPR GPR maps firn stratigraphy along a profile from the surface, and the radar identified firn layer with equal age along the continuous profiles can allow to gain a detailed insight into SMB patterns. To calculate SMB, the isochronous layers must be well dated, which is usually dependent on complementary depth-age of highly resolved ice core records along the radar profile.
During the past few decades, grounded GPR has been widely used for the estimation of spatial variation of recent and historical 160 SMB over Antarctica (e.g., Frezzotti et al., 2007;Anschutz et al., 2008;Müllerr et al., 2010). Most recently, the newly developed airborne radar systems provide the revolutionized SMB measurements over the Antarctic Ice Sheet (Kanagaratnam et al., 2004(Kanagaratnam et al., , 2007. It can robustly resolve the stratigraphy at the shallow (10 m Relative to point measurements such as stakes, snow pits/ice cores, the advantage of GPR observations is to yield a more accurate representation of spatial variations of SMB. Furthermore, the radar images of deep internal horizons allow us to 170 quantify long-term variability in SMB. The errors of GPR-based SMB observations are associated with the depth and age of the reflector, and extrapolation of density along the radar profile. The resulting uncertainties were estimated to be about 4% of the calculated SMB at a firn depth of 10 m, and about 0.5% at the depth of 60 m after the calibration of depth and layer thinning, and robustly dating (the isochronal accuracy of about 1 year) (Spikes et al., 2004).

AWS 175
In Antarctica, some AWSs equipped with ultrasonic sensors measure snow surface height changes by detecting the vertical distance to the surface. Combining with density observation, snow height changes can be converted to SMB. Despite the poor quality occasionally when the blowing snow or fog happens, this method can continuously yield a high (typically hourly) temporal resolution records of SMB (van den Broeke et al., 2004;Gorodetskaya et al., 2013), which can be utilized to identify individual accumulation/ablation events, to quantify seasonal cycle of snow accumulation, and also to calculate the surface 180 energy balance coupled with other AWS observations. Same as single ice core or stake observation, AWS measurements represent a single location, and spatial representativity is possibly limited. In addition, after collection of raw snow height data, the temperature-dependent speed of sound correction must be performed. The uncertainty of AWS height measurements is estimated to be ±1 cm or 0.4% of the distance to the 185 surface. This means that the measurements are not sufficient to examine the smaller snow accumulation events usually occurring on the interior of East Antarctic Plateau. 7

Structure and metadata
The AntSMB dataset includes three subsets, i.e., (1) multi-year averaged SMB observations from stakes, ice cores and GPR 190 measurements, (2) annually resolved SMB measurements by ice cores, stakes and stake networks, and (3) AWS daily snow height measurements. To facilitate data reuse, subsampling and re-analysis for scientific research efforts, each record in the three sub-datasets include some essential information, i.e., the name of measurement sites, site locations, measurement method, time coverage of the measurements, and citations. Site locations include latitude, longitude and surface elevation. Each location of the measurements is in units of decimal degrees relative to the WGS84 ellipsoid. As listed in the dataset's metadata, 195 measurement techniques include firn/ice core, snow pits, stake or stake network, ultrasonic sounder and GPR. Table 1 summarize the essential information for each measurement. Uncertainties of any measurement methods have been discussed in detail by Eisen et al. (2008).
Among the three subdatsets, the number of the multi-year mean SMB subdataset is largest, including unique measurements 200 by radar isochrones with the coverage of 22025 km 2 , 2276 stake measurements, and 1303 ice core and snow pit observations Annually resolved SMB subdataset contains 687 time series of records, of which 79 records come from the compilation of ice core snow accumulation by Thomas et al. (2017), and 26 from the shallow firn core records in Dronning Maud Land (DML) 210 collected by Altnau et al. (2015). Continuous stake surface height measurements at sub-annual resolution are available for the transverse route from Syowa Station to Dome F since the 1970s (Motoyama et al., 2015). We converted the measurements to SMB for the subdataset by multiplying snow height changes by snow density estimated from Wang et al. (2015).
AWS snow accumulation data are measured by the determination of the variations of the vertical heights between the sensor 215 and snow using surface ultrasonic height rangers. The measurements are performed at 32 sites, of which ten are located at Dronning Maud Land, seven at the Ross Ice Shelf, and four along Chinese transverse route from Zhongshan Station to Dome A. The comprehensive observed SMB database collects SMB field data at the daily, annual and multi-year scales from the whole AIS. Spatial distribution of the records is uneven within Antarctica. AWS snow accumulation measurements cover a wide range of areas, including coastal zone of East Antarctica (McMorrow et al., 2001) and West Antarctica (van Lipzig et al., 2004b) with high snow accumulation, the dry East Antarctic Plateau (Reijmer and Broeke, 2003;van den Broeke et al., 2004), Ross Ice Shelf (Cohen and Dean, 2013) Figure 1b). However, large parts of Antarctic interior with low snow accumulation remain undocumented, which is easily understood because the seasonal stratigraphy in ice cores is almost unavailable at the regions with the accumulation of smaller than 100 kg m -2 yr -1 (Frezzotti et al., 2007;Frezzotti et al., 2013). Compared with SMB compilation by Favier et al. (2013), spatial coverage of the multi-year SMB subdataset has greatly 230 improved, especially for West Antarctica and the Antarctic Peninsula. Despite the improvement of spatial distribution, SMB records are still poor for the region from the Ronne Ice Shelf via the South Pole to Dome C, and for the coastal zone of East Antarctica.

Temporal variability in the SMB records
The records in the comprehensive SMB dataset cover different time spans, ranging from a minimum of 1 year to a maximum 235 of 1000 years. The covered time periods are closely associated with the measurement method. AWS provides very highresolution measurements of snow height changes, but the records generally span only a few years (1-18 years). Although a significant advantage of ice cores is to record SMB changes over the long timespans, it is difficult to perform these observations at a high spatial density. Stake farms are the easiest method to observe SMB, but continuous measurements are available between several years and tens of years, largely due to the logistic constraints in the extreme Antarctic environment. GPR can 240 detect the local SMB from the last tens of years to about 1000 years along continuous profiles of the snowpack. The temporal resolution of GPR measurements is dependent on the age estimates of reflection horizons, and the resulting records in our dataset ranges from decadal to centennial.
For annually resolved SMB subdataset, of 183 time series from ice core and stake network measurements, 47 span from 1801 245 onwards (Fig. 3a). The number of time series peaks during the early 2000s when ice cores were retrieved in Dronning Maud Land (Altnau et al., 2015) and West AIS (Mayewski et al., 2013). Prior to 1800, the number of time series decreases greatly, with only ten with the duration beyond the past 500 years, and five beyond the past 1000 years (Fig. 3a). The sharp decline since the mid-2000s results from a lack of coring efforts. Annually resolved stake measurements cover the past 40 years, peaking from the mid-1990s to the early 2000s (Fig. 3b). 250 9 For the multi-year averaged SMB subdataset, 83% of the records with the exception of radar measurements cover less than 20 years, and 43% span less than 5 years. Figure 3c presents the distribution of years when these records were measured from 1950 onwards. The distribution of the measurements is relatively even, until the 1990s when the number of samples increase.
The temporal coverage of radar observations ranges from 25 years to 185 years. 255

Inter-comparison of the different types of SMB measurements
The dataset compiles the different types of SMB measurements including ice cores/snow pits, stakes, ultrasonic sensors and GPR approaches. It is critical to investigate if the resulting data have systematic discrepancy due to the distinct measurement methods. In particular, the measurements by ice core, stake, and ultrasonic sensor are performed at the centimeter scale, whereas GPR samples at the meter scale. Despite the scale difference, near 100-year averaged GPR measurements agree well 260 with 5-year averaged single stake at the corresponding locations along the transect near Talos Dome, with the differences of around 10% (Frezzotti et al., 2007). Given that no existing observed SMB dataset can be used as an independent reference to the different types of Antarctic multi-year averaged SMB observations, the inter-comparison of SMB determined by different methods at the same or near locations are made, as presented in Figure 4. They are mainly distributed near Talos Dome, along a transect from Terra Nova Bay to Dome C, on the western Dronning Maud Land, and at Dome F and Dome A. It is clear that 265 despite the different averaged time coverage, they provide a reasonable match with each other, with the largest discrepancy of less than 20%, which are consistent with the previous similar inter-comparison (e.g., Vaughan et al., 2004;Frezzotti et al., 2005;Anschütz et al., 2007).

Comparison with the previous AIS SMB observation datasets
Here, we present an unprecedentedly comprehensive compilation of SMB observations at the daily, annual and multi-year 270 scales. For the compilation of the multi-year averaged SMB ground based observations including stake/stake farm, snow pit/ice core, and GPR, we apply the same quality control criteria as used in the compilation by Favier et al. (2013)  Berkner Island from the European Project for Ice Coring in Antarctica (EPICA, Oerter, 2008a-l). The Antarctica 2k database constructed by Thomas et al. (2017) included 80 ice core records spanning at least 30 years, and the shorter and other groundbased measurement records are omitted. However, our Ant-SMB dataset focuses on the collection of annually resolved snow 285 accumulation records from different kinds of measurements covering the whole ice sheet. As a result, this dataset contains 175 annually resolved ice core snow accumulation records, 8 stake network measurements covering at least 10 years, and 512 time series of continuous stake measurements spanning more than 18 years.
Previous SMB compilations centered on glaciological observations on annual and longer timescales (e.g., Vaughan et al.,1999;290 Frezzotti et al., 2013;Thomas et al., 2017), which are useful for the examination of trends and large-scale variability of AIS snow accumulation. Nevertheless, they do not shed insight on SMB changes at much shorter timescales, such as synoptic scale and accumulation events. AWSs provide high resolution (typically hour) snow accumulation measurements, which is an advantage to quantify seasonal cycle of SMB, and to examine the synoptic sources of individual accumulation events, relative to the other methods such as snow pits, ice cores, and stakes. Snow accumulation data from individual or several AWSs at the 295 different sectors of Antarctica have been published by some previous studies (e.g., Reijmer and Van den Broeke, 2003;Thiery et al., 2012;Cohen and Dean, 2013;Thomas et al., 2015). However, these data have been not well compiled until now. Our dataset is the first attempt to collect all AWS snow accumulation measurements in Antarctica.

ERA5 output 300
Reanalysis utilizes a large amount of observations assimilated into a numerical model to generate a spatially and temporally complete state of the atmosphere. Because the main assimilated data are atmospheric and oceanic measurements, reanalysis outputs are not entirely subjective to the density of surface observations, and thus have the potential to provide important information over the regions with few or even no surface observation. Recent studies have revealed that European Centre for Medium Range Weather Forecasts (ECMWF) interim reanalysis (ERA-Interim) is likely to be the best or among the best 305 reanalysis dataset for the representation of inter-annual variability in Antarctic precipitation (e.g., Bromwich et al., 2011;Wang et al., 2016).
ERA5 is the fifth generation ECMWF reanalysis product produced by the Integrated Forecasting System (IFS) Cy41r2 operational in 2016 (Hersbach et al., 2020). Compared with ERA-Interim (~80 km and 60 pressure levels), a major advantage 310 of ERA5 is much higher horizontal and vertical resolutions (~ 31km and 137 pressure levels, respectively), and more enhanced outputs (hourly). Furthermore, IFS Cy41r2 includes a more advanced 4DVar assimilation scheme together with an uncertainty estimation, and much more observations are assimilated. Detailed improvements can be found in Hersbach et al. (2020). This reanalysis dataset has replaced ERA-Interim, of which updates were stopped on August, 2019. Here, our main objective is to know if ERA5 is able to provide a good SMB compared to the AntSMB observational dataset. Despite the recent release of 315 ERA5 data extending back to 1950, we only use the outputs for the 1979-2018 period, due to the spurious shift of reanalysis outputs in 1979 largely caused by changes in the amount of assimilated observations (e.g., Zhang et al., 2018;Huai et al., 2019;Wang et al., 2020).

Multi-year averaged SMB observations 320
Given that the output of climate models center on climate information since 1979 in Antarctica, it is necessary to define a special dataset for the model comparison. To match with the coverage period of the models, we only retain observations starting from 1950 onwards in the multi-year averaged SMB subdataset. In particular, we discard observations starting for the 1950-

Annually resolved SMB observations
To estimate the temporal performance of ERA5 for snow accumulation, we use the records from annually resolved SMB subdatabase covering at least 10 years starting from 1979. This results in 159 time series of annually resolved SMB. The representativeness of SMB measurements at a simple site for a region is influenced by local noises from the interaction between 330 wind and local snow surface, especially in the regions with accumulation rate of less than 120 kg m -2 yr -1 (e.g., Frezzotti et al., 2005Frezzotti et al., , 2007Ding et al., 2011). This can be confirmed by that on the DML plateau, ERA5 simulated individual annual SMB highly correlate with each other (r>0.70), but time series of SMB records from different ice cores are poorly correlated, even from the same drilling site. As a result, the relationships between ice core records and the corresponding ERA5 simulations at the drilling core location are variable, including significantly negative, positive and insignificant correlations 335 (Fig. 5a). Various linear relationships between the simulated and observed time series are also found over the Berkner Island and Ronne Ice Shelf with high density of cores, with r values ranging from -0.35 to 0.67. In the two areas, difference in the standard deviation of annual SMB values of individual ice cores is large, even for the records from the same locations ( Fig.   5c). At the South Pole, ERA5 shows a significant correlation (r=0.68, p<0.05) with stake farm measurements, but fails to do so with the individual ice core records (Fig.5a). A main possibility is that SMB derived from stake networks is less noisy by 340 removing small-scale spatial variability based on the average of a lot of stakes together. At most of sites, standard deviation of annual ice core records is larger than the corresponding ERA5 simulations for their overlapping periods (Fig.5d). To reduce local noise and better assess the performance of ERA5, we first average the individual observation records in the same grid cell, and then stack the averaged time series at the same geographic region. If there are ice core records and stake farm observations in the same location, the measurements of stake farm are utilized. Because the sites at the top of ice domes likely 12 have minor local noises (Monaghan et al., 2006a), the four time series of ice core records from the ice domes are not discarded in the estimate. Following Frezzotti et al. (2007) and (2013), a single ice core site with accumulations of more than 700 kg m -2 yr -1 allow the determination of annual SMB at ±10% accuracy, which corresponds to the accuracy derived from the instrumental measurement, and hence the corresponding ice core records are retained. After the composite and filtering, 48 locations or regions with annually resolved SMB are left to compare with ERA5 simulations. 350

Spatial performance of ERA5 output
A comparison of the density distribution of ERA5 precipitation minus evaporation (P-E) with the filtered multi-year averaged SMB observations reveals that the multi-year averaged dataset is representative of the high accumulation zone, but not for the bins with accumulation rate of 100-300 kg m -2 yr -1 over the West AIS (Fig. 6a). However, this dataset represents entire P-E spectrum of the model over the East AIS (Fig.6b). As shown in Fig.6c and d, the dataset also represents well the samples 355 elevation distribution of SMB in relation to the West AIS and East AIS, especially between 200 and 1000m elevations where it was not correctly sampled by the SMB observation dataset compiled by Favier (2013).
ERA5 reveals large spatial gradients of snow accumulation over the AIS (Fig. 7a), with values higher than 1000 kg m -2 yr -1 at the margins, and lower values (less than 30 kg m -2 yr -1 ) on the hinterland of East Antarctic Plateau. There is a very high 360 correlation between ERA5 output and the observed SMB (R 2 = 0.93, p < 0.01, which is calculated based on the logarithm of SMB values, due to the lognormal SMB distributions). The major spatial pattern of ERA5 simulations is in good agreement with the multi-year observations (Fig.7a). Dry biases occur in most sites of the inland East AIS and the Ross Ice Shelf, and around Byrd station, whereas wet biases in East AIS margins and part of sites in West AIS coastal regions (Fig. 7b). The mean bias accounts for 6.6% of the average of observed SMB, which is slightly higher than regional climate models (MAR and 365 RACMO2.3p2) (Agosta et al., 2019;Van Wessem et al., 2018). It is obvious that ERA5 robustly capture the sharp decrease in SMB with elevation (Fig.7c). Compared with observation in each 200 m elevation bin, ERA5 is slightly wet below 1600 m elevation, whereas dry biases occur in inland Antarctica with the elevations above 3000 m.

Temporal performance of ERA5 output
A recent study showed that ERA5 present relatively good skills for representing snow accumulation changes on the synoptic 370 timescale, observed at the AWSs over the Ross Ice Shelf and along the traverse route from Zhongshan Station to Dome A, with 56%~88% of extreme snowfall events captured (Liu et al., 2019). Given that these AWS observations are included in our AntSMB dataset, to avoid repetition, here we make a comparison between cumulative daily snowfall from ERA5 and the corresponding accumulation records from 11 AWS observations over the DML (Fig. 8). Obviously, gaps in the AWS records occur in most stations because of the problems of sensors or data transmission. Snow accumulation decreases in the daily 375 cumulative AWS records, and reflects the important role of drifting snow, compaction, sublimation or even ablation in the accumulation changes. Despite the noises of these post-deposition processes, stepped increase are observed for both ERA5 To further assess the temporal performance of ERA5, we use the continuous time series of stake measurements along the JARE traverse route from Syowa station to Dome F. These stake measurements are divided into four subgroups, as done for this traverse route by Wang et al. (2015). Stake measurements in each subgroup are stacked, and then compared with the composites of ERA5 simulations at the respective subgroup (Fig.9). ERA5 overestimates the observed SMB at the coastal and katabatic 400 regions, but underestimates those at the inland plateau region. The modeled records match particularly well with observations at the coastal, higher katabatic and inland plateau regions, with higher r 2 values of >0.5. Observed SMB at the lower katabatic region is simulated well by the reanalysis dataset.
Overall, ERA5 fits interannual variability in observed SMB acceptedly at most sites over the AIS, and this reveals much of 405 atmospheric circulation is represented by this reanalysis product. Nevertheless, its performance is limited at some sites of Lambert Basin, inland West Antarctica, and parts of East Antarctic coasts. These may result from the unresolved processes in ERA5 such as drifting snow, and the limited performance of ERA5 for the storm frequency related to synoptic-scale circulations, and sublimation because of circulation variations. Detailed interpretation of uncertainty of ERA5 is beyond the scope of this study. 410

Data availability
The comprehensive SMB observation dataset is available through a Big Earth Data Platform for Three Poles. The dataset can be downloaded from https://doi.org/10.11888/Glacio.tpdc.271148 (Wang et al., 2021). In this repository, the three subdatasets included in the entire dataset are provided in Excel spreadsheet format together with metadata files.

Discussion and conclusions 415
The dataset provides an unprecedentedly comprehensive compilation of SMB observations, with better spatial coverage than previous studies. In particular, our compilation greatly improves spatial density of measurements in the 200-1000 m elevations where are not correctly sampled by the dataset from Favier et al. (2013). However, there is a clear need to increase the spatial density of annually resolved SMB measurements over the inland East AIS, and daily SMB observations over West Antarctica, and 90°-170°E sector of East Antarctica. 420 This dataset can be used to estimate the temporal and spatial changes in the AIS SMB. A temporal homogeneous climatology of SMB for the second half of the 20th century may be obtained by temporal rescaling of the multi-year averaged SMB subdataset against ERA5 outputs as done by Medley et al. (2019) and Wang et al. (2019). The available syntheses of time series of records from annually resolved SMB subdataset will allow to investigate regional snow accumulation changes during 425 the past several decades or centuries (Kaspari, et al., 2004;Frezzotti et al., 2013;Altnau et al., 2015;Thomas et al., 2017). The combination of annual SMB subdataset with reanalysis products or the outputs of regional climate models can generate gridded datasets to better constraint the temporal and spatial variability AIS SMB at the different scales (Monaghan et al., 2006b;Medley et al., 2019;Wang et al., 2019). The availability of AWS snow height measurements will allow insights into synoptic and seasonal patterns of SMB, which are vital for model estimation and ice core dating studies. 430 In the current study, we have made a comparison between observation data and ERA5 output. As a result, in spite of discrepancy in magnitude, ERA5 represents spatial variations of SMB observations well, and captures a large proportion of the inter-annual variability. Similarly, this dataset can be used to evaluate the quality of other atmospheric reanalyses, and regional or global climate models such as JRA-55, MERRA-2, RACMO2.3, MAR and CESM. Moreover, a high spatial density 435 of stake and GPR measurements along several transections from coasts to inland are included in the dataset, which correctly sample the actual distribution of SMB, and thus allow to provide stringent constraints on the models in these specific regions.
Annually resolved SMB observations in the database are also likely to be used as an important input of data assimilation for paleoclimate reconstructions (Dalaiden et al., 2020). The dataset is of vital importance for improvement of remote sensing algorithm for Antarctic snow accumulation/snowfall rate, such as CloudSat 2C-SNOW-PROFILE product (Palerme et al., 440 2014;Behrangi et al., 2016).
The scientific community is expected to apply this dataset for Antarctic hydrological studies, model-data inter-comparison and remotely sensed algorithm developments. The cryospheric community is also encouraged to further share their SMB observation data to update this dataset in the future. 445 Li, Y., Cole-Dai, J., and Zhou, L.: Glaciochemical evidence in an East Antarctica ice core of a recent (AD 1450(AD -1850 neoglacial episode, J. Geophys. Res., 114, D08117, https://doi.org/10.1029/2008JD011091, 2009