SCDNA: a serially complete precipitation and temperature dataset for North America from 1979 to 2018

Station-based serially complete datasets (SCDs) of precipitation and temperature observations are important for hydrometeorological studies. Motivated by the lack of serially complete station observations for North America, this study seeks to develop an SCD from 1979 to 2018 from station data. The new SCD for North America (SCDNA) includes daily precipitation, minimum temperature (Tmin), and maximum temperature (Tmax) data for 27 276 stations. Raw meteorological station data were obtained from the Global Historical Climate Network Daily (GHCN-D), the Global Surface Summary of the Day (GSOD), Environment and Climate Change Canada (ECCC), and a compiled station database in Mexico. Stations with at least 8-year-long records were selected, which underwent location correction and were subjected to strict quality control. Outputs from three reanalysis products (ERA5, JRA-55, and MERRA-2) provided auxiliary information to estimate station records. Infilling during the observation period and reconstruction beyond the observation period were accomplished by combining estimates from 16 strategies (variants of quantile mapping, spatial interpolation, and machine learning). A sensitivity experiment was conducted by assuming that 30 % of observations from stations were missing – this enabled independent validation and provided a reference for reconstruction. Quantile mapping and mean value corrections were applied to the final estimates. The median Kling–Gupta efficiency (KGE) values of the final SCDNA for all stations are 0.90, 0.98, and 0.99 for precipitation, Tmin, and Tmax, respectively. The SCDNA is closer to station observations than the four benchmark gridded products and can be used in applications that require either quality-controlled meteorological station observations or reconstructed long-term estimates for analysis and modeling. The dataset is available at https://doi.org/10.5281/zenodo.3735533 (Tang et al., 2020). Published by Copernicus Publications. 2382 G. Tang et al.: SCDNA

1. Self-contained infilling only uses records from the target station to estimate its own missing values. Typical methods include interpolation based on data from the previous and subsequent days or replacing missing values by the long-term mean (Kemp et al., 1983;Pappas et al., 2014). Self-contained infilling, however, only performs well for variables with high temporal autocorrelation such as temperature and is problematic for daily precipitation (Simolo et al., 2010;Teegavarapu and Chandramouli, 2005) and in covering lengthy data gaps.
2. Spatial interpolation uses neighboring stations (identified based on spatial distance or statistical similarity) to estimate data at the target station. Spatial interpolation methods can be divided into two types: the first uses information only from neighboring stations, and common methods include linear interpolation and inverse distance weighting (IDW; Shepard, 1968). The second method needs information from both neighboring and target stations. Typical examples include the revised normal ratio (NR; Young, 1992) and the single best estimator (Eischeid et al., 1995(Eischeid et al., , 2000, both of which use correlation coefficients (CCs) between target and neighboring stations to estimate merging weights. This second type of spatial interpolation also includes more sophisticated methods (e.g., multiple linear regression, optimal interpolation, and kriging) that build a functional relationship between neighboring and target stations (Simolo et al., 2010). Previous studies have shown that multiple linear regression based on the least absolute deviation (MLAD) criteria performs better than many interpolation methods such as IDW, NR, and optimal interpolation in infilling/reconstruction (Eischeid et al., 2000;Kanda et al., 2018).
3. Quantile mapping (QM) is widely used to correct biases in meteorological data (Maraun, 2013;Cannon et al., 2015), and it performs well in estimating missing station data (Simolo et al., 2010;Newman et al., 2015Newman et al., , 2019Devi et al., 2019). In QM-based estimations, the cumulative distribution functions (CDFs) of observations from neighboring and target stations are derived, and the record at the target station is estimated as the inverse of its CDF using concurrent CDF probability information from neighboring stations. QM can avoid the problem of overestimating wet days in precipitation series and preserve the frequency distribution of time series, which is useful for estimating extreme events (Cannon et al., 2015).
4. Machine learning techniques have been successfully applied to infill station record gaps (Dastorani et al., 2010;Wambua et al., 2016). For example, Coulibaly and Evora (2007) estimated missing daily precipitation and temperature in northeastern Canada using six types of artificial neural networks (ANNs). Ustaoglu et al. (2008) estimated daily temperature using three ANN methods in Geyve and the Sakarya basin, Turkey. Gene expression programming was applied in the estimation of missing monthly rainfall data in Malaysia (Che Ghani et al., 2014). Sattari et al. (2017) recommended that a decision tree algorithm can be used to estimate monthly precipitation due to its simplicity and high accuracy. Serrano-Notivoli et al. (2019) applied the k nearest neighbors regression to reconstruct minimum temperature (T min ) and maximum temperature (T max ) observations in Spain to form a gridded dataset.
Previous SCDs have been developed using multiple infilling and reconstruction methods. For instance, Eischeid et al. (2000)  There is currently no SCD for North America; this means that researchers often must collect station data from different databases, which is time consuming and may cause inconsistencies between studies based on different methods.
Responding to this need, we develop a retrospective 40year daily SCD for North America (SCDNA) of precipitation, T min , and T max from 1979 to 2018. Central America and the Caribbean are also covered by SCDNA. The three variables are selected because (1) most stations measure precipitation and temperature, while other variables, such as humidity and wind speed, are measured at fewer stations, and (2) precipitation and temperature data are fundamental inputs for hydrological modeling. Station observations are collected from four global and regional databases and undergo strict quality control to eliminate dubious records. Since the performance of infilling and reconstruction methods differs in space and time, the results from 16 strategies are merged to produce a single deterministic estimate. Finally, the SCDNA is compared to four gridded products to demonstrate its performance and areas for improvement. The SCDNA is expected to have a wide variety of applications in North America, and the methodology can be used to produce SCDs in other regions of the world.

Meteorological station data
This study uses precipitation, T min , and T max station data from four databases: the Global Historical Climate Network Daily (GHCN-D; https: //www.ncdc.noaa.gov/ghcnd-data-access, last access: 18 October 2019; Menne et al., 2012), the Global Surface Summary of the Day (GSOD; https://catalog.data. gov/dataset/global-surface-summary-of-the-day-gsod, last access: 15 October 2019), Environment and Climate Change Canada (ECCC; https://climate.weather.gc.ca/ historical_data/search_historic_data_e.html, last access: 22 December 2019), and the Mexico database from Servicio Meteorológico Nacional under the Comisión Nacional del Agua (Livneh et al., 2015). This study uses daily precipitation totals from each dataset. Only stations with at least 8-year-long precipitation or T min and T max records between 1979 to 2018 are utilized. The requirement for minimum recording length is different among studies (e.g., Eischeid et al., 2000;Newman et al., 2015). We adopted a relatively short time limitation because (1) 8-year-long records are sufficient for providing basic support for missing value estimation ( Fig. S1 in the Supplement) and (2) the open-access dataset and codes enable users to design customized data selection criteria according to their research requirements.
The numbers of stations with at least 8-year-long records are 33 026, 4619, 3634, and 4049 for GHCN-D, GSOD, ECCC, and the Mexico database, respectively (Table 1). Their spatial distributions are shown in Fig. S2 in the Supplement. GHCN-D has compiled a large amount of data from many sources including the Mexico database and ECCC. For identical stations from different sources, we keep the one with the longer observation history, resulting in the exclusion of ∼ 95 % of stations from the Mexico database and the adoption of ∼ 91 % of the stations from ECCC. Stations with more than 30 % missing values in the observation period are excluded because they could be seasonal stations or suffer serious instrumentation problems. Stations overlapping in space (same latitude and longitude) and without sufficient metadata for discrimination are merged (see Sect. 3.2). The above screening reduces the available stations from 45 328 to 31 772 (Table 1), yet more stations are discarded due to quality control procedures (Sect. 3.1). The final SCDNA includes 24 615 precipitation, 19 604 T min , and 19 611 T max stations; note that the numbers of T min and T max stations differ as quality controls can result in excluding the one and reserving the other in some stations.
Most stations are located in the contiguous United States (CONUS), southern Canada, and Mexico, while few stations are located in high-latitude regions such as the Arctic Archipelago (Fig. 1b and c). The spatial distributions of precipitation and temperature stations are similar except in eastern CONUS where precipitation stations have a higher density.
In North America, more station observations occur in the US than in Canada and Mexico (Fig. 2). The number of samples in the US increases from 1979 to 2018, and there are more precipitation samples than temperature samples. For Canada, the numbers of precipitation and temperature samples are similar and show a decrease from 1988 to 2018; the sample number in 2018 is only 61.76 % of that in 1988. Mexico has more meteorological samples than Canada, yet this number decreases after 1983. The decreasing trend is especially sharp after 2012, which may be due to the delay in data collection or termination of some stations. Figure 3 shows the fractions of missing values for all stations during the observation period (referred to as ratio-1) and during the entire period from 1979 to 2018 (referred to as ratio-2). For temperature, ∼ 20 % of the stations have more than 20 % missing values in the observation period (ratio-1), and ∼ 20 % of the stations have more than 70 % missing values in the entire period (ratio-2). For precipitation, the fraction of missing values is larger. The fractions show strong spatial variations (Fig. S3 in the Supplement). Ratio-2 is smaller for precipitation stations in the western US and temperature stations in central US but is larger in Canada and Alaska. Most stations in Mexico have a higher ratio-1 than other regions in North America, indicating that those stations have notable fractions of missing values during the observation period.
In summary, the curves of ratio-1 indicate that a small number of missing values need infilling during the observa-  The densities of stations at the 0.5 • × 0.5 • resolution for precipitation and temperature, respectively. T min and T max stations are highly consistent, and thus T min is used to represent temperature in (c). The nested black boxes show examples of DEM and station densities. tion period, while the curves of ratio-2 indicate that extensive reconstruction is needed over the entire period.
Many types of precipitation and temperature measurement instruments are used at stations from different sources. For example, the Type B rain gauge has been used by Environment Canada since the 1970s for most weather stations (Devine and Mekis, 2008;Wang et al., 2017), while tipping bucket and weighing rain gauges are also used in some stations (Metcalfe et al., 1997). Nipher-shielded snow gauges have been used by some synoptic stations, while ruler measurements are still used by more stations (Mekis and Brown, 2010). Station data in the US are from many organizations or programs with different instrument configura-tions. For instance, the standard rain gauge is used by the Cooperative Observer Program, while snow telemetry uses storage-type gauges or tipping buckets. A better understanding of instrument specifications and historical changes is important for climate studies (Pielke et al., 2007;Whitfield, 2014;Ma et al., 2019). A detailed summary of station instruments is provided in the documentation of the dataset (https://doi.org/10.5281/zenodo.3735533).

Reanalysis products
We use reanalysis precipitation, T min , and T max from the fifth generation of European Centre for Medium-Range Weather Forecasts (ECMWF) atmospheric reanalyses of the global   Table 2). The three products are chosen be-cause they are representative products from different international organizations, and they or their predecessors (ERA-Interim, JRA-25, and MERRA) have been widely used by researchers. ERA5 and JRA-55 do not provide daily outputs; thus, daily precipitation is accumulated from sub-daily estimates, while daily T min and T max are estimated by the sub-daily minimum and maximum temperature values. Gridded reanalysis precipitation is linearly interpolated to match point-scale station data, and T min and T max are downscaled using temperature lapse rate (TLR; see Sect. 3.1).

Auxiliary data
The Multi-Error-Removed Improved-Terrain digital elevation model (MERIT DEM) at a 3 s (∼ 90 m at the Equator) resolution (Yamazaki et al., 2017) is used in this study. To enable temperature downscaling, the high-resolution DEM is spatially averaged to the original resolutions of ERA5, MERRA-2, and JRA-55 ( Table 2). The MERIT DEM data may be slightly different than the DEM data used in the three reanalysis products, and this will have a limited impact on missing data estimation (Sect. 3.3.2).
The Multi-Source Weighted-Ensemble Precipitation (MSWEP) V2.2 dataset (Beck et al., 2017(Beck et al., , 2019 is utilized for the comparison with the SCDNA developed by this study. MSWEP merges data from ground observations, satellite products, and reanalysis models and performs better than all products used for merging (Beck et al., 2019). The comparison can show whether the SCDNA is a better choice than MSWEP to fill in gaps in station precipitation observations.

Methodology
The methodology to produce the SCDNA includes three primary steps ( Fig. 4): (1) preparing a unified precipitation and temperature database from multiple sources (Sects. 2.1 and 3.1); (2) downscaling reanalysis estimates (Sects. 2.2 and 3.2) that are used in QM-based and machine-learningbased data estimation (Sect. 3.3) and in comparison with the SCDNA (Sect. 4.5); and (3) producing the SCDNA from 1979 to 2018 based on 16 strategies (Sect. 3.3). The following subsections summarize the work in each step of the methodology (Sect. 3.1, 3.2, and 3.3), as well as the approach used to evaluate the performance of the method (Sect. 3.4).
In this study, infilling refers to the estimation of missing values during the observation period, while reconstruction refers to estimating values outside of the observation period when no station record is available (Fig. 5). Station records that fail quality control are treated as missing values.  Stations are merged if their latitude and longitude match other stations. The problems of overlapping locations are caused by the identification alteration of one station for different periods, recording/rounding bias of station location information, inconsistent naming rules of different sources, and other factors. Although it is possible that multiple stations are deployed in the same location for experimental aims, location merging is done to preserve internal consis-tencies as inconsistent records at the same location are selfcontradictory.
The method for location merging includes several steps. First, overlapping stations are extracted and grouped. Stations within the same group that have nonoverlapping recording periods are simply merged into one time series. Otherwise, the Spearman's rank CC (SCC) between precipitation series from all station pairs in the group is calculated. For SCC < 0.7, the station group is discarded due to large discrepancies; for 0.7 < SCC < 0.9, the discrepancy is considered tolerable, and the station with the longest record is kept; for SCC > 0.9, stations are considered highly correlated, and their data are merged into one time series, while for overlapping periods, the station with the longest record is used.
Overall, 1240 stations are involved in location merging, stratified in 586 station groups. Around 10 % of the groups contain more than two stations, and the largest group contains five stations. After location merging, only 207 groups are kept and merged into unified times series (Table 1). Despite the steps taken above, the merged series could contain inhomogeneities due to the combination of records from multiple stations.

Quality control
To ensure station observations undergo strict and comprehensive quality control, we adopted the methods used to produce previous station-based datasets. For T min and T max , we followed the method designed by Durre et al. (2010) which is adopted by GHCN-D . The procedures include five types of checks: integrity checks, outlier checks, internal and temporal consistency checks, spatial consistency checks, and extreme mega consistency checks. A few of the procedures in Durre et al. (2010) require other variables such as snowfall and thus are not adopted in this study. In addition, the quality flags in this study are partly different from those of GHCN-D because of the different sources, numbers, and temporal periods of stations.
For precipitation, quality control procedures consist of three parts. The first part is similar to that for temperature. The second part (four types of checks) follows procedures designed by Hamada et al. (2011) which have been adopted by the Asian Precipitation -Highly-Resolved Observational Data Integration Towards Evaluation (APHRODITE; Yatagai et al., 2012). The third part (two types of checks) adopts strategies by Beck et al. (2019) used in the production of MSWEP. Note that although Durre et al. (2010) and Hamada et al. (2011) share some common traits for precipitation, both of them are adopted to ensure quality control reliability. The details of quality checks are in Appendix B.

Downscale reanalysis data
The reanalysis temperature estimates are downscaled to match point-scale station observations using temperature lapse rate (TLR) according to the following: where T R is 2 m reanalysis air temperature, T s is downscaled temperature, h is the height difference between station elevation and reanalysis grid elevation. TLR shows notable spatiotemporal variations (Minder et al., 2010), and estimating TLR based on ground observations over a large domain is difficult due to the sparsity of stations. Yet recent studies show that reanalysis outputs offer an alternative to estimating gridded TLR (e.g., Gao et al., 2012). The gradient of air temperature at different pressure levels above the ground can be used to approximate near-surface TLR (Gao et al., 2012(Gao et al., , 2018Gruber, 2012). Tang et al. (2018b) compared eight temperature downscaling methods in CONUS and found that methods based on reanalysis-derived TLR can achieve higher accuracy compared to fixed TLR (e.g., −6.5 • C km −1 ) or statistical interpolation downscaling methods. Hence, this study uses the linear regression slope between MERRA-2 air temperature and geopotential heights from 300 to 1000 hPa pressure levels to represent TLR for each month at a resolution of 0.5 • × 0.625 • ( Table 2). MERRA-2 is used because it directly provides monthly data and masks temperature data if the pressure level is below the land surface. The choice of pressure levels needs further investigation because relationships between vertical and near-surface temperature vary with regions. Complicated TLR phenomena such as inverse lapse rate are not considered for simplicity. The climatological mean of TLR ( Fig. S4 in the Supplement) decreases from −4.8 • C km −1 in the northeast continent (i.e., Canadian Arctic Archipelago) to −7.2 • C km −1 in the southwest continent (i.e., Rocky Mountains in CONUS). The smaller TLR magnitude at high latitudes is consistent with previous studies (e.g., Gardner et al., 2009;Marshall et al., 2007).
We generate estimates for every station and every day from 1979 to 2018 (Fig. 5). The estimates from these 16 strategies and the SCDNA are evaluated using station observations, and the performance of the SCDNA is compared to four benchmark gridded products. Then, the estimates of the SCDNA are corrected for further accuracy improvement. Finally, estimates are replaced by station observations when observations exist and pass quality control checks. The variance and spatial correlation analyses are performed to compare the statistical properties of station observations and estimates (see Sect. 4).
Only stations with at least 3000 valid values are included in the infilling and reconstruction effort. The nine steps (termed Step-1 to Step-9) of SCDNA production are described below. Unless otherwise stated, the steps are implemented for each target station (s), each variable (precipita- Figure 5. Diagram of the infilling and reconstruction for a specific station (referred to as A). The entire period from 1979 to 2018 is divided into the observation period and the reconstruction period. The data flows of variance and spatial correlation analyses are shown in the nested yellow boxes. Station B is a nearby station to A. tion, T min , and T max ), and each day of the year (DOY, i.e., 1-366).

Data extraction
-Step-1. Spatiotemporally concurrent reanalysis estimates (ERA5, JRA-55, and MERRA-2) are extracted, including precipitation, T min , T max , and TLR. Precipitation is linearly interpolated from gridded reanalysis estimates, and temperature is downscaled (i.e., corrected for the elevation difference between the reanalysis grid cell and the station elevation) based on TLR (Sect. 3.1). - Step-2. Neighboring stations (at least 1 and at most 30) with at least an 8-year overlapping period with station s are found within the search radius of 200 km. These stations are ranked from closest to farthest according to their CC with the target station. The SCC is used for precipitation, and Pearson CC (PCC) is used for T min and T max . CC is calculated using data within a 31 d window centered around the current DOY from all years. - Step-3. The empirical CDFs of s, neighboring stations, and reanalysis estimates are obtained using data within the same 31 d window.

Infilling and reconstruction
-Step-4. For each day (d) corresponding to the DOY, the estimated data are acquired based on 16 strategies which are divided into five groups.
Group 1: quantile mapping with neighboring stations (QMN) -QMN-1. For all neighboring stations with valid records, the station with the highest CC in Step-2 is selected. The estimated data for s and d are obtained using Eq. (2): where X i is precipitation or temperature for d from the selected neighboring station i, F i is the empirical CDF of i corresponding to the DOY, F −1 s is the inverse CDF of s corresponding to the DOY, and X s is the estimated data.
-QMN-2. For all neighboring stations with observations, estimated values are obtained using Eq. (2) which are merged based on Eq. (3): where n is the number of neighboring stations, is the QM-based estimate from i, and W i is the weight calculated using Eq. (4). CC i is the CC (SCC or PCC) between data from s and i corresponding to the DOY. W i is assigned 0 if CC i is negative.
-QMN-3. Similar to QMN-2 but the weight is calculated according to the distance (D i ) between s and i based on Eq. (5). Although the exponent of distance (k) varies in different studies, −2 is the most common choice (Teegavarapu and Chandramouli , 2005): -QMN-4. The median of QMN-1 to QMN-3 is used as the estimated data. The strategy of using median values is the same with Eischeid et al. (2000), which could be closer to actual observations than QMN-1 to QMN-3.
Group 2: quantile mapping with reanalysis products (QMR) Reanalysis products provide useful information for SCDNA production as (1) remote regions may not have enough neighboring stations and (2) neighboring stations also have missing values which could result in gaps of estimates at the target station.
-QMR-4. The median of QMR-1 to QMN-3 is used as the estimated data.

Group 3: interpolation (INT)
The three interpolation methods used in this study are MLAD (referred to as INT-1), NR (referred to as INT-2), and inverse distance weighting (IDW; referred to as INT-3). They are described below. Following Eischeid et al. (2000), neighboring stations with a CC lower than 0.35 are excluded. The remaining stations are ranked from high CC to low CC. A maximum of four neighboring stations are used in the interpolation.
For T min and T max , direct interpolation from neighboring stations to s could be biased due to the elevation differences between stations. Temperature data from neighboring stations are downscaled to the elevation of s based on Eq. (1).
-INT-1. MLAD minimizes the sum of absolute errors. It is more robust than regression based on least squares because, while least square estimation is effective when the errors are normally distributed and independent, environmental variables, especially precipitation, often violate the assumption of normality (Eischeid et al., 2000). MLAD has been well documented with better performance in gap infilling than other interpolation methods (Eischeid et al., 1995(Eischeid et al., , 2000Kanda et al., 2018;Young, 1992). The formula is shown in Eq. (6): where c i (i = 0, 1, . . . , n) is regression coefficients estimated using data within a 31 d window for each DOY.
Different d values corresponding to the same DOY could have different combinations of neighboring stations due to the limitation of observation availability. MLAD is performed for each combination to ensure that effective estimates are available for all days.
-INT-2. NR is an interpolation method proposed by Paulhus and Kohler (1952) and modified by Young (1992). The modified version is adopted in this study, which combines information from neighboring stations by re- (3). The weight is calculated using Eq. (7): where N i is the number of samples used to calculate CC i between s and i. The SCC is used for precipitation, and the PCC is used for temperature.
-INT-3. IDW is one of the most common interpolation methods. It is implemented similar to NR in that the inverse squared distance, as shown in Eq. (5), is used as the weight.
-INT-4. The median of INT-1, INT-2, and INT-3 is used as the estimated data. The input data are from neighboring stations and concurrent reanalysis estimates. For each s, neighboring stations are determined in a way similar to Step-2, but the CC is calculated using data from the entire observation period. Neighboring stations with a CC lower than all reanalysis products (ERA5, JRA-55, and MERRA-2) are excluded. The remaining neighboring stations and three reanalysis products form a complete repository of input features. Then, for each day that s has no observation, the input features are extracted from the repository in three steps: (1) neighboring stations without observations for the day are excluded, (2) the remaining neighboring stations and reanalysis products are ranked according to their CC with s, and (3) at most five stations/reanalysis products with the highest CC are selected. In this way, s will have multiple combinations of input features to ensure that all days with missing values have estimates. All combinations are used to train and test the ANN and RF models, resulting in multiple estimated series for s. The final estimates of s are generated in three steps: (1) the Kling-Gupta efficiency (KGE ; Kling et al., 2012) of all estimated series is calculated using all observations of s and ranked from high to low KGE (see Sect. 3.4 for definition of KGE ), (2) the series with a higher KGE is used to constitute the estimates of s in sequence, and (3) the second step is repeated until there are no missing values for s. This approach ensures that the "best" and complete estimates are provided for s.
-MAL-1. A four-layer ANN is used. The input layer has a maximum of five nodes (depending on the number of input features), the two hidden layers both have 20 nodes, and the output layer has one node for generating precipitation or temperature estimates. The transfer functions are hyperbolic tangent sigmoid for hidden layers and linear for the output layer. The training function is resilient backpropagation. The model is trained using the first 50 % of data, validated using the subsequent 20 % of data, and tested using the final 30 % of data.
-MAL-2. An RF model with 50 trees is built with 70 % training data and 30 % testing data. The minimum number of samples per tree leaf is five. The input nodes depend on the number of input features like MAL-1.
Group 5: multi-strategy merging (MRG) -MRG-1. KGE is used to rank the performance of the 11 strategies (QMN-1 to QMN-3, QMR-1 to QMN-3, INT-1 to INT-3, and MAL-1 to MAL-2) as a CC cannot reflect the magnitude difference (e.g., bias) between target and reference series. The first three cases of the 11 strategies are merged using squared KGE as the weight. The individual weight is assigned 0 if KGE is negative.
-MRG-2. The median of the three selected strategies in MRG-1 is used as the estimated data.

Generating serially complete records
-Step-5. In this step, Step-3 and Step-4 are repeated based on 70 % of data of s from the observation period.
Then, the KGE of estimates from all strategies is calculated using the remaining 30 % of observations. MAL-1 and MAL-2 are not repeated because they are trained on 70 % of observations. Although the evaluation samples are different among stations, the results are reliable and stable as shown in the results section. This step is implemented because QMN-1 to QMN-4, QMR-1 to QMR-4, and INT-1 in Step-4 use all data of s from the observation period to select stations, estimate empirical CDFs, and carry out regression. This potential overfitting problem could lead to the better performance of these strategies in the observation period but worse performance in the reconstruction period. KGE calculated in Step-4 can represent the accuracy of estimates in the observation period, while KGE calculated in Step-5 can represent the accuracy of estimates in the reconstruction period. - Step-6. In the observation period, the strategy with the highest KGE in Step-4 is selected to contribute the extension/reconstruction to the SCDNA. In the reconstruction period, first, the strategy with the highest KGE in Step-5 is determined; then, the estimates from the corresponding strategy in Step-4 are used to constitute the SCDNA because the empirical CDF and regression based on all observations in Step-4 could be more representative than the 70 % of observations in Step-5.
-Step-7. Estimates in Step-6 are corrected for certain climatological biases using station data from the observation period. Precipitation estimates are often subjected to wet-day bias. Two methods are implemented to address this problem. First, QM is performed based on the CDF of s in Step-3. However, QM may reduce the accuracy of estimated precipitation in some cases, for which the method used in Beck et al. (2019) is adopted. This method subtracts a tiny value (0.01 mm) from the original precipitation series and rescales the series to restore the original mean value. This operation is repeated until the estimated series shows an equal number of wet days (> 0.5 mm d −1 ) with observations of s. In addition to wet-day bias correction, mean value correction is implemented. The ratio between the mean values of precipitation estimates and observations is calculated for the observation period, which is used to rescale estimated series in both observation and reconstruction periods. For T min and T max , QM correction and mean value correction are also implemented. - Step-8. The accuracy of the SCDNA is evaluated and compared to benchmark datasets based on actual observations (Fig. 5). Then, the estimates are replaced by observations whenever possible to generate the final SCDNA. Very occasionally, estimated T min could be larger than estimated T max , for which T max is replaced by the maximum T max and T min is replaced by the minimumT min of the estimates from the 16 strategies. - Step-9. The serially complete data of SCDNA is quality controlled again using methods introduced in Sect. 3.1.2 to exclude stations with unreliable estimates.

Evaluate the precipitation and temperature estimates
KGE , which was proposed by Gupta et al. (2009) and modified by Kling et al. (2012), is used to support the merging of different strategies (Sect. 3.3) and the evaluation of the estimated precipitation and temperature: where r is the PCC, β is the bias ratio, γ is the variability ratio, µ is the mean value, and σ is the standard deviation (SD). The subscripts e and o represent estimated and reference time series, respectively. KGE ranges from negative infinity to 1. If two series exactly match, the KGE is 1. A β or γ value smaller/larger than 1 indicates that the mean value or variability of observations is underestimated/overestimated. In Sect. 4, the evaluation during the observation period is based on the complete station observations (i.e., Step-4 in Sect. 3.3.2), while the evaluation during the reconstruction period is realized using 30 % of independent station observations (i.e., Step-5 in Sect. 3.3.3). Unless otherwise stated, SCDNA estimates in Sect. 4 are after correction (Step-7 in Sect. 3.3.3). In Sect. 4.5, SCDNA estimates are compared with gridded products (ERA5, JRA-55, MERRA-2, and MSWEP). In addition to the three SCDNA variables (precipitation, T min , and T max ), mean temperature (T mean , the mean of T min and T max ) and daily temperature range (T range , the difference between T max and T min ) are also included. The involvement of T range can contribute to a more objective comparison between SCDNA and reanalysis products because the TLR-based downscaling of reanalysis temperature contains uncertainties which could affect the evaluation of T min , T max , and T mean . Although there exist differences between the TLRs of T min and T max , T range can reduce the effect of scale mismatch between gridded reanalysis temperature and point station temperature on evaluation results.

Comparison of infilling and reconstruction strategies
The value of a given infilling/reconstruction strategy can be quantified by the extent that a strategy is selected for use in the final SCDNA dataset. In this sense, the contribution ratios define the proportion of estimates that come from a specific strategy. Figure 6 shows that the contribution ratios of QMN, QMR, and INT to missing value estimation are generally smaller than 20 % in North America. Please note that QMN refers to all strategies within this group unless the strategy number is specified right after QMN. This also applies to other groups. QMR shows the smallest contribution ratios for almost all stations among the five groups. Compared to other regions in North America, contribution ratios of QMR are higher for precipitation stations in the western US and temperature stations in Mexico. INT shows lower contribution ratios in the Rocky Mountains compared to the western US, indicating that statistical interpolation without considering topographic effect is subjected to substantial uncertainties in complex terrain. MAL shows notably higher contribution ratios than QMN, QMR, and INT, particularly for T min and T max . The ratios of MAL are higher than 20 % for ∼ 30 % of precipitation stations, ∼ 65 % of T min stations, and ∼ 68 % of T max stations. MRG has the highest contribution ratios throughout North America. The average contribution ratios of MRG are 59.88 %, 41.59 %, and 40.56 % for precipitation, T min , and T max , respectively. For precipitation, MRG is particularly effective in high-latitude regions (northern Canada and Alaska), the western US, and Mexico. Figure 7 shows the KGE and contribution ratios of 16 strategies. The KGE of estimated precipitation is lower than that of estimated T min and T max due to the stronger spatial and temporal homogeneity of temperature (Fig. 7). The median KGE values of T min and T max are generally above 0.9, and the accuracy of estimated T max is higher than that of T min . The KGE during the reconstruction period is smaller than that during the observation period, which is particularly obvious for QMN, QMR, and INT-1 compared to other strategies, because QMN and QMR transfer CDF during the observation period to other periods, and INT-1 transfers the regression relationship during the observation period to other periods. MAL suffers a slight degradation in the reconstruction period, and the better performance of MAL-2 than MAL-1 shows that RF could be a better choice than ANN to estimate missing data. For MRG, the differences of KGE between the two periods are relatively small. For example, the median KGE values of MRG-1 for T max are 0.99 and 0.98 for the observation and reconstruction periods, respectively. MRG also shows higher KGE and a narrower quantile range than other strategies, particularly for precipitation, as it benefits from merging estimates from multiple strategies.
Regarding contribution ratios (Fig. 7), strategies with higher KGE often have larger contributions to the estimated series. However, this is not always true because the selection of strategies is performed for each DOY. Note that the contribution ratios of MAL-2 are even higher than MRG-1 during the observation period for T min and T max , although MRG-1 achieves higher KGE than MAL-2 for most stations. This is because MAL-2 could be the best choice for more DOY than MRG-1 even though MRG-1 may achieve the best overall performance. An example using T min data from one station is shown in Fig. S5 in the Supplement.
In the reconstruction period when observations are absent, the contribution ratios of MAL-2 decrease drastically compared to the observation period, contributing to the increased ratios of other strategies (particularly MRG-1). Although QMR shows the lowest contribution ratios, reanalysis products have implicit contributions to other strategies (e.g., MAL and MRG). Overall, MRG-1 shows much higher contribution ratios than all the other strategies (including MRG-2) during the reconstruction periods, indicating that it is the most important strategy in missing value estimation. Hence, combining information from multiple strategies is more reliable, and KGE -based merging is more effective than the median-value-based estimation.

Impact of reconstruction on spatial correlation and series variance
All infilling/reconstruction strategies except QMR rely on information from neighboring stations; this could affect the spatial correlation structure and the variance of SCDNA series. Space-time correlations and other properties (e.g., intermittency of precipitation) are important considerations because they can influence the performance of follow-on applications that use the SCDNA as input. Theoretically, QMN strategies could significantly inflate the spatial correlation but retain variance of station observations. The spatial correlation inflation in INT strategies could be lower, but the variance would be underestimated due to smoothing. QMR-1 is used as an example to demonstrate the effect of QM on the spatial correlation and series variance (Fig. S6 in the Supplement) because QMN uses different station combinations for every DOY, which would mask the effect of QM on final estimates. If the ERA5 used by QMR-1 is replaced by station observations, the results should be generally consistent. According to Fig. S6, the spatial correlation is substantially inflated by QMR-1, particularly for T min and T max , while the SD of QMR-1 estimates is very close to that of observations. This supports the design of estimating missing data using neighboring stations for each DOY as otherwise the inflation of CCs could be very substantial for the entire period. The spatial correlation based on station observations (Fig. 8a, d, and g) shows obvious seasonal variations with CCs lower in the warm season and higher in the cold season. The seasonality of CCs for T max is weaker compared to that for precipitation and T min . The SCDNA estimates capture the seasonal patterns but underestimates the variation (Fig. 8b, e, and h) because the inflation of spatial CCs is larger in the warm season than cold season (Fig. 8c, f, and i). Moreover, the inflation is larger for neighboring stations with a lower correlation with the target station. We tested selected neighboring stations according to their distance from the target station, and similar results were acquired. For precipitation, the median CC differences of all stations are close to 0.1 in the cold season and range between 0.1 and 0.15 in the warm season. For T min , the median CC differences are generally between 0.05 and 0.15. The CC differences of T max are relatively homogeneous for different seasons and generally fluctuate between 0.05 and 0.1. The inflation of CC is because (1) the estimates from the 10 neighboring stations and the target station are generally derived using highly overlapped information (Sect. 3.3.1) and (2) the estimation is realized for each DOY for all strategies except MAL, meaning that calculating CCs for each DOY shows the inflation to the largest extent.
The final SCDNA replaces estimates by observations, which can largely relieve the inflation of spatial correlation (Fig. S7 in the Supplement) depending on the degree to which observations are present in the record. For T min and T max , the CC is very close to that based on observations; for precipitation, the correlation in wintertime is even lower than that based on observations. Figures 9 and 10 show CCs between estimates at the target station and observations at the neighboring station. For precipitation, most strategies exhibit a similar spatial correlation structure with observations for most stations. QMR largely underestimates CCs compared to observations, which should be attributed to the differences between precipitation of reanalysis products and stations. There are notable differences in different strategies within one group. For example, QMN-1 shows larger inflation when observation-based CCs are higher, which is not seen in QMN-2 to QMN-4. This is probably because QMN-1 only uses information from the one neighboring station with the highest correlation with the target station for each DOY. The higher observation-based CC in Fig. 9 means this neighboring station could be more frequently used by QMN-1 to estimate data for the target station, resulting in the larger inflation of the CC. Another example is that INT-1 underestimates the CC for 68.75 % of stations, whereas INT-2 to INT-4 overestimates the CC for almost all stations. For SCD-1, the inflation of the CC is observed for 76.60 % of stations, whereas the magnitude of overestimation is smaller than that in Fig. 8. The mean values of observation-based and estimate-based CCs are 0.71 and 0.77, respectively. SCD-2 replaces estimates by observations and is the final dataset of this study. It reduces the mean estimate-based CC to 0.70. The overall spatial correlation structure of observations is generally preserved by SCD-2. However, SCD-2 calculates the CC for the entire period which is different from the period of observation-based CCs, resulting in uncertainties such as the underestimation for some stations when observation-based CCs are larger than 0.7.
The spatial correlation of T min is much stronger than that of precipitation (Fig. 10). Most strategies overestimate the CC for most stations, whereas the magnitude is quite small. For example, SCD-1 inflates the CC for 96.96 % of stations, while the mean CC values for observations (0.95) and SCD-1 (0.96) are very close to each other. QMR still underestimates CCs similar to Fig. 9 for precipitation. The CC based on SCD-2 is generally consistent with that based on observations, while slight underestimation exists for some stations when observation-based CCs are higher than 0.9. T max shows similar spatial correlation patterns to T min (Fig. S8 in the Supplement).
In summary, the inflation of the CC is inevitable particularly when estimates are obtained using information from a sole data source such as one neighboring station or one reanalysis product. The inflation is larger if each DOY is treated separately (Fig. 8; Fig. S7 in the Supplement) but smaller if the CC is calculated for all years (Figs. 9,10;Fig. S8 in the Supplement). Combining information from multiple sources (stations and reanalysis) and combining multiple strategies for each DOY are beneficial for estimating the overall spatial correlation structure. The spatial correlation structures vary for different strategies, and further studies are needed to clearly demonstrate how and why the estimation-based CCs differ from observation-based CCs.
The variability of observations and of the corrected and uncorrected SCDNA estimates (Step-7 in Sect. 3.3.3) is compared using the SD of the observation period (Fig. 11). The SD of uncorrected SCDNA precipitation is lower than that of observations, while after correction the SD agrees very well with observations. The mean values of SD are 7.36, 6.30, and 7.36 for observations, uncorrected SCDNA, and corrected SCDNA, respectively. For T min and T max , corrected and uncorrected SCDNA estimates both show consistent variability to observations. Figure 8. CC between target and neighboring stations for all DOY using station observations (the first column), SCDNA estimates (second column), and differences between SCDNA-and observation-based CCs (the third column). The CC is calculated for the observation period. For each target station, 10 neighboring stations are selected according to the correlation between time series from target and neighboring stations. Smaller numbers represent a higher correlation. For example, station 1 represents the neighbor with the highest CC in comparison to the target station. Each curve represents the median CC of all stations.

The performance of the serially complete dataset
Uncorrected SCDNA estimates show high accuracy in North America (Fig. 12). For precipitation, the median KGE of all stations is 0.87, and the median values of r, β, and γ are 0.91, 0.92, and 0.96, respectively. The KGE for Mexican stations generally ranges between 0.6 and 0.8, which is smaller than that in the US and southern Canada. Some stations in the Rocky Mountains, the Caribbean, Alaska, and northern Canada (regions with complex topography or climate) also show lower KGE for precipitation estimates. The spatial distribution of r is similar to that of KGE , while the magnitude is higher. According to γ , most stations underestimate precipitation variability, which is consistent with Fig. 11; β is generally lower than 1 in most regions of North America, particularly in the Rocky Mountains and Mexico where SCDNA underestimates precipitation.
Estimated temperature shows much higher KGE compared to precipitation. The median KGE and r of T min are 0.97 and 0.99, respectively. For T max , the median of KGE and r are 0.99 and 0.99, respectively. The median γ and β are both between 0.99 and 1 for T min and T max with small variations, particularly for T max (Fig. 12); the KGE of T min and T max is lower in the Caribbean and Mexico. For T min , the KGE for some stations around 45 • N and the Rocky Moun-G. Tang et al.: SCDNA Figure 9. Scatter density plots of CCs between precipitation from the target station and neighboring stations. For each target station, the neighboring station with the highest correlation with the target station is selected. The x axis represents the CC between observed precipitation from target and neighboring stations. The y axis represents the CC between estimated precipitation from the target station and the observed precipitation from the neighboring station. Each panel corresponds to one strategy in Sect. 3.3.2. SCD-1 represents SCD estimates after correction, while SCD-2 replaces estimates with observations. The CC is calculated for the overlapping observation period between target and neighboring stations, and the only exception is SCD-2 which calculates the CC using precipitation from target and neighboring stations during the entire period.  tains is lower than surrounding regions, although γ is spatially homogeneous for the same region. This is because the mean T min is close to 0 for some stations in this region, resulting in the large magnitude of β and γ . In contrast, T max exhibits homogeneous performance in the same region for all metrics.
Corrected SCDNA estimates (see  in the Supplement) have higher accuracy than uncorrected estimates (Fig. 12). For example, the median KGE for precipitation is improved from 0.87 to 0.90 after correction. The KGE for T min and T max is also improved but not as significantly as precipitation. The value of β equals 1 for all stations due to the mean value correction. For precipitation, γ changes from negative to positive for all stations, whereas the magnitude of the bias (deviation from 1) is smaller after correction. As a result, the spatial distribution of metrics for T min is also more homogeneous.
The distributions of KGE vary during the year (Fig. 13). For precipitation, more stations show lower KGE during summer (DOY 150 to 250) than at other times of the year, which may be due to the variability of summertime convective precipitation. For T min , some stations show lower KGE from DOY 100 to 250. The seasonal variation of KGE for T max is relatively weak, although KGE is slightly more concentrated at a higher level during spring and autumn than winter and summer. The overall performance of T max is better than T min and precipitation.

Comparison between the serially complete dataset
and gridded products SCDNA precipitation and temperature are compared with benchmark gridded products to demonstrate whether the SCDNA is a good choice when station data are unavailable. Actual station observations are used as reference. Although assessing gridded products using point-scale station data contains uncertainties (Tang et al., 2018a), the objective of this section is to illustrate their agreement with station observations in lieu of providing an exhaustive quantitative assessment of their real-world accuracy. Overall, the SCDNA achieves much higher KGE than reanalysis products for all variables (Fig. 14). For precipitation, the median KGE differences between the SCDNA and ERA5, JRA-55, and MERRA-2 are 0.48, 0.57, and 0.54, respectively. The corresponding KGE differences for T min are 0.46, 0.61, and 0.36, respectively. The improvement for T max is smaller, particularly in the eastern US where the topography is relatively flatter compared to the western US. The KGE differences of T mean are lower than T min but higher than T max due to the offset effect. T range suffers little from the elevation differences between stations and reanalysis Figure 12. The spatial distributions of KGE and its three components (r is the CC, β is the bias ratio, and γ is the variability ratio) for uncorrected SCDNA estimates over North America during the observation period. The maps are at a resolution of 0.5 • . The value for each grid cell is the median value of all stations within this grid cell. grids and is suitable to demonstrate the differences between SCDNA and reanalysis products. The median KGE differences for T range between the SCDNA and ERA5, 0.48,and 0.31,respectively. SCDNA and MSWEP precipitation is compared (Fig. 15). Since MSWEP merges data from numerous stations, the evaluation of MSWEP based on station data is not independent, which could result in the overestimation of its KGE . Even so, SCDNA precipitation shows higher KGE than MSWEP for 98.97 % of stations with a median KGE difference of 0.31. Figure 15 shows notable differences between MSWEP and SCDNA at the Canada-US border and the US-Mexico border. This is because MSWEP infers gauge reporting times by searching for the highest correlation between gauge data and the temporally shifted reanalysis/satellite estimates (Beck et al., 2019). The estimated temporal shift could vary with countries, which results in distinct differences of station-based evaluation results along national boundaries. The accumulation periods of station and MSWEP precipitation are inconsistent in some cases, which could affect the evaluation of MSWEP (see Sect. 5.1).
Note that the evaluation does not indicate that the SCDNA has higher accuracy than the gridded products; rather, the results show that SCDNA is a better substitute than gridded products when station observations are unavailable.

Observation time of stations
Meteorological stations in different countries usually have different local observation times, and stations in the same country may also experience changes in observation time (Vincent et al., 2012). Most station databases including those used in this study do not account for reporting time inconsistencies due to lack of hourly observations and welldocumented station metadata. Vincent et al. (2009) examined several methods to adjust the time of daily precipitation observations, which, however, often altered observed precipitation intensity. Beck et al. (2019) inferred the reporting time of daily precipitation observations by calculating the SCC between the series of stations and gridded products, which is  useful to correct the bias of gridded products. A simple experiment is carried out using the method of Beck et al. (2019) to infer the lag day of station series. For precipitation, 6418 stations show a non-negligible time shift from the reporting date (Fig. S10 in the Supplement). However, this method may be unsuitable for temperature because the estimated lag day is mostly zero, and the inferred reporting time cannot be directly applied to adjust station observations. The inconsistent reporting time has different impacts on precipitation, T min , and T max . For example, if a station records data from 08:00 on 1 January to 08:00 on 2 January, the station will probably use 2 January as the reporting time. However, two-thirds of the 24 h time is on 1 January, indicating that the accumulated precipitation could mostly occur on 1 January. T max could also occur during the daytime on 1 January, but it is hard to determine on which day T min occurs, which makes it challenging to adjust precipitation, T min , and T max at the same time. The difference between universal and local time makes this problem more complicated. Thus, the reporting time of stations is not corrected here due to aforementioned difficulties.

Homogenization
Inhomogeneities in station observations are defined as variations that are not caused by weather and climate factors. Long-term station records are often subjected to inhomogeneities due to factors like station relocation, observation time change, instrument change, and surrounding environment change (Venema et al., 2012). Many methods have been developed to identify breakpoints and homogenize station series at annual, monthly, or even daily scales (e.g., Ma et al., 2008;Vincent et al., 2002Vincent et al., , 2012. Different methods could generate different estimates of inhomogeneities as shown by many comparison studies (e.g., Beaulieu et al., 2008;Reeves et al., 2007;Venema et al., 2012). The four station databases (Sect. 2.1) used in this study provide original station records without homogenization. The SCDNA would inherit the potential inhomogeneities contained in these databases, and the infilling/reconstruction may also lead to discontinuities. The homogenization of the SCDNA is challenging considering that (1) the dataset covers a broad range of climate, topography, and countries, (2) the number of stations is large and differences between station periods (ranging from 8 to 40 years) are substantial, and (3) the suitability of existing methods for the homogenization of infilling/reconstruction estimates needs exploration. Therefore, homogenization is not carried out in this study, which, however, is an important direction for future studies.

Limitations of the KGE statistic
We use KGE because it incorporates information about correlation, bias, and variability and hence provides more information on methodological performance than an individual metric. For example, the PCC between temperature estimates and observations is usually close to 1 and cannot reflect the bias term, while the mean square error is prone to the effect of extreme values (or outliers). However, KGE also has limitations. For example, the values of KGE depend on the units of measurement (e.g., Santos et al., 2018) -in our case, the β values for temperature are clearly always close to 1 if the units of measurement for temperature are in Kelvin. Since these statistics incorrectly indicate very small temperature biases, we used degrees ( • ) for all KGE calculations in this study, ensuring that β has more leverage in the KGE statistics. Moreover, and critical to our analysis, the normalization used in the KGE formula (β and γ ) means that the KGE values are low when the denominators of β and γ are close to 0 (e.g., Santos et al., 2018). This problem is especially acute for temperature -for instance, we found that KGE values were very small for cases when µ o is close to 0. Nevertheless, the number of cases where µ o is close to 0 is rather small, where ∼ 0.5 % of all cases (based on all stations and all DOY) show absolute values of mean T min smaller than 0.1 • . For cases with µ o close to 0, the ranking based on KGE is similar to the ranking based on mean absolute error, which means that KGE can still function as a ranking indicator when its value is low. Further work is needed to both comprehensively evaluate the alternative infilling strategies presented in this paper and evaluate more advanced multi-method merging strategies.

Potential improvement directions
Several steps could be taken to improve the SCDNA. First, the optimal strategy could be different for each station as shown in the results in this study. Therefore, the quality of SCDNA may be further improved by using more infill-ing/reconstruction methods, which would yield diminishing returns at some point. For example, the long short-term memory (LSTM) could be suitable for providing missing station observations. Optimizing the configuration of various strategies will be necessary to balance computation efficiency and estimation accuracy, particularly when the number of stations is large. Second, some stations suffer from undercatch, which depends on gauge type, precipitation phase, environmental conditions, etc. The bias caused by undercatch can be substantial for stations located at high latitudes and the mountains (Yang et al., 2005;Scaff et al., 2015). Third, the SCDNA does not distinguish between rainfall and snowfall. Considering that a large part of North America has frequent snowfall in winter, precipitation phase classification will be useful for hydrometeorological studies. Auxiliary data from reanalysis and satellite products could be used to partition precipitation into rain and snow. Finally, although the SCDNA agrees well with station observations, long-term trends are difficult to reconstruct when actual observations are unavailable, meaning the SCDNA may not be suitable for climate trend analysis in the reconstruction period. Some gridded datasets use only stations with long-term records (e.g., Wood, 2008;Werner et al., 2019) to achieve temporally consistent estimates, whereas such stations are very few. Reasonable trend estimation is challenging but meaningful for SCD.
Furthermore, other variables such as wind and humidity observed by stations also suffer from the same problems faced by precipitation and temperature. Future studies should explore whether the current methodology is applicable to other variables. An SCD covering more variables would be useful for research in various fields.

Data availability
The SCDNA dataset is available at https://doi.org/10.5281/zenodo.3735533 (Tang et al., 2020) in netCDF format. The basic variables are station identification, latitude, longitude, elevation, date, and TLR derived in Sect. 3.2. Stations that undergo location merging (Sect. 3.1.1) are identified, and all relevant stations are included in the data file. For precipitation, T min , and T max , the variables in the netCDF4 file include original station observations, quality flags provided by original station databases, quality flags provided by this study, estimates from 16 strategies, uncorrected SCDNA estimates, corrected SCDNA estimates, the final SCDNA with estimates replaced by observations, data source flags indicating the source of each record in SCDNA (observations or 16 strategies), and accuracy metrics (KGE and its three components) for all estimates (16 strategies and SCDNA).
Scripts used to produce the SCDNA are available at https: //github.com/tgq14/GapFill (last access: 21 July 2020). The dataset will be regularly updated to cover the latest periods.

Conclusions
This study developed a daily SCD of precipitation, T min , and T max for 27 276 stations from 1979 to 2018 over North America (SCDNA). The original station data are compiled from multiple sources and undergo strict quality control. Many stations have non-negligible fractions of missing values in the observation and reconstruction periods. For each station, the infilling and reconstruction are implemented using 16 strategies (quantile mapping, statistical interpolation, and machine learning) based on information from neighboring stations and concurrent reanalysis estimates (ERA5, JRA-55, and MERRA-2). The final SCDNA combines estimates from the 16 strategies and is corrected using station observations. The spatial correlation is preserved and might be slightly inflated. The SCDNA estimates reproduce the variance of original station observations very well, particularly for temperature. The median KGE of the final precipitation, T min , and T max for all stations is 0.90, 0.98, and 0.99, respectively. The comparison with four benchmark gridded products shows that the SCDNA has much better agreement with station observations. The SCDNA will be useful for a variety of hydrometeorological studies in North America.
1. Integrity checks. The first type of integrity check is a duplication check to identify duplicated records for time series in different time periods. The second type of integrity check includes the streak check to identify consecutive identical values and the frequent-value check to identify close but not necessarily consecutive identical values. The world record exceedance check sets lower (−89.4 • C) and upper (57.7 • C) bounds of temperature.
2. Outlier checks. These include the gap check that examines the frequency distributions for all calendar months and the climatological outlier check that is based on the traditional z score (e.g., Hubbard and You, 2005).
3. Internal and temporal consistency checks, including the iterative temperature consistency check. These ensure some inherent relationships are abided (e.g., T min cannot be larger than T max ). The spike/dip check identifies temperatures which deviate from previous and following days by at least 25 • . The lagged temperature range check identifies abnormally large differences between T min and T max during a 3 d time window.
4. Spatial consistency checks, including the regression check and the spatial corroboration check. The regression check builds regression relationships between temperature at the target location and selected nearby stations to determine whether temperature at the target station should be flagged according to regression residuals and standardized residuals. The spatial corroboration check flags temperature at the target station if the value deviates far from the temperature at neighboring stations.
5. Extreme mega consistency checks. These ensure that certain relationships hold for the entire record of stations. For example, T max cannot be higher than the lowest T min for the calendar month, and vice versa.
For precipitation, quality control strategies are from three studies. The first part is similar to temperature but does not include the third type of checks (internal and temporal consistency checks). The second part is from Hamada et al. (2011).
1. Repetition checks. The nonzero check identifies constant daily values (> 10 mm d −1 ) that occur for more than 4 d. The zero check compares the annual zeroprecipitation frequency with its climatological value to spot unusual frequencies of zero.
2. Duplicated monthly or sub-monthly record check. The temporal CC and the number of days with equal precipitation are used to identify whether 2 different months have the same records caused by human errors.