A 500-year runoff reconstruction for European catchments

Abstract. Since the beginning of this century, Europe has been experiencing severe drought events (2003, 2007, 2010, 2018 and 2019) which have had an adverse impacts on various sectors, such as agriculture, forestry, water management, health, and ecosystems. During the last few decades, projections of the impact of climate change on hydroclimatic extremes were often capable of reproducing changes in the characteristics of these extremes. Recently, the research interest has been extended to include reconstructions of hydro-climatic conditions, so as to provide historical context for present and future extremes. 5 While there are available reconstructions of temperature, precipitation, drought indicators, or the 20 century runoff for Europe, long-term runoff reconstructions are still lacking (e.g, monthly or daily runoff series for short periods are commonly available). Therefore, we considered reconstructed precipitation and temperature fields for the period between 1500 and 2000 together with reconstructed scPDSI, natural proxy data, and observed runoff over 14 European catchments to calibrate and validate the semi-empirical hydrological model GR1A and two data-driven models (Bayesian recurrent and long short-term 10 memory neural network). The validation of input precipitation fields revealed an underestimation of the variance across most of Europe. On the other hand, the data-driven models have been proven to correct this bias in many cases, unlike the semiempirical hydrological model GR1A. The comparison to observed historical runoff data has shown a good match between the reconstructed and observed runoff and between the runoff characteristics, particularly deficit volumes. The reconstructed runoff is available via figshare, an open source scientific data repository under the DOI https://doi.org/10.6084/m9.figshare.15178107, 15 (Sadaf et al., 2021).

. Spatial distribution of the observed GHCN precipitation and temperature stations, GRDC discharge gauges and proxies for precipitation and temperature. complex non-linear relationships. While the lack of physical constraints in the data-driven models limits their application under contrasting (changing) boundary conditions (in comparison with those of the model training period), their advantage is that they can often directly use biased reconstructed data as an input series.
The objective of the present study is to provide a long-term, hydrological reconstruction for the Central European catchments, 55 utilizing the available gridded precipitation (Pauling et al., 2006) and temperature (Luterbacher et al., 2004) reconstructions, natural proxies (Ljungqvist et al., 2016) and other long-term historical data sources. Specifically, we use a combination of a conceptual hydrological model (GR1A; Mouelhi et al., 2006) and two data-driven models (Chen et al., 2020;Okut, 2016) to simulate the annual evolution of runoff over the period 1500-2000. We pay particular attention to low flows during drought years. Using long-term data on climatic conditions and runoff may provide an efficient technique of visualizing droughts and 60 low flow periods. The structure of the paper is as follows: the considered hydroclimatic reconstructions, natural proxies and observed data are described in Section 2. In Section 3, we introduce the data selection and pre-processing, hydrological and data-driven models and the drought identification. The reconstructed input fields, as well as our runoff simulations considering four input data combinations (precipitation, temperature, raw proxy and drought indicator) and two data-driven approaches, together with the hydrological model are evaluated in Section 4. Finally, we provide certain guidelines on the advantages and

Hydroclimatic reconstructions
We used reconstructed seasonal precipitation and temperature gridded data (0.5 • x 0.5 • ) over  • N / 29.75 • W-39.75 • E) from 1500 to the present day. To this end Pauling et al. 2006, reconstructed precipitation (P ) was done 75 by applying principal component regression to documented evidence (i.e., memoirs, annals, newspapers), speleothem proxy records (Proctor et al., 2000) and tree-ring chronologies from the International Tree-Ring Data Bank (ITRDB). Reconstructed temperature (T ) is obtained from Luterbacher et al. (2004) which relies on historical records and seasonal natural proxies (i.e., ice cores from Greenland and tree-rings from Scandinavia and Siberia). We refer to these data sets as reconstructed forcings. Additionally, we used data from the Old World Drought Atlas (OWDA; Cook et al. 2015) which contains information 80 regarding moisture conditions across Europe, specifically the self-calibrated Palmer Drought Severity Index (scPDSI) using summer-related, tree-ring proxies for the period from 0 to 2012 CE.

Other hydro-climate proxy information
We also included a raw proxy series for hydroclimatic variables by Ljungqvist et al. 2016 in our analysis, as mentioned in Supplementary tables (S1 and S2). We considered 20 precipitation related proxies consisting of three tree-ring widths, eight 85 lake sediments, five peat bogs, two speleothems and two peat humidifications. Similarly, there were 17 temperature-based proxies including six tree-rings, three ice cores, three lake sediments, two speleothems and three written records. These proxies are not evenly distributed across Europe (Fig. 1). The available series, typically spanning hundreds of years, were restricted to 1500 -2000 in our study. Data standardization was conducting by subtracting the mean and dividing by standard deviation (both calculated considering the time-series after 1900). Missing values were calculated by linear approximation and, in this 90 way, we obtained a consistent set of proxy information for each (annual) time step. It has been previously established that these proxies correlate well with climatic variables, such as precipitation and temperature (Riechelmann and Gouw-Bouman, 2019).

The Global Historical Climatology Network (GHCN)
The GHCN dataset (GHCN; (Peterson and Vose, 1997)) -one of the largest observational databases, collated by the National Oceanic and Atmospheric Administration (NOAA, Quayle et al., 1999) -was used to verify the accuracy of the precipitation 95 and temperature reconstructions. The GHCN-m (version 2) data-set contains observed temperature, rainfall and pressure data from 1701 to 2010. Data for the majority of stations are, however, available after 1900. GHCN-m precipitation and temperature from GHCN V2, as well as from the new GHCN V4 version were included in the preliminary analysis (Menne et al., 2012).
We found 113 precipitation and 144 temperature stations within the European domain (see Fig. 1) with records dating back earlier than 1875. Most stations are geographically concentrated in Central Europe, and few stations are located in the eastern 100 and northern areas of Europe (see Fig. 2). These data, hereafter, are referred to as the GHCN data.

Observed runoff
The Global Runoff Data Center (GRDC; https://www.bafg.de/GRDC/EN/Home/homepage_node.html) provides data for more than 2780 gauging stations in Europe, with the oldest records starting from 1806. The runoff series from the GRDC were selected based on the condition of data availability, at least 25 years prior to 1900. In total, there were 21 such stations 105 predominantly available in Central Europe: 11 in Germany, two in France, two in Switzerland, one in the Czech Republic, one in Sweden, one in Finland, one in Lithuania and one in Romania (see Fig. 1). These stations cover 12 European river basins (Rhine, Loire, Elbe, Danube, Wesser, Main, Glama, Slazach, Nemunas, Gota Alv, Inn and Kokemaenjoke), with areas ranging from nearly 6 100 km 2 (Kokemaenjoki, Muroleenkoski, Finland) to 576 000 km 2 (Danube, Orsova, Romania). The mean annual discharge (Q mean ) varies from 50 m 3 s −1 to 5 600 m 3 s −1 and spans different time periods for each catchment.

110
The most extensive records were available in KRV Sweden and Dresden, containing the longest discharge series of 212 and 208 years, respectively. The gauging station in Köln also provided 195 years of data for the Rhine River. Note that some of the gauging stations are located in close proximity and therefore have a greater degree of similarity in relation to the runoff timeseries (e.g., two stations in Basel, Rhine). Detailed information relating to all the selected stations and their silent characteristics are provided in Table 2.

Study area
In the first section of the study, the analysis is performed across the European region bounded by (30.25 • N-70.75 • N / 29.75 • W-39.75 • E), in which the grid-based reconstruction of precipitation and temperature was verified against the observation data.
In the second section, we focus on 21 specific Central European catchments, corresponding to the available long-term GRDC discharge records. The study area and the observational data of the hydroclimatic variables are shown in Figure 2.  This section is divided into three parts. The first part describes the selection and pre-processing of the reconstructed forcing (i.e., precipitation and temperature) for validation across Europe and the preparation of data for runoff simulation in several catchments. The hydrologic and data-driven models used, for runoff simulation are introduced in the second part. Finally, we describe the methods for the evaluation of simulated runoff (including drought identification).

Data pre-processing
We prepared two datasets. The first consists of reconstructed forcings and the corresponding GHCN data for all available European stations with long records (see Section 2.3). We considered the selected GHCN stations and data from the corresponding grid cells of the reconstructed forcings for this forcing validation exercise. To understand how the reconstructed forcings match the GHCN data across time scales, we aggregated both the reconstructed forcings and the GHCN data from seasons to 1, 2,
The second dataset represents the data of 21 selected catchments and consists of reconstructed forcings and the proxy data and runoff for the calibration and validation of individual catchments (Fig. 2). The catchment average precipitation and temperature were estimated from the reconstructed forcings by averaging the grid cells covering the specific catchment boundary.
Similarly, we calculated the average catchment PDSI from the OWDA, and also selected the raw proxy data from inside the 135 catchment or within a 100 km buffer around the catchment.

Hydrologic model (GR1A)
To simulate runoff in each catchment, we applied the annual time-scale hydrologic model, GR1A (Mouelhi et al., 2006). This model builds upon the work of Manabe (1969), considering dynamic storage and antecedent precipitation conditions. The model consists of a simple mathematical equation with a single (optimized) parameter: where Q, E and P represent annual runoff, potential evapotranspiration and precipitation, respectively; i denotes the year specific index. The parameter X is optimized, maximizing the Nash-Sutcliffe efficiency (NSE) between the observed and modelled runoff data. The potential evapotranspiration was calculated using the temperature-based formula, provided by Oudin et al. (2005).

Data-driven models
Data-driven methods, Artificial Neural Networks (ANNs; Kwak et al., 2020;Hu et al., 2018;Senthil Kumar et al., 2005)  relations. The ANNs consist of artificial neurons organized in layers and connections that route the signal through the network.
Each connection has an associated weight that is optimized within the calibration (in the context of ANNs, known as training).

150
There are many kinds of ANN which differ in terms of structure and type of connections, as well as direction, functional forms used for neuron activation or training.
In the present study, we considered two approaches : long short term memory (LSTM) neural networks and Bayesian regularized neural networks (BRNN). These techniques are commonly used to determine the relationship between rainfall and runoff (Hu et al., 2018;Xiang et al., 2020;Kratzert et al., 2018;Ye et al., 2021). We considered combinations of gridded forcing,

155
OWDA-based scPDSI, proxies and lagged gridded forcing as an input into the network for both model types. Specifically, the network using only gridded forcing is referred to as "Gridded", the network with a combination of gridded forcing and natural proxies is known as "Gridded+Proxies", the network with gridded forcing and OWDA scPDSI is termed as "Gridded+PDSI" and finally the network which include lagged gridded forcing is referred to as "Gridded+Lag". Figure A1 shows the architecture of LSTM, which is a modified version of the recurrent neural network, based on the back-

160
propagation algorithm (Hochreiter and Schmidhuber, 1997). In this structure, LSTM allows to learn a long-term data set and controls the overfitting problem (Chen et al., 2020). LSTM generally consists of two unit states (hidden and cell states) and three distinct gates (hidden, input and output). In this process, a given cell state saves the long-term memory at the previous unit, while hidden states act as a working memory to process information inside the gates. These gates can determine which information needs to be processed, remembered and transferred in the next state. With LSTM, different activation functions, 165 such as hyperbolic tangent tanh and sigmoid σ, can be used to update unit states. The implementation of the LSTM is carried out by means of R packages: "keras" (Arnold, 2017) and "tensorflow" (Abadi et al., 2016).
The training process of the LSTM is time consuming due to its inherent complexity. Therefore, the BRNN method was proposed because of its fast learning and high convergence approximation. Moreover, the BRNN helps to tackle the complex relationship between rainfall and runoff responses (Ye et al., 2021). This method implements the initial values of the ANN 170 parameters, using Bayesian regularization (Okut, 2016). Initial weights are set up as a prior distribution function during model training, typically taken as a normal distribution. By applying Bayesian formulation, weight parameters keep updating prior probability distribution to the posterior probability distribution. We trained this model in R using the "brnn" function of the "caret" package (Kuhn, 2015).
In both cases, the model optimization runs were conducted several times, and the one with the best performance was consid-175 ered for further evaluation. To reduce the likelihood of overfitting during the calibration/training, a fraction of the calibration data was used to check the performance of an independent (or so-called "testing") set. In addition, the network parameters (such as the number of neurons, activation functions, etc.) were iteratively tuned to yield fast convergence and good skill.

Goodness-of-fit assessment
We used a set of seven statistical metrics to assess the performance of simulated runoff, namely: Nash-Sutcliffe efficiency (KGE), root mean square error (RMSE), mean absolute error (MAE). The mathematical formulations of these metrics are provided in Appendix A1.

Runoff drought identification
To check the utility of our reconstruction, we finally explore how well the runoff droughts are represented in the simulations.

185
Our study considers hydrological droughts, defined based on the streamflow deficit, following the threshold level approach (Yevjevich, 1967;Rivera et al., 2017;Sung and Chung, 2014). This approach is typically used for daily or monthly time scales, considering 0.1 or 0.2 quantile threshold levels. To accommodate the annual scale as used here, we defined the start of the drought, when the runoff anomaly falls below the 0.33 quantile (regular drought) and the 0.05 quantile (extreme drought).
The drought persists until the runoff rises above the threshold again. Drought length and severity (the cumulative difference 190 of runoff and the threshold) were then calculated for each identified drought year. Hydrologic drought series can be further assessed to understand the critical aspects of runoff (temporal) dynamics and to classify past droughts in Europe (Cook et al., 2015;Wetter and Pfister, 2013).

Results and discussion
In this section, we analyze the 500-year-long reconstruction over space and time across Europe. Firstly, we provide a compar-195 ison between the GHCN observed precipitation and temperature, and the corresponding grid cells from Pauling et al. (2006) and Luterbacher et al. (2004) reconstructions. Next, the reconstructed runoff series for the selected catchments are evaluated against the corresponding observed GRDC runoff data.
Two distinct model types were investigated, i.e., a process-based hydrological model (GR1A) and two data-driven models (BRNN and LSTM). While the former takes gridded precipitation and temperature as an input, in the case of the latter, we also

Evaluation of reconstructed precipitation and temperature fields
The 500-year long paleoclimate reconstructions of precipitation (P) and temperature (T) were validated against the GHCN 205 observation data. The spatial map for the comparison is given in Figs. 2 and 3. The reconstructed data are verified against observational P and T across 99 and 94 European sites, respectively. Figure 2 shows that the correlation coefficient (R) of P reconstruction at most of the sites is above 0.5; the index of agreement (D) is larger than 0.6; KGE and NSE are showing values below 0.5 (NSE) and 0.6 (KGE); the rSD measurement is greater than 0.7 and RMSE varies between 50 and 100. We found relatively good performance values for temperature reconstruction, as depicted in Fig. 3. In this case, RMSE, estimated 210 between reconstructed and observational T, is around 0.2 • C; rSD fluctuates between 0.95 and 1.05, while R is higher than 0.84 and D is above 0.90. The NSE and KGE values were above 0.5 at many stations. Some stations indicated a worse performance and could not adequately capture the observed temperature variability.
Furthermore, we tested the skill of gridded reconstructed forcings to capture the multi-temporal characteristics of observed P and T dynamics, i.e., aggregated time-scale features ranging from seasonal to 30-year data. To this end, the seasonal values of 215 the P and T data series were aggregated from 0.25 to a 30-year period (with annual increments) with no overlapping windows.
The GOF statistics (Section 3.4) between each GHCN station and the corresponding reconstruction grid cell were estimated.
In Figure 4, we present the median gof statistics (black line), the ranges between the 25 th and 75 th (light envelope) and the 10 th and 90 th quantiles (dark envelope) of the distribution of the gof statistics over the stations for each aggregated time-step.
The RMSE for precipitation and temperature drops from initially high values for seasonal scales to relatively stable values for 220 aggregations with a duration greater than 10 years. This is expected since the RMSE depends on the number of observations.
With regards to other statistics, except for correlation which shows relatively stable values over aggregations, it is evident that the reconstruction skill decreases the greater the (aggregation) time-scale. In particular, the variance is underestimated and this underestimation is more substantial for long aggregations (see rSD panel in Fig. 4). This may imply that the utility of multi-year (drought) assessment, utilizing the reconstructed forcing datasets can be limited (and should be interpreted with caution).

225
It is worth noting that the large spread of gof statistics is mainly due to the outlying values at the grid cell, located along the boundary of the domain (i.e., the interface between land and sea/ocean) and high elevations (cf. also Figs. 2 and 3). In general, reconstructed precipitation exhibits greater differences from observations than temperature. This may be because the proxies considered in the reconstruction rely on different seasons and climate conditions. Additionally, the shortest available instrumental data before the 20 th century could encounter certain technical errors, such as problems with instrumental tools, 230 station relocation and dating issues (Dobrovolný et al., 2010). Moreover, other studies (e.g., Ljungqvist et al., 2020) stated that the precipitation series employed for the reconstructions were relatively shorter and more erroneous than the temperature series before the 20 th century (Pauling et al., 2006;Harris et al., 2014). Finally, the chosen statistical technique (principal component regression) could also contribute to variance inflation with larger time-scales (Pauling et al., 2006). For runoff prediction, we have considered several input variables for the models (i.e., the GR1A hydrologic model, the BRNN and LSTM data-driven models), as detailed in Table 4. The available GRDC observed runoff time-series at each gauging location were split into two parts: calibration , used to identify model parameters and validation (prior to 1900), for independent verification using the GOF statistics.
The GR1A conceptual hydrological model was driven by the gridded reconstruction of P and T to simulate the runoff 240 for each catchment separately. The simulated runoff series were then compared to the corresponding GRDC observations and the results were summarized by means of GOF statistics. As can be seen in Table 3, the correlation and NSE statistics for calibration achieve reasonable results at most of the catchments, with a few exceptions (i.e., Kokemenjoki, Goeta, Nemunas and Inn). These (relatively poorer catchment skills in northern Europe) are in line with the previous findings of Seiller et al. (2012) who noted that the lumped hydrological models often exhibit larger uncertainties and fail to capture the extreme catchment To further demonstrate the differences between the individual models, we show the simulated runoff series for all models for 265 those catchments with the highest (Blois, Loire) and lowest (Smalininkai, Nemunas) performance in Figure 5. The performance of the models is comparable during the calibration period for the Loire River. Clearly, all data-driven models are capable of mimicking the observed runoff, while the GR1A model exhibited certain minor deviations, primarily until 1930. In the validation period, the differences between the models are more visible, in particular, for above-average flows. At the beginning of the validation period (1870-1880) all models failed to simulate the high annual flows.

270
In the case of Nemaunas catchment, the GR1A simulation deviates extremely from the observed data and cannot capture the mean flow level. However, the calibration is poor even for the data-driven models and, does not simulate the year-to-year variability appropriately. Interestingly, the error in the GR1A model is less in relation to validation than calibration. The datadriven models perform in a similar way to that of calibration, with only minor differences between the two periods. Looking at the gof statistics, the models considering OWDA-based scPDSI or lagged forcings (e.g., P t−1 ) perform slightly better in

The runoff reconstruction datasets
As a first step, we excluded the catchments that exhibited poor performance in validation (see Table 3). As a threshold, we 280 considered validation NSE of 0.5 for at least one model, following the approach used by Ayzel et al. (2020). In this step, we  Table 4. The combination of gridded forcing with lagged values results in the best performance over nine catchments, of which seven are driving the BRNN and the remainder the LSTM. The LSTM with gridded forcing and OWDA-scPDSI was 290 best in one case, and the remaining four were most appropriately simulated with the BRNN and BRNN [Gridded+PDSI]. It should be noted that the differences between the models performing well are small, as noted in Figure 5 and further demonstrated in Figure 6.

Identification of low flows and significant hydrological drought events
In the final step of the analysis, we compared the droughts identified in the reconstructions with the GRDC observed series (Fig. 7). The match between the simulated and observed runoff deficit is less compared to the annual runoff time series. For most of the stations, the simulated deficit is lower than the corresponding observed estimates. This suggests that the reconstructed 310 precipitation and temperature fields do not represent the inter-annual variability correctly, which is in line with findings from Fig. 4. Despite a widespread issue with the representation of inter-annual persistence, Fig. 8 shows that the runoff deficits are simulated reasonably well for the Rees-Rhine and Köln-Rhine catchments.

Wesser
2. There is no significant difference between the BRNN and LSTM-simulated runoff neither in terms of the individual 360 values nor in relation to the validation metrics.
3. Validation skill metrics suggest that for runoff prediction, it is beneficial to consider data-driven models that explicitly account for serial dependence either through input data (e.g., time-lagged input fields) or directly in the model structure (e.g., LSTM -networks).
4. The droughts identified in the reconstructed series correlated well with significant documented events (such as 1540, 365 1669, and 1921).
The reconstructed series relies heavily on the consistency of underlying reconstructed precipitation (Pauling et al., 2006) and temperature (Luterbacher et al., 2004) forcing fields. Unfortunately, this cannot be fully verified directly, due to the lack of sufficient long-term observational data sets. With the limited information (GHCN), we identified several notable deficiencies in the reconstructed forcings, in particular, underestimated variance in precipitation reconstruction, leading to inconsistencies 370 in observed runoff (e.g., demonstrated by the poor results of GR1A for some catchments). Moreover, proxy records are spatially heterogeneous (also used in the development of gridded reconstructions). Due to the fact that some regions are better represented than others and inevitably this results to poor performance over the latter.
However, the skill of precipitation and temperature reconstructions across the selected catchments to develop runoff is fairly good. In addition, the data-driven methods that were used in the paper are capable of removing systematic bias (as was proven 375 in validation). We cannot be sure that the link between reconstructed forcing and runoff is stationary when going back in time.
Moreover, when the number of natural proxies decreases, the uncertainty increases. The reconstructed data should, therefore, always be considered with caution. In addition, we showed that the skill of the reconstructed forcings decreases with time-scale.
This may imply problems with the representation of multi-year droughts.
Future research could consider further improvements of the simulations, e.g., by training a meta-model combining the runoff 380 simulations from several fitted models. Since interest is not often focused on the runoff series, but on some other indicator (such as PDSI in the case of drought), it is also possible to simulate the drought indices directly, considering either the precipitation and temperature input fields or the simulated runoff. Finally, discrete classifiers could also be used to simulate the drought (or water level) classes directly. We used a few statistical measures to assess the skillfulness of runoff reconstruction using a gridded-based simulation and an observed data-set. These measurements are mathematically defined as follows: 395 rSD = SD gi SD oi The terms g i and o i refer to the gridded and observed time series at point i, respectively. The standard deviations ratio (rSD) returns the maximum value of 1. The observed variability is underestimated when the value is less than one, while the observed variability is overestimated when the value is greater than one.

RM SE
The RMSE and MAE measure how well predictions fit the measurements. MAE and RMSE values can range from 0 to infinity, indicating a perfect fit to a zero fit.
Cor computed the correlation of observed and predicted data. The method can be specified as "kendall" or "spearman".

405
Kendall's tau or Spearman's rho are used to estimate rank-based competence. The Nash-Sutcliffe efficiency (NSE), alternatively referred to as model efficiency (Nash and Sutcliffe, 1970), is a metric for the model's overall competence. It is defined as follows: predictions are on average less accurate than using the long-term mean of the observed time series mean(o i ).
Another coefficient of efficiency D, the index of agreement represents a decided improvement over the coefficient of determination but also is sensitive to extreme values, owing to the squared differences.
The index of agreement ranges from 0.0 to 1.0, with higher values signifying a better agreement between the model and 415 observations, similar to the interpretation of the coefficient of determination.
The Kling-Gupta efficiency (KGE) index is calculated using three primary components: r, α, and β. The symbol r denotes the Pearson product-moment correlation coefficient; α denotes the ratio of the standard deviations of the simulated and observed values; and β denotes the ratio of the mean of the simulated and observed values. α, β, and r have an ideal value of one. s is a three-dimensional numeric representation of the scaling factors of length three that is used to adjust the relative importance of various components.

A2 Data prepossessing of LSTM
To build the LSTM model, we use the Keras environment with its high-level application programming interface (API) for neural networks and tensor flows. Figure A1 represents the structure of the LSTM neural model for the rainfall runoff relationship in several catchments. We design our network by stacking one LSTM and two dense layers on top of one other. As shown in Fig.   A1, the model configured four distinct input combinations, each of which was normalized to [0, 1] in the training and testing 430 phases. The model considers the Rectified Linear Unit (ReLU), using component wise multiplication and defining the dropout parameter as 0.1. According to Kingma and Ba (2014), the optimization algorithm plays a significant role in the algorithm's convergence and optimization. For this reason, Adam's optimizer is considered, as it performs stochastic gradient descent (SGD) more efficiently using the backpropagation algorithm. During compilation, the learning rate is set to '0.001' or '0.002' and the model selects random batch sizes and epochs. In addition, the mean absolute error is a function used as an objective 435 to minimize residues and achieve optimum value. The checkpoint algorithm is also applied to test the model's accuracy level.
Finally, the best output of the model is saved, with minimum loss and better accuracy.