New high-resolution data sets for near-surface daily air
temperature
(minimum, maximum and mean) and daily mean wind speed for Europe
(the CORDEX domain) are provided for the period 2001–2010 for the
purpose of regional model validation in the framework of DecReg,
a sub-project of the German MiKlip project, which aims to develop
decadal climate predictions. The main input data sources are SYNOP observations, partly supplemented by station data from the
ECA&D data set (

In climate research, data of meteorological observations are preferably provided in the form of continuous regular grids. In this way the data can be used for regional or global climate monitoring as well as for a comparison with the outputs from numerical weather prediction models and climate models. One of the main and most reliable initial data sources are measurements taken at ground station networks like SYNOP (synoptic observations recommended by the World Meteorological Organization, WMO). Interpolation or averaging procedures are used to transform such point data to values representative for grid cells of regular size and distance.

For the near-surface temperature a variety of gridded observational
data is available. One of the most prominent global data sets is the
HadCRUT4 data set provided by the UK Met Office Hadley Centre and the
Climatic Research Unit (CRU) at the University of East Anglia

In addition to the pure observational data sets, so-called reanalysis data represent alternative sources. Measurements from surface stations, radiosondes and satellite data of different meteorological parameters serve as input data for the assimilation scheme of weather prediction models. Three-dimensional gridded data for a variety of parameters describing the initial status of the atmosphere are obtained. However, the dependency on model physics makes reanalysis data unsuitable for the evaluation of model forecasts.

Concerning the near-surface wind speed, the availability of gridded
observational data is currently very low. On a national level a few
efforts have been made to calculate horizontal wind fields based on
station reports

Within DecReg (decadal regional predictability), a subproject of
MiKlip (decadal climate predictions), the predictive skill of regional
climate models on a decadal timescale is investigated using hindcast
experiments. Independent gridded observational data sets for the
European region are used as reference. In order to meet the spatial
scale of these models (

A variety of interpolation methods can be used to derive continuous field
data based on point measurements (see overview given by, for example,

In this work a combination of regression and kriging is used to
compute gridded data in 0.044

The main input data sources used in this work are SYNOP reports. Daily minimum
temperatures (

Figure

Daily

Accuracy of daily mean temperatures

Differences of daily temperature data from SYNOP and ECA&D for
January 2010 at stations with identical coordinates. Mean and maximum
absolute daily deviation for minimum temperature

The national weather agencies contributing data to ECA&D determined
daily temperature parameters based on partly deviating observation intervals.
This leads to potential discrepancies between SYNOP (using consistent
intervals; see Sect.

To take account of this consistency problem, an algorithm was designed to
include suitable data from the ECA&D archive, considering the
density of SYNOP reports in each target area and by data comparison at
identical stations (with same coordinates and altitudes). The differences
at these stations were used as
indicator for the consistency in the target region for each
day. Depending on the presence of SYNOP station data in a certain area,
different thresholds for considering or rejecting ECA&D station data
were used (e.g., daily deviation required to be smaller than 1

In Fig.

The temporal evolution during 2001–2010 of input data for the different
parameters is illustrated in Fig.

Temporal evolution of total input data used for the interpolations 2001–2010 (see color code). The dotted curves show the basic number made up by SYNOP stations. The differences indicate the increase by the inclusion of ECA&D data. SYNOP data are used for the interpolations for wind speed only.

All input data (the hourly data of each day and the daily data of each month) were quality checked regarding different types of inhomogeneities: (1) outliers, (2) significant shifts in the time series, (3) constant data over longer intervals and (4) exceedance of climatological thresholds. In the following a closer description of the strategies for the example of hourly temperature data is given.

For type 1 the minima and maxima of each daily cycle at a target
station are subtracted
from the mean of the cycle while omitting the extremes in the
averaging. These test values (denoted dtn and dtx in the program) are
considered absolutely (adding notation 1) and relatively
(divided by the standard deviation of the daily values without extremum,
notation 2). Based on experiments with data of several example months,
empiric thresholds were determined to decide whether a value is
considered an outlier or not. Depending on the number of the daily
values and the
comparison of absolute vs. relative test value these thresholds range
between 8 and 14

For inconsistencies of types 2 and 3 a running standard deviation
(SD

In the case of wind speed, similar techniques, but with adjusted thresholds, were
developed to consider erroneous data. Concerning climatological
thresholds, a global upper value of 65

For the time series of daily values similar tests following the
strategies for the hourly data are applied. This quality check is
particularly important for the SYNOP extreme values and for all ECA&D
data because related hourly data are often not available. For stations
where both extreme values and hourly raw data are available,
a consistency check between extremes based on these hourly data and the
aggregated extremes (denoted “hrl” and “agg” in the following) is
made (e.g., minimum

The seven regions used for temperature interpolation. The regional fields are finally merged using the regional weights (grey color scale). Stations used to calculate regional lapse rates are orange.

The hourly data are also used to fill gaps in monthly time series of
the extremes, if the overlap between, for example,

During the interpolation process a monthly background field for each
parameter is first created. Therefore, only time series with no more
than six missing values within a considered month are used.
To achieve a more precise estimate of the monthly
mean, missing values are reconstructed by linear regression with
neighboring stations (depending on available
stations search radii increased stepwise to a maximum of

For temperature interpolation, a regression kriging approach

Predictor fields of altitude

The steps are performed separately in seven overlapping regions
(see Fig.

As the first step, a multiple linear regression of the monthly means for each
station against data fields of altitude (using elevation data from the
shuttle radar topography mission (SRTM; see

Among the three predictors, altitude is the most crucial because
temperature typically strongly depends on it and altitude changes in
space occur on very small scales. Thus, linear regression against
altitude can substantially improve the final interpolation results in
regions with pronounced orographic characteristics. In a new setup
applied in this work the dependency of monthly temperature from
altitude is determined first and independently from the two other
predictors on the basis of station data from mountainous areas. This
strategy was chosen because height coefficients derived from the
standard multiple regression are potentially affected by strong
horizontal temperature contrasts. For example, if mountain stations are
concentrated in a part of a region with relatively low temperatures,
lapse rates tend to be overestimated in the previous setup.
Implausible lapse rates were diagnosed under such
conditions, especially for the Scandinavian
region. We could
solve this issue with an independent regression step on altitude based on
station data from valleys and mountains in relatively close distance. The orange dots in Fig.

A second modification in the altitude regression setup was implemented in this work. Also daily temperature–altitude dependencies are estimated using the same strategy as above. In this way variations from day to day, which occur especially in winter, can be considered.

Comparison of regression setups used to estimate temperature lapse
rates for mean temperature in Scandinavia.

For 2010 we compared the height coefficients according to the previous
setup (setup 1, involving all predictors and station data) and the new setup
(setup 2, separate regression with subsets). Reference lapse rates were
calculated using the regional averages of representative data pairs from
mountains (highest available station in a target area, at least
700 m a.s.l.) and nearby valleys. The data of up to two suitable valley
stations (at least 400 m below height of mountain station)
were used for each mountain station to receive a robust average. A minimum
distance of 0.8

In comparison to the daily reference lapse rates (Fig.

Latitudinal changes of the solar radiation as well as land–sea distribution and atmospheric dynamics (preferentially leading to a zonal air mass exchange) affect the predictor parameter of the long-term zonal monthly mean temperature. Continentality reflects the buffering effect of the oceans on annual temperature changes. In contrast to altitude, these two predictor fields exhibit moderate spatial changes. Their potential for improving the interpolation is thus important in regions with a low observation density, e.g., in North Africa.

Another modification compared to

Whole-domain averages of spatial variance explained by single predictors (%) for monthly mean temperature and 4 tested months. In the bottom row the results for the multiple regression model involving all predictors are given.

In Table

The monthly regression residuals (observations minus values according
to regression model) are interpolated on a

In the final step the differences between daily and monthly temperatures are interpolated following the same concept as above. Before this daily interpolation all daily anomalies are height-normalized using the daily regression coefficients (correcting deviations from monthly mean) for the temperature–altitude relationship determined in step one. The daily temperature field is eventually calculated as the sum of monthly temperatures, daily height-normalized residuals and the reversal of the height normalization.

An important aspect in the interpolation using kriging is the adjustment of
the kriging parameters (for details see, for example,

The corresponding graph, illustrated in Fig.

Several strategies of fitting the variogram function to the station data in
each region were tested in this work. First, a “null” variogram is defined
based on experiments with data from 4 example months (January 2001,
July 2001, January 2010, July 2010). Using cross validation, thus leaving out
subsequently one data point and reproducing it based on the information from
the remaining stations, different combinations of the three parameters are
tested. The parameter values performing best, define the “null” variogram.
An automated function for fitting variograms

Idealized variogram with the parameters nugget, sill and range

For the interpolation of daily wind speed a new method based on the
concept used for temperature was developed. Different predictor fields
correlated with wind speed were tested and chosen. Again, the seven
regions displayed in Fig.

Alpine region (denoted region 8 in the following) introduced for the interpolation of wind speed.

After testing a variety of potential predictor fields, four parameters
were chosen for the linear regression (see Fig.

Coastal distance is also of high relevance for the mean wind speed in
10

Surface roughness describes the deviations of a surface from an ideal
smooth form. On the Earth's surface obstacles such as bushes, trees or
buildings increase the surface roughness and thus affect the movement
of air. According to theory the wind speed change with distance from
the surface shows the following simplified dependency (under neutral
stability conditions) on the roughness length

Predictor fields of exposure

In addition to the three “static” predictor parameters above, the use of
meteorological field data can provide valuable information for regions with
low station coverage.

Despite the filter algorithm applied for reanalysis data in regions of
high altitudes, a slight dependency of reanalysis data on exposure and
roughness length remains for areas like the Alps, the Atlas Mountains,
the Caucasus and parts of Turkey. Thus, for the Alps, where a good
coverage with station data is available, a new region
(Fig.

Whole-domain averages of the spatial variance explained by single predictors (%) for monthly mean wind speed and 4 tested months. The bottom row shows the result for the multiple regression model involving all predictors.

Regional variogram parameters nugget (relative to sill) and range
(

Regional variogram parameters nugget (relative to sill) and range
(

The four predictors are used for the regression of the data of monthly
mean wind speed. In contrast to temperature, wind speed data tend to
produce a logarithmic distribution. Therefore, ratios between monthly
wind data and the corresponding area mean of the related region are
considered. Core regions relevant for the determination of the
regression coefficients in each region were defined in the same way
as for temperature. For the new region 8, weights above 0.5 define its
core area. The test results of the monthly regression for the 4
example months are listed in Table

Also, linear regression on a daily basis was tested, focusing
especially on the predictive skill of the daily ERA-Interim reanalysis
data. Thereby we found good correlations between ERA-Interim and the
daily observations. On average, 31 % of daily variance could be
explained by ERA-Interim over 4 tested months (same as above).
Thus, an additional regression step on a daily basis is applied using
daily anomalies of ERA-Interim 850

Steps in the interpolation of daily mean temperature for 31 July
2010.

Following the same scheme as described for temperature, the
normal-score transformed residuals of the monthly regression are
interpolated using simple kriging. Again, a “null” variogram
optimized on the basis of cross-validation experiments for 4 tested
months (same as above) was determined for monthly and daily means in
each region. The results are shown in Table

After normal-score back transformation the gridded monthly residuals
are added to the gridded regression values and multiplied by the
absolute mean wind speed of the considered region (correcting the
normalization applied before regression) to obtain the monthly field
of

In the daily kriging step the daily anomalies with respect to the
monthly mean at each station are interpolated. Here, ratios instead
of absolute deviations are considered, respecting the characteristics
of wind speed distribution. As noted in Sect.

Steps in the interpolation of daily mean wind speed for 28 February
2010.

For each of the three interpolation steps uncertainty estimates are
recorded. For the two kriging steps the kriging variance is used as
a measure of uncertainty. Kriging variance is known to lack precision
on a local scale, since local variation of the data is not considered
in the estimation of uncertainty. More sophisticated approaches were
suggested by

Detailed image of central Europe for

Furthermore, cross validation for all data within the example years
2001 and 2010 is applied to receive error estimates for all station
coordinates. In Sect.

In Fig.

Relative explained variance for monthly mean wind speed and for the monthly mean of the three temperature parameters (see color code) for 2001–2010.

Corresponding results for daily mean wind speed on 28 February 2010 are shown
in Fig.

Annual cycles (means 2001–2010) of relative explained variance for
monthly mean temperature (

In the following the results of the monthly regression analysis for
the full decade 2001–2010 are presented for wind speed and for the three temperature parameters. In Fig.

This aspect is investigated more closely in Fig.

For wind speed (Fig.

For two years, 2001 and 2010, the quality of the final interpolation
product is evaluated by applying “leave one out” cross validation
(as defined in Sect.

Cross-validation results for daily mean temperature data in
January 2010

The RMSE is 1.68

Figure

Cross-validation results for daily mean wind speed data
January 2010

In Fig.

The variability curves in Fig.

Besides the accuracy of the interpolation, expressed here in the global
measure RMSE, its ability to preserve the observed spatial variability is
also of importance. Some methods tend to smooth small-scale features

Annual cycle of daily RMSE according to cross validation for 2001
(black) and 2010 (blue) for

To assess the characteristic of the temperature data set in comparison with
the daily E-OBS grid data (version 13.0;

As shown in Sect.

To evaluate the temperature grid data in mountain regions, a comparison with
ERA-Interim reanalysis temperatures at 850

Time series of the ratio of spatial variance of interpolated vs. observed station data for the years 2001 and 2010 based on cross validation. The color code indicates the parameters and years.

The outcomes of this comparison for January 2010 are shown in Fig.

In a further analysis the ERA-Interim temperatures at 850

Comparison of daily temperature grid data from DecReg and E-OBS for
January 2010:

Overall, E-OBS and DecReg mountain temperatures at around 1500

Regarding the observation intervals of the daily E-OBS data, no consistency throughout the domain is ensured, which is a result of deviating procedures applied by the national weather agencies providing data to ECA&D. The SYNOP input data used in DecReg are based on the same daily intervals. Thus, potential discontinuities of temperature fields near national borders are avoided and comparability with model data for defined intervals ensured.

Apart from the causes discussed above, differences in the distribution
of the input station data used in E-OBS and DecReg can also lead to deviating
grid data. This aspect is important for regions where the density of stations
is generally low, as observed around the Mediterranean Sea, especially in the
early years of the decade (compare Fig.

As mentioned in Sect.

Comparison of daily temperature grid data at altitudes of around
1500

The outcomes for

Comparison of cross-validation results with gridded uncertainty
estimates for daily mean temperature in January 2010

For wind speed (Fig.

In this work interpolation schemes for daily station data of minimum,
maximum and mean temperature as well as daily mean wind speed in
0.044

As an important prerequisite for the interpolation, a pre-processing to derive daily means from hourly SYNOP data in combination with a profound quality control was established. Also for the other input data, daily extreme temperatures and the data of the ECA&D archive, detailed quality control procedures were developed. In order to maintain consistency with SYNOP, a selection algorithm, controlling the integration of ECA&D data in regions where SYNOP data are sparse and consistency between the two sources is high, was implemented.

For the time period 2001–2010 the spatial variation of the monthly means can be well explained by the predictors. We obtain relative explained variances in the range of 80–90 % for the temperature parameters and about 50–60 % for wind speed.

Cross validation is performed for the years 2001 and 2010 to assess
the quality of the daily interpolation products. For daily mean
temperature, RMSEs of about 1–2

Concerning the conservation of spatial variance, very good performance is found for the temperature parameters. In the interpolation products 90–100 % of the observed variance is typically preserved. Only for minimum temperature are at times lower values recorded. For daily wind speed, a fraction of 60–80 % of the original variance is preserved after interpolation. The relatively high degree of unexplained small-scale variance leads to a smoothing of the wind data.

Comparison of cross-validation results with gridded uncertainty
estimates for daily mean wind speed in January 2010

The cross-validation results are also used to evaluate the quality of the gridding uncertainty based on kriging variance and regression errors. On average, a reasonable consistency between these data is found. Nevertheless, temporal and spatial variations of uncertainty occurring on small scales are not adequately reflected in the gridded uncertainties.

In comparison with the E-OBS temperature data
occasional discrepancies of more than 5

The regression kriging approaches used in this work for the
interpolation of daily temperature and wind speed observations on
a grid size of 0.044

For the dependency of temperature on altitude more reliable regression
results are obtained by performing this regression separately and on
the basis of representative stations. Also, day-to-day variations of
this dependency are considered in the new setup used in this
study. Nevertheless, the linear regression approach applied to the
relatively large areas of each region is not capable to reflect
nonlinear vertical temperature changes and spatial differences of
this parameter within a region. More complex approaches considering
this issue in the calculation of high-resolution grid data in
mountainous regions have been published

Concerning the regression of wind speed, a considerable part of spatial variance on a monthly basis (40–50 %) remains unexplained by the predictors used in this work. For predictor fields of exposure, coastal distance and roughness length it would be more realistic to take into account the current wind direction and local predictor conditions determined for this wind direction. This strategy would introduce further complexity in the calculations. However, the percentage of variance explained by predictors as well as the final interpolation accuracy could likely be increased.

The gridded error estimates calculated for the daily and monthly
products are, on regional average, reasonable, but for certain days and
areas these estimates are found to be unrealistic. An alternative
approach yielding more reliable errors

However, users of these grid data are recommended to consider the IQR uncertainty fields provided in separate files in their analyses. Especially in parts of North Africa the uncertainties are usually very high due to very sparse observations. To deal with this issue, IQR thresholds tolerable for a specific analysis could be defined to exclude regions with less reliable data.

The data sets presented in this article are published at

The authors would like to acknowledge the Bundesministerium für
Bildung und Forschung (BMBF) for funding this project. The
colleagues in the department Regional Climate Monitoring at
Deutscher Wetterdienst (DWD) are acknowledged, especially Maya Körber and Andrea Kreis for their support in technical issues
related to UNIX and database queries. We also thank our partners
participating in the DecReg project for the fruitful cooperation
during the 26 months. Furthermore, the E-OBS data set from the EU-FP6
project ENSEMBLES (