MOPREDAScentury: a long-term monthly precipitation grid for the Spanish mainland

Beguería, Santiago; Peña-Angulo, Dhais; Trullenque-Blanco, Víctor; González-Hidalgo, Carlos

doi:https://doi.org/10.5194/essd-15-2547-2023

Articles | Volume 15, issue 6

https://doi.org/10.5194/essd-15-2547-2023

Articles | Volume 15, issue 6

Data description paper

23 Jun 2023

Data description paper |

| 23 Jun 2023

MOPREDAScentury: a long-term monthly precipitation grid for the Spanish mainland

Santiago Beguería, Dhais Peña-Angulo, Víctor Trullenque-Blanco, and Carlos González-Hidalgo

Abstract

This article describes the development of a monthly precipitation dataset for the Spanish mainland, covering the period between December 1915 and December 2020. The dataset combines ground observational data from the National Climate Data Bank (NCDB) of the Spanish meteorological service (AEMET) and new data rescued from meteorological yearbooks published prior to 1951 that were never incorporated into the NCDB. The yearbooks' data represented a significant improvement of the dataset, as it almost doubled the number of weather stations available during the first decades of the 20th century, the period when the data were more scarce. The final dataset contains records from 11 312 stations, although the number of stations with data in a given month varies largely between 674 in 1939 and a maximum of 5234 in 1975. Spatial interpolation was used on the resulting dataset to create monthly precipitation grids. The process involved a two-stage process: estimation of the probability of zero precipitation (dry month) and estimation of precipitation magnitude. Interpolation was carried out using universal kriging, using anomalies (ratios with respect to the 1961–2000 monthly climatology) as dependent variables and several geographic variates as independent variables. Cross-validation results showed that the resulting grids are spatially and temporally unbiased, although the mean error and the variance deflation effect are highest during the first decades of the 20th century, when the observational data were more scarce. The dataset is available at https://doi.org/10.20350/digitalCSIC/15136 under an open license and can be cited as Beguería et al. (2023).

Download & links

How to cite.

Received: 18 Feb 2023 – Discussion started: 02 Mar 2023 – Revised: 12 May 2023 – Accepted: 21 May 2023 – Published: 23 Jun 2023

1 Introduction

Sea and land weather station records are crucial information sources to study the evolution of climate over the last century and beyond and are the result of the sustained effort of many volunteers and climate and weather agencies around the world (see Strangeways, 2007). A large number of projects have focused on collecting and curating data from different sources in order to improve the spatial and temporal coverage of the datasets and even rescue old data that had not been digitized and remain unknown to the broad public. These efforts are particularly required in regions with large spatial variability and heterogeneous precipitation regimes, such as Mediterranean climate regions of the world. Especially in those areas, however, research does not provide unanimous results; for example, trend analyses show differences according to the period selected, dataset, or study area (Hoerling et al., 2012; Mariotti et al., 2015; Zittis, 2018; Deitch et al., 2017; Caloiero et al., 2018; Peña-Angulo et al., 2020; among many others).

In parallel with these efforts, many research groups have focused on developing spatial and temporal complete grids that override the fragmentary character of observational (station-based) datasets. The development of gridded climatic datasets from point observations has experienced a fast development in the first decades of the 21st century, aided by the tremendous improvement of computing capabilities and the implementation of complex interpolation methods in standard statistical packages and programming languages (New et al., 2002; Hijmans et al., 2005; Harris et al., 2014, 2020; Schamm et al., 2014). Gridded datasets offer numerous advantages over point-based observational data that make them best suited to climate and environmental studies. While observational datasets are limited to the locations of climatic stations and the time series are often fragmentary in time, gridded datasets offer a continuous spatial and temporal coverage. Having a continuous coverage is most relevant for computing regional or even global averages, which are crucial in climate change studies. Gridded data are also often a necessity as simulation model inputs, which usually require continuous climatic forcing data.

Users of gridded data, however, must not forget that grids are in fact models and not directly observed data, and as such they are not devoid of issues. Interpolation methods are not perfect, and they have inherent problems such as the deflation of (spatial and temporal) variance, as we discussed in Beguería (2016). Also, since the spatial and temporal coverage of observational datasets is often not homogeneous (some areas and time periods are over-represented, while others may lack any data), there are potential sources of bias. Despite this, gridded datasets are currently used in the vast majority of studies that make use of climate data.

In a previous work we described the development of a gridded dataset of monthly precipitation for Spain, MOPREDAS, spanning 1946–2005 (González-Hidalgo et al., 2011). Other gridded precipitation datasets have later been developed for Spain with a daily temporal resolution, such as Spain02 for 1950–2003 (Herrera et al., 2012), SAFRAN-Spain (1979–2014; Quintana-Seguí et al., 2016, 2017), SPREAD (1950–2012; Serrano-Notivoli et al., 2017), AEMET-Spain (1951–2017; Peral et al., 2017), or Iberia01 (1971–2015; Herrera et al., 2019). Currently, no dataset exists spanning back to the first decades of the 20th century. This is due in part to the drastic decrease in the number of available observations prior to 1950. The objective of this article is to describe the development of the MOPREDAScentury dataset, a gridded dataset of monthly precipitation over mainland Spain covering the period 1916–2020, aimed at becoming a reference spatial–temporal dataset to assess changes in the spatial and temporal patterns of precipitation over Spain. The process includes the rescue of old records not included in the National Climate Data Bank (NCDB) of the Spanish meteorological service (AEMET) which allow the observational sample to be increased and are critical for developing a gridded dataset. The text describes this data rescue process and the spatial interpolation, presents the main results of a cross-validation assessment, and discusses several issues related to the development of the dataset.

2 Data and methods

The development of the MOPREDAScentury dataset encompassed two distinctive steps: (i) improving the observational dataset available in digital format, especially for the first half of the 20th century, and (ii) using spatial interpolation techniques to create the gridded dataset. This section describes both steps, as well as the procedure used for evaluating the resulting dataset.

2.1 Data rescue (yearbooks)

The MOPREDAScentury dataset combines land-based weather station data digitized and stored in the National Climate Data Bank (NCDB) and newly digitized records from meteorological yearbooks (YB) that were published by different government offices until 1950 such as Ministerio de Fomento, Servicio Meteorológico (a part then of the Instituto Geográfico y Catastral) and Ministerio del Aire. The data rescue process from the yearbooks was carried out in two main steps: (a) digitization and (b) matching with the data series in the NCDB. Digitization was carried out by manual reading and typing the data into digital files, using the scanned version of the YB collection stored at AEMET's public repository (https://repositorio.aemet.es, last access: 12 June 2021). Matching the digitized data series with those in the NCDB proved to be a laborious task, as the identification of the weather stations in the YB was not consistent across the books and did not always coincide with the NCDB. Similar difficulties were found when rescuing temperature data to develop the MOTEDAScentury dataset (Gonzalez-Hidalgo et al., 2015, 2020; a detailed description of the matching process can be found in these references). The rescued yearbooks' data had a fair level of overlapping with the NCDB, but they allowed us to fill in gaps and extend many time series back into the first decades of the 20th century. There were also a number of data series that were completely new.

The augmented dataset resulting from the combination of the NCDB and the YB rescued data was subjected to a quality control. Thus, the observations were automatically flagged as suspicious in the following cases:

sequences of 12 identical monthly values occurring in different years in the same station, or in the same year in different stations;
sequences of 7 or more consecutive months with zero precipitation in the same station;
individual months with precipitation equal to or greater than 1000 mm.

The flagged data (suspicious values) were then manually checked in their original sources (books) to discard digitization errors, in which case they were corrected, and if not they were compared against three or four neighbouring stations to decide whether to maintain or discard them. An example of data rejection is provided in Table 1.

Table 1Example of an anomalous value discarded by comparison with their nearest neighbours (December of 1920).

Download Print Version | Download XLSX

2.2 Spatial interpolation (two-step method)

We use geostatistical techniques for the interpolation of monthly precipitation. Geostatistics is now a well-known field, and it has been presented in a wide range of introductory texts (Goovaerts, 1997), so we should provide only a brief summary here. The key element in geostatistics is the variogram (or, more commonly, the semivariogram), which is a function that relates the semi-variance γ between any pair of measurements to the spatial distance between them, h:

\begin{matrix} (1) & γ (h) = \frac{1}{2} E [{\{Z (x) - Z (x + h)\}}^{2}] . \end{matrix}

An empirical semivariogram can be constructed from a set of geographically explicit measurements by analysing all the possible paired observations, and a mathematical model can then be fit to provide a continuous estimation of the relationship between any pair of points. This function can then be used to derive interpolation weights, being the basis of a family of interpolation methods known as Gaussian process regression or, in the geostatistical literature, kriging. Kriging interpolation yields best linear unbiased predictions (BLUPs) at unsampled locations, being a major reason for its widespread use.

The most frequently used form of kriging is ordinary kriging (OK), in which the interpolated values are linear weighted averages of the n available observations, z(x), and an unknown constant value, Z(x₀):

\begin{matrix} (2) & \hat{z} (x) = \sum_{i = 0}^{n} λ_{i} z (x_{i}) - Z (x_{0}), \end{matrix}

where λ_i are the interpolation weights, with the condition that they sum to 1 ( $\sum_{i = 0}^{n} λ_{i})$ , so the interpolation is unbiased. Although kriging does not require any distribution assumptions on the data, OK relies on second-order stationarity. That is, it is assumed that the expected value of Z(x₀) is constant over the spatial domain ( $E [Z (x_{0})] = μ (x) = m)$ and that the covariance for any pair of observations depends only on the distance between them ( $E [\{Z (x) - m\} \{Z (x + h) - m\}] = c (h))$ .

Here we used two extensions of OK, universal kriging (UK) and indicator kriging (IK). Universal kriging relaxes the first assumption and allows a spatially non-stationary mean to be dealt with, sometimes called a spatial trend. The interpolated values thus consist of a deterministic part (the trend), μ(x), and a stochastic part or residual, ρ(x):

\begin{matrix} (3) & \hat{z} (x) = μ (x) + ρ (x) = \sum_{k = 0}^{m} α_{k} f_{k} (x) + \sum_{i = 0}^{n} λ_{i} z (x_{i}), \end{matrix}

where f_k(x) are spatially varying variables and α_k are unknown regression coefficients. Therefore, UK allows for the inclusion of co-variables as predictors for the interpolation and can therefore be viewed as a mixed-effect model or a combination of regression and interpolation.

Indicator kriging, on the other hand, is useful for binary variables (event/no event) and provides an estimation of the transition probability. It uses an indicator function to transform the variable into a binary outcome instead of working with the original variable, yielding event probabilities as a result, $\hat{p} (x) = \hat{p} (z (x) = 0)$ . IK can be based on either OK or UK, accepting co-variables as spatial predictors in the latter case.

Here we adopted a two-step approach, consisting of using IK for predicting precipitation occurrence and UK for predicting the precipitation magnitude. This is an approach most commonly used for the interpolation of daily precipitation (Hwang et al., 2012; Serrano-Notivoli et al., 2019) and less so for monthly data. In the case of our study area, as we will see later on, the frequency of zero-precipitation months is not irrelevant, so a two-step approach was advisable. Therefore, in a first step we used the following indicator function to transform the observed variable in millimetres into zero-precipitation events:

\begin{matrix} (4) & I (x) = \{\begin{array}{cc} 1 & if z (x) = 0 \\ 0 & otherwise \end{array} . \end{matrix}

Then, we used indicator kriging to obtain estimated zero-precipitation probabilities, $\hat{p} (x)$ . In a second step we used universal kriging for estimating precipitation magnitude, $\hat{z} (x)$ . Once the two predictions were performed, we combined them into a single estimated precipitation field z^′(S) according to the following rule:

\begin{matrix} (5) & z^{'} (x) = \{\begin{array}{cc} 0 & if \hat{p} (x) \leq p_{t} \\ \hat{z} (x) & otherwise \end{array}, \end{matrix}

where $p_{t} \in (0, 1)$ is a classification threshold. Determining the classification threshold is a complex task, since different values can be used that lead to better performance on the event of interest (zero monthly precipitation, in our case) at the cost of allowing more false negatives, or the opposite. We shall discuss the classification in the Discussion section of this article.

We used five co-variables for the deterministic part: the easting and northing coordinates, the altitude, the distance to the coastline (Fig. A1), and the monthly climatology: zero-precipitation probability for IK (Fig. A2) and mean precipitation for UK (Fig. A3). To obtain the climatologies we computed spatial fields of monthly mean precipitation using UK and the geographic covariates mentioned above, based on data from a sample of 1698 observatories with at least 35 years of data over the period 1961–2000. This period was selected because it contained the highest number of serially complete data series, while encompassing a long enough period (40 years) to allow for stable average values of the two variables of interest. In the Discussion section we provide a comparison of this approach using the original data and using a full normalization. All co-variables were re-scaled to a common range between 0 and 1 to facilitate model parameter fitting.

One peculiarity of the variable of interest (monthly precipitation) is that it can only take non-negative values. Also, when a number of observations are considered over a sufficiently large area, the data often show a skewed distribution. One common solution to both issues is to use a logarithmic transformation of the data, i.e. interpolating on ln (x) instead of x, an approach that is sometimes known as lognormal kriging in the geostatistics literature. This generates additional issues, though, as this approach tends to overestimate lower precipitation and underestimate high precipitation (Roth, 1998). Another drawback is that it does not allow us to interpolate observations of zero precipitation, which is sometimes solved by adding a small offset, for example, interpolating on ln (x+1), although this has other undesired effects, and it does not provide a good estimation of zero observations. Here we opted for using the original variable, i.e. without a log transformation, although we compare both options in the discussion section.

Another transformation that is often applied when interpolating precipitation fields, especially with weighted averaging methods, is to transform the original values to anomalies. This is in fact a way of normalizing the data in space, as it eliminates the differences that occur between locations that systematically tend to receive much higher or lower precipitation. As new observatories appear and disappear around a given point, this could lead to biases in the interpolation that could introduce anomalies in the interpolated series. Here we decided to use anomalies computed as ratios to the long-term climatologies (mean values) computed above.

Selection of the semivariogram model and parameter fitting are fundamental steps for kriging. There are many different semivariogram models available, and there is no general rule as to how to choose one over the other but the modeller's experience. While some models offer greater flexibility to adapt to the empirical semivariogram, parameter estimation can become a problem in some cases because there are no exact solutions, and the iterative algorithms used do not always yield good results. This is usually not a problem when performing one interpolation as the analyst often tries different options and checks that there are no substantial differences between the results, but it can be an issue when a large number of interpolations need to be done and an automatic process needs to be designed, as was the case here. We used the function krige from the gstat package for R (Pebesma, 2004; Gräler et al., 2016) to perform the kriging interpolations and autofitVariogram from the package automap (Hiemstra et al., 2009) to compute the semivariogram coefficients. The Matérn semivariogram model was the most frequently selected one, both for indicator and universal kriging. It is a highly flexible model that often yields optimum results. In some cases, though, a Gaussian model was preferred by the automated procedure, and in a few cases the automatic process was not able to converge to good parameter values, so a spherical or an exponential (less flexible but more robust) model was enforced. The frequency of each semivariogram model over the whole time period is provided, for IK and UK, in the additional material (Table A1).

Best linear unbiased predictions (BLUPs), characterized by their mean and standard deviation, were then cast over a point grid at regular distance over longitude and latitude, with a mean distance of 10 km between points.

2.3 Evaluation

Evaluation of the interpolation results is fundamental to fully understand the benefits of the interpolated dataset, the limitations, and the best use cases. Here we performed a thorough evaluation based on several statistics and checks, for both the IK (probability of zero precipitation) and the UK (precipitation magnitude) interpolations. To evaluate the performance of the interpolation method to estimate values at unmeasured locations, we followed a leave-one-out cross-validation (LOOCV) approach. This is an iterative process in which the interpolation is repeated as many times as there are data, each time removing one observation from the training dataset that is later used to compare the estimated and observed values. Although this is a time-consuming process, it allows us to obtain an independent sample that better represents the ability of the model to estimate values when no data are available. By not removing other observations that the one being used for evaluation, we could also test the effect of having a varying number of observations in the vicinity.

There is no consensus about the most appropriate statistic to evaluate binary classifications and their associated confusion matrices. A confusion matrix, also known as an error matrix, has four categories: true positives, TP (pred =0, obs =0); true negatives, TN (pred >0, obs >0); false positives, FP (pred =0, obs >0); and false negatives, FN (pred >0, obs =0); and the total positive (P) and negative (N) observations and the positive and negative predicted totals (PP = TP + FP and PN = TN + FN, respectively). A variety of statistics can be calculated based on these quantities, of which here we focused on the following ones.

The positive prediction value (PPV) is the fraction of positive predictions that are true positive: $PPV = PP / TP$ .
The negative predictive value (NPV) is the fraction of negative predictions that are true negative: $NPV = NP / TN$ .
The true positive rate (TPR) is the fraction of positive cases correctly predicted: $TPR = TP / (TP + FN)$ . It can be considered the probability of detection (if a case is positive, the probability that it would be predicted as such).
The true negative rate (TNR) is the fraction of negative cases that were correctly predicted: $TNR = TN / (TN + FP)$ . It can be considered a measure of how specific the test is (if a case is negative, the probability that it would be predicted as such).

The PPV and NPV are not intrinsic to the test as they also depend on the event's prevalence (fraction of positive cases in the observed sample). In a highly un-balanced sample, such as the case of zero precipitation in our dataset in which the proportion of station/months with zero precipitation is very low, these two statistics will be highly affected. The TPR and TNR, on the contrary, do not depend on prevalence, so they are intrinsic to the test. In diagnostic testing the TPR and TNR are the most used and are known as sensitivity and specificity, respectively. In informational retrieval, the main ratios are the PPV and TPR, where they are known as precision and recall.

We also computed two metrics that summarize the elements of the confusion matrix, so they can be considered overall measures of the quality of the binary classification. The F1 score is computed as

\begin{matrix} (6) & F 1 = \frac{2 TP}{2 TP + FN + FP} . \end{matrix}

As it can be seen, the F1 ignores the count of true negatives, so it places more emphasis on the positive cases (zero-precipitation months, in our case). The Matthews correlation coefficient (MCC), on the other hand, produces a high score only when the prediction results are good in all the four confusion matrix categories. It is equivalent to chi-square statistics for 2×2 contingency tables. Its value ranges between −1 and 1, with values close to zero meaning a bad performance (not higher than a random classifier), while 1 represents a perfect classification.

\begin{matrix} (7) & MCC = \frac{(TP \times TN) - (FP \times FN)}{\sqrt{(TP + FP) (TP + FN) (TN + FP) (TN + FN)}} \end{matrix}

All these statistics vary between 0 and 1, where a high value (close to 1) is interpreted as indicating a high accuracy.

For the evaluation of the magnitude we used the following metrics:

the mean absolute error (MAE) and the relative MAE (RMAE), as global error measures;
the mean error (ME) and the relative ME (RME), as global bias measures;
the ratio of standard deviations (RSD), as a measure of variance deflation;
the Kling–Gupta efficiency (KGE), as an overall goodness-of-fit measure.

Model evaluation was performed globally considering the whole dataset but also for each month individually to make it possible to analyse the temporal evolution of the performance statistics.

Finally, in order to determine the benefits of the data rescue process, we compared the original (NCDB) and augmented (NCDB+YB) datasets. We used the same cross-validation scheme described above, but in this case the validation was restricted to the period covered by the yearbooks (1916–1950). Another important difference is that we only used the observations present in both the NCDB and NCDB+YB datasets for computing cross-validation statistics, although the whole datasets were used for performing the interpolation.

https://essd.copernicus.org/articles/15/2547/2023/essd-15-2547-2023-f01

Figure 1Temporal evolution of the number of active weather stations in the dataset: national digital data bank (NCDB), newly digitized yearbook (YB) datasets, and their combination (NCDB+YB).

MOPREDAScentury: a long-term monthly precipitation grid for the Spanish mainland

2.1 Data rescue (yearbooks)

2.2 Spatial interpolation (two-step method)

2.3 Evaluation

3.1 Data rescue and quality control

3.2 Gridded dataset

3.3 Validation: probability of zero precipitation

3.4 Validation: magnitude

3.5 Evaluation of the combined dataset