Homogenization of Portuguese long-term temperature data series : Lisbon , Coimbra and Porto

Three long-term temperature data series measured in Portugal were studied to detect and correct non-climatic homogeneity breaks and are now available for future studies of climate variability. Series of monthly minimum ( Tmin) and maximum ( Tmax) temperatures measured in the three Portuguese meteorological stations of Lisbon (from 1856 to 2008), Coimbra (from 1865 to 2005) and Porto (from 1888 to 2001) were studied to detect and correct non-climatic breaks. These series, together with monthly series of average temperature ( Taver) and temperature range (DTR) derived from them, were tested in order to detect breaks, using firstly metadata, secondly a visual analysis, and thirdly four widely used homogeneity tests: von Neumann ratio test, Buishand test, standard normal homogeneity test, and Pettitt test. The homogeneity tests were used in absolute (using temperature series themselves) and relative (using sea-surface temperature anomalies series obtained from HadISST2.0.0.0 close to the Portuguese coast or already corrected temperature series as reference series) modes. We considered the Tmin, Tmax and DTR series as most informative for the detection of breaks due to the fact that Tmin andTmax could respond di fferently to changes in position of a thermometer or other changes in the instrument’s environment; Taver series have been used mainly as control. The homogeneity tests showed strong inhomogeneity of the original data series, which could have both internal climatic and non-climatic origins. Breaks that were identified by the last three mentioned homogeneity tests were compared with available metadata containing data such as instrument changes, changes in station location and environment, observation procedures, etc. Significant breaks (significance 95 % or more) that coincided with known dates of instrumental changes were corrected using standard procedures. It was also noted that some significant breaks, which could not be connected to known dates of any changes in the park of instruments or stations location and environment, were probably caused by large volcanic eruptions. The corrected series were again tested for homogeneity; the corrected series were considered free of non-climatic breaks when the tests of most of monthly series showed no significant (significance 95 % or more) breaks that coincide with dates of known instrument changes. Corrected series are now available within the framework of ERA-CLIM FP7 project for future studies of climate variability ( doi:10.1594/PANGAEA.785377).


Introduction
Long instrumental climatological records assume a paramount role in the studies of variation of the atmospheric conditions.They provide vital information about climate variability, trends and cycles.Unfortunately, longterm series often contain inhomogeneities caused by a number of non-climatic factors that could provide unrealistic trends, shifts and jumps (Peterson et al., 1998;Aguilar et al., 2003).These inhomogeneities are originated by changes in instruments, station locations and surrounding environment, observation routines and methods of preliminary data treatment.Undoubtedly, such inhomogeneities have to be detected and corrected beforehand, and only after that could the data series be used in any kind of climate studies.
The problem of identification and correction of nonclimatic inhomogeneities has been studied thoroughly (see, e.g.review in Peterson et al., 1998).The simplest way to detect the shift-like inhomogeneities is a visual analysis, preferably by an experienced meteorologist (Peterson et al., 1998).It is clear that this method is very subjective and could be used as an initial part of the analysis, providing information about "doubtful" periods that have to be studied thoroughly with other objective methods.
At the moment, there exist a lot of objective statistical methods accepted by the scientific community that can detect the presence and probable date of inhomogeneities, and new methods continue to be developed (see e.g.Venema et al., 2012).Most of these methods belong to one of three groups: likelihood-based methods, linear-regression based methods, and non-parametric methods (Wang et al., 2007).In climate studies, the most commonly used methods are the standard normal homogeneity test (SNHT; Alexandersson and Moberg, 1997) and its variations, the Buishand cumulative deviation test (Buishand, 1982), the non-parametric rank Pettitt test (Pettitt, 1979), the two-phase regression methods (e.g.Solow, 1987) and others.These methods estimate not only the level of inhomogeneity of the tested series, but also detect the highly probable homogeneity break points (hereafter: breaks).The other tests, like the von Neumann ratio test (von Neumann, 1941), do not give any information about the date of the break, but estimate the overall level of inhomogeneity in the data.
The tasks of non-climatic breaks correction are complicated by the fact that not all inhomogeneities existing in data series are of non-climatic origin.There are breaks that originate from "real" climate changes, like volcanic aerosol ejections or abrupt changes of atmospheric and/or oceanic circulation.The non-climatic inhomogeneities have to be somehow separated from the others, and this task could be done using the metadata -a record of station relocations, changes in station environment, changes in the instrument park, observation routines, applications of new formulae to calculate means, etc.The metadata could provide precise information about the dates and reasons for non-climatic changes and consequently ideal for use in any homogenization procedure.Moreover, all available information about stations' history should be preferred over statistical methods, especially in the tasks of detection of the breaks dates (Venema et al., 2012).Any break detected by statistical methods had to be checked against metadata, and if there is a written note that some intervention took place in the station setup at the break date, this break should be considered as non-climatic and (in most cases) be corrected (Peterson et al., 1998;Aguilar et al., 2003).
The analysis of the separate monthly series could provide different break points for each month, both due to the randomness of the meteorological time series and to the fact that some inhomogeneities could have larger effect during the warm part of the year than during the cold part.Therefore, not only annual but also monthly (or seasonal) means have to be analysed in the process of homogenization (Aguilar et al., 2003).
The detected non-climatic inhomogeneities required correction.The correction procedure was constructed so that all data were corrected in line with the conditions of the last homogeneous part of the data series: a period ranging from the last break to the end of the series.In this case, all future periods of the incoming data would not damage the homogeneity of the data series (Aguilar et al., 2003).The procedure of correction is applied to the data series backward in time, starting from the most recent break.The most usual way to correct non-climatic breaks is to calculate the means of the studied parameter during some time before and after each of the breaks.The adjustment value is then a difference (or ratio in case of parameters like precipitation) between these means.Accordingly, the adjustment value is applied to the inhomogeneous part (part before the break) of the series (Aguilar et al., 2003).

Homogeneity tests
Despite the fact that the main role in the detection of the breaks was assigned in this study to the metadata, four simple, widely used statistical homogeneity tests were applied to the data (Klein Tank, 2007): von Neumann ratio test, Buishand test, standard normal homogeneity test (SNHT), and Pettitt test.The first test allows one to estimate only the presence of breaks in the dataset, whereas the last three tests also give information about the possible dates of such breaks.The use of tests of different types (parametric, non-parametric, likelihood), which also have different sensitivities in different parts of the series, could help to obtain more significant results.
Three of these tests (Buishand test, SNHT and Pettitt test) were used both in absolute mode -statistical analysis of the temperature series themselves and in relative mode -statistical analysis of the temperature series using (for coastal stations) monthly anomalies (relative to the 1961-1990 period) of sea-surface temperature (SST) obtained with HadISST2.0.0.0 in the 2 grid points nearest the Portuguese coastal stations (Rayner et al., 2012) or already corrected temperature series for the non-coastal station (Coimbra).To perform the relative homogeneity tests, the temperature and SST series were standardised (transformed to series with mean of 0 and standard deviation of 1); afterwards, the differences between temperature and SST anomalies were calculated and subjected to homogeneity tests.
Hereafter, for each of the test descriptions, n is the data set length, Y i is i-th element of the data set, Y is the mean value of the data set.When the sample is homogeneous the expected value is N = 2.If the sample contains a break, then the value of N tends to be lower than this expected value (Buishand, 1981).If the sample has rapid variations in the mean, then values of N may rise above two (Klein Tank, 2007).This test gives no information about the location of the shift.The critical values for N (for n ≥ 20), with probability level α, are defined as where u α is the α-th percentile of a standard normal variate from the standard normal table (Buishand, 1981).Critical values for N for different data set lengths are given in Table 1.It should be mentioned that in case of a number of data sets with similar breaks and similar level of variations of the mean, the data set with smaller standard deviation has smaller N as well (see eq. 3 in Buishand, 1981).This means that annually averaged parameters should have smaller N values than monthly averaged ones.

Buishand test -parametric test
This test supposes that tested values are independent and identically normally distributed (null hypothesis).The alternative hypothesis assumes that the series has a jump-like shift (break).This test is more sensitive to breaks in the middle of time series (Costa and Soares, 2009).The test statistics, which are the adjusted partial sums (Buishand, 1982), are defined as When series are homogeneous, the values of S * k will fluctuate around zero because no systematic deviations of the Y i values with respect to their mean will appear.
Q-statistics: if a break is present in year K, then S * k reaches a maximum (negative shift) or minimum (positive shift) near the year k = K.
(6) Buishand (1982) gives critical values for Q and R for different data set lengths (see Table 1).

Standard normal homogeneity test -likelihood ratio test
SNHT is one of the most popular homogeneity tests in climate studies.The null and alternative hypotheses in this test are the same as in the Buishand test; however, unlike the Buishand test, SNHT is more sensitive to the breaks near the beginning and the end of the series (Costa and Soares, 2009).Alexandersson and Moberg (1997) proposed a statistic T (k) to compare the mean of the first k years of the record with that of the last (n − k) years: where If a break is located at the year K, then T (k) reaches a maximum near the year k = K.The test statistic T 0 is defined as The null hypothesis is rejected if T 0 is above a certain level, which is dependent on the sample size.Critical values for different data set lengths are given in Khaliq and Ouarda (2007) -see Table 1.

Pettitt test -non-parametric rank test
The null and alternative hypotheses in this test are the same as in the Buishand test, and this test is also more sensitive to the breaks in the middle of the series (Costa and Soares, 2009).The ranks r 1 ...r n of the Y 1 ...Y n are used to calculate the statistics (Pettitt, 1979): If a break occurs in year K, then the statistic is maximal or minimal near the year k = K: The statistical significance (for probability level α) is given as Critical values for X K for different data set lengths are given in Table 1.

Homogenization procedure
At first, the series were inspected for outliers that could appear due to typing and/or OCR procedures.This manual and visual inspection was applied to the data both in tabular and in graphical form.Afterward the following procedure was used for homogenizing the temperature data (see also Fig. 1): 1.Detection of possible breaks in the original data series using visual analysis and aforementioned homogeneity tests (absolute and relative).
2. Comparison of the break dates with available metadata and climatic forcing data (like volcanoes eruptions, anthropogenic landscape changes, etc.).It is possible that metadata do not list all changes in the stations' environments that occurred during the measurements periods; however, in this study we found no significant (as estimated by the statistical tests) breaks that could not be associated to metadata records or other sources.
3. Selection of non-climatic breaks in the data series for correction.
4. Correction of non-climatic breaks: 1.For each break (t break ) starting from the latest in time to the first, a. selection of a time interval (∆t) around the current break taking in consideration the length of homogeneous periods before and after the current break; b. calculation of the mean values of the temperature parameters (< T > (time period)) for each month separately during two time intervals, before the break (time period = t break − ∆t) and after the break (time period = t break + ∆t); c. calculation of the corrections (dT) for each month separately as the difference of these means, dT = < T > (t break + ∆t) -< T > (t break -∆t); d. smoothing of 12 monthly correction values dT by 3-month adjacent averaging to achieve a reasonable variation of dT throughout the year; e. ignoring all dT that are smaller than instrumental errors (0.1 • C); f. correction of the data for the periods before current breaks using dT for each month.
2. Proceed to the previous (earlier) break (starting from step 4.1).
3. Visual analysis and homogeneity tests of the corrected data sets (see step 1).
4. In addition, to estimate the "quality" of the correction (Venema et al., 2012), the centered root mean square errors (CRMSE, see e.g. Taylor, 2001 andGleckler et al., 2008) were calculated as well, using SST (see Sect. 2.4) series or already corrected temperature series for other stations as reference series.The final number of corrected breaks and time intervals for corrections (∆t) were chosen in a way that minimizes not only breaks detected by homogeneity tests statistics but also minimizes the number of months (for each station and each temperature parameter) for which CRMSEs of the corrected data are greater than corresponding CRMSEs for original series.
In case the analysis of corrected series shows the absence of non-climatic breaks (with 95 % significance), the corrected data series are considered to be homogenized for nonclimatic breaks with significance of at least 95 %.

Volcanic eruptions and their effect on temperature variations
Some inhomogeneities detected in the meteorological data do not correspond to known dates of the instrumental or environmental changes.It is possible that these breaks could Pinatubo Philippine be caused by some sudden but natural forcings, e.g.volcanic eruptions (Martínez et al., 2010).The eruptions are accompanied by the injection of SO 2 and dust into the stratosphere.The increase of the dust and aerosol load in the stratosphere causes a reduction of the solar radiation in the lower atmosphere and leads to changes in the lower atmosphere circulation patterns during 2-4 yr after the eruptions (Robock, 2000).Table 2 shows major volcanic eruptions with the dust volcanic index (DVI) reaching more than 100 (from Mann et al., 2000 and NCDC database) from 1850 to 2000.The inhomogeneities that coincide with periods of strong eruptions (1855-1856, 1861-1862, 1875, 1883-1904, 1963-1964, 1982-1984, and 1991-1994) could be of natural (volcanic) origin, provided there were no records of instrumental changes for such epochs.In case some instrumental changes did take place during these periods, it would be difficult to make reasonable corrections only for the non-climatic part of these particular breaks.

Sea-surface temperature anomalies series
Monthly SST anomalies series (relative to the 1961-1990 period), obtained with HadISST2.0.0.0 (Rayner et al., 2012) in 2 grid points near the Portuguese coast during the period from 1899 to 2010, have been used as reference series to perform relative homogeneity tests and calculate CRMSE values.These series (comprised of a combination of 10 ensemble members) were extracted for the grid cells located between 8-9 • W and 41-42 • N in the case of the Porto nearest grid point and between 9-10 • W and 38-39 • N for Lisbon.HadISST2.0.0.0 is based on version 2.5 of the International Comprehensive Ocean Atmosphere Data Set (ICOADS) and includes updated ocean satellite data, among other components.Also, homogeneity adjustments have been applied by Rayner et al. (2012) to the SST data to correct for known bias in the data.Changes in the measurement times no Table 4. Correlation coefficients between the temperature series from Porto and Lisbon (r L ) and Coimbra (r C ) calculated for the period 1910-1932 (±10 yr around the gap).Significances of the correlation coefficients (p) are smaller than 0.02 with only one exception: p = 0.37 for correlation coefficients between T min of Porto and Lisbon in June (m6).Regression coefficients (A, L, C) for regression models (Porto T min/max = A+L×T min / max (Lisbon) +C×T min / max (Coimbra)) are chosen using the best subset procedure with maximization of adj.R 2 parameter and calculated using data for the period 1910-1932 (±10 yr around the gap).(adj.R 2 × 100) values show the percent of the variability of the dependent variables (Porto T min and T max series) that has been accounted for by the model under consideration.In the current analysis the SST series measured near Porto (mean of 10 ensemble members, later on "SST Porto") have been used as reference series for Porto temperature series homogenization, and SST series measured near Lisbon (mean of 10 ensemble members, later on "SST Lisbon") have been used in the homogenization of Lisbon temperature series.

Data description and metadata
The original data set contains monthly averages of daily minimum (T min ) and maximum (T max ) temperature and their annual means measured by Instituto Geofísico (Observatório Meteorológico da Serra do Pilar) da Universidade do Porto (IGUP), Porto, from 1888 to 2001.The data set length is 114 yr.Measurement errors are ±0.2 • C (valid for all observed temperature series presented here).
The meteorological station of Porto has been in regular operation since 1888 when it was put under the jurisdiction of the Observatório Meteorológico da Princesa D. Amélia, now IGUP, on the south part of the river Douro.In 1916 the station location was changed slightly and the thermometer was moved from the tower (10.3 m above the ground) to the ground level (1.3 m above the ground).The data set has a gap from September 1920 to February 1922; there is also a possibility that the thermometer was replaced in March 1922. In 1947and again in 1984/1985, changes in the observation times were made.Table 3 shows known dates of possible non-climatic breaks due to instrument changes (Pinhal, 2008).
Changes in the location of the instruments could result in sudden jumps of the measured parameter values.T min and T max could respond differently to changes in position of the respective thermometers, depending on the character of the changes in the instrument's environment (Aguilar et al., 2003).Therefore, the variations of DTR could be more important for the detection of the breaks; breaks could be weak in the T min or T max series, but clearly seen in DTR series (Wijngaard et al., 2003).The following parameters have been analysed (valid for all temperature series presented here): All series contain monthly and annual means; T min and T max are measured values, DTR and T aver are calculated values.To perform relative homogeneity tests, the differences between standardized T min , T max and T aver series and standardized SST Porto series were calculated.

Interpolation of the gap from September 1920 to February 1922
The gap in the data from September 1920 to February 1922 should be filled before the data are subjected to the homogeneity analysis.It is possible to fill the gap using the simple linear interpolation for the absent one or two values for each of the monthly data series.On the other hand, it is possible to build a mathematical regression model for a more realistic interpolation, using data from nearby meteorological stations, namely, Coimbra and Lisbon data series.All 12 monthly series of T min and T max were interpolated separately.The time period used for the regression models is 10 yr before the gap (1910)(1911)(1912)(1913)(1914)(1915)(1916)(1917)(1918)(1919) plus 10 yr after the gap (1923)(1924)(1925)(1926)(1927)(1928)(1929)(1930)(1931)(1932).First, the correlation coefficients (r) between temperature parameters measured in Porto and Coimbra and Lisbon were  Multiple regression models for Porto T min and T max series were built using the Coimbra and Lisbon data as regressors.The models have been built using the "best subset" method, maximizing the adj.R 2 parameter.The regression coefficients are shown in Table 4 alongside with (adj.R 2 × 100) values that show the percentage of the variability of the dependent variable (Porto series) that has been accounted for by the model under consideration.As one can see, the multiple regression models are good approximations of the real data and can be used for the gap interpolation in the Porto  3).data.Similar regression models have been calculated using a smaller time period: ±5 yr around the gap (1915-1919 plus 1923-1927).However, the 5-yr-around-gap models give, in general, worse approximations for the real data than the 10yr-around-gap models.Finally, the gap from September 1920 to February 1922 was interpolated using the 10-yr-aroundgap multiple regression models for each parameter and for each month separately.Annual values of T min and T max for 1920-1922 have been calculated using both measured and interpolated monthly data.Both regression models (only for annual means) alongside with original annual means for all three meteorological stations are shown in Fig. 2a-b.Please note that the annual interpolation values for 5-and 10-yraround-gap models shown in Fig. 2    There are significant jumps in the monthly T min variations clearly seen during warm months (from April to September).However, there are no jumps in the monthly T max variations that could be easily detected by the visual analysis.This difference could be explained by the different sensitivity of the T min and T max to the change in location and in the instrument height.

Homogeneity tests results
Figure 4 shows the von Neumann ratio for 12 monthly series of T min , T max , DTR and T aver .This test shows strong inhomogeneities in all four series and DTR in particular.As one can see, variations of the homogeneity of the data series strongly depend on the temperature parameter: -T min -data series of warm months (from April to September) are more inhomogeneous than of cold ones; -T max -data series of warm months (from April to September) are less inhomogeneous than of cold ones with one exception -May; -DTR -data series of two months only (January and February) are apparently homogeneous; -T aver -these data are more homogeneous than T min and T max .They could be labelled as inhomogeneous with a probability of 95 % only in March and May.However, these breaks are seen only for some months and not by all three homogeneity tests at the same time.Therefore, the only break that has to be corrected is the 1916 break.Other non-climatic breaks have no statistically significant effect or could not be corrected due to the coincident influence of other (climatic) forces like the volcanic eruption around  3).
1984.This conclusion was confirmed later during the correction procedure (Sect.3.3.1)-correction of possible breaks in 1922 and 1947 makes the corrected series even more inhomogeneous.

Preliminary conclusions
T min data set shows inhomogeneities in 1916, near the 1920s, 1947, 1963 and 1984; T max data set showed inhomogeneities in 1916, near the 1920s, 1930s, 1947, 1963 and, 1984; DTR data set showed strong inhomogeneity during the period 1916-1922 and, probably, a weak break in the 1940s; T aver showed weak breaks around the 1920s, 1930s, 1947 and  1984; T min during warm months and T max during cold months are more inhomogeneous than in other months.The inhomogeneities that are not associated to known dates of instrumental changes may be due to the internal climatic variations caused, for example, by major volcanic eruptions.The most significant non-climatic break occurred in 1916 due to changes in the instruments location and height.This break requires correction.Other breaks detected by homogeneity tests have no statistically significant effect or could not be corrected due to the coincident influence of other (climatic) forces.

Correction procedure
To correct the non-climatic breaks we used the procedure described in Sect.2.2.The best corrections were obtained when T min and T max data sets were divided into two periods: 1888-1915 and 1916-2001.For each month the means of temperature parameters for certain time intervals (±5 yr for T min and ±15 yr for T max ) around 1916 were calculated.The data for the 1888-1915 period were corrected using the correction values calculated as described above.All correction values are shown in Fig. 6.The corrections were applied to T min and T max data sets.Afterwards, corrected values of DTR and T aver were calculated.Results of the correction as well as original data are shown in Fig. 7a-d.Please note that due to the use of standardized values, the difference between temperature series and SST anomalies presented in Fig. 7 (and similar for other stations) show differences between corrected and original series even for non-corrected periods -the series means and standard deviations that are used in the standardizing procedure change after correction.

Homogeneity of the corrected series
All four corrected data sets were subjected to the same homogeneity tests as the original data.The results of these tests for T min , T max , T aver and DTR are shown in .As one can see from the comparison of similar statistics for the original (Figs.4-5) and corrected (Figs.8-9) data, the latter data sets are less inhomogeneous but still contain inhomogeneities coinciding with the volcanic eruptions that occurred in the end of the 19th and 20th centuries.Some absolute tests for some months (not shown) still show breaks of homogeneity in a period lasting from 1922 to 1947, although there is no consistency between the three homogeneity tests (Buishand and Pettitt tests and SNHT) in relation to the dates of the breaks.The relative homogeneity tests showed an almost total absence of the breaks around dates of known instrument changes.Therefore, the corrections for these breaks were not necessary.The homogeneity level given by the von Neumann ratio of the corrected data series (Fig. 8) still depends on the temperature parameter; among all parameters T min is the least homogeneous.One of the possible reasons for the remaining inhomogeneities in the T min data series is the volcanic effect.Figure 10 shows CRMSEs of corrected series (SST Porto are reference series) plotted versus corresponding CRMSEs of the original series.3).
As one can see, the inhomogeneity level of both T min (left panel) and T max (right panel) decreases or stays unchanged for all monthly series.Breaks detected in the corrected data sets by the different homogeneity tests are rarely coincident, except for the end of the 20th century (an epoch of El Chichon and Pinatubo eruptions -see Table 2).Sometimes the tests still show breaks of homogeneity during different periods but there is no consistency between the three homogeneity tests (Buishand and Pettitt tests and SNHT) in the dates of the breaks.In our opinion, these inhomogeneities are caused by the application of the correction values which are already smoothed by a 3months adjacent average to maintain the annual cycle (see Sect. 2.2) and we believe that in these cases additional corrections are not necessary.Thus, we consider the data sets of T min and T max corrected by the procedure described in the paper as free of non-climatic changes with a significance of at least 95 %.The meteorological station of Lisbon/Geofísico has been in regular operation since October 1854.During the first ten years the thermometers were positioned in the terrace of the Observatory Tower of the old Escola Politécnica, located in the Jardim Botânico.This three-storied tower was built in 1854, thus leading to the foundation of the Infante D. Luiz Observatory (now IGIDL).This building proved inadequate for systematic observations and a new 4-storied tower was inaugurated in October 1863 in the main central edifice of Escola Politécnica, with the thermometers being reinstalled in the new terrace.This building still houses the IGIDL today and some of its meteorological instruments, but the park of instruments containing the thermometers (the Stephenson shelter), initially installed on the platform of the new observatory tower, was transferred to the grounds in Jardim Botânico in 1941 (the distance between the two locations is about 120 m).In 1979 the Jardim Botânico's instrumental park location was slightly changed.Additionally, in January 1977 changes in the times of observation have been made (Carvalho, 2001).Table 3 summarizes the information about possible non-climatic breaks that could appear in the Lisbon/Geofísico temperature series.
To perform relative homogeneity tests, the differences between standardized T min , T max and T aver series and standardized SST Lisbon series were calculated.

Visual analysis
Figure 11a-d show the time variations of the annual series of T min , T max , T aver and DTR, respectively.As one can see, DTR variations (Fig. 11d) show two easily visible breaks in 1863/1864 and 1940/1941.These breaks correspond to the two most significant changes in the instruments location: movement to a new place in 1864 and relocation of the instruments from the top of the tower to the ground level in 1941.At first sight, it seems that the minor changes in the thermometers height that took place from 1917 to 1937 and minor changes in the instruments location in 1979 were too small to have a significant influence on the data homogeneity.
These two breaks have different influences on the T min and T max variations (see Fig. 11a and b).As one can see, during the first break (1864) there are significant jumps both in T min and T max .However, during the second break in 1941 there is a significant jump in T max but a very small one (if any) in T min .On the contrary, the difference between T min and SST (Fig. 11a, blue line) has a significant jump in 1941, whereas the difference between T max and SST (Fig. 11a, blue line) shows no visible jumps.This dissimilarity could be explained by the different character of the changes in the instruments locations.In 1864 the instruments were moved to a new place with a new microclimate; in 1941 the change was mainly in the instruments height, not so much in location, causing a significant jump only in one of the extremes (see Aguilar et al., 2003).These conclusions were also derived from the visual analysis of the 12 monthly data series for each of the four temperature parameters  3).

Homogeneity tests results
The von Neumann test statistics for the 12 monthly series of T min , T max , T aver and DTR are shown in Fig. 12.The variations of the homogeneity of monthly data series (given by the von Neumann ratio) strongly depend on the temperature parameter: -T min -data series of warm months are more inhomogeneous than of cold ones; -T max -all months show strong inhomogeneity except August (m8); -T aver -data series of months from January to June and October (from m1 to m6 and m10) are inhomogeneous;  -DTR -data series of warm months are more inhomogeneous than of cold ones.
The average of 12 monthly test statistics series (absolute and relative) for Buishand, SNHT and Pettitt test for T min , T max , T aver and DTR for annual series are shown in Fig. 13a-d, respectively.The grey vertical lines mark the dates of known changes in thermometer position.As one can see, some of these dates (namely, 1864 and the period from 1916 to 1941) coincide with significant breaks depicted by the maxima (or minima) of the curves.It should be mentioned that for T min the coincidences between the known instrumental changes dates and break years detected by the absolute tests are rare, whereas for T max , DTR and T aver these coincidences are very frequent.Also, there are two periods of possible break years detected by the homogeneity tests that do not coincide with known dates of instrument changes: one is at the end of the 19th century/beginning of 20th century (approx.from (1880) 1890 to 1900) and the second is at the end of the 20th century (approx.from 1970 to 1990).Relative homogeneity tests (blue lines) of T min show significant breaks around 1937 (small changes in the thermometer height) and homogeneity tests of T max show significant breaks around 1941.

Preliminary conclusions
T min data sets show inhomogeneities in the 1860s, near 1970s-1980s and, possibly, near 1880s-1890s; T max is more sensitive than T min to the changes of the thermometer height that took place in 1864 and from 1916 to 1941.T max data sets show strong inhomogeneity during this period.There are also some inhomogeneities near 1880s-1890s and 1970s-  (1941).These breaks have to be corrected.Small changes in the thermometer height took place from 1917 to 1936 and the short periods between the changes do not allow us to estimate statistically significant corrections.The dislocation of the station in 1979 does not significantly (with significance 95 % or more) affect the homogeneity of the data -the means of the temperature parameters for 1941-1978 and 1979-2008 are the same within the instrumental and statistical errors.

Correction procedure
The T min and T max data sets were divided into three periods: 1856-1863, 1864-1940, and 1941-2008.We started from the most recent break -1940/1941.For each month the means of temperature parameters for certain time intervals (±20 yr for T min and ±45 yr for T max ) around 1941 were calculated.The second break (1863/1864) was corrected using means calculated for time intervals 1864 ± 8 yr both for T min and T max .All correction values are shown in Fig. 14.As one can see, the corrections for the second period    3).
T max are non-zero for all months whereas the corrections for  almost in all months.The possible reason for the remaining inhomogeneities in the T min series is the volcanic effect clearly seen in Fig. 17a-d.
Figure 18 shows CRMSEs of corrected series (SST Lisbon are used as reference series) plotted versus corresponding CRMSEs of the original series.As one can see, the inhomogeneity level of T min (left panel) slightly decreases -dots are lower than the bisect; on the contrary, the inhomogeneity level of T max (right panel) stays almost the same for 10 out of 12 monthly series but CRMSE of two monthly series slightly increases.
Sometimes the tests still show breaks of homogeneity in the period from 1917 to 1936 but there is no consistency between the three homogeneity tests (Buishand and Pettitt tests and SNHT) in the dates of the breaks.In our opinion, these inhomogeneities are caused again by the application of the smoothed correction values and we believe that in these cases additional corrections are not necessary.Thus, we consider the data sets of T min and T max corrected by the procedure described in the paper as free of non-climatic changes with a significance of at least 95 %.

Data description and metadata
The original data set contains monthly averages of daily minimum (T min ) and maximum (T max ) temperature and their annual means measured at Instituto Geofísico da Universidade de Coimbra (IGUC), Coimbra from 1865 to 2005.The data set length is 141 yr.
Accordingly to IGUC logbooks, during the entire period (1865-2005) the meteorological station remained in the same location -the park of IGUC.However, the park of instruments has undergone some changes in position and environment described in Table 3.There were two more or less significant changes in the instruments location in 1922 and 1933; besides that, the standard (Stephenson's) shelter was installed in 1922 and in 1950 the thermometer height increased slightly (from 1.15 m to 1.45 m).Since Coimbra is not a coastal station, the already corrected temperature series for Porto and Lisbon were used as reference series; the differences between T min , T max and T aver series and corresponding series for Porto and Lisbon were calculated to perform relative homogeneity tests.

Visual analysis
Figure 19a-d show time variations of the annual series of T min , T max , T aver and DTR, respectively.The DTR variations show easily a visible break in 1921/1922 (relocation of the instruments and installation of the shelter) coinciding with a significant jump in T min (Fig. 19a), but not in T max (Fig. 19b).Another break probably appears in 1949/1950 (changes in thermometer height); it can be seen both in T min and T max data.This break is however absent in DTR data (probably, due to almost equal shifts in T min and T max ).There is also a small break in 1932/1933 (small relocation).

Homogeneity tests results
The statistics of four homogeneity tests applied to this data set are shown in Figs.20 and 21a-d 3).
-T min -data for months from February to June (m2 to m6) are more inhomogeneous than others; -T max -all months show strong inhomogeneity; -DTR -data for months of the second half of the year are more inhomogeneous than others; -T aver -data from February to June and October (from m2 to m6 and m10) are inhomogeneous (temperature data from Lisbon discussed in Sect. 4 show similar characteristics of the annual inhomogeneities variations).
The average of 12 monthly statistic series for other three homogeneity tests (in absolute and relative mode) applied to  series of T min , T max , T aver and DTR are shown in Fig. 21ad, respectively.It should be mentioned that SNHT statistics, both for annual and monthly T max , show an unexpected behaviour: despite the absence of any jumps in the temperature data, the SNHT statistic shows strong inhomogeneities at the end of the data set (2002)(2003)(2004)(2005).These inhomogeneities do not correlate with inhomogeneities detected on the same data by other tests.This unexpected behaviour could be explained by the known tendency of the SNHT to generate false alarm results close to the start and the end of data sets (Wang, et al., 2007).Therefore, to disambiguate the interpretation, Fig. 21b does not show SNHT statistics for T max during 2002-2005 yr.The analysis of the homogeneity tests statistics provides the most probable time periods of the breaks in the data homogeneity: around 1885-1890, around 1905, around 1916, around 1920, around 1930-1936, in the 1940s, 1960s and 1980s.Many inhomogeneities, which are detected by the tests but could not be associated with known instrumental changes, correspond to volcanic effects.
The comparison between homogeneity test statistics of Coimbra and Lisbon data shows more or less a similar character of the annual inhomogeneities variations for both places.These similarities arise from the relative proximity of Lisbon and Coimbra and likeliness in the character of their climatic variation as well as from the volcanic origin of a number of inhomogeneities of the data.

Preliminary conclusions
T max data showed more inhomogeneities than other temperature parameters; T min data sets showed inhomogeneities near 1880s, 1900s, 1920s, 1960s and 1980-1990s; DTR and T aver showed strong inhomogeneities around 1885-1890, around 1905, around 1916, 1922, around 1930-1936, around 1941, in the 1960s and 1980s; and T min and T aver data had more inhomogeneities during warm months.The inhomogeneity levels of T max and DTR data were more or less constant throughout the year.The most significant non-climatic break occurred in 1922 due to changes in the instruments location.This break is clearly seen in relative homogeneity tests statistics both for T min and T max .Another break was associated with the small relocation of the instruments park in 1933.This break is seen only in relative homogeneity tests statistics for T max .These two breaks required correction.The change in the thermometer height in 1950 showed no significant (significance 95 % or more) effect on the homogeneity of the temperature data.

Correction procedure
To correct the non-climatic breaks, T min and T max data sets were divided into three periods: 1865-1921, 1922-1932, and 1933-2004.We started from the most recent break -1932/1933.This break was corrected only in T max series.As one can see, the inhomogeneity level of T min (left panels) decreases slightly -dots are close to the bisect, whereas on the contrary the inhomogeneity level of T max (right panels) significantly decreases for all monthly series when compared to Lisbon temperature series (low panel) and for 10 monthly series when compared to Porto temperature series (top panels).These homogeneity tests allow one to consider the corrected series of T min and T max as free of non-climatic changes with a significance of at least 95 %.

Conclusions
Homogeneity tests show the presence of strong non-climatic breaks in all temperature series.Most of the detected breaks were corrected and the homogeneity tests of the corrected series show no significant (significance 95 % or more) breaks around dates of instrumental changes.

Porto
One strong non-climatic break was detected in the temperature series of Porto Serra do Pilar, IGUP.This break was caused by the changes in the instruments location and height (1916).This break did not coincide with known volcanic eruptions of significant strength and required correction.Other breaks detected by the homogeneity tests either had low levels of significance (lower than 95 %) or coincided with (probably caused by) strong volcanic eruptions.Such was the case of the possible non-climatic break in 1984, which could not be corrected due to the aforementioned coincidence.The break that took place in 1916 was corrected.

Lisbon
Two strong non-climatic breaks were detected in the temperature series of Lisbon, IGIDL.These breaks were caused by the changes in the instruments location (1864) and height (1941).These breaks were corrected.Other breaks detected by the homogeneity tests had low levels of significance (lower than 95 %).

Coimbra
Two strong non-climatic breaks were detected in the temperature series of Coimbra, IGUC.These breaks were caused by the changes in the instruments location (1922 and 1933).These breaks were corrected.

Figure 2 .
Figure 2. Variations of T min (a) and T max (b) measured in Lisbon, Coimbra and Porto-Serra do Pilar from 1917 to 1925 (annual means) and approximations by multiple regression models for time periods of ±5 and ±10 yr around the gap -annual sums.Grey vertical lines mark the period of absent data.The bold red lines show the accepted interpolation.
).The significances (p) of the correlation coefficients are smaller than 0.02 with a single exception.There are strong correlations (r = 0.51...0.92) between the temperature variations in Porto and Lisbon and Coimbra for almost all months with only one exception -the correlation between Porto and Lisbon series of T min (June, m6): r L = 0.21, p = 0.37.Nevertheless, it is still possible to use the data from Lisbon and Coimbra as regressors for Porto data in multiple regression models.

Figure 3 .
Figure 3. Porto: annual variations of T min (a), T max (b), T aver (c) and DTR (d); temperature series are shown in black, differences between temperature and SST Porto series are shown in blue.Grey vertical bands show dates of known instruments relocation (see Table3).
are just average values calculated on the basis of monthly interpolation values of each model for presentation purposes only and were not used for interpolation.As can be seen, the interpolation using the multiple regression models instead of simple linear Earth Syst.Sci.Data, 4, 187-213, 2012   www.earth-syst-sci-data.net/4/187/2012/

Figure 4 .
Figure 4. Porto: von Neumann ratio statistics for monthly series of T min , T max , DTR and T aver .Black straight lines show probability levels.

Figure 5 .
Figure 5. Porto: average of 12 monthly series of Buishand Q test (left panels), SNHT (middle panels) and Pettitt tests (right panels) statistics of T min (a), T max (b), T aver (c) and DTR (d).Statistics of temperature series are shown in black, statistics of differences between temperature and SST series are shown in blue.Solid and dashed horizontal lines show probability levels of 99 % and 95 %, respectively.Grey vertical lines show known dates of instrumental changes, cyan broad vertical lines show periods of strong volcanic influence.

Figure
Figure5a-dshow test statistics (absolute and relative) for Buishand, SNHT and Pettitt test for T min , T max , T aver and DTR, respectively.The average of 12 monthly statistics series is plotted in these figures to emphasize the main features of each homogeneity test statistics and for better visualisation.From Fig.5a-dit is possible to detect the strongest break in data homogeneity around 1916 -date of movement to a new location and change in the thermometers height.Also, for some months (not shown) there are breaks in the homogeneity around 1920s (gap and probable change of the thermometer), 1930s (unknown origin), 1947 (changes in the measurements time), 1963 (volcanic eruption), and 1984 (changes in the measurements time, which coincide with volcanic eruptions).However, these breaks are seen only for some months and not by all three homogeneity tests at the same time.Therefore, the only break that has to be corrected is the 1916 break.Other non-climatic breaks have no statistically significant effect or could not be corrected due to the coincident influence of other (climatic) forces like the volcanic eruption around

Figure 7 .
Figure 7. Porto: original and corrected annual series of T min (a), T max (b), T aver (c) and DTR (d); temperature series are shown in black and red, differences between temperature and SST Porto series are shown in blue and cyan.Grey vertical bands show dates of known instruments relocation (see Table3).

Figure 8 .
Figure 8. Porto: same as Fig. 4 but for the corrected series.

Figure 9 .
Figure 9. Porto: same as Fig. 5a-d but for the corrected series.

Figure 11 .
Figure 11.Lisbon: annual variations of T min (a), T max (b), T aver (c) and DTR (d); temperature series are shown in black, differences between temperature and SST Lisbon series are shown in blue.Grey vertical bands show dates of known instruments relocation (see Table3).

Figure 12 .
Figure 12.Lisbon: von Neumann ratio statistics for monthly series of T min , T max , DTR and T aver .Black straight lines show probability levels.

Figure 13 .
Figure 13.Lisbon: average of 12 monthly series of Buishand Q test (left panels), SNHT (middle panels) and Pettitt tests (right panels) statistics of T min (a), T max (b), T aver (c) and DTR (d).Statistics of temperature series are shown in black, statistics of differences between temperature and SST series are shown in blue.Solid and dashed horizontal lines show probability levels of 99 % and 95 %, respectively.Grey vertical lines show known dates of instrumental changes, cyan broad vertical lines show periods of strong volcanic influence.

Figure 15 .
Figure11a-dshow the time variations of the annual series of T min , T max , T aver and DTR, respectively.As one can see, DTR variations (Fig.11d) show two easily visible breaks in 1863/1864 and 1940/1941.These breaks correspond to the two most significant changes in the instruments location: movement to a new place in 1864 and relocation of the instruments from the top of the tower to the ground level in 1941.At first sight, it seems that the minor changes in the thermometers height that took place from 1917 to 1937 and minor changes in the instruments location in 1979 were too small to have a significant influence on the data homogeneity.These two breaks have different influences on the T min and T max variations (see Fig.11a and b).As one can see, during the first break (1864) there are significant jumps both in T min and T max .However, during the second break in 1941 there is a significant jump in T max but a very small one (if any) in T min .On the contrary, the difference between T min and SST (Fig.11a, blue line) has a significant jump in 1941, whereas the difference between T max and SST (Fig.11a, blue line) shows no visible jumps.This dissimilarity could be explained by the different character of the changes in the instruments locations.In 1864 the instruments were moved to a new place with a new microclimate; in 1941 the change was mainly in the instruments height, not so much in location, causing a significant jump only in one of the extremes (seeAguilar et al., 2003).These conclusions were also derived from the visual analysis of the 12 monthly data series for each of the four temperature parameters (not shown)jumps around 1864 and 1941 are seen for almost all months.

Figure 16 .
Figure 16.Lisbon: same as Fig. 12 but for the corrected series.

Figure 17 .
Figure 17.Lisbon: same as Fig. 13a-d but for the corrected series.
1980s; DTR and T aver show the same three periods of inhomogeneity: near1880s-1890s, 1910s-1940s and 1970s- 1980s.Temperature series of warm months contain more inhomogeneities than those of cold months.The inhomogeneities that are not associated to known dates of instrumental changes could appear due to the internal climatic variations caused by, e.g.major volcanic eruptions.The most significant non-climatic breaks have occurred in 1864 and 1941 due to changes in the instruments location (1864) and height

Figure 19 .
Figure 19.Coimbra: annual variations of T min (a), T max (b), T aver (c) and DTR (d); Coimbra temperature series (black) and differences between Coimbra and Porto (blue) and Lisbon (green) temperature series.Grey vertical bands show dates of known instruments relocation (see Table3).
Figure 20.Coimbra: von Neumann ratio statistics for monthly series of T min , T max , DTR and T aver .Black straight lines show probability levels.

Figure 21 .
Figure 21.Coimbra: average of 12 monthly series of Buishand Q test (left panels), SNHT (middle panels) and Pettitt tests (right panels) statistics of T min (a), T max (b), T aver (c) and DTR (d).Statistics of temperature series are shown in black, statistics of differences between Coimbra and Porto are shown in blue and between Coimbra and Lisbon are shown in green.Solid and dashed horizontal lines show probability levels of 99 % and 95 %, respectively.Grey vertical lines show known dates of instrumental changes, cyan broad vertical lines show periods of strong volcanic influence.

Figure 23 .
Figure 23.Coimbra: original and corrected annual series of T min (a), T max (b), T aver (c) and DTR (d); Coimbra temperature series (red and black) and differences between Coimbra and Porto (cyan and blue) and Lisbon (dark green and green) temperature series.Grey vertical bands show dates of known instruments relocation (see Table3).

Figure 24 .
Figure 24.Coimbra: same as Fig. 20 but for the corrected series.
For each month the means of T max for time intervals of ±10 yr www.earth-syst-sci-data.net/4/187/2012/ Earth Syst.Sci.Data, 4, 187-213, 2012 around 1933 were calculated.The second break (1921/1922) was corrected both in T max and T min series using intervals of ±10 yr for T max and ±40 yr for T min .All correction values are shown in Fig. 22. Results of the corrections as well as original data are shown in Fig. 23a-d for annual series.5.3.2Homogeneity of the corrected series All four corrected data sets were subjected to the same homogeneity tests as the original data.The results of these tests for T min , T max , T aver and DTR are shown in Figs.24 and 25a-d (similarly to Fig. 21).As one can see from the comparison of homogeneity test statistics of original (Figs.20-21) and corrected (Figs.24-25) series, the corrected data sets are less inhomogeneous.The statistics of the relative homogeneity tests show much less inhomogeneities in the corrected series than statistics of absolute homogeneity tests.The corrected series still contain inhomogeneities caused (probably) by the volcanic eruptions.It can be seen that, as a whole, the annual variations of the corrected data series homogeneity given by the von Neumann ratio still depends on the temperature parameter; T min is the less homogeneous among all parameters.Despite the fact that annual series still contain non-climatic inhomogeneities, monthly series, in most cases, are free of them.For a couple of months homogeneity tests still show breaks in homogeneity in the period from 1922 to 1933, but there is no consistency between the three homogeneity tests (Buishand and Pettitt tests and SNHT) in the dates of breaks.Figure 26 shows CRMSEs of corrected series (corrected Porto and Lisbon temperature series are used as reference series), plotted versus CRMSEs of the original series.

Table 1 .
(von Neumann, 1941)99 % critical values for the following homogeneity test statistics: N of the von Neumann ratio test, T 0 of the SNHT, X K of the Pettitt test and Q of the Buishand partial sum tests, for data sets with different lengths(114, 141 and 153 elements).In this test the null hypothesis is that the data are independent identically distributed random values; the alternative hypothesis is that the values in the series are not randomly distributed.The von Neumann ratio N is defined as the ratio of the mean square successive (year to year) difference to the variance(von Neumann, 1941):

Table 2 .
Major volcanic eruptions from 1850 to 2000.DVI values taken from the NCDC database.

Table 3 .
Known dates of changes in thermometer heights (h t ) and locations for the three Portuguese stations of Lisbon, Coimbra and Porto.