the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
EEAR-Clim: A high density observational dataset of daily precipitation and air temperature for the Extended European Alpine Region
Abstract. The Extended European Alpine Region (EEAR) exhibits a well-established and very high-density network of in-situ weather stations, hardly attainable in other mountainous regions of the world. However, the strong fragmentation into national and regional administrations and the diversity of data sources have so far hampered full exploitation of the available data for climate research. Here, we present EEAR-Clim, a new observational dataset gathering in-situ daily measurements of air temperature and precipitation from a variety of meteorological and hydrological services covering the whole EEAR. Data collected include time series from recordings up to 2020, the longest ones spanning up to 200 years. The overall observational network encompasses about 9000 in-situ weather stations, significantly enhancing data coverage at high elevations and achieving an average spatial density of one station per 6.8 8 km2 over the period 1991–2020. Data collected from many sources were tested for quality to ensure internal, temporal, and spatial consistency of time series, including outliers removal. Data homogeneity was assessed through a cross-comparison of the outcomes using three methods well established in the literature, namely Climatol, ACMANT, and RH Test. Quantile matching was applied to adjust inhomogeneous periods in time series. Overall, about 4 % of data were flagged as non-reliable and about 20 % of air temperature time series were corrected for one or more inhomogeneous periods. In the case of precipitation time series, fewer breakpoints were detected, confirming the well-known challenge of properly identifying inhomogeneities in noisy data. The dataset aims to serve as a powerful tool for better understanding climate change over the European Alps.
- Preprint
(4350 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 23 Nov 2024)
-
RC1: 'Comment on essd-2024-285', Anonymous Referee #1, 15 Nov 2024
reply
review of "EEAR-Clim: A high density observational dataset of daily precipitation and air temperature for the Extended European Alpine Region" bu Bongiovanni, Matiu, Crespi, Napoli, Majone and Zardi.
The authors present a temperature & precipition climate dataset for the extended Alpine region. The collected data for this dataset is extensive with about 9000 stations and quality control and homogeneity of the time series is thorough and follows recent views. The Alpine region is such a sensitive region to climate change but due to the complex topography and the fact that data sources are many, a daatset like the EEAR is valuable for the scientific community but also for climate services providers in the region like national and regional meteorological services.
The documentation and the amount of rigour in the approach is impressive. As far as this reviewer can see, no methodological errors have been made. Some slight clarifications are possible though.Perhaps it is relevant to add that the publication of the this manuscript and the dataset sparked interest at the climatological departments of MeteoSwiss and GeoSphere. And it is in this direction that I have my most important concern. While the authors rightly note in the introduction that the EEAR fills a gap in the dataset landscape that urgently needs to be filled, the EEAR is not completely independent from the existing datasets. Specifically, the reader who is active in Alpine climate research will be interested in the differences between EEAR and existing datasets, like the APGD for a pan-Alpine view on precipitation. Also, GeoSphere published quality controlled and homogenized series for Austria using a somewhat different approach than done in this study. What is missing in the study is an initial assessment of differences between the key existing datasets and the EEAR. Clearly, it will not be possible to make a comparison against all national datasets, but the team of authors is well acquainted with the existing dataset landscape to select a fww (like 2) datasets to make a meaningfull initial assessment of differences.
The second concern is actually more of an invitation to contemplate the notion of making this an operational dataset and making it a living dataset rather than a dataset which is very good but is not updated. I would like to urge the authors to add a section in the manuscript discussing the possibility and requirements to make this dataset operational. The guess of this reviewer is that one bottleneck in making the dataset operational is access to data. Perhaps the authors may want to refer to recent initiatives, like the EUMETNET Rodeo project https://rodeo-project.eu/ which build upin the High-Value Datasets directive of the European Commission. (https://digital-strategy.ec.europa.eu/en/news/commission-defines-high-value-datasets-be-made-available-re-use)
Other issues the authors may want to look into
* line 32. The Hofstra et al. and Kysely & Plavcova publications refer to rather ancient versions of E-OBS. It would be appreciated if a remark could be added which makes clear that station density in this part of the greater Alpine region (and in other parts) has improved significantly. Although it remains fair to state that EEAR still has a higher stations density in many parts of the greater Alpine area.
* lines 214-221. The issue of time shift is indeed an important issue. The 3-day moving window approach alleviates this to some degree, for example if a 24-h accumulated rainfall amount is attached to the date of the start of the 24-h period where the reference has attached this amout to the end date of the period. However, it is likely that the reference and target station provide data that partially overlap. For these cases it is less clear if the sliding window approach works. In this case, sometimes accumulated precip of the previous (or next) day will match better than the amounts recorded on days with the same dat stamp. Would it be possible for the authors to spend a few more words on this subject and how this has been approached?
Perhaps it is worthwhile to check https://doi.org/10.5194/essd-15-1441-2023 where a similar problem is encountered.very minor issues
* line 140: typo in 'metrics' (is plural, should be single)
* table 2: the description of the outliers detection in the table and in the text (line 210-212) could be more complete. The table adds criteria on temperature which are not described in the text.
* line 311: Sadly, the ETCCDI does no longer exist as a WMO Expert Team. It has been replaced by ET-SCI (Sector specific Climate Indices) https://climpact-sci.org/assets/etsci/etsci-poster-20190220-en.pdf Not much activity seems to come from this Expert Team. All I can suggest is to add a link to the https://etccdi.pacificclimate.org/ website to guide the unaware reader.
* line 489: It would be appreciated if data providers to ECA&D were acknowledged here as well (https://knmi-ecad-assets-prd.s3.amazonaws.com/documents/ECAD_datapolicy.pdf)Citation: https://doi.org/10.5194/essd-2024-285-RC1 -
RC2: 'Comment on essd-2024-285', Gilles Delaygue, 17 Nov 2024
reply
Bongiovanni et al. have collected and gathered different datasets of daily temperature and precipitation measurements for a large region centered over the European Alps, from national and regional meteorological agencies. They tested the quality and homogenised part of these data in order to provide a more homogeneous dataset which could be used to study both climatic variability and long term trends over approximately the last century.
I find this work and the constructed dataset extremely valuable, and the manuscript worth being published. The authors did spend a great effort to gather new data, test and correct them as much as possible in order to provide a more homogeneous dataset. Compared to existing datasets, this one has several differences:
it provides both temperature and precipitation, up to year 2020;
measurements from different regional agencies in north Italy have been gathered, incl. from newly digitized archives, which is a complicated task compared to national agencies;
a huge work has been devoted to check and homogenise the series;
data have been kept at the station level, rather than being spatially averaged ('gridded');
large area including the European Alps, with part of the Mediterranean area;
some long series (although most of them are rather short).
I am an end user of meteorological series (and also contributed to exhume old data), my comments will focus on these aspects and not so much on the correction techniques. I feel that the manuscript could describe the dataset a bit more in depth, make it clear what are the strengths and weaknesses compared to other dataset, and provide some advice and caveats to its use. The introduction could be shorten to leave more space for such a description. My comments below are propositions in that direction (but they may miss the point, i apologize in that case).A. General comments
1. The dataset is large (9000 series), with a complicated time structure (essentially series shorter than 30 years, but also long ones; their overlap is not clear). The quality check and corrections are extensive, but homogenisation was applied to less than 20% of the series (12% for P). Hence the dataset is still quite heterogeneous, and i feel that more explanations, or even advice, could be given to potential users, especially strengths and weaknesses compared to other datasets. For instance, it would be interesting to know which part of the dataset is considered more accurate (e.g., when an absolute threshold like 0ºC is considered), and which part is considered less accurate (but still useful for statistical analysis).
Two specific comments:
1.1 Authors underline the possibility to study climatic change from their dataset but 70% of the T series, and more than half of the P series, are shorter than 30 years, the duration required to calculate a climatic mean (WMO definition). Hence to me the largest part of the dataset cannot be used to study climatic trends, but rather climatic interannual to decadal variability. This should be underlined.
1.2 A homogenisation procedure has been applied, and its effect on long term trends is not clear to me: if the correction were perfect, the procedure should preserve these trends. However, in practice, the correction of the 'non-climatic' change is not perfect, and this may increase or decrease the long term trend. I guess this effect has been tested in previous studies, by applying other procedures (use of metadata; least-square fitting with nearby stations, etc.). Maybe some indications could be given.2. The whole interest of the dataset is to compare series in different area. However, techniques and procedures of measurements have been widely different between operators, with strong differences due to both the daily resolution and the complex mountain environment. How far can the series of this dataset be compared; can some systematic differences be expected? I think this point should be addressed, since most users of the dataset will have no clue about this question.
At least since the 1990s the measurement techniques and procedures should be well known. I guess measurement techniques did converge over the recent decades, and this concerns most of the data, but did the measurement procedures also converge (e.g. times of measurement in the day)? Some technical knowledge was also acquired with intercomparison projects (esp. the ones lead by WMO).
This problem is especially sensitive for precipitation and extremes; it is also sensitive because the dataset aggregates mountain and plain area, where techniques and measurement biases are different.
2.1 Difference in measurement technique is especially a problem for precipitation in mountain area, due to a potentially strong wind bias, and even more for estimating snow water equivalent, which can be very tricky. WMO has organised intercomparison projects, since at least the 1980s (Sevruk et al.): did it help to homogenise the measurement techniques? Since then?
2.2 Measurements procedures have been also widely different. The case of precipitation is the most sensitive since a precipitation event can be spread over few hours to few days: the measurement procedure could be to read the daily total at 6am, and the value attributed to the previous day, or to read the total at 6pm and the value attributed to the same day, so that a precipitation event could be cut and spread over different days. With a strong impact on precipitation intensity.3. The structure of the dataset is not yet clear to me.
3.1 Time structure. I do not understand what is actually displayed by Figure 2a. I expected Fig.2a-b to be cumulative distributions (esp. because continuous lines/curves suggest a time continuity). Obviously this is not the case, but Fig2a is not clear to me, even with the indication (L.132) that these are "10-yr increments". The first value of stations number seems to correspond to a record length of 1 year, is that the number of stations with a record length of 1 to 10 years? More explanations should be given in the figure legend, esp. by given an example for both figures. I guess Fig2a would be clearer with vertical bars.
3.2 Further, comparing both figures 2a and b, i have the impression that the longest series are not necessarily the latest. Hence the information of the series overlap is missing in these figures, and it would be great to see some information on their overlap (maybe with an additional figure).
3.3 Spatial structure. How the area has been defined? The whole focus of the study is about "mountain terrain", but Fig.3b shows that, in fact, most of the area has an elevation lower than, say, 400m, and most of the stations are below 500m (L.149). This seems at odd with the claim that (L.108) "EEAR is predominantly constituted by complex terrain and hence characterized by strong elevation gradients". I suggest to give some clue about the area definition (why 3-18E, 43-49N ?), and to qualify the above comments.
B. Specific commentsAbstract:
L.7-8 The fact that most of the 9000 series are short (less than 30 years) and restricted to the period 1991-2020 should be underlined.
L.14-15: "better understanding climate change" and climatic variability, mostly, since most of the series are too short to address a climatic trend (i.e., over less than 30 years)L.16 "The continuous warming of the climate" > "Global warming" [suggestion]
L.21-22 "benefits from a density of weather stations and length of data series not easily attainable in many other regions": not clear neither complete to me. I guess what is meant is the availability of many stations and long duration of series. But this richness should be compared to the heterogeneity of the mountain environment, which requires much more stations to resolve "the complex nature of Alpine terrain" (L.37).
L.31 E-OBS is a gridded product, it would be interesting to have some indications of strengths/weaknesses compared to this dataset.
L.32 a word missing here, i guess "has a lower density"
L.53 "intensity of extreme weather events" is very sensitive to the measurement procedure (cf. Comment 1.2), hence the need to document the procedure as metadata.
L.58-60 "data quality", "accurate data", is a bit too vague to me, quality depends on the use of data.
L.125 "TP" for total precipitation is a bit confusing to me, since T refers to temperature; why not P instead?
L.125 "global provider" (GHCN): maybe make it clear that it is only a provider, not a producer, of meteorological observations
L.126 Table 1: ECA&D is a provider, not a producer of data. Why was it used to get data? I guess these series were already quality checked and homogenised, if so, were the procedures similar to the ones used in this work?
L.126 Table 1 & Status of data: further explanations should be given about the data availability, rather than just the 'open data' flag. What means "available without restrictions"? If about 2700 series are not included in the dataset, then why considering them? (Hopefully restrictions concern raw data and not corrected/homogenised data?)
L.140 "distance between stations is a useful metrics" not really in mountain area, but at least the simplest metrics
L.166 daily averages: please make it clear over which period sub-daily data have been averaged (0-24h, 6h-6h, other?)
L.178 i understand that the thresholds indicated in Table 2 are used to detect possible problems, which are then manually checked. But how data have been finally considered as "definitely erroneous"? It is important for end users of the dataset to know precisely which criteria have been applied to finally reject a data.
It could be, and this should be explained, that these erroneous values were very easy to manually spot out of the adjacent days. If, instead, the detection of erroneous days was semi-automatic, then thresholds are important. I could expect that Tmin could change by more than 20ºC over two days in winter. Concerning precipitation limit, the discussion L.190-192 on extreme precipitation is not clear to me. Daily totals above 500mm are rare, but have been consistently measured (e.g., in the SE part of French Massif Central). And daily, or 24h, totals close to 1000mm can be found in the meteorological archives within the EEAR.L.178 Table 2: P > 9 times P95 and P > 5 times P95
L.347-349 i do not understand the test here: homogenised series have been used to test 2 procedures (Exp1-2), why did the authors expect to detect heterogeneities in them?
L.352 Homogenisation procedure: how many years before and after each breakpoint have been used to calculate the quantiles? (not clear in Squintu et al whether 5 or 20 years)
L.363-365 Any possibility to couple the breakpoint detection and homogenisation for all temperature variables (Tn, Tx, Tm)? Would that improve the correction?
L.405 "varied" > "variable"?
L.434 "Most of them consist of at least 60 years" this does not seem consistent with Figs2 and Table A1 which show that most are shorter than 30 years?
L.441 "newly digitized data" i do not think this information is given elsewhere, although important; it should be underlined at the beginning
L. 451 "QC" "improved the overall accuracy" : this is expected, of course, but this is not tested/shown?
L.462 "was used as a basis to carry out an extended analysis" but not described in this manuscript > make it clear
L. 463 "a high-resolution interpolated version" i.e. a gridded version?
L.478 "Values below 0.1 mm" per day? (homogenisation procedure is based on monthly averages)
Table A1 should be central in the manuscript since it gives the overall structure of the dataset
----------
Citation: https://doi.org/10.5194/essd-2024-285-RC2
Data sets
EEAR-Clim: A high density observational dataset of daily precipitation and air temperature for the Extended European Alpine Region Giulio Bongiovanni, Michael Matiu, Alice Crespi, Anna Napoli, Bruno Majone, and Dino Zardi https://doi.org/10.5281/zenodo.10951609
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
272 | 42 | 128 | 442 | 6 | 8 |
- HTML: 272
- PDF: 42
- XML: 128
- Total: 442
- BibTeX: 6
- EndNote: 8
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1