Dense labelled time-series for mapping European forest disturbance agents

Viehweger, Jonas; Hirschmugl, Manuela; Puhm, Martin; Burtscher, Raphael; Gallaun, Heinz; Deutscher, Janik

doi:10.5194/essd-2026-185

Preprints

https://doi.org/10.5194/essd-2026-185

Preprints

12 May 2026

| 12 May 2026

Status: this preprint is currently under review for the journal ESSD.

Dense labelled time-series for mapping European forest disturbance agents

Jonas Viehweger, Manuela Hirschmugl, Martin Puhm, Raphael Burtscher, Heinz Gallaun, and Janik Deutscher

Abstract. Attributing disturbance agents to canopy mortality in European forests remains difficult due to sparse, heterogeneous, and often single-agent reference data. We present DISFOR (Viehweger et al., 2026), a uniformly re-interpreted ground-truth dataset of forest disturbance agents, designed to serve as training data for multi-temporal classification and analysis of disturbance agents with Sentinel-2 time-series. The dataset comprises 3,822 unique sample points, each defined at the 10×10 m pixel level, labelled by disturbance event and agent and fully temporally segmented into consecutive events and forest states for the years 2015–2024. Labels follow a three-level hierarchical scheme that supports analyses from broad "alive vs. disturbed" partitions to specific agents such as bark beetle, windthrow, wildfire, and salvage logging. Samples were drawn from multi-source ancillary data (e.g. EFFIS, FORWIND, Copernicus EMS, and regional forestry datasets) and consistently re-interpreted using an open-source interface and generally available primary data like Sentinel-2 and very high resolution imagery. For each sample we provide interpreter confidence, cluster identifiers to capture spatio-temporal autocorrelation, and additional metadata. Alongside interpreted samples, we release two Sentinel-2 data products tailored to complementary use cases: (i) a tabular single pixel reflectance time series between 2015–2024 and (ii) georeferenced 32×32-pixel image chips centred on the sampled points suitable for computer vision applications, with Python utilities for reproducible data loading and filtering. The dataset is suited for training and calibration of both change detection and agent attribution algorithms at sub-annual resolution, it supports training on single timestamps as well as on time-series, and facilitates studies that integrate spectral dynamics with spatial context. By harmonizing multiple first-level disturbance agent products and providing dense, temporally explicit labels, this resource lowers a major barrier to developing European forest disturbance and recovery monitoring.

Received: 10 Mar 2026 – Discussion started: 12 May 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Jonas Viehweger, Manuela Hirschmugl, Martin Puhm, Raphael Burtscher, Heinz Gallaun, and Janik Deutscher

Status: open (until 18 Jun 2026)

Post a comment Subscribe to comment alert

RC1:
'Comment on essd-2026-185', Loïc Dutrieux, 29 May 2026 reply
Very interesting data compilation and re-interpretation effort that will certainly greatly benefit the community as data with this type of thematic details are either scarce or highly heterogeneous in quality. I have a few overall comments and some more minor remarks linked to sections or statements made in the manuscript.
Coverage:

It could have included at least two more datasets:
https://www.nature.com/articles/s41597-026-07084-8

https://zenodo.org/records/10449087

Details on labeling process:

The manuscript includes little details on the labeling process. I would have like to see illustrations and textual descriptions of spectro-spatio-temporal aspects of the various dynamics considered in the dataset. Some kind of feedback on the ideal visual interpretation parameters (color composites, vegetation indices) would be a nice addition too. Finally further discussion of some specific labeling challenges would make the data creation process fully transparent.
Temporal conceptualization of classification framework:

The manuscript discusses how various dynamics are either mapped to temporal segments of abrupt events, but it's not clear how this conceptualization is later handled in the dataset. I see only temporal segment labels in the dataset fields description.
Minor remarks:

Line 30: The community often makes bark beetle its own class and consider it to be the only type of biotic disturbance. It surely is by far the largest biotic agent but it's not the only one.

Line 41: metadata format heterogeneity is true and datasets cannot be ingested in their raw form in machine learning pipeline, but given data scarcity, this is hardly a blocking issue

Line 42: DEFID2 is in theory broader in scope than just insects; it includes all pests and diseases

Line 61-66: I believe the process you described can be characterized as augmentation
Line 69: You use "training data"; good to make it clear whether it's indeed intended for training, or if it is a more generic "reference dataset"

Line 96: It would be good to have some idea of the amount of sample for which the visual re-interpretation did not confirm the agent or even the disturbance. In my experience compiled datasets are widely heterogeneous in quality

Lines 104-110: I'm not entirely sure it is relevant to name the projects at this stage

Line 134: How were local news reports discovered?

Line 138: FORWIND has not been updated in a long time and marginally overlap with the sentinel era

Line 147: In my opinion you're missing an introductory paragraph here detailing the general idea and key assumption behind this app (visual interpretation potential, simultaneous spatial and temporal visualization, ergonomic, etc).

Figure 2: The right most explanation bubble explains that the sample footprint is not the same. Why?

Line 160: Dieback is a common term used instead of decline

Line 161: Stretching the concept of segments/events, not only disturbances would fit into events, but any transition between two segments, such as a stabilization at the end of a regrowth period.

Line 164: I do not understand why you'd need two labels per segment

Perhaps conceptual figure illustrating segments/labels/events would help here

Line 210: I think this was already mentioned a few paragraphs earlier

Line 235: Fully agree, I would even further insist on that. Because of unknown or non-zero inclusion probabilities, the dataset cannot be used for assessment of mapping accuracy. It can be used for model performance assessment though, but that does not necessarily translate to mapping accuracy.

Line 247: OK, but which baseline was used. Differences are marginal between baselines, but still good to be exhaustive

Line 248: How do you justify applying the offset but not the scaling factor?

Line 259: Not sure it makes sense to use COG for a 32 pixels chip that is probably made of a single data block. That's fine, the greater includes the lesser.

Lines 303-309: These are new elements; it's uncommon to mention things that have not been discussed previously in a conclusion
Loïc Dutrieux

Reply
Citation: https://doi.org/10.5194/essd-2026-185-RC1

Jonas Viehweger, Manuela Hirschmugl, Martin Puhm, Raphael Burtscher, Heinz Gallaun, and Janik Deutscher

Data sets

DISFOR Jonas Viehweger et al. https://doi.org/10.57967/hf/7983

Model code and software

DISFOR – Python library Jonas Viehweger https://github.com/JR-DIGITAL/DISFOR

DISFOR – Python library documentation Jonas Viehweger https://jr-digital.github.io/DISFOR/

Jonas Viehweger, Manuela Hirschmugl, Martin Puhm, Raphael Burtscher, Heinz Gallaun, and Janik Deutscher

Viewed

Total article views: 230 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
184	37	9	230	14	14

HTML: 184
PDF: 37
XML: 9
Total: 230
BibTeX: 14
EndNote: 14

Views and downloads (calculated since 12 May 2026)

Month	HTML	PDF	XML	Total
May 2026	184	37	9	230

Cumulative views and downloads (calculated since 12 May 2026)

Month	HTML	PDF	XML	Total
May 2026	184	37	9	230

Viewed (geographical distribution)

Total article views: 230 (including HTML, PDF, and XML) Thereof 230 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 31 May 2026

Short summary

This publication presents DISFOR, a dataset labelling timing and agent (wildfire, windthrow, clear cuts, etc) of forest disturbance events in Europe. The dataset should allow for progress in the field of automatic disturbance agent classification using satellite data. It was created by deriving disturbance agents from various already available disturbance agent products. To harmonize the products, the timing and the assigned disturbance agent were re-interpreted using Sentinel-2 time-series.


Total:	0
HTML:	0
PDF:	0
XML:	0