04 Jul 2024
 | 04 Jul 2024
A Sentinel-2 Machine Learning Dataset for Tree Species Classification in Germany

Maximilian Freudenberg, Sebastian Schnell, and Paul Magdon

Abstract. We present a machine learning dataset for tree species classification in Sentinel-2 satellite image time series of bottom of atmosphere reflectance. The dataset is based on the German national forest inventory of 2012, as well as analysis ready satellite imagery computed using the FORCE processing pipeline. From the national forest inventory data, we extracted the tree positions, filtered 387 775 trees in the upper canopy layer and automatically extracted the corresponding bottom of atmosphere reflectance time series from Sentinel-2 L2A images. These time series are labeled with the corresponding tree species, which allows pixel-wise classification tasks. Furthermore, we provide auxiliary information such as the approximate tree position, the year of possible disturbance events or the diameter at breast height. Temporally, the dataset spans the years from July 2015 to end of October 2022 with ca. 75.3 million data points for trees of 51 species and species groups, as well as 13.8 million observations for non-tree background. Spatially, it covers entire Germany. The dataset is available under following DOI (Freudenberg et al., 2024):

Short summary
Classifying tree species in satellite images is an important task for environmental monitoring and forest management. Here we present a dataset containing Sentinel-2 satellite pixel time series of individual trees, intended for training machine learning models. The dataset was created by merging information from the German national forest inventory in 2012 with satellite data. It sparsely covers entire Germany for the years 2015 to 2022 and comprises 51 species and species groups.