A Sentinel-2 Machine Learning Dataset for Tree Species Classification in Germany
Abstract. We present a machine learning dataset for tree species classification in Sentinel-2 satellite image time series of bottom of atmosphere reflectance. The dataset is based on the German national forest inventory of 2012, as well as analysis ready satellite imagery computed using the FORCE processing pipeline. From the national forest inventory data, we extracted the tree positions, filtered 387 775 trees in the upper canopy layer and automatically extracted the corresponding bottom of atmosphere reflectance time series from Sentinel-2 L2A images. These time series are labeled with the corresponding tree species, which allows pixel-wise classification tasks. Furthermore, we provide auxiliary information such as the approximate tree position, the year of possible disturbance events or the diameter at breast height. Temporally, the dataset spans the years from July 2015 to end of October 2022 with ca. 75.3 million data points for trees of 51 species and species groups, as well as 13.8 million observations for non-tree background. Spatially, it covers entire Germany. The dataset is available under following DOI (Freudenberg et al., 2024): https://doi.org/10.3220/DATA20240402122351-0