Preprints
https://doi.org/10.5194/essd-2025-340
https://doi.org/10.5194/essd-2025-340
20 Aug 2025
 | 20 Aug 2025
Status: this preprint is currently under review for the journal ESSD.

BorFIT: A Novel LiDAR-Based Training Dataset for Individual Tree Segmentation and Species Detection in northern boreal Forests

Jacob Schladebach, Birgit Heim, Léa Enguehard, Mareike Wieczorek, Jakob Broers, Robert Jackisch, Josias Gloy, Kunyan Hao, James Tretton, Anna Gorshunova, and Stefan Kruse

Abstract. BorFIT is a novel training data set designed to assist in the segmentation of individual trees and the detection of species from LiDAR point clouds, thus contributing to deep learning-based forestry applications. Recent advancements in AI-supported individual tree detection have shown significant progress; however, satisfactory results remain elusive in dense and structurally-complex boreal forests. We compiled a training data set designed to remedy this issue. It comprises 384 LiDAR point clouds, each with an area of 20 m × 20 m, in the form of reference plots, with up to 200 manually segmented and species classified trees per point cloud. We carried out LiDAR surveys at 146 sites between 2021 and 2024 in East Siberia (Yakutia), northwest Canada, and Alaska (USA), selected along a bioclimatic gradient to represent the circumboreal region. From each LiDAR transect derived point cloud, we extracted a minimum of four reference plots (each 20 m × 20 m) based on maximum tree heights within the plots to systematically sample the apparent tree density gradient. We manually segmented identifiable trees within each reference plot point cloud leading to 16,530 individual trees in total. Following segmentation, we trained four randomForest classifiers to predict the species of every segmented tree. The predicted tree species include: Picea mariana (Britton, Sterns Poggenb.), Picea sitchensis ((Bong.) Carrière), Picea glauca ((Moench) Voss), Pinus contorta (Douglas ex Loudon), Abies lasiocarpa ((Hook.) Nutt.), Larix laricina ((Du Roi) K.Koch), Betula papyrifera (Marshall), Betula neoalaskana ((Regel) Ashburner McAll.), Populus balsamifera (L.), Populus tremuloides (Michx.), Pinus sylvestris (Thunb.) and Alnus glutinosa ((L.). The data offer the means for 3D space analysis of species distribution and stand structure around the circumboreal region. Furthermore, it can be used as a training data set for artificial intelligence (AI) applications and thereby improve our understanding of the boreal forest’s vegetation reorganization in response to significant global warming.

Competing interests: At least one of the (co-)authors is a member of the editorial board of Earth System Science Data.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Jacob Schladebach, Birgit Heim, Léa Enguehard, Mareike Wieczorek, Jakob Broers, Robert Jackisch, Josias Gloy, Kunyan Hao, James Tretton, Anna Gorshunova, and Stefan Kruse

Status: open (until 26 Sep 2025)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Jacob Schladebach, Birgit Heim, Léa Enguehard, Mareike Wieczorek, Jakob Broers, Robert Jackisch, Josias Gloy, Kunyan Hao, James Tretton, Anna Gorshunova, and Stefan Kruse

Data sets

eference dataset of individual trees from the Tundra-Taiga-Ecotone and Northern boreal forests BorFIT Stefan Kruse, Jacob Schladebach, Jakob Broers, Kunyan Hao, James Tretton, Anna Gorshunova https://doi.pangaea.de/10.1594/PANGAEA.980505

Model code and software

BorFIT Jacob Schladebach https://github.com/StefanKruse/BorFIT/tree/main

Jacob Schladebach, Birgit Heim, Léa Enguehard, Mareike Wieczorek, Jakob Broers, Robert Jackisch, Josias Gloy, Kunyan Hao, James Tretton, Anna Gorshunova, and Stefan Kruse
Metrics will be available soon.
Latest update: 20 Aug 2025
Download
Short summary
BorFIT is a novel training dataset for LiDAR point cloud segmentation and tree species detection in boreal forests. Comprising 384 plots across Siberia, Canada, and Alaska, it features 16,530 manually segmented trees of 12 species. BorFIT supports AI applications for analyzing species distribution, stand structure, and boreal forest response to climate change.
Share
Altmetrics