<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "https://jats.nlm.nih.gov/nlm-dtd/publishing/3.0/journalpublishing3.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="data-paper" specific-use="SMUR" dtd-version="3.0" xml:lang="en">
<front>
<journal-meta>
<journal-id journal-id-type="publisher">ESSDD</journal-id>
<journal-title-group>
<journal-title>Earth System Science Data Discussions</journal-title>
<abbrev-journal-title abbrev-type="publisher">ESSDD</abbrev-journal-title>
<abbrev-journal-title abbrev-type="nlm-ta">Earth Syst. Sci. Data Discuss.</abbrev-journal-title>
</journal-title-group>
<issn pub-type="epub">1866-3591</issn>
<publisher><publisher-name></publisher-name>
<publisher-loc>Göttingen, Germany</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.5194/essd-2025-340</article-id>
<title-group>
<article-title>BorFIT: A Novel LiDAR-Based Training Dataset for Individual Tree Segmentation and Species Detection in northern boreal Forests</article-title>
</title-group>
<contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Schladebach</surname>
<given-names>Jacob</given-names>
<ext-link>https://orcid.org/0009-0001-8381-2228</ext-link>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Heim</surname>
<given-names>Birgit</given-names>
<ext-link>https://orcid.org/0000-0003-2614-9391</ext-link>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Enguehard</surname>
<given-names>Léa</given-names>
<ext-link>https://orcid.org/0000-0002-2144-8264</ext-link>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Wieczorek</surname>
<given-names>Mareike</given-names>
<ext-link>https://orcid.org/0000-0002-3180-1607</ext-link>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Broers</surname>
<given-names>Jakob</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Jackisch</surname>
<given-names>Robert</given-names>
<ext-link>https://orcid.org/0000-0001-5696-8721</ext-link>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Gloy</surname>
<given-names>Josias</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Hao</surname>
<given-names>Kunyan</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Tretton</surname>
<given-names>James</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Gorshunova</surname>
<given-names>Anna</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Kruse</surname>
<given-names>Stefan</given-names>
<ext-link>https://orcid.org/0000-0003-1107-1958</ext-link>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
</contrib-group><aff id="aff1">
<label>1</label>
<addr-line>Alfred Wegener Institute Helmholtz Centre for Polar and Marine Research, Potsdam, Germany</addr-line>
</aff>
<aff id="aff2">
<label>2</label>
<addr-line>Technical University of Berlin, Berlin, Germany</addr-line>
</aff>
<pub-date pub-type="epub">
<day>20</day>
<month>08</month>
<year>2025</year>
</pub-date>
<volume>2025</volume>
<fpage>1</fpage>
<lpage>31</lpage>
<permissions>
<copyright-statement>Copyright: &#x000a9; 2025 Jacob Schladebach et al.</copyright-statement>
<copyright-year>2025</copyright-year>
<license license-type="open-access">
<license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri"  xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p>
</license>
</permissions>
<self-uri xlink:href="https://essd.copernicus.org/preprints/essd-2025-340/">This article is available from https://essd.copernicus.org/preprints/essd-2025-340/</self-uri>
<self-uri xlink:href="https://essd.copernicus.org/preprints/essd-2025-340/essd-2025-340.pdf">The full text article is available as a PDF file from https://essd.copernicus.org/preprints/essd-2025-340/essd-2025-340.pdf</self-uri>
<abstract>
<p>BorFIT is a novel training data set designed to assist in the segmentation of individual trees and the detection of species from LiDAR point clouds, thus contributing to deep learning-based forestry applications. Recent advancements in AI-supported individual tree detection have shown significant progress; however, satisfactory results remain elusive in dense and structurally-complex boreal forests. We compiled a training data set designed to remedy this issue. It comprises 384 LiDAR point clouds, each with an area of 20 m &amp;times; 20 m, in the form of reference plots, with up to 200 manually segmented and species classified trees per point cloud. We carried out LiDAR surveys at 146 sites between 2021 and 2024 in East Siberia (Yakutia), northwest Canada, and Alaska (USA), selected along a bioclimatic gradient to represent the circumboreal region. From each LiDAR transect derived point cloud, we extracted a minimum of four reference plots (each 20 m &amp;times; 20 m) based on maximum tree heights within the plots to systematically sample the apparent tree density gradient. We manually segmented identifiable trees within each reference plot point cloud leading to 16,530 individual trees in total. Following segmentation, we trained four randomForest classifiers to predict the species of every segmented tree. The predicted tree species include: &lt;em&gt;Picea mariana&lt;/em&gt; (Britton, Sterns Poggenb.), &lt;em&gt;Picea sitchensis&lt;/em&gt; ((Bong.) Carri&amp;egrave;re), &lt;em&gt;Picea glauca&lt;/em&gt; ((Moench) Voss), &lt;em&gt;Pinus contorta&lt;/em&gt; (Douglas ex Loudon), &lt;em&gt;Abies lasiocarpa&lt;/em&gt; ((Hook.) Nutt.), &lt;em&gt;Larix laricina&lt;/em&gt; ((Du Roi) K.Koch), &lt;em&gt;Betula papyrifera&lt;/em&gt; (Marshall), &lt;em&gt;Betula neoalaskana&lt;/em&gt; ((Regel) Ashburner McAll.), &lt;em&gt;Populus balsamifera&lt;/em&gt; (L.), &lt;em&gt;Populus tremuloides&lt;/em&gt; (Michx.), &lt;em&gt;Pinus sylvestris&lt;/em&gt; (Thunb.) and &lt;em&gt;Alnus glutinosa&lt;/em&gt; ((L.). The data offer the means for 3D space analysis of species distribution and stand structure around the circumboreal region. Furthermore, it can be used as a training data set for artificial intelligence (AI) applications and thereby improve our understanding of the boreal forest&amp;rsquo;s vegetation reorganization in response to significant global warming.</p>
</abstract>
<counts><page-count count="31"/></counts>
</article-meta>
</front>
<body/>
<back>
</back>
</article>