23 Nov 2021
23 Nov 2021
Status: this preprint is currently under review for the journal ESSD.

SiDroForest: A comprehensive forest inventory of Siberian boreal forest investigations including drone-based point clouds, individually labelled trees, synthetically generated tree crowns and Sentinel-2 labelled image patches

Femke van Geffen1,2, Birgit Heim1, Frederic Brieger1,3, Rongwei Geng1,4,5, Iuliia A. Shevtsova1,2, Luise Schulte1,2, Simone M. Stuenzi1,6, Nadine Bernhardt1,7, Elena I. Troeva8, Luidmila A. Pestryakova9, Evgenij S. Zakharov8,9, Bringfried Pflug10, Ulrike Herzschuh1,2,11, and Stefan Kruse1 Femke van Geffen et al.
  • 1Alfred Wegener Institute Helmholtz Centre for Polar and Marine Research (AWI), Research Unit Potsdam, Germany
  • 2University of Potsdam, Institute of Biochemistry and Biology, Potsdam, Germany
  • 3Carleton University, Department of Geography and Environmental Studies Ottawa, Canada
  • 4Key Laboratory of Land Surface Pattern and Simulation, Institute of Geographical Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, China
  • 5University of Chinese Academy of Sciences, Beijing, China
  • 6Humboldt-Universität zu Berlin, Geography Department, Unter den Linden, Berlin, Germany
  • 7Julius Kühn-Institut Bundesforschungsinstitut für Kulturpflanzen, Quedlinburg, Germany
  • 8Institute for Biological Problems of the Cryolithozone, Russian Academy of Sciences, Siberian Branch, Yakutsk, Russia
  • 9North-Eastern Federal University of Yakutsk, Institute of Natural Sciences, Yakutsk, Russia
  • 10German Aerospace Center (DLR), Berlin, Germany
  • 11University of Potsdam, Institute of Environmental Science and Geography, Potsdam, Germany

Abstract. This data collection is an attempt to remedy the scarcity of tree level forest structure data in the circum-boreal region, whilst providing, as part of the data collection, adjusted and labelled tree level and vegetation plot level data for machine learning and upscaling practices. Publicly available comprehensive datasets on tree level forest structure are rare, due to the involvement of governmental agencies, public sectors, and private actors that all influence the availability of these datasets.

We present datasets of vegetation composition and tree and plot level forest structure for two important vegetation transition zones in Siberia, Russia; the summergreen–evergreen transition zone in central Yakutia and the tundra–taiga transition zone in Chukotka (NE Siberia). The SiDroForest collection contains a variety of data mainly based on unmanned aerial vehicle (UAV) and field data collected from 64 vegetation plots during fieldwork jointly performed by the Alfred Wegener Institute for Polar and Marine Research (AWI) and the North-Eastern Federal University of Yakutsk (NEFU) during the Chukotka 2018 expedition to Siberia.

The data collection consists of four separate datasets. The fieldwork locations are the anchors that bind the data types together based on the location of the vegetation plot.

i) The first dataset (Kruse et al., 2021, provides UAV-borne data products covering the 64 vegetation plots surveyed during fieldwork: including structure from motion (SfM) point clouds, point-cloud products such as Digital Elevation Model (DEM), Canopy Height Model (CHM), Digital Surface Model (DSM) and Digital Terrain Model (DTM) constructed from Red Green Blue (RGB) and Red Green Near Infrared (RGN) orthomosaics. Forest structure and vegetation composition data are crucial in the assessment of whether a forest is to act as a carbon sink under changing climate conditions. Fieldwork and UAV-products can provide such data in depth.

ii) The second dataset contains spatial data in the form of points and polygon shape files of 872 labelled individual trees and shrubs that were recorded during fieldwork at the same vegetation plots with information on tree height, crown diameter, and species (van Geffen et al., 2021c, These tree- and shrub-individual labelled point and polygon shape files were generated and are located on the UAV RGB orthoimages. The individual number links to the information collected during the expedition such as tree height, crown diameter and vitality provided in table format. This dataset can be used to link individual trees in the SfM point clouds, providing unique insights into the vegetation composition and also allows future monitoring of the individual trees and the contents of the recorded vegetation plots at large. iii) The third dataset contains a synthesis of 10 000 generated images and masks that have the tree crowns of two species of larch (Larix gmelinii and Larix cajanderi) automatically extracted from the RGB UAV images in the common objects in context (COCO) format (van Geffen et al., 2021a, The synthetic dataset was created specifically to detect Siberian larch species.

iv) If publicly available forest-structure datasets at tree level are rarely available for Siberia, even fewer ready-to-use tree and plot level data are available for machine learning approaches, for example optimised data formats containing annotated vegetation categories. The fourth set contains Sentinel-2 Level-2 bottom of atmosphere labelled image patches with seasonal information and annotated vegetation categories covering the vegetation plots (van Geffen et al., 2021b, The dataset is created with the aim of providing a small ready-to use validation and training data set to be used in various vegetation-related machine-learning tasks.

The SidroForest data collection serves a variety of user communities. First of all, the UAV-derived top of canopy structure information, orthomosaics and the detailed vegetation information in the labelled data set provide detailed information on forest type, structure and composition for scientific communities with ecological and biological applications. The detailed Land Cover and Vegetation structure information in the first two data sets are of use for the generation and validation of Land Cover remote sensing products in radar and optical remote sensing. In addition to providing information on forest structure and vegetation composition of the vegetation plots, parts of the SiDroForest dataset are prepared to be used as training and validation data for machine learning purposes. For example, the Synthetic tree crown dataset is generated from the raw UAV images and optimized to be used in neural networks. Furthermore, the fourth SiDroForest data set contains standardized Sentinel-2 labelled image patches that provide training data on vegetation class categories for machine learning classification with JSON labels provided. The SiDroForst data collective serves as a basis to add future data collected during expeditions performed by the Alfred Wegener Institute, creating a larger dataset in the upcoming years that can provide unique insights into remote hard to reach boreal regions of Siberia.

Femke van Geffen et al.

Status: final response (author comments only)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on essd-2021-281', Anonymous Referee #1, 04 Jan 2022
  • RC2: 'Comment on essd-2021-281', Anonymous Referee #2, 14 Mar 2022
  • RC3: 'Comment on essd-2021-281', Anonymous Referee #3, 23 Mar 2022

Femke van Geffen et al.

Data sets

SiDroForest: Orthomosaics, SfM point clouds and products from aerial image data of expedition vegetation plots in 2018 in Central Yakutia and Chukotka, Siberia. Kruse, Stefan; Farkas, Luka; Brieger, Frederic; Geng, Rongwei; Heim, Birgit; Pestryakova, Luidmila A; Zakharov, Evgenii S; Herzschuh, Ulrike; van Geffen, Femke

Femke van Geffen et al.


Total article views: 623 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
476 126 21 623 18 18
  • HTML: 476
  • PDF: 126
  • XML: 21
  • Total: 623
  • BibTeX: 18
  • EndNote: 18
Views and downloads (calculated since 23 Nov 2021)
Cumulative views and downloads (calculated since 23 Nov 2021)

Viewed (geographical distribution)

Total article views: 588 (including HTML, PDF, and XML) Thereof 588 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
Latest update: 17 May 2022
Short summary
SiDroForest is an attempt to remedy data scarcity regarding vegetation data in the circumpolar region, whilst providing adjusted and labelled data for machine learning and upscaling practices. SiDroForest contains four datasets that include SfM point clouds, individually labelled trees, synthetic tree crowns and labelled Sentinel-2 patches that provide insights into the vegetation composition and forest structure of two important vegetation transition zones in Siberia, Russia.