Preprints
https://doi.org/10.5194/essd-2025-613
https://doi.org/10.5194/essd-2025-613
05 Nov 2025
 | 05 Nov 2025
Status: this preprint is currently under review for the journal ESSD.

GlobalGeoTree: A Multi-Granular Vision-Language Dataset for Global Tree Species Classification

Yang Mu, Zhitong Xiong, Yi Wang, Muhammad Shahzad, Franz Essl, Holger Kreft, Mark van Kleunen, and Xiao Xiang Zhu

Abstract. Global tree species mapping using remote sensing data is vital for biodiversity monitoring, forest management, and ecological research. However, progress in this field has been constrained by the scarcity of large-scale, labeled datasets. To address this, we introduce GlobalGeoTree – a comprehensive global dataset for tree species classification. GlobalGeoTree comprises 6.3 million geolocated tree occurrences, spanning 275 families, 2,734 genera, and 21,001 species across the hierarchical taxonomic levels. Each sample is paired with Sentinel-2 image time series and 27 auxiliary environmental variables, encompassing bioclimatic, geographic, and soil data. The dataset is partitioned into GlobalGeoTree-6M, a large subset for model pretraining, and curated evaluation subsets, primarily GlobalGeoTree-10kEval, a benchmark for zero-shot and few-shot classification. To demonstrate the utility of the dataset, we introduce a baseline model, GeoTreeCLIP, which leverages paired remote sensing data and taxonomic text labels within a vision-language framework pretrained on GlobalGeoTree-6M. Experimental results show that GeoTreeCLIP achieves substantial improvements in zero- and few-shot classification on GlobalGeoTree-10kEval over existing advanced models. By making the dataset, models, and code publicly available, we aim to establish a benchmark to advance tree species classification and foster innovation in biodiversity research and ecological applications. The code is publicly available at https://github.com/MUYang99/GlobalGeoTree, and the GlobalGeoTree dataset is available at https://huggingface.co/datasets/yann111/GlobalGeoTree.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Yang Mu, Zhitong Xiong, Yi Wang, Muhammad Shahzad, Franz Essl, Holger Kreft, Mark van Kleunen, and Xiao Xiang Zhu

Status: open (until 12 Dec 2025)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Yang Mu, Zhitong Xiong, Yi Wang, Muhammad Shahzad, Franz Essl, Holger Kreft, Mark van Kleunen, and Xiao Xiang Zhu

Data sets

GlobalGeoTree: A Multi-Granular Vision- Language Dataset for Global Tree Species Classification Y. Mu et al. https://doi.org/10.15468/dd.9qxqyy

Model code and software

GlobalGeoTree: A Multi-Granular Vision- Language Dataset for Global Tree Species Classification Y. Mu et al. https://github.com/MUYang99/GlobalGeoTree

Yang Mu, Zhitong Xiong, Yi Wang, Muhammad Shahzad, Franz Essl, Holger Kreft, Mark van Kleunen, and Xiao Xiang Zhu
Metrics will be available soon.
Latest update: 05 Nov 2025
Download
Short summary
To better protect our planet's forests, we need to know what trees are where. We created GlobalGeoTree, a massive public dataset linking 6.3 million tree locations worldwide with satellite data. This dataset helps computers learn to identify tree species from space, supporting biodiversity monitoring and climate action. Our baseline model shows this is a promising path to understanding global forests.
Share
Altmetrics