Articles | Volume 18, issue 2
https://doi.org/10.5194/essd-18-1379-2026
© Author(s) 2026. This work is distributed under the Creative Commons Attribution 4.0 License.
GlobalGeoTree: a multi-granular vision-language dataset for global tree species classification
Download
- Final revised paper (published on 24 Feb 2026)
- Preprint (discussion started on 05 Nov 2025)
Interactive discussion
Status: closed
Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor
| : Report abuse
- RC1: 'Comment on essd-2025-613', Anonymous Referee #1, 14 Nov 2025
- RC2: 'Comment on essd-2025-613', Anonymous Referee #2, 29 Nov 2025
- AC1: 'Comment on essd-2025-613', Xiao Xiang Zhu, 14 Jan 2026
Peer review completion
AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload
AR by Xiao Xiang Zhu on behalf of the Authors (14 Jan 2026)
Author's response
Author's tracked changes
Manuscript
ED: Publish as is (25 Jan 2026) by Birgit Heim
AR by Xiao Xiang Zhu on behalf of the Authors (28 Jan 2026)
Manuscript
General Comments:
The authors present a global multi-modal dataset for tree species classification, integrating diverse data sources and offering both a large-scale pretraining dataset and a separate evaluation set. They also propose GeoTreeCLIP, a model that leverages hierarchical label structures and demonstrates improvements over baseline methods. The experimental setup is comprehensive, including comparisons with CLIP-style models and supervised learning approach. All code and data are publicly available.
Specific Comments:
1. Dataset Construction:
1.1 The authors use the JRC Forest Cover Map v1 for filtering. Given that version 2 has been publicly released with documented improvements, is there a reason for not using the updated version?
1.2 The GlobalGeoTree-10kEval set includes 90 species out of over 21,000. Could the authors clarify the selection criteria? Were any sampling or filtering strategies applied to ensure the reliability of the evaluation set, particularly given the inclusion of citizen science sources like iNaturalist?
1.3 While the evaluation set is constructed as a separate test set, there appears to be no explicit validation process to assess its quality. Given the integration of heterogeneous data sources, some form of validation (manual or automated) would greatly enhance the trustworthiness and utility of the dataset.
2. Model (GeoTreeCLIP):
2.1 The authors attribute the performance improvements of GeoTreeCLIP to domain-specific pretraining. However, it’s difficult to isolate the effects of pretraining alone, as other baseline models lack temporal fusion and may differ in how auxiliary data are handled. A more controlled ablation or discussion would strengthen this claim.
3. Evaluation Metrics and Reporting:
3.1 The paper mentions addressing class imbalance by grouping species into frequent, common, and rare categories. However, results are not reported per group. Including group-specific performance would align with common practices in imbalanced classification tasks
3.2 Given the global scope of the dataset and the known regional biases, regional performance breakdowns would be informative and important for understanding model generalizability.
Additional comments:
The authors’ effort in assembling such a large-scale, publicly available dataset and developing a strong benchmark model is highly appreciated. However, since this is a data description paper, the dataset itself should be the focal point. At present, the lack of validation for datasets is a significant limitation. While the work offers valuable contributions for machine learning research, particularly within benchmark or workshop tracks at venues like CVPR or NeurIPS, it may not yet meet the expectations for a journal like ESSD, which prioritizes data quality.