Preprints
https://doi.org/10.5194/essd-2025-710
https://doi.org/10.5194/essd-2025-710
29 Jan 2026
 | 29 Jan 2026
Status: this preprint is currently under review for the journal ESSD.

BuildingSense: a new multimodal building function classification dataset

Pengxiang Su, Ruifei Chen, Heng Xu, Wei Huang, Xinling Deng, Songnian Li, Wanglin Yan, Hangbin Wu, and Chun Liu

Abstract. Building function is a description of building usage. The accessibility of its information is essential for urban research, including urban morphology, urban environment, and human activity patterns. Existing building function classification methodologies face two major bottlenecks: (1) poor model interpretability and (2) inadequate multimodal feature fusion. Although large models with strong interpretability and efficient multimodal data fusion capabilities offer promising potential for addressing the bottlenecks, they remain limited in processing multimodal spatial datasets. Their performance in building function classification is therefore also unknown. To the best of our knowledge, there is a lack of multimodal building function classification datasets, which results in the challenge of effectively performing their performance evaluation. Meanwhile, prevailing building function categorization schemes remain coarse, which hinders their ability to support finer-grained urban research in the future. To bridge the gap, we constructed a novel multimodal and fine-grained dataset—BuildingSense—for building function classification. Based on BuildingSense, we evaluated the performance of four state-of-the-art large models from the perspective of classification outcomes and reasoning processes. The results demonstrate that large models can effectively comprehend multimodal spatial data, challenging the conventional concept. Based on that, three directions for future research can be key: (1) build a categorized inference example database, (2) develop cost-effective classification models, and (3) quantify the confidence of model outputs. Our findings not only provide insights into the development of subsequent large model-based classification methods but also contribute to the advancement of multimodal fusion-based classification methods. The dataset and code of this paper can be accessed through https://doi.org/10.6084/m9.figshare.30645776.v2 (Su et al., 2025a).

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Pengxiang Su, Ruifei Chen, Heng Xu, Wei Huang, Xinling Deng, Songnian Li, Wanglin Yan, Hangbin Wu, and Chun Liu

Status: open (until 07 Mar 2026)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Pengxiang Su, Ruifei Chen, Heng Xu, Wei Huang, Xinling Deng, Songnian Li, Wanglin Yan, Hangbin Wu, and Chun Liu

Data sets

BuildingSense-A multimodal building function classification dataset Pengxiang Su, Runfei Chen, Heng Xu, Wei Huang, Xinling Deng, Wanglin Yan, Songnian Li, Hangbin Wu, Chun Liu https://figshare.com/s/dc6aada5afa0d620a79f

Model code and software

BuildingSense-A multimodal building function classification dataset Pengxiang Su, Runfei Chen, Heng Xu, Wei Huang, Xinling Deng, Wanglin Yan, Songnian Li, Hangbin Wu, Chun Liu https://figshare.com/s/dc6aada5afa0d620a79f

Pengxiang Su, Ruifei Chen, Heng Xu, Wei Huang, Xinling Deng, Songnian Li, Wanglin Yan, Hangbin Wu, and Chun Liu

Viewed

Total article views: 69 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
53 15 1 69 4 3 0
  • HTML: 53
  • PDF: 15
  • XML: 1
  • Total: 69
  • Supplement: 4
  • BibTeX: 3
  • EndNote: 0
Views and downloads (calculated since 29 Jan 2026)
Cumulative views and downloads (calculated since 29 Jan 2026)

Viewed (geographical distribution)

Total article views: 66 (including HTML, PDF, and XML) Thereof 66 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 30 Jan 2026
Download
Short summary
The accessibility of building function is essential for urban research. We reviewed the recent work and concluded three limitations: few open-source datasets, coarse building function categories, and poor model interpretability with inadequate multimodal feature fusion. Thus, we created BuildingSense with fine-grained categories and multimodal data, and proved that the large model can be used for improving the interpretability of results, with three directions for enhancing their performance.
Share
Altmetrics