Articles | Volume 17, issue 3
https://doi.org/10.5194/essd-17-1245-2025
https://doi.org/10.5194/essd-17-1245-2025
Data description paper
 | 
24 Mar 2025
Data description paper |  | 24 Mar 2025

ChatEarthNet: a global-scale image–text dataset empowering vision–language geo-foundation models

Zhenghang Yuan, Zhitong Xiong, Lichao Mou, and Xiao Xiang Zhu

Related authors

GlobalBuildingAtlas: An Open Global and Complete Dataset of Building Polygons, Heights and LoD1 3D Models
Xiao Xiang Zhu, Sining Chen, Fahong Zhang, Yilei Shi, and Yuanyuan Wang
Earth Syst. Sci. Data Discuss., https://doi.org/10.5194/essd-2025-327,https://doi.org/10.5194/essd-2025-327, 2025
Preprint under review for ESSD
Short summary
Physics-aware machine learning for glacier ice thickness estimation: a case study for Svalbard
Viola Steidl, Jonathan Louis Bamber, and Xiao Xiang Zhu
The Cryosphere, 19, 645–661, https://doi.org/10.5194/tc-19-645-2025,https://doi.org/10.5194/tc-19-645-2025, 2025
Short summary
Learning Building Floor Numbers from Crowdsourced Streetview Images
Yifan Tian, Yao Sun, and Xiao Xiang Zhu
Abstr. Int. Cartogr. Assoc., 7, 171, https://doi.org/10.5194/ica-abs-7-171-2024,https://doi.org/10.5194/ica-abs-7-171-2024, 2024
Calving front monitoring at a subseasonal resolution: a deep learning application for Greenland glaciers
Erik Loebel, Mirko Scheinert, Martin Horwath, Angelika Humbert, Julia Sohn, Konrad Heidler, Charlotte Liebezeit, and Xiao Xiang Zhu
The Cryosphere, 18, 3315–3332, https://doi.org/10.5194/tc-18-3315-2024,https://doi.org/10.5194/tc-18-3315-2024, 2024
Short summary
Towards Sustainable Urban Energy: A Robust Deep Learning Framework for Solar Potential Estimation
Weiyan Lin, Jiasong Zhu, Yuansheng Hua, Qingyu Li, Lichao Mou, and Xiao Xiang Zhu
Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLVIII-1-2024, 371–378, https://doi.org/10.5194/isprs-archives-XLVIII-1-2024-371-2024,https://doi.org/10.5194/isprs-archives-XLVIII-1-2024-371-2024, 2024

Cited articles

Bai, J., Bai, S., Yang, S., Wang, S., Tan, S., Wang, P., Lin, J., Zhou, C., and Zhou, J.: Qwen-vl: Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond, arXiv [preprint], https://doi.org/10.48550/arXiv.2308.12966, 2023. a
Bastani, F., Wolters, P., Gupta, R., Ferdinando, J., and Kembhavi, A.: SatlasPretrain: A large-scale dataset for remote sensing image understanding, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023, 16772–16782, https://doi.org/10.1109/ICCV51070.2023.01538, 2023. a, b
Chen, J., Zhu, D., Shen, X., Li, X., Liu, Z., Zhang, P., Krishnamoorthi, R., Chandra, V., Xiong, Y., and Elhoseiny, M.: MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning, arXiv [preprint], https://doi.org/10.48550/arXiv.2310.09478, 2023. a, b
Cheng, G., Han, J., and Lu, X.: Remote sensing image scene classification: Benchmark and state of the art, P. IEEE, 105, 1865–1883, https://doi.org/10.1109/JPROC.2017.2675998, 2017. a
Cheng, Q., Huang, H., Xu, Y., Zhou, Y., Li, H., and Wang, Z.: NWPU-captions dataset and MLCA-Net for remote sensing image captioning, IEEE T. Geosci. Remote, 60, 5629419, https://doi.org/10.1109/TGRS.2022.3201474, 2022. a, b, c
Download
Short summary
ChatEarthNet is an image–text dataset that provides high-quality, detailed natural language descriptions for global-scale satellite data. It consists of 163 488 image-text pairs with captions generated by ChatGPT-3.5 and an additional 10 000 image-text pairs with captions generated by ChatGPT-4V(ision). This dataset has significant potential for training and evaluating vision–language geo-foundation models in remote sensing.
Share
Altmetrics
Final-revised paper
Preprint