Preprints
https://doi.org/10.5194/essd-2026-57
https://doi.org/10.5194/essd-2026-57
25 Feb 2026
 | 25 Feb 2026
Status: this preprint is currently under review for the journal ESSD.

Democratizing planetary-scale analysis: An ultra-lightweight Earth embedding database for accurate and flexible global land monitoring

Shuang Chen, Jie Wang, Shuai Yuan, Jiayang Li, Yu Xia, Yuanhong Liao, Junbo Wei, Jincheng Yuan, Xiaoqing Xu, Xiaolin Zhu, Peng Zhu, Hongsheng Zhang, Yuyu Zhou, Haohuan Fu, Huabing Huang, Bin Chen, Fan Dai, and Peng Gong

Abstract. The rapid evolution of satellite-borne Earth Observation (EO) systems has fundamentally revolutionized terrestrial monitoring, yielding comprehensive petabyte-scale archives. However, the immense computational resources and storage volumes required for global-scale analysis often preclude widespread use by many research teams, hindering broader scientific adoption and the execution of planetary-scale studies. To address these barriers, we present the Embedded Seamless Data (ESD), an ultra-lightweight, 30-m global Earth embedding database spanning the 25-year period from 2000 to 2024.

By transforming high-dimensional, multi-sensor observations from the Landsat series (5, 7, 8, and 9) and MODIS Terra into information-dense, quantized latent vectors, ESD distils essential geophysical and semantic features into a unified latent space. Utilizing the ESDNet architecture and Finite Scalar Quantization (FSQ), the dataset achieves a transformative ~340-fold reduction in data volume compared to raw daily archives. This compression allows the entire global land surface for a single year to be encapsulated within approximately 2.4 TB, enabling decadal-scale global analysis on standard local workstations.

Rigorous validation demonstrates that ESD maintains high reconstructive fidelity to the original reflectance values across the spectral dimension, achieving a Mean Absolute Error (MAE) of 0.0130 (averaged over six spectral bands, including Blue, Green, Red, NIR, SWIR1, and SWIR2), a Root Mean Square Error (RMSE) of 0.0179, and a Correlation Coefficient (CC) of 0.8543. By condensing the annual phenological cycle into 12 temporal latent steps, the embeddings provide inherent denoising effects and a semantically organized latent space that outperforms raw reflectance data in downstream land-cover classification tasks, achieving a comparable and even higher overall accuracy of 79.74 % than the 76.92 % obtained using raw sensor fusion data on globally distributed land cover sample sets. With robust few-shot learning capabilities and longitudinal consistency across 25 years, the ESD product provides a versatile foundation for democratizing planetary-scale Earth system research and advancing next-generation geospatial artificial intelligence.

Competing interests: At least one of the (co-)authors is a member of the editorial board of Earth System Science Data.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Shuang Chen, Jie Wang, Shuai Yuan, Jiayang Li, Yu Xia, Yuanhong Liao, Junbo Wei, Jincheng Yuan, Xiaoqing Xu, Xiaolin Zhu, Peng Zhu, Hongsheng Zhang, Yuyu Zhou, Haohuan Fu, Huabing Huang, Bin Chen, Fan Dai, and Peng Gong

Status: open (until 03 Apr 2026)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Shuang Chen, Jie Wang, Shuai Yuan, Jiayang Li, Yu Xia, Yuanhong Liao, Junbo Wei, Jincheng Yuan, Xiaoqing Xu, Xiaolin Zhu, Peng Zhu, Hongsheng Zhang, Yuyu Zhou, Haohuan Fu, Huabing Huang, Bin Chen, Fan Dai, and Peng Gong

Data sets

Embedded Seamless Data: An ultra-lightweight Earth embedding database for accurate and flexible global land monitoring Shuang Chen https://doi.org/10.12436/iEarth.0000.20251229.000064.v1

Shuang Chen, Jie Wang, Shuai Yuan, Jiayang Li, Yu Xia, Yuanhong Liao, Junbo Wei, Jincheng Yuan, Xiaoqing Xu, Xiaolin Zhu, Peng Zhu, Hongsheng Zhang, Yuyu Zhou, Haohuan Fu, Huabing Huang, Bin Chen, Fan Dai, and Peng Gong
Metrics will be available soon.
Latest update: 25 Feb 2026
Download
Short summary
Monitoring our planet with satellites produces massive datasets too large for researchers to handle. We created a new global database that condenses 25 years of Landsat and MODIS observations into a highly efficient format (Analysis-ready embedding vectors) using AI. By reducing data size by over 340 times while maintaining high accuracy, we allow global studies to be run on standard PC. Makes planetary research accessible to everyone and helps us better track environmental changes over time.
Share
Altmetrics