Democratizing planetary-scale analysis: An ultra-lightweight Earth embedding database for accurate and flexible global land monitoring
Abstract. The rapid evolution of satellite-borne Earth Observation (EO) systems has fundamentally revolutionized terrestrial monitoring, yielding comprehensive petabyte-scale archives. However, the immense computational resources and storage volumes required for global-scale analysis often preclude widespread use by many research teams, hindering broader scientific adoption and the execution of planetary-scale studies. To address these barriers, we present the Embedded Seamless Data (ESD), an ultra-lightweight, 30-m global Earth embedding database spanning the 25-year period from 2000 to 2024.
By transforming high-dimensional, multi-sensor observations from the Landsat series (5, 7, 8, and 9) and MODIS Terra into information-dense, quantized latent vectors, ESD distils essential geophysical and semantic features into a unified latent space. Utilizing the ESDNet architecture and Finite Scalar Quantization (FSQ), the dataset achieves a transformative ~340-fold reduction in data volume compared to raw daily archives. This compression allows the entire global land surface for a single year to be encapsulated within approximately 2.4 TB, enabling decadal-scale global analysis on standard local workstations.
Rigorous validation demonstrates that ESD maintains high reconstructive fidelity to the original reflectance values across the spectral dimension, achieving a Mean Absolute Error (MAE) of 0.0130 (averaged over six spectral bands, including Blue, Green, Red, NIR, SWIR1, and SWIR2), a Root Mean Square Error (RMSE) of 0.0179, and a Correlation Coefficient (CC) of 0.8543. By condensing the annual phenological cycle into 12 temporal latent steps, the embeddings provide inherent denoising effects and a semantically organized latent space that outperforms raw reflectance data in downstream land-cover classification tasks, achieving a comparable and even higher overall accuracy of 79.74 % than the 76.92 % obtained using raw sensor fusion data on globally distributed land cover sample sets. With robust few-shot learning capabilities and longitudinal consistency across 25 years, the ESD product provides a versatile foundation for democratizing planetary-scale Earth system research and advancing next-generation geospatial artificial intelligence.
Competing interests: At least one of the (co-)authors is a member of the editorial board of Earth System Science Data.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.