Preprints
https://doi.org/10.5194/essd-2026-306
https://doi.org/10.5194/essd-2026-306
08 May 2026
 | 08 May 2026
Status: this preprint is currently under review for the journal ESSD.

A long-term consistent socioeconomic dataset of Chinese cities generated by Bayesian spatiotemporal modeling with multi-source Earth observations

Zhangying Tang, Xianteng Tang, Lingfeng Liao, Guoqiang Yan, Zhenyan Wang, Yuju Wu, Mingyu Xie, Yumeng Zhang, Chengwu Wang, Zhoufeng Wang, Yangting Zeng, Chao Song, and Jay Pan

Abstract. Within the Healthy Cities and Sustainable Development Goals (SDGs) agendas, socioeconomic data are fundamental for tracking regional development. China, however, lacks a complete, long-term subnational socioeconomic dataset due to severe spatiotemporal missingness in official statistical yearbooks. We compiled 35 official socioeconomic indicators for 366 Chinese cities from 2000 to 2021, incorporated remote-sensing-derived covariates as auxiliary information, and applied a Bayesian spatiotemporal interacting varying intercepts (BSTIVI) model to capture the target variables’ spatial, temporal, and coupled spatiotemporal dependence. Model performance was evaluated using global Bayesian criteria and cross-validation, while local error distributions and temporal trends were visualized to examine imputation outcomes. Based on the completed dataset, we further derived a composite development index using entropy weighting and assessed spatial inequality with the Gini coefficient, coefficient of variation and hotspot analysis. The results show that BSTIVI achieved markedly better fit than traditional multiple linear regression (MLR). In cross-validation, 32 of 35 indicators achieved R2 >= 0.95, RMSE and MAE remained low. The resulting data product showed strong imputation performance in both spatial and temporal dimensions. Analyses of the completed dataset revealed marked spatial inequality and clustering in urban socioeconomic development across China during 2000–2021. We ultimately produced the first long-term city-level socioeconomic dataset for China, comprising 35 indicators and one composite index, with Bayesian credible intervals for imputed values. This study provides both a new city-level data resource for China and a transferable framework for imputing missing subnational socioeconomic data worldwide, thereby supporting Earth system research and SDG implementation.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Zhangying Tang, Xianteng Tang, Lingfeng Liao, Guoqiang Yan, Zhenyan Wang, Yuju Wu, Mingyu Xie, Yumeng Zhang, Chengwu Wang, Zhoufeng Wang, Yangting Zeng, Chao Song, and Jay Pan

Status: open (until 14 Jun 2026)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Zhangying Tang, Xianteng Tang, Lingfeng Liao, Guoqiang Yan, Zhenyan Wang, Yuju Wu, Mingyu Xie, Yumeng Zhang, Chengwu Wang, Zhoufeng Wang, Yangting Zeng, Chao Song, and Jay Pan

Data sets

City-Level Socioeconomic Indicators and Composite Development Index for China, 2000-2021 Zhangying Tang, Xianteng Tang, Lingfeng Liao, Guoqiang Yan, Zhenyan Wang, Yuju Wu, Mingyu Xie, Yumeng Zhang, Chengwu Wang, Zhoufeng Wang, Yangting Zeng, Chao Song, and Jay Pan https://doi.org/10.5281/zenodo.18217116

Zhangying Tang, Xianteng Tang, Lingfeng Liao, Guoqiang Yan, Zhenyan Wang, Yuju Wu, Mingyu Xie, Yumeng Zhang, Chengwu Wang, Zhoufeng Wang, Yangting Zeng, Chao Song, and Jay Pan
Metrics will be available soon.
Latest update: 08 May 2026
Download
Short summary
China’s city statistics often have major gaps, making it hard to track long-term socioeconomic change. We combined official records with satellite observations to create a public dataset for 366 cities from 2000 to 2021, covering 35 socioeconomic indicators and uncertainty estimates. The dataset reveals strong regional inequality and provides a new resource for research on urbanization, sustainability, and city development.
Share
Altmetrics