06 Feb 2024
 | 06 Feb 2024
Status: this preprint is currently under review for the journal ESSD.

A 100-m gridded population dataset of China’s seventh census using ensemble learning and geospatial big data

Yuehong Chen, Congcong Xu, Yong Ge, Xiaoxiang Zhang, and Ya'nan Zhou

Abstract. China has undergone rapid urbanization and internal migration in past years and its up-to-date gridded population datasets are essential for diverse applications. Existing datasets for China, however, suffer from either outdatedness or failure to incorporate the latest seventh national population census data conducted in 2020. In this study, we develop a novel population downscaling approach that leverages stacking ensemble learning and geospatial big data to produce up-to-date population grids at a 100-m resolution for China from the seventh census data at both county and town levels. The proposed approach employs random forest, XGBoost, and LightGBM as base models for stacking ensemble learning and delineates the inhabited areas from geospatial big data to enhance the gridded population estimation. Experimental results demonstrate that the proposed approach exhibits the best fit performance compared to individual base models. Meanwhile, the out-of-sample town-level test set indicates that the estimated gridded population dataset (R2=0.8936) is more accurate than existing WorldPop (R2=0.7427) and LandScan (R2=0.7165) products for China in 2020. Furthermore, with the inhabited areas enhancement, the spatial distribution of population grids is more reasonable intuitively than the two existing products. Hence, the proposed population downscaling approach provides a valuable option for producing gridded population datasets. The estimated 100-m gridded population dataset of China holds great significance for future applications and it is publicly available at (Chen et al., 2024).

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.
Yuehong Chen, Congcong Xu, Yong Ge, Xiaoxiang Zhang, and Ya'nan Zhou

Status: final response (author comments only)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on essd-2023-541', Anonymous Referee #1, 06 Apr 2024
  • CC1: 'Comment on essd-2023-541', Lingling Li, 09 Apr 2024
  • RC2: 'Comment on essd-2023-541', Anonymous Referee #2, 28 Apr 2024
Yuehong Chen, Congcong Xu, Yong Ge, Xiaoxiang Zhang, and Ya'nan Zhou
Yuehong Chen, Congcong Xu, Yong Ge, Xiaoxiang Zhang, and Ya'nan Zhou


Total article views: 1,363 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
1,030 277 56 1,363 39 43
  • HTML: 1,030
  • PDF: 277
  • XML: 56
  • Total: 1,363
  • BibTeX: 39
  • EndNote: 43
Views and downloads (calculated since 06 Feb 2024)
Cumulative views and downloads (calculated since 06 Feb 2024)

Viewed (geographical distribution)

Total article views: 1,337 (including HTML, PDF, and XML) Thereof 1,337 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
Latest update: 20 May 2024
Short summary
Population data is crucial for human-nature interactions. Gridded population data can address limitations of census data in irregular units. In China, rapid urbanization necessitates timely and accurate population grids. However, existing datasets for China are either outdated or lack recent census data. Hence, a novel approach was developed to disaggregate China’s seventh census data into 100-m population grids. The resulting dataset outperformed exising LandScan and WorldPop datasets.