A 100-m gridded population dataset of China’s seventh census using ensemble learning and geospatial big data
Abstract. China has undergone rapid urbanization and internal migration in past years and its up-to-date gridded population datasets are essential for diverse applications. Existing datasets for China, however, suffer from either outdatedness or failure to incorporate the latest seventh national population census data conducted in 2020. In this study, we develop a novel population downscaling approach that leverages stacking ensemble learning and geospatial big data to produce up-to-date population grids at a 100-m resolution for China from the seventh census data at both county and town levels. The proposed approach employs random forest, XGBoost, and LightGBM as base models for stacking ensemble learning and delineates the inhabited areas from geospatial big data to enhance the gridded population estimation. Experimental results demonstrate that the proposed approach exhibits the best fit performance compared to individual base models. Meanwhile, the out-of-sample town-level test set indicates that the estimated gridded population dataset (R2=0.8936) is more accurate than existing WorldPop (R2=0.7427) and LandScan (R2=0.7165) products for China in 2020. Furthermore, with the inhabited areas enhancement, the spatial distribution of population grids is more reasonable intuitively than the two existing products. Hence, the proposed population downscaling approach provides a valuable option for producing gridded population datasets. The estimated 100-m gridded population dataset of China holds great significance for future applications and it is publicly available at https://figshare.com/s/d9dd5f9bb1a7f4fd3734 (Chen et al., 2024).
Viewed (geographical distribution)