Tracking County-level Cooking Emissions and Their Drivers in China from 1990 to 2021 by Ensemble Machine Learning
Abstract. Cooking emissions are a significant source of PM2.5, posing considerable public health risks due to their high toxicity and proximity to densely populated areas. Despite their importance, there is currently a lack of an accurate, long-term, high-resolution national cooking emission inventory in China, primarily due to the challenges in obtaining high-quality activity level data over extended periods and at fine spatial scales. Here, we address these limitations by leveraging advanced machine learning techniques to predict activity levels and further estimate emissions.
Specifically, we develop an ensemble model of machine learning algorithms—Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Multilayer Perceptron Neural Network (MLP), and Deep Neural Networks (DNN)—to accurately predict cooking activity levels across Chinese counties based on statistical indicators related to population, economy, and the catering industry. The ensemble machine learning model demonstrates exceptional generalization and transferability (R2=0.892–0.989), outperforming traditional statistical models and individual machine learning models. Unlike previous inventories that rely on simplistic proxy data such as population for calculation and downscaling, our inventory directly calculates county-level cooking emissions, providing more accurate emission estimates and spatial distributions. Furthermore, we incorporate critical but previously missing toxic pollutants, such as ultrafine particles (UFPs) and polycyclic aromatic hydrocarbons (PAHs), into the national cooking emission inventory. Therefore, we develop China's first county-level cooking emission inventory, spanning from 1990 to 2021, with high spatial resolution and wide pollutant coverage.
According to our inventory, in 2021, China’s total cooking emissions of organics in the full volatility range, PM2.5, UFPs, and PAHs are 997 kt, 408 kt, 6.50 × 1025 particles, and 15.8 kt, respectively. From 1990 to 2021, emissions of these pollutants increased by over 65 %, and their spatiotemporal trends were affected to varying degrees by external factors, such as population migration, economic development, pollution control policies, and the pandemic at different periods. Using the SHapley Additive exPlanations (SHAP) algorithm, we further analyze the contribution patterns of key driving factors, such as urbanization rate, population, and local emission factors, to emission changes. Notably, driver analysis reveals that existing control measures are insufficient to curb the rapid growth of emissions, necessitating enhanced controls. Regarding control strategies, our county-level inventory finds that 62.3 % of the China’s organic emissions are concentrated in 30 % of the counties, which are densely populated and occupy only 14.4 % of the national land area. Therefore, prioritizing control of these areas will be an efficient and targeted strategy. Our research provides crucial data and insights for understanding the impact of cooking emissions on air pollution and health, aiding in policy development. Our long-term, high-resolution emission datasets are publicly available at https://doi.org/10.6084/m9.figshare.26085487 (Li et al, 2025).