CLRD-GLPS: A Long-term Seasonal Dataset of Ruminant Livestock Distribution in China's Grazing Production Systems (2000–2021) Using Stacking-based Interpretable Machine Learning
Abstract. Understanding the spatial-temporal distribution of grazing livestock is crucial for assessing livestock system sustainability, managing animal diseases, mitigating climate change risks, and controlling greenhouse gas emissions. In China, grazing ruminants are predominantly distributed across vast grasslands in semi-humid and alpine regions. However, existing gridded livestock distribution datasets fail to distinguish between grazing and other livestock production systems and do not simultaneously account for long-term and seasonal dynamics. This study introduces CLRD-GLPS, a comprehensive dataset mapping China's ruminant livestock distribution in grazing livestock production systems from 2000 to 2021. Our approach addresses limitations in existing datasets by integrating interpretable machine learning methods to segment grazing livestock from total livestock populations and generate seasonal grazing pastures with dynamic grazing suitability masks. We developed a stacking-based ensemble methodology that enhances predictive performance while providing insights into distribution mechanisms. The stacking ensemble models demonstrate robust performance through 5-fold cross-validation, with R² values ranging from 0.909 to 0.967 for cattle and 0.874 to 0.914 for sheep and goats. Validation results demonstrated the high accuracy of CLRD-GLPS across multiple spatial scales. At the county level, it strongly agreed with census data, effectively capturing grazing livestock distribution. City-level validation confirmed strong agreement (R² = 0.691–0.881), while grid-level validation using independent observations yielded R² = 0.79, further confirming the accuracy of fine-resolution predictions. The CLRD-GLPS dataset provides essential information for understanding grazing ruminant dynamics and developing targeted livestock management policies. Furthermore, our methodological framework offers a template for creating similar livestock distribution datasets for other regions and livestock production systems.