Preprints
https://doi.org/10.5194/essd-2026-284
https://doi.org/10.5194/essd-2026-284
12 May 2026
 | 12 May 2026
Status: this preprint is currently under review for the journal ESSD.

NortheastChinaMaizeYield10m: A 10-m Resolution Maize Yield Dataset for Northeast China (2019–2024) Generated via a Mechanistically Interpretable, Label-free Framework

Jingbo Hu, Xin Du, Qiangzi Li, Yuan Zhang, Hongyan Wang, Jiansong Luo, Jingyuan Xu, Yachao Zhao, Zhaoming Zhang, Yong Dong, and Yunqi Shen

Abstract. In the face of escalating global food demand and increasing climate variability, precise and granular crop yield monitoring is indispensable for maintaining regional agricultural stability. However, current deep learning approaches for yield estimation are severely constrained by their heavy reliance on massive in situ labeled data, which limits their application in data-scarce regions. Furthermore, these models often overlook the essential temporal evolution logic of yield formation and lack a systematic discussion regarding the contribution patterns of different feature dimensions, resulting in a black-box nature of the underlying model mechanisms. To address these bottlenecks, this study proposes a label-free maize yield estimation framework that couples mechanistic models with deep learning. The framework’s core strength lies in a physiologically complete simulation database, using the WOFOST model to exhaustively cover 30 years of climate variability and habitat combinations across Northeast China (1.24 × 10⁶ km²). A Gated Recurrent Unit (GRU) network was then introduced for end-to-end modeling, accurately capturing the energy accumulation trajectory from vegetative to reproductive growth. Validation against 458 independent ground points (2022–2024) demonstrated robust generalization with an R² of 0.69, an RMSE of 1.21 t/ha, and an RRMSE of 13.71 %, despite using no ground data for training. Our analysis revealed that integrating photosynthetic intensity (LAImean), duration (LAD) and peak features (LAImax) across growth stages is critical for accuracy, while omitting early-stage features significantly impairs the model's ability to capture cumulative growth effects. Furthermore, the model successfully captured the spatiotemporal yield anomalies caused by the 2023 typhoon and flooding events. Ultimately, this study generated a 10-m resolution maize yield dataset (2019–2024) for Northeast China. The dataset exhibits consistent interannual stability, with the Root Mean Square Error (RMSE) ranging from 7.98 % to 22.21 % and the coefficient of determination (R2) remaining above 0.44 at the county level. By deeply coupling mechanistic simulation with data mining, this dataset provides detailed support for optimizing agricultural production and guiding farming practices. The Northeast China Maize Yield 10-m dataset is openly available at https://zenodo.org/records/19547014 (Hu et al., 2026).

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Jingbo Hu, Xin Du, Qiangzi Li, Yuan Zhang, Hongyan Wang, Jiansong Luo, Jingyuan Xu, Yachao Zhao, Zhaoming Zhang, Yong Dong, and Yunqi Shen

Status: open (until 18 Jun 2026)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Jingbo Hu, Xin Du, Qiangzi Li, Yuan Zhang, Hongyan Wang, Jiansong Luo, Jingyuan Xu, Yachao Zhao, Zhaoming Zhang, Yong Dong, and Yunqi Shen

Data sets

NortheastChinaMaizeYield10m: A 10-m Resolution Maize Yield Dataset for Northeast China (2019–2024) Generated via a Mechanistically Interpretable, Label-free Framework Jingbo Hu https://zenodo.org/records/19547014

Jingbo Hu, Xin Du, Qiangzi Li, Yuan Zhang, Hongyan Wang, Jiansong Luo, Jingyuan Xu, Yachao Zhao, Zhaoming Zhang, Yong Dong, and Yunqi Shen
Metrics will be available soon.
Latest update: 13 May 2026
Download
Short summary
Accurate crop yield data is vital for food security but often hard to get due to a lack of field records. We developed a new way to map maize yields by combining plant growth simulations with satellite images, requiring no ground data for training. We produced a high-resolution map of Northeast China from 2019 to 2024. This tool successfully identified crop losses from major floods, helping to guide farming decisions and disaster relief without the need for expensive field surveys.
Share
Altmetrics