the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
CY-Bench: A comprehensive benchmark dataset for sub-national crop yield forecasting
Abstract. In-season, pre-harvest crop yield forecasts are essential for enhancing transparency in commodity markets and improving food security. They play a key role in increasing resilience to climate change and extreme events and thus contribute to the United Nations’ Sustainable Development Goal 2 of zero hunger. Pre-harvest crop yield forecasting is a complex task, as several interacting factors contribute to yield formation, including in-season weather variability, extreme events, long-term climate change, soil, pests, diseases and farm management decisions. Several modeling approaches have been employed to capture complex interactions among such predictors and crop yields. Prior research for in-season, pre-harvest crop yield forecasting has primarily been case-study based, which makes it difficult to compare modeling approaches and measure progress systematically. To address this gap, we introduce CY-Bench (Crop Yield Benchmark), a comprehensive dataset and benchmark to forecast maize and wheat yields at a global scale. CY-Bench was conceptualized and developed within the Machine Learning team of the Agricultural Model Intercomparison and Improvement Project (AgML) in collaboration with agronomists, climate scientists, and machine learning researchers. It features publicly available sub-national yield statistics and relevant predictors—such as weather data, soil characteristics, and remote sensing indicators—that have been pre-processed, standardized, and harmonized across spatio-temporal scales. With CY-Bench, we aim to: (i) establish a standardized framework for developing and evaluating data-driven models across diverse farming systems in more than 25 countries across six continents; (ii) enable robust and reproducible model comparisons that address real-world operational challenges; (iii) provide an openly accessible dataset to the earth system science and machine learning communities, facilitating research on time series forecasting, domain adaptation, and online learning. The dataset (https://doi.org/10.5281/zenodo.11502142, (Paudel et al., 2025a)) and accompanying code (https://github.com/WUR-AI/AgML-CY-Bench, (Paudel et al., 2025b))) are openly available to support the continuous development of advanced data driven models for crop yield forecasting to enhance decision-making on food security.
- Preprint
(9762 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 18 Apr 2025)
-
AC1: 'Correction to Author List', Michiel Kallenberg, 13 Mar 2025
reply
Due to an oversight, three contributing authors were inadvertently omitted from the author list. Their names and affiliations are as follows:
- Dainius Masiliūnas, Wageningen University & Research
- Allard de Wit, Wageningen University & Research
- Maximilian Zachow, Technical University of Munich
They will be included in the next revision of the manuscript.
Citation: https://doi.org/10.5194/essd-2025-83-AC1
Data sets
CY-Bench: A comprehensive benchmark dataset for subnational crop yield forecasting D. Paudel et al. https://doi.org/10.5281/zenodo.11502142
Model code and software
CY-Bench: A comprehensive benchmark dataset for subnational crop yield forecasting D. Paudel et al. https://github.com/WUR-AI/AgML-CY-Bench/
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
39 | 7 | 0 | 46 | 0 | 0 |
- HTML: 39
- PDF: 7
- XML: 0
- Total: 46
- BibTeX: 0
- EndNote: 0
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1