Preprints
https://doi.org/10.5194/essd-2025-83
https://doi.org/10.5194/essd-2025-83
12 Mar 2025
 | 12 Mar 2025
Status: this preprint is currently under review for the journal ESSD.

CY-Bench: A comprehensive benchmark dataset for sub-national crop yield forecasting

Dilli Paudel, Michiel Kallenberg, Stella Ofori-Ampofo, Hilmy Baja, Ron van Bree, Aike Potze, Pratishtha Poudel, Abdelrahman Saleh, Weston Anderson, Malte von Bloh, Andres Castellano, Oumnia Ennaji, Raed Hamed, Rahel Laudien, Donghoon Lee, Inti Luna, Michele Meroni, Janet Mumo Mutuku, Siyabusa Mkuhlani, Jonathan Richetti, Alex C. Ruane, Ritvik Sahajpal, Guanyuan Shai, Vasileios Sitokonstantinou, Rogério de Souza Nóia Júnior, Amit Kumar Srivastava, Robert Strong, Lily-belle Sweet, Petar Vojnovic, and Ioannis N. Athanasiadis

Abstract. In-season, pre-harvest crop yield forecasts are essential for enhancing transparency in commodity markets and improving food security. They play a key role in increasing resilience to climate change and extreme events and thus contribute to the United Nations’ Sustainable Development Goal 2 of zero hunger. Pre-harvest crop yield forecasting is a complex task, as several interacting factors contribute to yield formation, including in-season weather variability, extreme events, long-term climate change, soil, pests, diseases and farm management decisions. Several modeling approaches have been employed to capture complex interactions among such predictors and crop yields. Prior research for in-season, pre-harvest crop yield forecasting has primarily been case-study based, which makes it difficult to compare modeling approaches and measure progress systematically. To address this gap, we introduce CY-Bench (Crop Yield Benchmark), a comprehensive dataset and benchmark to forecast maize and wheat yields at a global scale. CY-Bench was conceptualized and developed within the Machine Learning team of the Agricultural Model Intercomparison and Improvement Project (AgML) in collaboration with agronomists, climate scientists, and machine learning researchers. It features publicly available sub-national yield statistics and relevant predictors—such as weather data, soil characteristics, and remote sensing indicators—that have been pre-processed, standardized, and harmonized across spatio-temporal scales. With CY-Bench, we aim to: (i) establish a standardized framework for developing and evaluating data-driven models across diverse farming systems in more than 25 countries across six continents; (ii) enable robust and reproducible model comparisons that address real-world operational challenges; (iii) provide an openly accessible dataset to the earth system science and machine learning communities, facilitating research on time series forecasting, domain adaptation, and online learning. The dataset (https://doi.org/10.5281/zenodo.11502142, (Paudel et al., 2025a)) and accompanying code (https://github.com/WUR-AI/AgML-CY-Bench, (Paudel et al., 2025b))) are openly available to support the continuous development of advanced data driven models for crop yield forecasting to enhance decision-making on food security.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.
Share
Dilli Paudel, Michiel Kallenberg, Stella Ofori-Ampofo, Hilmy Baja, Ron van Bree, Aike Potze, Pratishtha Poudel, Abdelrahman Saleh, Weston Anderson, Malte von Bloh, Andres Castellano, Oumnia Ennaji, Raed Hamed, Rahel Laudien, Donghoon Lee, Inti Luna, Michele Meroni, Janet Mumo Mutuku, Siyabusa Mkuhlani, Jonathan Richetti, Alex C. Ruane, Ritvik Sahajpal, Guanyuan Shai, Vasileios Sitokonstantinou, Rogério de Souza Nóia Júnior, Amit Kumar Srivastava, Robert Strong, Lily-belle Sweet, Petar Vojnovic, and Ioannis N. Athanasiadis

Status: open (until 18 Apr 2025)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • AC1: 'Correction to Author List', Michiel Kallenberg, 13 Mar 2025 reply
Dilli Paudel, Michiel Kallenberg, Stella Ofori-Ampofo, Hilmy Baja, Ron van Bree, Aike Potze, Pratishtha Poudel, Abdelrahman Saleh, Weston Anderson, Malte von Bloh, Andres Castellano, Oumnia Ennaji, Raed Hamed, Rahel Laudien, Donghoon Lee, Inti Luna, Michele Meroni, Janet Mumo Mutuku, Siyabusa Mkuhlani, Jonathan Richetti, Alex C. Ruane, Ritvik Sahajpal, Guanyuan Shai, Vasileios Sitokonstantinou, Rogério de Souza Nóia Júnior, Amit Kumar Srivastava, Robert Strong, Lily-belle Sweet, Petar Vojnovic, and Ioannis N. Athanasiadis

Data sets

CY-Bench: A comprehensive benchmark dataset for subnational crop yield forecasting D. Paudel et al. https://doi.org/10.5281/zenodo.11502142

Model code and software

CY-Bench: A comprehensive benchmark dataset for subnational crop yield forecasting D. Paudel et al. https://github.com/WUR-AI/AgML-CY-Bench/

Dilli Paudel, Michiel Kallenberg, Stella Ofori-Ampofo, Hilmy Baja, Ron van Bree, Aike Potze, Pratishtha Poudel, Abdelrahman Saleh, Weston Anderson, Malte von Bloh, Andres Castellano, Oumnia Ennaji, Raed Hamed, Rahel Laudien, Donghoon Lee, Inti Luna, Michele Meroni, Janet Mumo Mutuku, Siyabusa Mkuhlani, Jonathan Richetti, Alex C. Ruane, Ritvik Sahajpal, Guanyuan Shai, Vasileios Sitokonstantinou, Rogério de Souza Nóia Júnior, Amit Kumar Srivastava, Robert Strong, Lily-belle Sweet, Petar Vojnovic, and Ioannis N. Athanasiadis

Viewed

Total article views: 46 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
39 7 0 46 0 0
  • HTML: 39
  • PDF: 7
  • XML: 0
  • Total: 46
  • BibTeX: 0
  • EndNote: 0
Views and downloads (calculated since 12 Mar 2025)
Cumulative views and downloads (calculated since 12 Mar 2025)

Viewed (geographical distribution)

Total article views: 46 (including HTML, PDF, and XML) Thereof 46 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 13 Mar 2025
Download
Short summary
Improving crop yield predictions is crucial for food security. Prior research relied on case studies, making it hard to compare methods & track progress. We introduce CY-Bench, a global dataset for forecasting maize and wheat yields across diverse farming systems in over 25 countries. It includes standardized weather, soil, and satellite data, curated by a diverse set of experts. CY-Bench supports the development of better forecasting tools to help decision-makers plan for global food security.
Share
Altmetrics