the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
GGCP10: A Global Gridded Crop Production Dataset at 10km Resolution from 2010 to 2020
Xingli Qin
Bingfang Wu
Hongwei Zeng
Miao Zhang
Fuyou Tian
Abstract. Spatial-temporal distribution information on global crop production is of is crucial for studying global food security and promoting sustainable agricultural development. However, the presently available datasets related to this subject are characterized by coarse resolution and discontinuous time spans. To tackle these problems, we have integrated multiple data sources, including statistical data, gridded production data, agroclimatic indicator data, agronomic indicator data, global land surface satellite products and ground data, to develop a data-driven crop production spatial allocation model, and generated the first global temporally continuous 10 km resolution gridded production dataset of four major crops (maize, wheat, rice and soybean) from 2010 to 2020 (Global gridded crop production dataset at 10 km, GGCP10). A set of data-driven models were trained based on agro-ecological zones to achieve accurate predictions of crop production for different agricultural regions. The performance of the models is demonstrated by the cross-validation results. The accuracy and reliability of GGCP10 have been evaluated from various perspectives using gridded, survey and statistical data. GGCP10 can reveal the spatial-temporal distribution patterns of global crop production and contribute to the understanding of the mechanisms driving changes in crop production. GGCP10 provides crucial data support for research on global food security and sustainable agricultural development. The GGCP10 dataset is available on Harvard Dataverse: https://doi.org/10.7910/DVN/G1HBNK (Qin et.al., 2023).
- Preprint
(2436 KB) - Metadata XML
- BibTeX
- EndNote
Xingli Qin et al.
Status: open (until 24 Dec 2023)
-
RC1: 'Comment on essd-2023-346', Anonymous Referee #1, 07 Nov 2023
reply
Spatial distribution of crops is critical information for food security, agriculture development and investment decisions, sustainable agricultural development etc. There have been multiple attempts, by different teams in the world, to produce global crop maps. And yet so far few attempts have been made to produce time series global crop maps. The GGCP10 dataset focuses on maize, wheat, rice, and soybeans and covers the years 2010 to 2020, the first temporally continuous, gridded dataset of crop production at the global scale. The dataset was constructed using a data-driven spatial production allocation model that incorporated multiple source datasets. The use of various data sources, including FAO statistical data, GAEZ+ 2015 annual crop data, and other sources, demonstrates a robust foundation for the study. This model was rigorously examined through pre-processing and consistency checks to ensure data accuracy and reliability. The incorporation of machine learning techniques for predicting crop yields and production is a forward-looking approach. These techniques have demonstrated solid performance in recent years. The approach of combining information from multiple sources, including climate, soil, and topographic data, is a commendable strategy for predicting crop production accurately.
However I do have serious concerns about the paper. My first concern is that their whole modelling approach, production model in particular (see Section 2.2.3 Data-driven Model Training), implicitly assumes that the biophysical parameters alone could determine the crop production. In other words, their modelling approach assumes that the driven factors for the huge spatial heterogeneity of crop productivity (or production if crop area is counted) are mainly those biophysical parameters such as soil, AEZ zones, various vegetation indices, climate variables (multi-source indicators XI(i,j) as shown in their model, Line 225). Any breeders or agricultural economists would tell you that this is not true. Social economic factors such as crop seeds/varieties, crop management, fertilizer, pesticide are the major driven force in crop productivity (and so crop production). This is why, for example, the maize yield in a large estate farm in Zambia could be a few times higher than that of a subsistence maize farmer next door – just a few hundred meters away! Of course collecting the data for these parameters on a global scale is much harder, if possible at all. Without the inputs of these critical parameters, estimating crop yields spatially is a huge challenge. My second concern is that the paper is a data description paper and yet it misses the critical dataset: a global sub-national crop statistics data. Their major statistical data source is the FAOSTATA data at country level, which is too coarse for the gridded product. Crop type mapping is too complex and too dynamic to be able to be modeled without the actual sub-national statistics. For example, farmers may decide to reduce their maize area and instead plant more rice in the current season if they expect more rain in the coming season or simply they believe the maize price will go down next year. Any fancy modelling approach is difficult to capture that without the actual data. The paper itself emphasizes a lot on their modelling approach while ignoring the time-consuming effort of collecting crop data for the four crops (maize, rice, wheat and soybean). I would say the latter is much more critical, in particular considering that the ESSD journal is, which I quote, “for the publication of articles on original research data (sets), furthering the reuse of high-quality data of benefit to Earth system sciences”.
In addition, I have the following minor issues:
- The paper could benefit from more transparency regarding data preprocessing steps, such as how data clipping based on crop phenology is conducted and how missing or corrupted data are handled. For example, Line 179-183, Where does CA(i,t) come from? How to divide CA(i, t) into CA(i,j) ? Not clear at al. I think (I am not 100% sure as I have a hard time to understand this section) “reference year” at Line 184 should be “target year”. After reading the section multiple times, I still don’t know how the harvested area is estimated at the pixel level. I considered myself as an expert, imagine how an ordinary reader would feel!
- Model Selection: The paper mentions the selection of machine learning models but lacks specific details about the criteria used for model selection. Providing more insight into the model selection process would enhance the paper's transparency.
- Data Limitations: While the paper discusses data limitations briefly, a more thorough exploration of potential data limitations, such as inaccuracies in remote sensing data or potential biases, would provide a more comprehensive view.
Citation: https://doi.org/10.5194/essd-2023-346-RC1
Xingli Qin et al.
Data sets
GGCP10: A Global Gridded Crop Production Dataset at 10km Resolution from 2010 to 2020 Xingli Qin, Bingfang Wu, Hongwei Zeng, Miao Zhang, and Fuyou Tian https://doi.org/10.7910/DVN/G1HBNK
Xingli Qin et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
483 | 162 | 9 | 654 | 4 | 11 |
- HTML: 483
- PDF: 162
- XML: 9
- Total: 654
- BibTeX: 4
- EndNote: 11
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1