the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
GGCP10: A Global Gridded Crop Production Dataset at 10km Resolution from 2010 to 2020
Abstract. Spatial-temporal distribution information on global crop production is of is crucial for studying global food security and promoting sustainable agricultural development. However, the presently available datasets related to this subject are characterized by coarse resolution and discontinuous time spans. To tackle these problems, we have integrated multiple data sources, including statistical data, gridded production data, agroclimatic indicator data, agronomic indicator data, global land surface satellite products and ground data, to develop a data-driven crop production spatial allocation model, and generated the first global temporally continuous 10 km resolution gridded production dataset of four major crops (maize, wheat, rice and soybean) from 2010 to 2020 (Global gridded crop production dataset at 10 km, GGCP10). A set of data-driven models were trained based on agro-ecological zones to achieve accurate predictions of crop production for different agricultural regions. The performance of the models is demonstrated by the cross-validation results. The accuracy and reliability of GGCP10 have been evaluated from various perspectives using gridded, survey and statistical data. GGCP10 can reveal the spatial-temporal distribution patterns of global crop production and contribute to the understanding of the mechanisms driving changes in crop production. GGCP10 provides crucial data support for research on global food security and sustainable agricultural development. The GGCP10 dataset is available on Harvard Dataverse: https://doi.org/10.7910/DVN/G1HBNK (Qin et.al., 2023).
This preprint has been withdrawn.
-
Withdrawal notice
This preprint has been withdrawn.
-
Preprint
(2436 KB)
Interactive discussion
Status: closed
-
RC1: 'Comment on essd-2023-346', Anonymous Referee #1, 07 Nov 2023
Spatial distribution of crops is critical information for food security, agriculture development and investment decisions, sustainable agricultural development etc. There have been multiple attempts, by different teams in the world, to produce global crop maps. And yet so far few attempts have been made to produce time series global crop maps. The GGCP10 dataset focuses on maize, wheat, rice, and soybeans and covers the years 2010 to 2020, the first temporally continuous, gridded dataset of crop production at the global scale. The dataset was constructed using a data-driven spatial production allocation model that incorporated multiple source datasets. The use of various data sources, including FAO statistical data, GAEZ+ 2015 annual crop data, and other sources, demonstrates a robust foundation for the study. This model was rigorously examined through pre-processing and consistency checks to ensure data accuracy and reliability. The incorporation of machine learning techniques for predicting crop yields and production is a forward-looking approach. These techniques have demonstrated solid performance in recent years. The approach of combining information from multiple sources, including climate, soil, and topographic data, is a commendable strategy for predicting crop production accurately.
However I do have serious concerns about the paper. My first concern is that their whole modelling approach, production model in particular (see Section 2.2.3 Data-driven Model Training), implicitly assumes that the biophysical parameters alone could determine the crop production. In other words, their modelling approach assumes that the driven factors for the huge spatial heterogeneity of crop productivity (or production if crop area is counted) are mainly those biophysical parameters such as soil, AEZ zones, various vegetation indices, climate variables (multi-source indicators XI(i,j) as shown in their model, Line 225). Any breeders or agricultural economists would tell you that this is not true. Social economic factors such as crop seeds/varieties, crop management, fertilizer, pesticide are the major driven force in crop productivity (and so crop production). This is why, for example, the maize yield in a large estate farm in Zambia could be a few times higher than that of a subsistence maize farmer next door – just a few hundred meters away! Of course collecting the data for these parameters on a global scale is much harder, if possible at all. Without the inputs of these critical parameters, estimating crop yields spatially is a huge challenge. My second concern is that the paper is a data description paper and yet it misses the critical dataset: a global sub-national crop statistics data. Their major statistical data source is the FAOSTATA data at country level, which is too coarse for the gridded product. Crop type mapping is too complex and too dynamic to be able to be modeled without the actual sub-national statistics. For example, farmers may decide to reduce their maize area and instead plant more rice in the current season if they expect more rain in the coming season or simply they believe the maize price will go down next year. Any fancy modelling approach is difficult to capture that without the actual data. The paper itself emphasizes a lot on their modelling approach while ignoring the time-consuming effort of collecting crop data for the four crops (maize, rice, wheat and soybean). I would say the latter is much more critical, in particular considering that the ESSD journal is, which I quote, “for the publication of articles on original research data (sets), furthering the reuse of high-quality data of benefit to Earth system sciences”.
In addition, I have the following minor issues:
- The paper could benefit from more transparency regarding data preprocessing steps, such as how data clipping based on crop phenology is conducted and how missing or corrupted data are handled. For example, Line 179-183, Where does CA(i,t) come from? How to divide CA(i, t) into CA(i,j) ? Not clear at al. I think (I am not 100% sure as I have a hard time to understand this section) “reference year” at Line 184 should be “target year”. After reading the section multiple times, I still don’t know how the harvested area is estimated at the pixel level. I considered myself as an expert, imagine how an ordinary reader would feel!
- Model Selection: The paper mentions the selection of machine learning models but lacks specific details about the criteria used for model selection. Providing more insight into the model selection process would enhance the paper's transparency.
- Data Limitations: While the paper discusses data limitations briefly, a more thorough exploration of potential data limitations, such as inaccuracies in remote sensing data or potential biases, would provide a more comprehensive view.
Citation: https://doi.org/10.5194/essd-2023-346-RC1 -
CC5: 'Reply on RC1', Qinghan Dong, 19 Feb 2024
t is a discussion not a reply, as I am not a team member of the authors' institution.
Your comments are actually judicious, I can only agree with all your points mentioned in your comments. From my experience, the crop yield as mostly determined , as you mentioned, by seed cultiva as well as the chemical fertilizer application. and most often the application amount should be used as a predominant explicative variable in modeling. However, if authors were trying to use the early greenness from cropping fields to predict the harvest yield, that would be also sound from scientific point of view. Actually, the remote sensing based methodology, together with agro-meteorology, is widely applied to crop yield forecasting, especially in agricultural mature production systems (North, South America, Europe), where advanced seed technology is widely accessible.
Citation: https://doi.org/10.5194/essd-2023-346-CC5 -
AC1: 'Reply on RC1', Xingli Qin, 19 Mar 2024
We greatly appreciate the reviewer's positive feedback and insightful comments on our work. Your valuable suggestions have helped us to further refine the manuscript and better highlight the significance and innovation of this research. We have carefully considered each point you raised and provided detailed responses in the supplement. Please kindly refer to our point-by-point responses for more information on how we have addressed your comments and suggestions.
-
CC1: 'Comment on essd-2023-346', Vishal Mishra, 30 Jan 2024
This is a great study. This dataset is based on multi-source data production and the experimental results also demonstrate its high reliability. It shows the spatial distribution of crop production over a long time series, which helps to evaluate the impact of global warming and land use change on food yield.
It is suggested that the author should further explore the application prospects of this dataset, in particular in assessing the potential for achieving the zero hunger goal of sustainable development in Africa.Citation: https://doi.org/10.5194/essd-2023-346-CC1 -
CC4: 'Reply on CC1', Qinghan Dong, 19 Feb 2024
It is a discussion not a reply, as I am not a team member of the authors' institution.
Your comments are actually judicious, I can only agree with all your points mentioned in your comments. From my experience, the crop yield as mostly determined , as you mentioned, by seed cultiva as well as the chemical fertilizer application. and most often the application amount should be used as a predominant explicative variable in modeling. However, if authors were trying to use the early greenness from cropping fields to predict the harvest yield, that would be also sound from scientific point of view. Actually, the remote sensing based methodology, together with agro-meteorology, is widely applied to crop yield forecasting, especially in agricultural mature production systems (North, South America, Europe), where advanced seed technology is widely accessible.
Citation: https://doi.org/10.5194/essd-2023-346-CC4 -
AC4: 'Reply on CC1', Xingli Qin, 19 Mar 2024
Thank you very much for your endorsement and encouragement of our research work. As you pointed out, the GGCP10 dataset, which we constructed based on multi-source data, reveals the spatio-temporal distribution patterns of crop production over a long time series, which is of great value for assessing the impact of global change on food production.
We strongly agree with your suggestion to further explore the application prospects of this dataset, especially in assessing the potential for achieving the zero hunger goal of sustainable development in Africa. Indeed, the four crops covered by this dataset (maize, wheat, rice, and soybean) play an important role in food production and consumption in many African countries. Using the spatio-temporal dynamics of crop production provided by GGCP10, we can analyze the current status, limiting factors, and yield enhancement potentials of food production in different regions of Africa, and propose countermeasures and suggestions for improving regional food security. In addition, by combining GGCP10 with data on land use change, population growth and other factors in Africa, it can also help to assess future food demand gaps and resource and environmental pressures under different scenarios, thereby exploring path options for achieving a balance between socio-economic development and food security.We will elaborate on the above issues in the discussion section of the revised manuscript.
Citation: https://doi.org/10.5194/essd-2023-346-AC4
-
CC4: 'Reply on CC1', Qinghan Dong, 19 Feb 2024
-
CC2: 'Comment on essd-2023-346', Asfaw Kebede Kassa, 07 Feb 2024
It is very important research, which will move forward the effort made by different researchers to predict crop production at different scale. Most of researchers and practitioners dreadfully looking for continuous temporal and spatial information related to crop production. Such research efforts with better spatial accuracy and temporal resolution is vital for efforts made on food security.
The application of machine learning currently becoming decisive approach in science due to the reason of having better prediction. But in the course of training of data, unless we use the actual representative ground data the application of machine learning will not bring the intended result. Therefore, my worry in this research is the use of crop yield data on the ground. Some country have different agro-climatic zones due to their topographic variations and their crop yield vary accordingly from district to district within the country. But the statistical data used in this research is FAO crop yield data, which give a single yield for one crop/country (example: a country X will have y maize yield per Z year). But there is huge crop yield difference with in a single country from district to district due to different factors, taking FAO crop yield data (average) as training model may not bring the intended prediction accuracy at 10km resolution on the ground unless it is validated using ground observation, it would be good to make such major issue clear in this best dataset initiative.
This research will give an indication to the total yield a country may have to a specific crops. Effort were made to validate the result of the dataset using observations in Indian, USA and China, It would be good to see the dataset validation issues for food insecure areas in the world (E.g. Africa) using ground observation instead of FAOSTAT at least in major crops in Africa like maize and wheat, this will increase reliability and accuracy of the dataset. The other aspects in addition to the factors used in this research, what about agricultural inputs (agronomic and physical inputs) can bring a yield difference, it would be wise to clarify effect of these issues in your model or take it as hole for improvement in the future.
Citation: https://doi.org/10.5194/essd-2023-346-CC2 - AC5: 'Reply on CC2', Xingli Qin, 19 Mar 2024
-
CC3: 'Comment on essd-2023-346', Qinghan Dong, 19 Feb 2024
Line 09: It would be good to pay more attention to the English grammar such as on Line 09
Line 11: It would be good to use more precise remote sensing terminology for example using the term "discontinuous time series" instead of " discontinuous time spans"
Citation: https://doi.org/10.5194/essd-2023-346-CC3 -
AC3: 'Reply on CC3', Xingli Qin, 19 Mar 2024
Thank you very much for your valuable comments on the details of the article.
Regarding the grammatical issue in line 9, we will carefully review it again and correct any grammatical errors and inappropriate expressions to ensure that the language expression of the article conforms to English grammar norms.
Regarding the expression of "discontinuous time series" in line 11, we completely agree with your point of view. "Time series" is indeed a more accurate and professional term in the field of remote sensing. We will adopt your suggestion and change "discontinuous time spans" to "discontinuous time series" in the revised manuscript to improve the academic standardization of the article.
In addition to the two points you mentioned, we will also conduct a thorough check of the entire text and invite native English-speaking peers to review and polish the revised manuscript, striving to achieve accurate, concise, and standardized language expression. Thank you again for your careful reading and pertinent suggestions.Citation: https://doi.org/10.5194/essd-2023-346-AC3
-
AC3: 'Reply on CC3', Xingli Qin, 19 Mar 2024
-
RC2: 'Comment on essd-2023-346', Anonymous Referee #2, 20 Feb 2024
Qin et al. used multiple data sources, including statistical data, gridded production data, agroclimatic indicator data, agronomic indicator data, global land surface satellite products and ground data, to develop a data-driven crop production spatial allocation model, and generated the global 10km resolution gridded production dataset of four major crops (maize, wheat, rice and soybean) from 2010 to 2020. Basically, this topic is necessary. However, this study has several serious issues.
First, the method for generating production map is not robust. The authors used an existing production map as the reference and training a machine learning model to allocate statistical production to grids. The method is not innovative, and is unreasonable from the importance analysis of input features (see my below comment). In principle, there is no reason to prove the machine learning method work here, because the planting area of a given pixel can not be predicted at all which depends on the famers’ activities. Therefore, I did not believe this method can work for generating production map globally or regionally.
Second, the harvested area map used by the authors too simple to indicate the spatial and temporal variations. As we know, the most important feature for production is harvested area. However, the authors assumed a fixed ratio to exact the harvested area from a given year to other years. I did not think the harvested area map can reproduce the spatial and temporal changes, which is still static like the previous study. And anyone can easily generate production map based on this assumption without as an input of harvested area.
Third, the writing still need a lot of work. I am always confused that the authors exactly mean. And there are a lot of places that the authors missed necessary details which made the manuscript hard to follow.
There are still many minor concerns as below.
2.1.9 section: the authors should introduce the indicators first. I do not understand what cumulative potential biomass means, which is a satellite-based observation or a predicted variable. The authors provide a reference which a website in Chinese and the readers can not find accurate definition from there. Same problem also was found in the section 2.1.10, like VCIx. By the way, there are a lot of places with this kind unclear writing issues making the reading very hard.
Line 167: it is very difficult to understand what you mean here.
Line 169-171: again, I am confused and did not understand the meaning. You may show the correlation between GAZE and CropWatch.
Section 2.2.1: the method for estimating harvested area is not robust totally. The authors assumed the same change rates of all crops with the total cultivation area and used the statistical area by FAO (national level) to estimate harvested area of each grid. It is obviously wrong method, and which may induce large bias among different years. If the authors did this kind estimation for harvested area why did not just estimate production directly. Besides, the writing of method section need improve and it will be good to introduce the principle first and then write out the algorithm, which will be easier for the readers.
Line 214: a typical writing error ‘First, the time series data are clipped by crop phenology to obtain the data corresponding to the crop growth period.’ How can the series datasets are clipped by crop phenology? Is it separated into different seasons? Or separate various crop types according to their own phenology? However, these series datasets also include location, terrain or soil according to table 1. How to clip, and why clip these features?
Table 1: the title of table is just ‘input features’? what does ‘dimensions’ mean? What does the total dimension indicate?
Line 222: ‘these correlations are largely consistent within local regions’, what do you mean here? It is meaningless to correlate production and harvested area, which is an obvious correlation.
Line 279: you may not use comparative degree in the sentence when you accurately did not make comparison between two things. There are same grammar errors in the close following sentences.
Line 284: ‘regions --- have ---’ is not good expression. I strongly suggest the authors rewrite the manuscript.
Line 290 and 295: it is very mess and redundant paragraph. Especially, at the result section, the authors talked a lot of potential implications which should be put into the discussion section, and by the way these implications also mentioned in the introduction and discussion sections too.
Line 305: this paragraph is to introduce the method, which should be put into the method section, and it is no need to introduce the common information as the reader easily know it.
Figures (fig. 2 and 3 at least) should show up after the main text that mentioned them.
Line 309: I cannot read these numbers in fig. 3.
Line 350-360: you may consider to improve the writing here.
Section 3.3: this section is the best choice to examine the reasonability of the method. All four figures showed the most important role of location for simulating production, which is unreasonable. Why did the authors select the location as an input feature? And why the location is important for simulating production. These results made me suspect this method. I really hope the authors be careful here, and the wrong method made the validation not too bad but will induce totally wrong regional or global distribution. Besides, the satellite-based features should play an important role, but they did not in this study. The explanation of this section does not make sense mostly. For example, at paragraph with line 380, and most of these explanations also are correct for other crops.
3.4 Comparing with Existing Datasets: I don’t think this dataset is consistent with other existing datasets. Like fig. 9 showed the large difference compared to SPAM2010 for all four crops. It can also be found the systematic differences from fig. 10.
Citation: https://doi.org/10.5194/essd-2023-346-RC2 -
AC2: 'Reply on RC2', Xingli Qin, 19 Mar 2024
We would like to thank the reviewer for the constructive suggestions, which help to significantly improve the research and the quality of this work. We have provided our detailed point-by-point responses to address all the issues raised. Please review the detailed responses in the supplement.
-
AC2: 'Reply on RC2', Xingli Qin, 19 Mar 2024
Interactive discussion
Status: closed
-
RC1: 'Comment on essd-2023-346', Anonymous Referee #1, 07 Nov 2023
Spatial distribution of crops is critical information for food security, agriculture development and investment decisions, sustainable agricultural development etc. There have been multiple attempts, by different teams in the world, to produce global crop maps. And yet so far few attempts have been made to produce time series global crop maps. The GGCP10 dataset focuses on maize, wheat, rice, and soybeans and covers the years 2010 to 2020, the first temporally continuous, gridded dataset of crop production at the global scale. The dataset was constructed using a data-driven spatial production allocation model that incorporated multiple source datasets. The use of various data sources, including FAO statistical data, GAEZ+ 2015 annual crop data, and other sources, demonstrates a robust foundation for the study. This model was rigorously examined through pre-processing and consistency checks to ensure data accuracy and reliability. The incorporation of machine learning techniques for predicting crop yields and production is a forward-looking approach. These techniques have demonstrated solid performance in recent years. The approach of combining information from multiple sources, including climate, soil, and topographic data, is a commendable strategy for predicting crop production accurately.
However I do have serious concerns about the paper. My first concern is that their whole modelling approach, production model in particular (see Section 2.2.3 Data-driven Model Training), implicitly assumes that the biophysical parameters alone could determine the crop production. In other words, their modelling approach assumes that the driven factors for the huge spatial heterogeneity of crop productivity (or production if crop area is counted) are mainly those biophysical parameters such as soil, AEZ zones, various vegetation indices, climate variables (multi-source indicators XI(i,j) as shown in their model, Line 225). Any breeders or agricultural economists would tell you that this is not true. Social economic factors such as crop seeds/varieties, crop management, fertilizer, pesticide are the major driven force in crop productivity (and so crop production). This is why, for example, the maize yield in a large estate farm in Zambia could be a few times higher than that of a subsistence maize farmer next door – just a few hundred meters away! Of course collecting the data for these parameters on a global scale is much harder, if possible at all. Without the inputs of these critical parameters, estimating crop yields spatially is a huge challenge. My second concern is that the paper is a data description paper and yet it misses the critical dataset: a global sub-national crop statistics data. Their major statistical data source is the FAOSTATA data at country level, which is too coarse for the gridded product. Crop type mapping is too complex and too dynamic to be able to be modeled without the actual sub-national statistics. For example, farmers may decide to reduce their maize area and instead plant more rice in the current season if they expect more rain in the coming season or simply they believe the maize price will go down next year. Any fancy modelling approach is difficult to capture that without the actual data. The paper itself emphasizes a lot on their modelling approach while ignoring the time-consuming effort of collecting crop data for the four crops (maize, rice, wheat and soybean). I would say the latter is much more critical, in particular considering that the ESSD journal is, which I quote, “for the publication of articles on original research data (sets), furthering the reuse of high-quality data of benefit to Earth system sciences”.
In addition, I have the following minor issues:
- The paper could benefit from more transparency regarding data preprocessing steps, such as how data clipping based on crop phenology is conducted and how missing or corrupted data are handled. For example, Line 179-183, Where does CA(i,t) come from? How to divide CA(i, t) into CA(i,j) ? Not clear at al. I think (I am not 100% sure as I have a hard time to understand this section) “reference year” at Line 184 should be “target year”. After reading the section multiple times, I still don’t know how the harvested area is estimated at the pixel level. I considered myself as an expert, imagine how an ordinary reader would feel!
- Model Selection: The paper mentions the selection of machine learning models but lacks specific details about the criteria used for model selection. Providing more insight into the model selection process would enhance the paper's transparency.
- Data Limitations: While the paper discusses data limitations briefly, a more thorough exploration of potential data limitations, such as inaccuracies in remote sensing data or potential biases, would provide a more comprehensive view.
Citation: https://doi.org/10.5194/essd-2023-346-RC1 -
CC5: 'Reply on RC1', Qinghan Dong, 19 Feb 2024
t is a discussion not a reply, as I am not a team member of the authors' institution.
Your comments are actually judicious, I can only agree with all your points mentioned in your comments. From my experience, the crop yield as mostly determined , as you mentioned, by seed cultiva as well as the chemical fertilizer application. and most often the application amount should be used as a predominant explicative variable in modeling. However, if authors were trying to use the early greenness from cropping fields to predict the harvest yield, that would be also sound from scientific point of view. Actually, the remote sensing based methodology, together with agro-meteorology, is widely applied to crop yield forecasting, especially in agricultural mature production systems (North, South America, Europe), where advanced seed technology is widely accessible.
Citation: https://doi.org/10.5194/essd-2023-346-CC5 -
AC1: 'Reply on RC1', Xingli Qin, 19 Mar 2024
We greatly appreciate the reviewer's positive feedback and insightful comments on our work. Your valuable suggestions have helped us to further refine the manuscript and better highlight the significance and innovation of this research. We have carefully considered each point you raised and provided detailed responses in the supplement. Please kindly refer to our point-by-point responses for more information on how we have addressed your comments and suggestions.
-
CC1: 'Comment on essd-2023-346', Vishal Mishra, 30 Jan 2024
This is a great study. This dataset is based on multi-source data production and the experimental results also demonstrate its high reliability. It shows the spatial distribution of crop production over a long time series, which helps to evaluate the impact of global warming and land use change on food yield.
It is suggested that the author should further explore the application prospects of this dataset, in particular in assessing the potential for achieving the zero hunger goal of sustainable development in Africa.Citation: https://doi.org/10.5194/essd-2023-346-CC1 -
CC4: 'Reply on CC1', Qinghan Dong, 19 Feb 2024
It is a discussion not a reply, as I am not a team member of the authors' institution.
Your comments are actually judicious, I can only agree with all your points mentioned in your comments. From my experience, the crop yield as mostly determined , as you mentioned, by seed cultiva as well as the chemical fertilizer application. and most often the application amount should be used as a predominant explicative variable in modeling. However, if authors were trying to use the early greenness from cropping fields to predict the harvest yield, that would be also sound from scientific point of view. Actually, the remote sensing based methodology, together with agro-meteorology, is widely applied to crop yield forecasting, especially in agricultural mature production systems (North, South America, Europe), where advanced seed technology is widely accessible.
Citation: https://doi.org/10.5194/essd-2023-346-CC4 -
AC4: 'Reply on CC1', Xingli Qin, 19 Mar 2024
Thank you very much for your endorsement and encouragement of our research work. As you pointed out, the GGCP10 dataset, which we constructed based on multi-source data, reveals the spatio-temporal distribution patterns of crop production over a long time series, which is of great value for assessing the impact of global change on food production.
We strongly agree with your suggestion to further explore the application prospects of this dataset, especially in assessing the potential for achieving the zero hunger goal of sustainable development in Africa. Indeed, the four crops covered by this dataset (maize, wheat, rice, and soybean) play an important role in food production and consumption in many African countries. Using the spatio-temporal dynamics of crop production provided by GGCP10, we can analyze the current status, limiting factors, and yield enhancement potentials of food production in different regions of Africa, and propose countermeasures and suggestions for improving regional food security. In addition, by combining GGCP10 with data on land use change, population growth and other factors in Africa, it can also help to assess future food demand gaps and resource and environmental pressures under different scenarios, thereby exploring path options for achieving a balance between socio-economic development and food security.We will elaborate on the above issues in the discussion section of the revised manuscript.
Citation: https://doi.org/10.5194/essd-2023-346-AC4
-
CC4: 'Reply on CC1', Qinghan Dong, 19 Feb 2024
-
CC2: 'Comment on essd-2023-346', Asfaw Kebede Kassa, 07 Feb 2024
It is very important research, which will move forward the effort made by different researchers to predict crop production at different scale. Most of researchers and practitioners dreadfully looking for continuous temporal and spatial information related to crop production. Such research efforts with better spatial accuracy and temporal resolution is vital for efforts made on food security.
The application of machine learning currently becoming decisive approach in science due to the reason of having better prediction. But in the course of training of data, unless we use the actual representative ground data the application of machine learning will not bring the intended result. Therefore, my worry in this research is the use of crop yield data on the ground. Some country have different agro-climatic zones due to their topographic variations and their crop yield vary accordingly from district to district within the country. But the statistical data used in this research is FAO crop yield data, which give a single yield for one crop/country (example: a country X will have y maize yield per Z year). But there is huge crop yield difference with in a single country from district to district due to different factors, taking FAO crop yield data (average) as training model may not bring the intended prediction accuracy at 10km resolution on the ground unless it is validated using ground observation, it would be good to make such major issue clear in this best dataset initiative.
This research will give an indication to the total yield a country may have to a specific crops. Effort were made to validate the result of the dataset using observations in Indian, USA and China, It would be good to see the dataset validation issues for food insecure areas in the world (E.g. Africa) using ground observation instead of FAOSTAT at least in major crops in Africa like maize and wheat, this will increase reliability and accuracy of the dataset. The other aspects in addition to the factors used in this research, what about agricultural inputs (agronomic and physical inputs) can bring a yield difference, it would be wise to clarify effect of these issues in your model or take it as hole for improvement in the future.
Citation: https://doi.org/10.5194/essd-2023-346-CC2 - AC5: 'Reply on CC2', Xingli Qin, 19 Mar 2024
-
CC3: 'Comment on essd-2023-346', Qinghan Dong, 19 Feb 2024
Line 09: It would be good to pay more attention to the English grammar such as on Line 09
Line 11: It would be good to use more precise remote sensing terminology for example using the term "discontinuous time series" instead of " discontinuous time spans"
Citation: https://doi.org/10.5194/essd-2023-346-CC3 -
AC3: 'Reply on CC3', Xingli Qin, 19 Mar 2024
Thank you very much for your valuable comments on the details of the article.
Regarding the grammatical issue in line 9, we will carefully review it again and correct any grammatical errors and inappropriate expressions to ensure that the language expression of the article conforms to English grammar norms.
Regarding the expression of "discontinuous time series" in line 11, we completely agree with your point of view. "Time series" is indeed a more accurate and professional term in the field of remote sensing. We will adopt your suggestion and change "discontinuous time spans" to "discontinuous time series" in the revised manuscript to improve the academic standardization of the article.
In addition to the two points you mentioned, we will also conduct a thorough check of the entire text and invite native English-speaking peers to review and polish the revised manuscript, striving to achieve accurate, concise, and standardized language expression. Thank you again for your careful reading and pertinent suggestions.Citation: https://doi.org/10.5194/essd-2023-346-AC3
-
AC3: 'Reply on CC3', Xingli Qin, 19 Mar 2024
-
RC2: 'Comment on essd-2023-346', Anonymous Referee #2, 20 Feb 2024
Qin et al. used multiple data sources, including statistical data, gridded production data, agroclimatic indicator data, agronomic indicator data, global land surface satellite products and ground data, to develop a data-driven crop production spatial allocation model, and generated the global 10km resolution gridded production dataset of four major crops (maize, wheat, rice and soybean) from 2010 to 2020. Basically, this topic is necessary. However, this study has several serious issues.
First, the method for generating production map is not robust. The authors used an existing production map as the reference and training a machine learning model to allocate statistical production to grids. The method is not innovative, and is unreasonable from the importance analysis of input features (see my below comment). In principle, there is no reason to prove the machine learning method work here, because the planting area of a given pixel can not be predicted at all which depends on the famers’ activities. Therefore, I did not believe this method can work for generating production map globally or regionally.
Second, the harvested area map used by the authors too simple to indicate the spatial and temporal variations. As we know, the most important feature for production is harvested area. However, the authors assumed a fixed ratio to exact the harvested area from a given year to other years. I did not think the harvested area map can reproduce the spatial and temporal changes, which is still static like the previous study. And anyone can easily generate production map based on this assumption without as an input of harvested area.
Third, the writing still need a lot of work. I am always confused that the authors exactly mean. And there are a lot of places that the authors missed necessary details which made the manuscript hard to follow.
There are still many minor concerns as below.
2.1.9 section: the authors should introduce the indicators first. I do not understand what cumulative potential biomass means, which is a satellite-based observation or a predicted variable. The authors provide a reference which a website in Chinese and the readers can not find accurate definition from there. Same problem also was found in the section 2.1.10, like VCIx. By the way, there are a lot of places with this kind unclear writing issues making the reading very hard.
Line 167: it is very difficult to understand what you mean here.
Line 169-171: again, I am confused and did not understand the meaning. You may show the correlation between GAZE and CropWatch.
Section 2.2.1: the method for estimating harvested area is not robust totally. The authors assumed the same change rates of all crops with the total cultivation area and used the statistical area by FAO (national level) to estimate harvested area of each grid. It is obviously wrong method, and which may induce large bias among different years. If the authors did this kind estimation for harvested area why did not just estimate production directly. Besides, the writing of method section need improve and it will be good to introduce the principle first and then write out the algorithm, which will be easier for the readers.
Line 214: a typical writing error ‘First, the time series data are clipped by crop phenology to obtain the data corresponding to the crop growth period.’ How can the series datasets are clipped by crop phenology? Is it separated into different seasons? Or separate various crop types according to their own phenology? However, these series datasets also include location, terrain or soil according to table 1. How to clip, and why clip these features?
Table 1: the title of table is just ‘input features’? what does ‘dimensions’ mean? What does the total dimension indicate?
Line 222: ‘these correlations are largely consistent within local regions’, what do you mean here? It is meaningless to correlate production and harvested area, which is an obvious correlation.
Line 279: you may not use comparative degree in the sentence when you accurately did not make comparison between two things. There are same grammar errors in the close following sentences.
Line 284: ‘regions --- have ---’ is not good expression. I strongly suggest the authors rewrite the manuscript.
Line 290 and 295: it is very mess and redundant paragraph. Especially, at the result section, the authors talked a lot of potential implications which should be put into the discussion section, and by the way these implications also mentioned in the introduction and discussion sections too.
Line 305: this paragraph is to introduce the method, which should be put into the method section, and it is no need to introduce the common information as the reader easily know it.
Figures (fig. 2 and 3 at least) should show up after the main text that mentioned them.
Line 309: I cannot read these numbers in fig. 3.
Line 350-360: you may consider to improve the writing here.
Section 3.3: this section is the best choice to examine the reasonability of the method. All four figures showed the most important role of location for simulating production, which is unreasonable. Why did the authors select the location as an input feature? And why the location is important for simulating production. These results made me suspect this method. I really hope the authors be careful here, and the wrong method made the validation not too bad but will induce totally wrong regional or global distribution. Besides, the satellite-based features should play an important role, but they did not in this study. The explanation of this section does not make sense mostly. For example, at paragraph with line 380, and most of these explanations also are correct for other crops.
3.4 Comparing with Existing Datasets: I don’t think this dataset is consistent with other existing datasets. Like fig. 9 showed the large difference compared to SPAM2010 for all four crops. It can also be found the systematic differences from fig. 10.
Citation: https://doi.org/10.5194/essd-2023-346-RC2 -
AC2: 'Reply on RC2', Xingli Qin, 19 Mar 2024
We would like to thank the reviewer for the constructive suggestions, which help to significantly improve the research and the quality of this work. We have provided our detailed point-by-point responses to address all the issues raised. Please review the detailed responses in the supplement.
-
AC2: 'Reply on RC2', Xingli Qin, 19 Mar 2024
Data sets
GGCP10: A Global Gridded Crop Production Dataset at 10km Resolution from 2010 to 2020 Xingli Qin, Bingfang Wu, Hongwei Zeng, Miao Zhang, and Fuyou Tian https://doi.org/10.7910/DVN/G1HBNK
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
1,418 | 449 | 76 | 1,943 | 59 | 72 |
- HTML: 1,418
- PDF: 449
- XML: 76
- Total: 1,943
- BibTeX: 59
- EndNote: 72
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1