CROPGRIDS: A global geo-referenced dataset of 173 crops circa 2020

Tang, Fiona H. M.; Nguyen, Thu Ha; Conchedda, Giulia; Casse, Leon; Tubiello, Francesco N.; Maggi, Federico

doi:https://doi.org/10.5194/essd-2023-130

Preprints

https://doi.org/10.5194/essd-2023-130

Preprints

20 Apr 2023

| 20 Apr 2023

Status: this preprint has been withdrawn by the authors.

CROPGRIDS: A global geo-referenced dataset of 173 crops circa 2020

Fiona H. M. Tang, Thu Ha Nguyen, Giulia Conchedda, Leon Casse, Francesco N. Tubiello, and Federico Maggi

Abstract. Despite recent advancements in cloud processing and modelling and the increasing availability of high spectral- and temporal- resolution satellite imagery, mapping the spatial distribution of crop types remains a challenging task. Here, we present CROPGRIDS – a comprehensive global, geo-referenced dataset providing information on areas for 173 crops circa the year 2020, at a resolution of 0.05° (~5.55 km at the equator). It represents a major update of the Monfreda et al. (2008) dataset, the most widely used geospatial dataset previously available, covering 175 crops with reference year 2000 at 10 km spatial resolution. CROPGRIDS updates Monfreda et al. (2008) through the careful evaluation of 26 published gridded datasets covering more recent crop information at regional, national, and global levels, largely over the period 2015–2020. The new product successfully updates the area extent for 80 of the 175 crops originally covered, representing an update to 1.2 billion hectares of crop area (i.e., 81 % of the total cropland included in CROPGRIDS). CROPGRIDS carries forward the crop type maps originally in Monfreda et al. (2008) for 93 crops as more recent information for these crops is not available. We compared CROPGRIDS harvested area of individual crops against independent national and subnational data from 36 National Statistical Offices (NSOs), national-level crop area data for more than 180 countries and territories from FAOSTAT, as well as geospatially, against a newly available high-resolution (30 m) cropland agreement map (Tubiello et al., 2023). Results indicated robustness against the available independent information, with CROPGRIDS world total harvested and crop areas around 1.5 billion hectares. To the best of our knowledge, CROPGRIDS represents the most comprehensive update of previous work on the subject area, offering a new benchmark of global gridded harvested and crop area data for the year circa 2020. CROPGRIDS dataset can be downloaded at https://doi.org/10.6084/m9.figshare.22491997 (Tang et al., 2023).

This preprint has been withdrawn.

Received: 03 Apr 2023 – Discussion started: 20 Apr 2023

Competing interests: At least one of the (co-)authors is a member of the editorial board of Earth System Science Data. The peer review process was guided by an independent editor, and the authors also have no other competing interests to declare.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 1915 KB)

Withdrawal notice
This preprint has been withdrawn.
Preprint (1915 KB)

Supplement (3317 KB)

Download & links

This preprint has been withdrawn.

Fiona H. M. Tang, Thu Ha Nguyen, Giulia Conchedda, Leon Casse, Francesco N. Tubiello, and Federico Maggi

Interactive discussion

Status: closed

RC1:
'Comment on essd-2023-130', Anonymous Referee #1, 16 May 2023

See the attached file

Citation: https://doi.org/10.5194/essd-2023-130-RC1
- AC1:
  'Reply on RC1', Fiona Tang, 30 Jun 2023
  We thank the Reviewer for his/her positive comments. Please see below our point-by-point responses with Reviewer's comments in normal font and our responses in bold.
  My first concern is that the authors mixed all the crop type maps from different years between 2000 and 2020. The underlying assumption is that crop distribution does not change or change little over this long period and so you could use the crop in a certain location/pixel in previous year to represent the same crop in the latest year circa 2020. This is in fact not true. While cropland may not change from year to year, crop types vary from year to year due to market conditions, climate forecasting (e.g. farmers decide what to plant in their fields by looking into the expected prices and climate forecasting), and even crop rotations or fallow. 20 years is a long time. This is exactly why crop type mapping is much harder than cropland extent mapping.
  
  We agree with the Reviewer about the fact that mapping crop type remains challenging. While we sought to provide the best possible update to Monfreda et al. (2008) from available data since 2000, at the same time we explicitly accounted for the year of release of those data—giving preference to the most recent available years, through selection via the endogenous data quality provided in Eq. (1). This procedure penalized datasets older than 2015, making it less likely to be selected and hence integrated into CROPGRIDS. In addition, 24 out of the 27 selected datasets referred to the period between 2015 and 2020.
  This method resulted in more than 62% of cropland area included in CROPGRIDS having a reference year between 2015 and 2020, therefore reducing the very issue highlighted in the review where possible. Results were also validated against FAOSTAT time series to gain insight into changes occurred between 2000 and 2020 (see Section 3.1 in submitted manuscript).
  Finally, we measured the uncertainty associated with our selection procedure (see Section 2.3 in submitted manuscript) and the selection outcome was not substantially affected by the use of different weights, meaning that CROPGRIDS represents the best possible outcome of combinations of available dataset from recent years.
  We nonetheless appreciate the Reviewer for pointing out this limitation, which is now explicitly acknowledged in the revised manuscript in line 319 – 322.
  Related to the above mixture is that they mix and match the crops together. Admittedly, the authors try to harmonize “crop names in the input datasets, including performing aggregations where needed, to correspond to the crop names in MRF, thus ensuring internal consistency and alignment with FAO crop classifications (Supplementary Table 1)” (page 5 Line 97-100). In fact, these different crop maps are created for different purposes at different times, and their definition of crops, particularly aggregated ones such as beans, pulse, or even millet, may vary from one dataset to another. When the authors treat them as the same, they are mixing apples with oranges!
  
  We agree with the reviewer on the difficulty to map different crop maps even when referring to the same species. The same issue is relevant even at higher level, for instance when consolidating ‘’cropland’’ from different land cover datasets. There is nonetheless a significant need to consolidate the existing information, as the alternative would be no information at all—as testified by the fact that the MRF map circa 2000 is still the only basic product that existed prior to our work, and this is unacceptable.
  At the same time the methods we developed aim at minimizing the definitional bias inherent in the nature of this work. We have explicitly engaged with FAO experts who contribute as co-authors to minimize possible mismatching errors. We used the MRF and FAO crop classifications, including the Indicative Crop Classification (ICC) of the World Programme for the Census of Agriculture, as the foundation for tagging crops, matching crop names or when this was not possible, species groupings, ensuring consistency across different datasets. As an example, "coffee" in both MRF and FAO encompasses various types of coffee plants within this category. Consequently, when presented with individual maps of different coffee plants, such as arabica and robusta coffee from datasets like SPAM, we combined the two maps and represent them as "coffee."
  It's important to note that we did not aggregate crops from different years, ensuring that each dataset remained intact within its respective time frame.
  We respectfully disagree with the Reviewer that this aggregation equates to mixing "apples and oranges." On the contrary, the harmonization of definitions underlying our methods and the engagement of FAO statistical experts enabled for meaningful comparisons across diverse datasets.
  In crop type mapping, farming system is critical. While CROPGRIDS does address multiple cropping for certain crops, it neglects mixed or sequential (e.g. winter wheat followed by summer rice, or mixed beans and maize) cropping systems. Furthermore irrigation is another complexity. The absence of information on these systems can lead to inaccuracies in harvested and crop area estimates, potentially resulting in underestimations or overestimations. Recognizing this limitation is crucial, but it also emphasizes the need for further research and data collection to address these gaps in knowledge
  
  CROPGRIDS incorporates sequential cropping when there is sufficient data or information available. For instance, the CHNMZWHRI dataset explicitly accounts for sequential cropping. While we acknowledge that mapping specific cropping systems would be valuable, it falls outside the scope of this manuscript.
  Regarding irrigation, some of the input datasets do utilize irrigation as prior information to enhance crop mapping accuracy, such as MRF, SPAM2010, and GAEZ+2015. However, we concur with the Reviewer that absence of accurate irrigation data poses limitations contributing to uncertainties.
  In our revised manuscript, we addressed uncertainties from lack of data regarding cropping practices and irrigation in line 313 – 315. We appreciate the Reviewer's suggestion to emphasize the need for additional data collection, and this aspect was newly highlighted in the revised manuscript in line 316 - 318.
  For crop type mapping, the most critical input is the groundtruthing data for satellite-based mapping, sampling points for survey method, and sub-national crop statistics data for modellingbased method. CROPGIRDS doesn’t add or collect any new input data (they collected independent data sourced from National Statistical Offices (NSOs) but only for validation). It assembles and integrates the existing gridded datasets. This puts extra importance or even necessity for this paper to show new innovations to be justified to be published on ESSD – a data journal.
  
  We appreciate and share the overall Reviewer’s point about the importance of ground truth as a source of information for validation. Nonetheless, we underline that the purpose of this work was to re-analyse the existing information to create a novel, comprehensive dataset on crop types distribution in view of the current lack of data knowledge. To this aim, the validation with individual national statistical offices provides a solid foundation to the value of this work even in the absence of a ground truth validation, while pointing usefully to which areas and crops validation datasets are needed to move the science forward.
  We have inserted some of the above points in the conclusions of the revised manuscript in line 354 – 356.
  Perhaps a minor issue: circa 2020 is misleading. Normally circa 2020 implies reference year around Year 2000, having similar years of data before or after Year 2000. In fact, the majority of the global datasets in this paper are before 2018 and only a few country maps are for Years 2019 and 2020.
  
  Our view—having discussed this very issue during manuscript development – that any year from within the range provided by the selected datasets to build CROPGRIDS would be arbitrary, so that the use of ‘’circa 2020’’ best describes the product, pointing after all to the year of the most updated dataset, which is 2020.
  
  Citation: https://doi.org/10.5194/essd-2023-130-AC1
RC2:
'Comment on essd-2023-130', Anonymous Referee #2, 29 May 2023

Spatial-explicit crop distribution maps are important for biogeochemical cycle modelings. This study produced a dataset with 173 crop maps using a data harmonization approach. I found the data useful, but the details of the approaches used were not clear. For the top-down method, a commonly seen challenge is to appropriately define the allocation priority. This often happens when allocating a few types of crops to a region while lacking sub-national distribution information. The authors have described the details of the rules built in using the data (i.e. the ranking of the data). However, it is unclear what rules were applied in allocating each of the crop areas to maps. In other words, which crop type was given the priority to be allocated first? The sequence matters because when a specific crop was allocated in a grid, other crop types will have to be allocated to the rest available area in the grid. When the grid is fully occupied, other candidate types need to be allocated to other grids. Moreover, there is also the case when a grid can potentially be allocated to many types of crops, but the total area of these crops exceeds the available cropland area in the grid. Then, the allocation sequence also matters. Therefore, the sequence of the crop to be allocated greatly determines product reliability. I think this is the most challenging work in data harmonization and these mechanisms need to be clarified.
Other suggestions:
1. The author claimed that the data was updated from Monfreda et al. (2008) dataset. Does it mean that the cropland area was the same as the dataset? How about the cropland expansion/abandonment from 2000 to 2020?
2. I checked the data and found that "no data" and "no crop" are both 0. Better to use a fixed value to flag the no data area.

Citation: https://doi.org/10.5194/essd-2023-130-RC2
- AC2:
  'Reply on RC2', Fiona Tang, 30 Jun 2023
  We thank the Reviewer for his/her positive comments. Please see below our point-by-point responses with Reviewer's comments in normal font and our responses in bold.
  %%%%
  Spatial-explicit crop distribution maps are important for biogeochemical cycle modelings. This study produced a dataset with 173 crop maps using a data harmonization approach. I found the data useful, but the details of the approaches used were not clear. For the top-down method, a commonly seen challenge is to appropriately define the allocation priority. This often happens when allocating a few types of crops to a region while lacking sub-national distribution information. The authors have described the details of the rules built in using the data (i.e. the ranking of the data). However, it is unclear what rules were applied in allocating each of the crop areas to maps. In other words, which crop type was given the priority to be allocated first? The sequence matters because when a specific crop was allocated in a grid, other crop types will have to be allocated to the rest available area in the grid. When the grid is fully occupied, other candidate types need to be allocated to other grids. Moreover, there is also the case when a grid can potentially be allocated to many types of crops, but the total area of these crops exceeds the available cropland area in the grid. Then, the allocation sequence also matters. Therefore, the sequence of the crop to be allocated greatly determines product reliability. I think this is the most challenging work in data harmonization and these mechanisms need to be clarified.
  We appreciate the positive comment from the Reviewer. We would like to clarify that neither sequential allocation of crops to grid cells nor prioritization of specific crops is employed in the construction of CROPGRIDS. Rather, during the harmonization step (see Section 2.2.1 in the manuscript), we first checked that the harvested area (HA) of each crop provided in each dataset does not exceed three times the corresponding crop area (CA) and that CA does not exceed the grid cell area (GA). If CA > GA then CA is scaled down to match the grid cell area (CA = GA). Likewise, if HA > 3*CA, then we scaled HA down to HA = 3*CA. After this harmonization process, we next chose the georeferenced dataset that best represents the harvested and crop areas of each crop type in parallel.
  After this step, we double checked that the conditions CA ≤ GA and that HA ≤ 3*CA used before were always satisfied. Additionally, we also checked that the sum of CA across all crops is smaller than or equal to GA. We did not find instances in which this last condition was not satisfied. We have clarified the dataset selection and this verification step in the revised manuscript in line 166 – 167 and line 170 – 173.
  Other suggestions:
  The author claimed that the data was updated from Monfreda et al. (2008) dataset. Does it mean that the cropland area was the same as the dataset? How about the cropland expansion/abandonment from 2000 to 2020?
  
  CROPGRIDS is an update of Monfreda et al. (2008), but it does not preserve the same cropland area as in Monfreda et al. (2008). Rather, it takes into account the changes of cropland over the past two decades. In fact, CROPGRIDS recorded a total harvested area of 1.54 billion ha globally, while Monfreda et al. (2008) estimated around 1.26 billion ha. We also compared CROPGRIDS against the most recent cropland map (Tubiello et al., 2023), and both datasets were in good agreement (see Section 3.3 in the manuscript).
  References:
  Monfreda, C., Ramankutty, N. and Foley, J.A.: Farming the planet: 2. Geographic distribution of crop areas, yields, physiological types, and net primary production in the year 2000. Glob. Biogeochem. Cycles, 22, GB10222008, https://doi.org/10.1029/2007GB002947, 2008.
  Tubiello, F. N., Conchedda, G., Casse, L., Hao, P., De Santis, G., and Chen, Z.: A new cropland area database by country circa 2020, Earth Syst. Sci. Data Discuss. [preprint], https://doi.org/10.5194/essd-2023-211, in review, 2023.
  I checked the data and found that "no data" and "no crop" are both 0. Better to use a fixed value to flag the no data area.
  
  We thank the Reviewer for pointing this out. We have now marked the “not assessed” as “-1” and “ocean/water” as “-2”. We have updated the .nc files and metadata files distributed in the figshare link. Note that the .nc files also include the legend of the values.
  
  Citation: https://doi.org/10.5194/essd-2023-130-AC2
RC3:
'Comment on essd-2023-130', Anonymous Referee #3, 18 Jun 2023
Review for ESSD-2023-130
Title: CROPGRIDS: A global geo-referenced dataset of 173 crops circa 2020

This study updated 80 crops out of the 173 crops in the Monfreda et al. (2008) crop dataset to circa 2020 by collecting and using the 26 recently published crop datasets at multiple scales. This developed CROPGRIDS dataset was carefully evaluated and compared with FAOSTAT and other independent national and subnational statistical datasets. The methods of data selection and development were clearly illustrated, and the major biases and uncertainty of this data were also pointed out. This crop dataset can be very valuable for communities with its updated information on global gridded harvested and crop area. I think the manuscript can eventually be accepted, but I still have some questions and suggestions.
The endogenous and exogenous data quality indicators were used to choose the best-fit input datasets for each crop and country. Instead of choosing this best-fit dataset, have you ever considered combining the advantages of different types of datasets? The survey data has the advantage of high accuracy in total amount, while remote sensing data may have the advantage of reflecting spatial variability of cropland distribution. Using the amount of survey data and spatial patterns of remote sensing data may be better than using either of the single dataset.

The comparison between CROPGRIDS and FAOSTAT (Fig 2) showed that the CROPGRIDS performed much better in major crops in large countries (crops with higher harvested area), but not well in crops with lower harvested area. The R² used here is mostly determined by the high values. Can you provide another indicator to represent the overall differences between two datasets, e.g. NSE?

When selecting input datasets, geospatial coverage for at least one country is one of the criteria. There are huge differences in cropland areas of different countries. For some large countries, e.g. China, and India, the sub-national data can also be very critical. Using sub-national data may help reduce the biases and uncertainty within these countries.

Have you compared the harvest area of cropland with the HYDE dataset?
Citation: https://doi.org/10.5194/essd-2023-130-RC3
- AC3:
  'Reply on RC3', Fiona Tang, 30 Jun 2023
  We thank the Reviewer for his/her positive comments. Please see below our point-by-point responses with Reviewer's comments in normal font and our responses in bold.
  %%%%
  The endogenous and exogenous data quality indicators were used to choose the best-fit input datasets for each crop and country. Instead of choosing this best-fit dataset, have you ever considered combining the advantages of different types of datasets? The survey data has the advantage of high accuracy in total amount, while remote sensing data may have the advantage of reflecting spatial variability of cropland distribution. Using the amount of survey data and spatial patterns of remote sensing data may be better than using either of the single dataset.
  
  We fully understand the recommendation by the Reviewer, and we have in fact tested different ways to combine the information. Eventually, we clearly found that combining data such as averaging layers from multiple sources introduced substantial biases, mostly because of the spatial nature of the datasets. For example, you may have one dataset providing a spatial distribution and a second dataset providing a different spatial distribution with no overlapping. In this case, averaging would produce a much more extended geospatial covering, but with half the cultivated area. This would produce grid cells with area towards the lower bound, with this being a strong bias. We tested other averaging, such as weighted averages by dataset quality, but this also resulted in strong biases. We therefore concluded that the best option was to retain one dataset only when multiple were available. Thanks for reasoning around this.
  The comparison between CROPGRIDS and FAOSTAT (Fig 2) showed that the CROPGRIDS performed much better in major crops in large countries (crops with higher harvested area), but not well in crops with lower harvested area. The R2 used here is mostly determined by the high values. Can you provide another indicator to represent the overall differences between two datasets, e.g. NSE?
  
  We thank the Reviewer for pointing this out. We have now included in Fig 2 the normalized root mean squared error (NRMSE) to quantify the overall differences between the two datasets.
  When selecting input datasets, geospatial coverage for at least one country is one of the criteria. There are huge differences in cropland areas of different countries. For some large countries, e.g. China, and India, the sub-national data can also be very critical. Using sub-national data may help reduce the biases and uncertainty within these countries.
  
  We selected only datasets that provide georeferenced information covering at least one country with at least a resolution of 0.083 degree (about 10 km at the equator), and hence, all input datasets include sub-national level data.
  Have you compared the harvest area of cropland with the HYDE dataset?
  
  We have not compared the harvested area with the HYDE dataset as HYPDE provides land use data until 2015, whereas CROPGRIDS has a reference year of 2020. Instead, we compared against the new cropland agreement map (CAM) that is also circa 2020.
  
  Citation: https://doi.org/10.5194/essd-2023-130-AC3

Interactive discussion

Status: closed

RC1:
'Comment on essd-2023-130', Anonymous Referee #1, 16 May 2023

See the attached file

Citation: https://doi.org/10.5194/essd-2023-130-RC1
- AC1:
  'Reply on RC1', Fiona Tang, 30 Jun 2023
  We thank the Reviewer for his/her positive comments. Please see below our point-by-point responses with Reviewer's comments in normal font and our responses in bold.
  My first concern is that the authors mixed all the crop type maps from different years between 2000 and 2020. The underlying assumption is that crop distribution does not change or change little over this long period and so you could use the crop in a certain location/pixel in previous year to represent the same crop in the latest year circa 2020. This is in fact not true. While cropland may not change from year to year, crop types vary from year to year due to market conditions, climate forecasting (e.g. farmers decide what to plant in their fields by looking into the expected prices and climate forecasting), and even crop rotations or fallow. 20 years is a long time. This is exactly why crop type mapping is much harder than cropland extent mapping.
  
  We agree with the Reviewer about the fact that mapping crop type remains challenging. While we sought to provide the best possible update to Monfreda et al. (2008) from available data since 2000, at the same time we explicitly accounted for the year of release of those data—giving preference to the most recent available years, through selection via the endogenous data quality provided in Eq. (1). This procedure penalized datasets older than 2015, making it less likely to be selected and hence integrated into CROPGRIDS. In addition, 24 out of the 27 selected datasets referred to the period between 2015 and 2020.
  This method resulted in more than 62% of cropland area included in CROPGRIDS having a reference year between 2015 and 2020, therefore reducing the very issue highlighted in the review where possible. Results were also validated against FAOSTAT time series to gain insight into changes occurred between 2000 and 2020 (see Section 3.1 in submitted manuscript).
  Finally, we measured the uncertainty associated with our selection procedure (see Section 2.3 in submitted manuscript) and the selection outcome was not substantially affected by the use of different weights, meaning that CROPGRIDS represents the best possible outcome of combinations of available dataset from recent years.
  We nonetheless appreciate the Reviewer for pointing out this limitation, which is now explicitly acknowledged in the revised manuscript in line 319 – 322.
  Related to the above mixture is that they mix and match the crops together. Admittedly, the authors try to harmonize “crop names in the input datasets, including performing aggregations where needed, to correspond to the crop names in MRF, thus ensuring internal consistency and alignment with FAO crop classifications (Supplementary Table 1)” (page 5 Line 97-100). In fact, these different crop maps are created for different purposes at different times, and their definition of crops, particularly aggregated ones such as beans, pulse, or even millet, may vary from one dataset to another. When the authors treat them as the same, they are mixing apples with oranges!
  
  We agree with the reviewer on the difficulty to map different crop maps even when referring to the same species. The same issue is relevant even at higher level, for instance when consolidating ‘’cropland’’ from different land cover datasets. There is nonetheless a significant need to consolidate the existing information, as the alternative would be no information at all—as testified by the fact that the MRF map circa 2000 is still the only basic product that existed prior to our work, and this is unacceptable.
  At the same time the methods we developed aim at minimizing the definitional bias inherent in the nature of this work. We have explicitly engaged with FAO experts who contribute as co-authors to minimize possible mismatching errors. We used the MRF and FAO crop classifications, including the Indicative Crop Classification (ICC) of the World Programme for the Census of Agriculture, as the foundation for tagging crops, matching crop names or when this was not possible, species groupings, ensuring consistency across different datasets. As an example, "coffee" in both MRF and FAO encompasses various types of coffee plants within this category. Consequently, when presented with individual maps of different coffee plants, such as arabica and robusta coffee from datasets like SPAM, we combined the two maps and represent them as "coffee."
  It's important to note that we did not aggregate crops from different years, ensuring that each dataset remained intact within its respective time frame.
  We respectfully disagree with the Reviewer that this aggregation equates to mixing "apples and oranges." On the contrary, the harmonization of definitions underlying our methods and the engagement of FAO statistical experts enabled for meaningful comparisons across diverse datasets.
  In crop type mapping, farming system is critical. While CROPGRIDS does address multiple cropping for certain crops, it neglects mixed or sequential (e.g. winter wheat followed by summer rice, or mixed beans and maize) cropping systems. Furthermore irrigation is another complexity. The absence of information on these systems can lead to inaccuracies in harvested and crop area estimates, potentially resulting in underestimations or overestimations. Recognizing this limitation is crucial, but it also emphasizes the need for further research and data collection to address these gaps in knowledge
  
  CROPGRIDS incorporates sequential cropping when there is sufficient data or information available. For instance, the CHNMZWHRI dataset explicitly accounts for sequential cropping. While we acknowledge that mapping specific cropping systems would be valuable, it falls outside the scope of this manuscript.
  Regarding irrigation, some of the input datasets do utilize irrigation as prior information to enhance crop mapping accuracy, such as MRF, SPAM2010, and GAEZ+2015. However, we concur with the Reviewer that absence of accurate irrigation data poses limitations contributing to uncertainties.
  In our revised manuscript, we addressed uncertainties from lack of data regarding cropping practices and irrigation in line 313 – 315. We appreciate the Reviewer's suggestion to emphasize the need for additional data collection, and this aspect was newly highlighted in the revised manuscript in line 316 - 318.
  For crop type mapping, the most critical input is the groundtruthing data for satellite-based mapping, sampling points for survey method, and sub-national crop statistics data for modellingbased method. CROPGIRDS doesn’t add or collect any new input data (they collected independent data sourced from National Statistical Offices (NSOs) but only for validation). It assembles and integrates the existing gridded datasets. This puts extra importance or even necessity for this paper to show new innovations to be justified to be published on ESSD – a data journal.
  
  We appreciate and share the overall Reviewer’s point about the importance of ground truth as a source of information for validation. Nonetheless, we underline that the purpose of this work was to re-analyse the existing information to create a novel, comprehensive dataset on crop types distribution in view of the current lack of data knowledge. To this aim, the validation with individual national statistical offices provides a solid foundation to the value of this work even in the absence of a ground truth validation, while pointing usefully to which areas and crops validation datasets are needed to move the science forward.
  We have inserted some of the above points in the conclusions of the revised manuscript in line 354 – 356.
  Perhaps a minor issue: circa 2020 is misleading. Normally circa 2020 implies reference year around Year 2000, having similar years of data before or after Year 2000. In fact, the majority of the global datasets in this paper are before 2018 and only a few country maps are for Years 2019 and 2020.
  
  Our view—having discussed this very issue during manuscript development – that any year from within the range provided by the selected datasets to build CROPGRIDS would be arbitrary, so that the use of ‘’circa 2020’’ best describes the product, pointing after all to the year of the most updated dataset, which is 2020.
  
  Citation: https://doi.org/10.5194/essd-2023-130-AC1
RC2:
'Comment on essd-2023-130', Anonymous Referee #2, 29 May 2023

Spatial-explicit crop distribution maps are important for biogeochemical cycle modelings. This study produced a dataset with 173 crop maps using a data harmonization approach. I found the data useful, but the details of the approaches used were not clear. For the top-down method, a commonly seen challenge is to appropriately define the allocation priority. This often happens when allocating a few types of crops to a region while lacking sub-national distribution information. The authors have described the details of the rules built in using the data (i.e. the ranking of the data). However, it is unclear what rules were applied in allocating each of the crop areas to maps. In other words, which crop type was given the priority to be allocated first? The sequence matters because when a specific crop was allocated in a grid, other crop types will have to be allocated to the rest available area in the grid. When the grid is fully occupied, other candidate types need to be allocated to other grids. Moreover, there is also the case when a grid can potentially be allocated to many types of crops, but the total area of these crops exceeds the available cropland area in the grid. Then, the allocation sequence also matters. Therefore, the sequence of the crop to be allocated greatly determines product reliability. I think this is the most challenging work in data harmonization and these mechanisms need to be clarified.
Other suggestions:
1. The author claimed that the data was updated from Monfreda et al. (2008) dataset. Does it mean that the cropland area was the same as the dataset? How about the cropland expansion/abandonment from 2000 to 2020?
2. I checked the data and found that "no data" and "no crop" are both 0. Better to use a fixed value to flag the no data area.

Citation: https://doi.org/10.5194/essd-2023-130-RC2
- AC2:
  'Reply on RC2', Fiona Tang, 30 Jun 2023
  We thank the Reviewer for his/her positive comments. Please see below our point-by-point responses with Reviewer's comments in normal font and our responses in bold.
  %%%%
  Spatial-explicit crop distribution maps are important for biogeochemical cycle modelings. This study produced a dataset with 173 crop maps using a data harmonization approach. I found the data useful, but the details of the approaches used were not clear. For the top-down method, a commonly seen challenge is to appropriately define the allocation priority. This often happens when allocating a few types of crops to a region while lacking sub-national distribution information. The authors have described the details of the rules built in using the data (i.e. the ranking of the data). However, it is unclear what rules were applied in allocating each of the crop areas to maps. In other words, which crop type was given the priority to be allocated first? The sequence matters because when a specific crop was allocated in a grid, other crop types will have to be allocated to the rest available area in the grid. When the grid is fully occupied, other candidate types need to be allocated to other grids. Moreover, there is also the case when a grid can potentially be allocated to many types of crops, but the total area of these crops exceeds the available cropland area in the grid. Then, the allocation sequence also matters. Therefore, the sequence of the crop to be allocated greatly determines product reliability. I think this is the most challenging work in data harmonization and these mechanisms need to be clarified.
  We appreciate the positive comment from the Reviewer. We would like to clarify that neither sequential allocation of crops to grid cells nor prioritization of specific crops is employed in the construction of CROPGRIDS. Rather, during the harmonization step (see Section 2.2.1 in the manuscript), we first checked that the harvested area (HA) of each crop provided in each dataset does not exceed three times the corresponding crop area (CA) and that CA does not exceed the grid cell area (GA). If CA > GA then CA is scaled down to match the grid cell area (CA = GA). Likewise, if HA > 3*CA, then we scaled HA down to HA = 3*CA. After this harmonization process, we next chose the georeferenced dataset that best represents the harvested and crop areas of each crop type in parallel.
  After this step, we double checked that the conditions CA ≤ GA and that HA ≤ 3*CA used before were always satisfied. Additionally, we also checked that the sum of CA across all crops is smaller than or equal to GA. We did not find instances in which this last condition was not satisfied. We have clarified the dataset selection and this verification step in the revised manuscript in line 166 – 167 and line 170 – 173.
  Other suggestions:
  The author claimed that the data was updated from Monfreda et al. (2008) dataset. Does it mean that the cropland area was the same as the dataset? How about the cropland expansion/abandonment from 2000 to 2020?
  
  CROPGRIDS is an update of Monfreda et al. (2008), but it does not preserve the same cropland area as in Monfreda et al. (2008). Rather, it takes into account the changes of cropland over the past two decades. In fact, CROPGRIDS recorded a total harvested area of 1.54 billion ha globally, while Monfreda et al. (2008) estimated around 1.26 billion ha. We also compared CROPGRIDS against the most recent cropland map (Tubiello et al., 2023), and both datasets were in good agreement (see Section 3.3 in the manuscript).
  References:
  Monfreda, C., Ramankutty, N. and Foley, J.A.: Farming the planet: 2. Geographic distribution of crop areas, yields, physiological types, and net primary production in the year 2000. Glob. Biogeochem. Cycles, 22, GB10222008, https://doi.org/10.1029/2007GB002947, 2008.
  Tubiello, F. N., Conchedda, G., Casse, L., Hao, P., De Santis, G., and Chen, Z.: A new cropland area database by country circa 2020, Earth Syst. Sci. Data Discuss. [preprint], https://doi.org/10.5194/essd-2023-211, in review, 2023.
  I checked the data and found that "no data" and "no crop" are both 0. Better to use a fixed value to flag the no data area.
  
  We thank the Reviewer for pointing this out. We have now marked the “not assessed” as “-1” and “ocean/water” as “-2”. We have updated the .nc files and metadata files distributed in the figshare link. Note that the .nc files also include the legend of the values.
  
  Citation: https://doi.org/10.5194/essd-2023-130-AC2
RC3:
'Comment on essd-2023-130', Anonymous Referee #3, 18 Jun 2023
Review for ESSD-2023-130
Title: CROPGRIDS: A global geo-referenced dataset of 173 crops circa 2020

This study updated 80 crops out of the 173 crops in the Monfreda et al. (2008) crop dataset to circa 2020 by collecting and using the 26 recently published crop datasets at multiple scales. This developed CROPGRIDS dataset was carefully evaluated and compared with FAOSTAT and other independent national and subnational statistical datasets. The methods of data selection and development were clearly illustrated, and the major biases and uncertainty of this data were also pointed out. This crop dataset can be very valuable for communities with its updated information on global gridded harvested and crop area. I think the manuscript can eventually be accepted, but I still have some questions and suggestions.
The endogenous and exogenous data quality indicators were used to choose the best-fit input datasets for each crop and country. Instead of choosing this best-fit dataset, have you ever considered combining the advantages of different types of datasets? The survey data has the advantage of high accuracy in total amount, while remote sensing data may have the advantage of reflecting spatial variability of cropland distribution. Using the amount of survey data and spatial patterns of remote sensing data may be better than using either of the single dataset.

The comparison between CROPGRIDS and FAOSTAT (Fig 2) showed that the CROPGRIDS performed much better in major crops in large countries (crops with higher harvested area), but not well in crops with lower harvested area. The R² used here is mostly determined by the high values. Can you provide another indicator to represent the overall differences between two datasets, e.g. NSE?

When selecting input datasets, geospatial coverage for at least one country is one of the criteria. There are huge differences in cropland areas of different countries. For some large countries, e.g. China, and India, the sub-national data can also be very critical. Using sub-national data may help reduce the biases and uncertainty within these countries.

Have you compared the harvest area of cropland with the HYDE dataset?
Citation: https://doi.org/10.5194/essd-2023-130-RC3
- AC3:
  'Reply on RC3', Fiona Tang, 30 Jun 2023
  We thank the Reviewer for his/her positive comments. Please see below our point-by-point responses with Reviewer's comments in normal font and our responses in bold.
  %%%%
  The endogenous and exogenous data quality indicators were used to choose the best-fit input datasets for each crop and country. Instead of choosing this best-fit dataset, have you ever considered combining the advantages of different types of datasets? The survey data has the advantage of high accuracy in total amount, while remote sensing data may have the advantage of reflecting spatial variability of cropland distribution. Using the amount of survey data and spatial patterns of remote sensing data may be better than using either of the single dataset.
  
  We fully understand the recommendation by the Reviewer, and we have in fact tested different ways to combine the information. Eventually, we clearly found that combining data such as averaging layers from multiple sources introduced substantial biases, mostly because of the spatial nature of the datasets. For example, you may have one dataset providing a spatial distribution and a second dataset providing a different spatial distribution with no overlapping. In this case, averaging would produce a much more extended geospatial covering, but with half the cultivated area. This would produce grid cells with area towards the lower bound, with this being a strong bias. We tested other averaging, such as weighted averages by dataset quality, but this also resulted in strong biases. We therefore concluded that the best option was to retain one dataset only when multiple were available. Thanks for reasoning around this.
  The comparison between CROPGRIDS and FAOSTAT (Fig 2) showed that the CROPGRIDS performed much better in major crops in large countries (crops with higher harvested area), but not well in crops with lower harvested area. The R2 used here is mostly determined by the high values. Can you provide another indicator to represent the overall differences between two datasets, e.g. NSE?
  
  We thank the Reviewer for pointing this out. We have now included in Fig 2 the normalized root mean squared error (NRMSE) to quantify the overall differences between the two datasets.
  When selecting input datasets, geospatial coverage for at least one country is one of the criteria. There are huge differences in cropland areas of different countries. For some large countries, e.g. China, and India, the sub-national data can also be very critical. Using sub-national data may help reduce the biases and uncertainty within these countries.
  
  We selected only datasets that provide georeferenced information covering at least one country with at least a resolution of 0.083 degree (about 10 km at the equator), and hence, all input datasets include sub-national level data.
  Have you compared the harvest area of cropland with the HYDE dataset?
  
  We have not compared the harvested area with the HYDE dataset as HYPDE provides land use data until 2015, whereas CROPGRIDS has a reference year of 2020. Instead, we compared against the new cropland agreement map (CAM) that is also circa 2020.
  
  Citation: https://doi.org/10.5194/essd-2023-130-AC3

Fiona H. M. Tang, Thu Ha Nguyen, Giulia Conchedda, Leon Casse, Francesco N. Tubiello, and Federico Maggi

Supplement

https://doi.org/10.5194/essd-2023-130-supplement

Data sets

CROPGRIDS Fiona H. M. Tang, Thu Ha Nguyen, Giulia Conchedda, Leon Casse, Francesco N. Tubiello, and Federico Maggi https://doi.org/10.6084/m9.figshare.22491997

Fiona H. M. Tang, Thu Ha Nguyen, Giulia Conchedda, Leon Casse, Francesco N. Tubiello, and Federico Maggi

Viewed

Total article views: 4,447 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
3,122	1,226	99	4,447	345	86	114

HTML: 3,122
PDF: 1,226
XML: 99
Total: 4,447
Supplement: 345
BibTeX: 86
EndNote: 114

Views and downloads (calculated since 20 Apr 2023)

Month	HTML	PDF	XML	Total
Apr 2023	280	96	8	384
May 2023	252	48	5	305
Jun 2023	212	69	5	286
Jul 2023	191	42	7	240
Aug 2023	116	46	3	165
Sep 2023	132	64	7	203
Oct 2023	103	76	2	181
Nov 2023	46	34	1	81
Dec 2023	65	39	4	108
Jan 2024	98	57	1	156
Feb 2024	94	42	2	138
Mar 2024	110	64	6	180
Apr 2024	103	44	8	155
May 2024	81	44	6	131
Jun 2024	70	21	5	96
Jul 2024	51	27	6	84
Aug 2024	38	16	4	58
Sep 2024	56	24	1	81
Oct 2024	41	22	1	64
Nov 2024	37	33	0	70
Dec 2024	41	20	0	61
Jan 2025	59	22	2	83
Feb 2025	50	32	0	82
Mar 2025	63	35	1	99
Apr 2025	90	17	3	110
May 2025	70	33	1	104
Jun 2025	63	23	0	86
Jul 2025	62	22	2	86
Aug 2025	106	40	1	147
Sep 2025	305	27	7	339
Oct 2025	37	47	0	84

Cumulative views and downloads (calculated since 20 Apr 2023)

Month	HTML	PDF	XML	Total
Apr 2023	280	96	8	384
May 2023	252	48	5	305
Jun 2023	212	69	5	286
Jul 2023	191	42	7	240
Aug 2023	116	46	3	165
Sep 2023	132	64	7	203
Oct 2023	103	76	2	181
Nov 2023	46	34	1	81
Dec 2023	65	39	4	108
Jan 2024	98	57	1	156
Feb 2024	94	42	2	138
Mar 2024	110	64	6	180
Apr 2024	103	44	8	155
May 2024	81	44	6	131
Jun 2024	70	21	5	96
Jul 2024	51	27	6	84
Aug 2024	38	16	4	58
Sep 2024	56	24	1	81
Oct 2024	41	22	1	64
Nov 2024	37	33	0	70
Dec 2024	41	20	0	61
Jan 2025	59	22	2	83
Feb 2025	50	32	0	82
Mar 2025	63	35	1	99
Apr 2025	90	17	3	110
May 2025	70	33	1	104
Jun 2025	63	23	0	86
Jul 2025	62	22	2	86
Aug 2025	106	40	1	147
Sep 2025	305	27	7	339
Oct 2025	37	47	0	84

Viewed (geographical distribution)

Total article views: 4,358 (including HTML, PDF, and XML) Thereof 4,358 with geography defined and 0 with unknown origin.

Country	#	Views	%

Cited

Latest update: 18 Oct 2025

Download

This preprint has been withdrawn.

Preprint (1915 KB)
Metadata XML

Short summary

CROPGRIDS is a comprehensive global, geo-referenced dataset that provides information on harvested and crop areas of 173 crops circa the year 2020. This new product provides more recent crop type information for 80 crops, covering about 1.2 billion hectares of crop area globally. CROPGRIDS will facilitate global-scale assessments in various disciplines, including agriculture and resource management, food systems, environmental impact and sustainability analyses, and agroeconomics.


Total:	0
HTML:	0
PDF:	0
XML:	0

CROPGRIDS: A global geo-referenced dataset of 173 crops circa 2020

Interactive discussion

Interactive discussion

Supplement

Data sets

Viewed

Viewed (geographical distribution)

Cited

3 citations as recorded by crossref.