the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
EuroCrops v2.0: Multi-annual harmonized parcel level crop type data linked to European Union-wide survey, statistical and Earth Observation products
Abstract. As part of the Common Agricultural Policy (CAP) of the European Union (EU), farmers make annual declarations of the agricultural activities for which they receive subsidies. The declarations include the crops they grow at parcel level, referred to as Geo-Spatial Application (GSA) data. Paying Agencies (PA) of every EU Member State (MS) use specific crop classifications in their native language, and not all provide access to the GSA data. In the past, the EuroCrops initiative harmonized openly available GSA data for a single year (2021) using the Hierarchical Crop and Agriculture Taxonomy (HCAT), but multiple years are available depending on the country. Harmonizing a time series of farmers' crop declarations at parcel level would allow for comparative spatiotemporal analysis across the EU, the development of indicators that can be used for CAP and other policy monitoring purposes, and would provide data for training and validation of remotely sensed products. Here we have collected the GSA crop type declarations and parcel geometries that are publicly available from 18 PAs, the administrative bodies managing GSA data, for a minimum of three years. We have then harmonized the GSA data using HCAT v.4, a new version developed as part of this work. The data set includes nearly 47 million parcels covering 21 Mha. To facilitate integration and interoperability of the GSA data with other EU data sets containing spatial information on crops, we harmonized the crop classes used in the following data sets with HCAT v.4: 1) LUCAS, 2) the Integrated Farm Statistics/Farm Structure Survey, 3) the Farm Accountancy Data Network (FADN), and 4) the classification systems of the Copernicus High Resolution Layer on Crop Types. To demonstrate the potential of the multiannual, harmonised dataset presented in this paper, the GSA data were aggregated to NUTS 2 regions and compared with statistics on crop areas from Eurostat, showing good correspondence for many crops but also highlighting those crops and countries where the agreement is less good, providing possible reasons why. The data can also be used for mapping crop rotations, and a map showing maize monoculture illustrates this application. Farmers' declarations will increasingly become available as MS are required to publish these under the High-Value Dataset regulation. The EuroCrops v2.0 data set is registered and publicly available under the DOI https://doi.org/10.2905/b9fb9e67-78a9-4327-9d59-39a928d812d3.
- Preprint
(17064 KB) - Metadata XML
-
Supplement
(993 KB) - BibTeX
- EndNote
Status: open (extended)
- RC1: 'Comment on essd-2025-752', Anonymous Referee #1, 25 Feb 2026 reply
-
RC2: 'Comment on essd-2025-752', Anonymous Referee #2, 03 Mar 2026
reply
Reviewer’s Comments
Earth System Science Data Manuscript essd-2025-752
“EuroCrops v2.0: Multi-annual harmonized parcel level crop type data linked to European Union-wide survey, statistical and Earth Observation products”
This manuscript presents EuroCrops v2.0, a multi-annual harmonized dataset of parcel-level crop declarations derived from publicly available Geo-Spatial Application (GSA) data across 18 Paying Agencies in the European Union. The dataset is harmonized using an updated Hierarchical Crop and Agriculture Taxonomy (HCAT4), which introduces additional dimensions for seasonality and usage, and is linked to multiple European statistical and Earth Observation classification systems (AGRIPROD, LUCAS, IFS/FSS, FADN, HRL Crop Types). The manuscript fits well within the scope of Earth System Science Data. It describes a large, policy-relevant, open-access geospatial dataset with clear potential for reuse in Earth observation, agricultural monitoring, sustainability assessment, and CAP-related evaluation. The extension from a single-year dataset (EuroCrops v1) to a multi-annual harmonized dataset represents a significant and valuable advance. The manuscript is generally well structured, and the authors demonstrate considerable effort in data cleaning, semantic harmonization, and validation. The comparison with Eurostat statistics and gridded IFS data provides useful validation evidence. The discussion appropriately acknowledges limitations.
However, several aspects require clarification or strengthening, particularly regarding reproducibility of the harmonization workflow, quantification of methodological impacts (e.g., stacking and rasterization), and documentation of uncertainty and metadata.
Major Comments
- Handling of Multiple Crops per Parcel
The manuscript describes rules for handling multiple crops listed within a single parcel label (e.g., hierarchy-based grouping, majority crop assignment, first-mentioned crop). However:
- The proportion of parcels affected by multi-crop labels is not reported.
- The potential bias introduced by assigning the “first mentioned” crop is not assessed.
- Country-level differences in this issue are not quantified. Given that some countries (e.g., Portugal, Ireland) exhibit specific declaration complexities, this could introduce systematic distortions.
2. Reproducibility of the Harmonization Workflow
The harmonization process is conceptually well described, including translation, semantic verification, matching to HCAT4, and iterative refinement. However, several components lack sufficient detail for full reproducibility. The manuscript indicates that crop names were translated using Google Translate, DeepL, OPUS-MT, and, where necessary, ChatGPT. Additionally, a large language model (LLM) was used to detect semantic discrepancies in crop code definitions over time. For ESSD, automated or semi-automated semantic decisions that affect taxonomy assignment should be reproducible. The manuscript should clarify:
- Which LLM(s) and versions were used?
- Were deterministic settings applied?
- What prompts or criteria were used to detect semantic discrepancies?
- How were disagreements between translation tools resolved?
- What proportion of entries required manual intervention?
Without these details, it is difficult for future users to replicate or audit the harmonization procedure.
3. Uncertainty and Quality Indicators
The manuscript presents validation against Eurostat and IFS data, including R² and NSE metrics. This is commendable and strengthens confidence in the dataset. However, the dataset itself does not appear to provide:
- Harmonization confidence indicators,
- Translation reliability scores,
- Flags for manually corrected entries,
- Country-level reliability summaries.
Given the complexity of semantic harmonization across languages and classification systems, some form of quality metadata would enhance transparency.
4. Metadata, FAIR Compliance, and Versioning
The dataset is openly available (with DOI), and code is provided via GitHub. This is highly positive. However, the manuscript would benefit from:
- A clear data dictionary describing all attributes in the geoparquet files,
- Explicit CRS documentation,
- Description of file naming conventions,
- A versioning and update policy (e.g., semantic versioning, expected update frequency),
- Clarification on long-term repository archiving (e.g., Zenodo snapshot of code).
Minor Comments
- Abstract: Consider explicitly stating the temporal coverage range (e.g., 2008–2023, where applicable) for clarity.
- Consistency of terminology: Ensure consistent referencing of HCAT versions (HCAT2, HCAT3, HCAT4).
- CRS specification: Explicitly define the coordinate reference systems used for storage and visualization.
- Validation summary: A concise summary table of validation statistics per crop class would enhance readability.
Citation: https://doi.org/10.5194/essd-2025-752-RC2
Data sets
EuroCropsV2 geodata Martin Claverie https://doi.org/10.2905/b9fb9e67-78a9-4327-9d59-39a928d812d3
Model code and software
EuroCropsV2 Scripts Martin Claverie and Momchil Yordanov https://github.com/Martincccc/EuroCropsV2
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 255 | 136 | 18 | 409 | 49 | 21 | 28 |
- HTML: 255
- PDF: 136
- XML: 18
- Total: 409
- Supplement: 49
- BibTeX: 21
- EndNote: 28
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
I recommend minor revision. The manuscript describes EuroCrops v2.0, a multi-annual harmonized parcel-level crop-type dataset built from publicly available EU Geo-Spatial Application (GSA) declarations (18 Paying Agencies; minimum three consecutive years including 2021), harmonized via the updated HCAT v4 taxonomy and linked to key EU statistical/survey and EO classification systems. The paper is timely and valuable for CAP-related analysis, crop-rotation studies, and as training/validation data for EO products; the processing chain and interoperability ambition are strong
Handling missing crop codes: you gap-fill codes where possible and otherwise create new unique codes (starting at 10001) “in a random order.” Please clarify the randomization (seed; stability across releases) and provide a quick summary of how frequent this situation is by country/year (e.g., Estonia/Ireland name-only years
Translation + matching quality: you translate via multiple MT APIs (Google/DeepL/OPUS-MT and sometimes ChatGPT), then manually check disagreements, and match via weighted Levenshtein plus rules; you note translation/matching challenges (e.g., ~58% for Finnish, ~51% matched for Brandenburg). Please add a small, systematic quality assessment so users understand typical error modes (spelling mistakes, colloquial names, mixtures)
Optional to extend the validation side: you may consider briefly positioning EuroCrops alongside global gridded crop-area products such as CROPGRIDS (173 crops circa 2020) as complementary yet solid benchmarks at coarser resolution (useful for sanity-checking aggregated totals, not as parcel-level substitutes)
https://www.nature.com/articles/s41597-024-03247-7
https://openknowledge.fao.org/items/ebc60e53-2b29-4173-b1fa-2320669b1312
Overall, I see these as documentation and reproducibility refinements rather than methodological blockers