the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Advancements in LUCAS Copernicus 2022: Enhancing Earth Observation with Comprehensive In-Situ Data on EU Land Cover and Use
Abstract. The Land Use/Cover Area frame Survey (LUCAS) of the European Union (EU) presents a rich resource for detailed understanding of land cover and use, making it invaluable for Earth Observation (EO) applications. This manuscript discusses the recent advancements and improvements in the LUCAS Copernicus module, particularly the data collection process of 2022, its protocol simplifications, and geometry definitions compared to the 2018 survey and data. With approximately 150,000 polygons collected in 2022, an increase from 60,000 in 2018, the LUCAS Copernicus 2022 data provides a unique and comprehensive in-situ dataset for EO applications. The protocol simplification also facilitates a faster and more efficient data collection process. In 2022, there are 137,966 polygons generated, out of the original 149,408 LUCAS Copernicus points, which means 92.3 % of the points were actually surveyed. The data holds 82 land cover classes for the Copernicus module LUCAS level 3 legend (88 classes). For land use the data holds 40 classes, along with 18 classes of land use types. The dataset is available here for download (PID: http://data.europa.eu/89h/e3fe3cd0-44db-470e-8769-172a8b9e8874). The paper further elaborates on the implications of these enhancements and the need for continuous harmonisation to ensure semantic consistency and temporal usability of data across different periods. Moreover, it calls for additional studies exploring the potential of the collected data, especially in the context of remote sensing and computer vision. The manuscript ends with a discussion on future data usage and dissemination strategies.
- Preprint
(15084 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (extended)
-
RC1: 'Comment on essd-2023-494', Kristof van Tricht, 01 Mar 2024
reply
Manuscript summary
This paper presents the 2022 version of the Lucas Copernicus dataset. It explains the differences with the 2018 version, most notably the significant increase in number of surveyed points and the shapes of the polygons which are now significantly larger. Some basic statistics on the new dataset are also provided. The main dataset is provided as a GPKG file. The authors conclude that further harmonization is needed in order to guarantee the semantic consistency of the coding and legend, as well as the temporal inter-usability of both the 2018 and 2022 data.
Review summary
I should start by mentioning that the value of the Lucas Copernicus dataset(s) cannot be overstated. There's a tremendous amount of work and dedication that goes into the whole workflow of visiting points, interpretation of land cover, making the necessary observations and finally processing everything in a consistent polygon-based dataset. The result is one of the most influential in situ datasets that can be used by the Earth Observation community and an example to other countries/continents. The highly anticipated 2022 dataset will be of significant value to the community and the exciting increase in both number of observations and size of the polygons will be received with acclaim.
Next to unquestionable value of the dataset, the paper itself is generally written well. However, here and there it lacks some detail that I found essential to fully understand the nature of the dataset, required to make the bridge to EO applications. Therefore, I think some minor revisions are required to add some more detail, after which I would be happy to recommend this paper and dataset for publication in ESSD. My comments and questions for clarification can be found below.
Comments
L8-9: confusing sentence where first 82 land cover classes are mentioned, and then 88 classes. Probably different things but it could use some rephrasing for clarity.
L49: is this homogeneity interpreted also on the ortho-photo when deciding on the landcover class?
L62: could the authors explain the rationale behind the polygons? What is the aim of providing homogenous polygons on top of the (pure) observation of the point itself if e.g. only the respective Sentinel 10m or 20m pixel is confirmed to be homogenous? L99 might provide a clue but it would be good to explain this rationale.
L65: this is a bit confusing as the minimum area to execute Copernicus module is reported to be 25m² while the MMU is about 79m². Where does this difference come from?
L69-77: this section is not entirely clear to me. “The position they have reached” can deviate the theoretical LUCAS point. Why and by how much? What does “cannot reach” mean? Why is a “linear feature narrower than 3m” the exception when no Copernicus-relevant information can be recorded? What are “a few meters” that a surveyor can move?
L98-99: What is the aim of the quasi-circular polygon shape? Downstream applications will likely have to process the polygons further in order to be able to have e.g. pure Sentinel pixels and not a mix at the borders of the polygon where another LC could start.
Sect. 5.3: where does this preliminary assessment come from? Can the authors elaborate a bit more?
L117-118: does this mean future versions of the dataset are possible? How will versioning of the dataset in that case be treated?
L118: Could the authors elaborate a bit more on the (planned) compatibility between 2018 and 2022 survey? e.g. what are at the moment legend inconsistencies between 2018 and 2022 and how are users advised to cope with such difference?
Sect. 7: Are other dissemination methods considered in the future as well? Such as upload zo Zenodo with DOI, upload to Google Earth Engine for fast uptake by the community, ...
L130-131: link only works when copy-pasting manually; hyperlink behind the text does not.
Figures
Figure 1: things such as the legend are too small in the current version and therefore not readable.
Figure 2: in (a), what is the size of the resulting polygon? Does this match Sentinel-1 or Sentinel-2 data? In (b), the polygon seems to contain a mix of grass, bare and builtup. Aren’t the polygons supposed to contain one homogenous land cover/use type?
Figure 3: The table shown in the figure should be explained in the caption.
Figure 5: Why is the minimum value 0, while L65 states the Copernicus module is not executed for areas smaller than 25m²? The text on top of this figure should also be revisited. E.g. far right part is truncated. I such a large precision for these numbers required? Some rounding would probably increase readability.
Figure A1: caption should contain a bit more information to be able to interpret what exactly is shown
DATA
I checked the GPKG file and have a question after a quick look: some polygons contain hardly any data (not even landcover), e.g. point_id 38543138 has almost all “null” attributes while clearly being located in arable land. Why is that? In fact 2686 polygons have “null” in their “survey_lc1” attribute. Is this to be expected?
Citation: https://doi.org/10.5194/essd-2023-494-RC1 -
CC1: 'Comment on essd-2023-494', Babak Ghassemi, 17 Jun 2024
reply
This paper explores the Land Use/Cover Area frame Survey (LUCAS) of the European Union, focusing on the 2022 LUCAS Copernicus module. The number of polygons increased from 60,000 in 2018 to approximately 150,000 in 2022 due to streamlined data collection protocols and refined geometry definitions. The dataset contains 88 land cover (LC) and 40 land use classes. The paper discusses the benefits of Earth Observation applications, the importance of semantic consistency, and future studies in remote sensing and computer vision, concluding with strategies for data usage and dissemination.
Here are some comments based on the existing pre-print paper as well as available data on:
https://data.jrc.ec.europa.eu/dataset/e3fe3cd0-44db-470e-8769-172a8b9e8874
Figure 3. Evaluating the provided dataset, there are 2,686 samples with null land cover attributes. Considering this issue, I would suggest either modifying the LC class count in the figure or updating the null values in the dataset.
Line 79 and Figure 4. There are 98 unique values in the lc1 attribute of the dataset, of which 88 are mentioned here. The following classes are missing: A3, C1, D1, D2, E1, E2, E3, F1, F2, and F3. Would it be possible to explain this difference?
Additionally. If this figure represents the unique labels of Level-3 LC classes, why does it contain values (whose label doesn’t have 3 letters such as A, A1, B,…) possibly not related to this category? When classifying in Level-3, can these classes be used as unique classes?
Line 80. As mentioned in Figure 5, the mean area value is 0.35475. Therefore, it would be better for this value to be rounded to 0.35 Ha instead of 0.34 Ha.
Section 5.2. Around 21,207 polygons out of available, 137,966 have an area below 100 m2, which means covering an area less than a sentinel-2 (or 1) pixel. For instance, pointid = 49264422, supposedly to be the apple tree has an area of around 17.38 m2, and inspecting the approximate place in Google Maps, it seems more grass or arable land. Therefore, providing an instruction regarding how to deal with this issue as well as the reliability of these polygons will be helpful.
Line 129. The dataset only contains 113 attribute columns instead of the 117 mentioned here.
Citation: https://doi.org/10.5194/essd-2023-494-CC1
Data sets
LUCAS Copernicus 2022 European Commission, Joint Research Centre (JRC) http://data.europa.eu/89h/e3fe3cd0-44db-470e-8769-172a8b9e8874
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
552 | 102 | 30 | 684 | 29 | 26 |
- HTML: 552
- PDF: 102
- XML: 30
- Total: 684
- BibTeX: 29
- EndNote: 26
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1