the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A global reference data set for land cover mapping at 10 m resolution
Abstract. This paper presents a unique global reference data set for land cover mapping at a 10 m resolution, aligned with Sentinel-2 imagery for the year 2015. It contains more than 16.5 million data records at a 10 m resolution (or 165 K data records at 100 m) and information on 12 different land cover classes. The data set was collected by a group of experts through visual interpretation of very high resolution imagery (e.g., from Google Maps, Microsoft Bing, ESRI World), along with other sources of information provided in the Geo-Wiki platform (e.g., Normalized Difference Vegetation Index time series, Sentinel-2 image time series, geo-tagged photographs, and street view imagery). To ensure high quality and consistency among the experts that collected the data, regular coordination meetings took place, there were regular quality checks of expert submissions, and comparison with regional land cover maps was undertaken. This extensive reference land cover data set can be used in various applications, e.g., land cover analysis, including mapping and quality verification, ecosystems mapping and modelling, and biodiversity and cropland studies, among others. The data set is available for download at https://zenodo.org/records/14871660.
- Preprint
(922 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 01 Oct 2025)
-
CC1: 'Comment on essd-2025-468', Meine van Noordwijk, 25 Aug 2025
reply
It is refreshing to see that this paper refers to 'tree cover', as the directly observable land cover characteristic, and avoids the term 'forest', that combines land cover and institutional ('non-agriculture') criteria in common definitions (FAO, EUDR). Part of the tree cover will be forest, some agroforestry, some monocultural stands of tree crops and some urban trees. Criteria beyond remote sensing observations will be needed to make the further distinctions.
It would be good if the reasons for this use of terminology were mentioned explicitly . As it stands, row4 in Table 1 is the only time the word 'forest' is used (an oversight in editing?). Yet, the text suggests that papers such as Buchhorn et al. 2020a were used for comparison, where some of the classes are called 'forest'. So - please add further policy relevance to the paper by such clarifications, as in major policies there is definitional confusion.Citation: https://doi.org/10.5194/essd-2025-468-CC1 -
RC1: 'Comment on essd-2025-468', Anonymous Referee #1, 25 Aug 2025
reply
Summary
This paper introduces a global reference land cover dataset at 10 m resolution based on Sentinel-2 imagery from 2015, containing over 16.5 million data records across 12 land cover classes. The dataset was created through expert visual interpretation of high-resolution imagery (e.g., Google Maps, Bing, ESRI World) along with additional sources from the Geo-Wiki platform such as NDVI time series, Sentinel-2 time series, and geo-tagged photos. The dataset is publicly available via Zenodo and supports applications in land cover analysis, ecosystem modeling, biodiversity, and cropland studies.
Major comments
The paper is well written and concise, the methodology is rigorous and sound, the study will contribute to the land cover and land use change community. I see great potential for publication in ESSD. However, there are several shortcomings and clarifications that I strongly suggest the authors address prior to publication. For example, it’s unclear from the manuscript how misclassification was determined and how quality of reference data was assessed (see specific comments below).
Minor comments
- Table 1 – the term subpixel is not defined. Is a pixel 100 m and subpixel is 10 m? Please clearly define the term in the table. I see the definition in the text on line 73. Note that “subpixels” is spelled inconsistently throughout the text – sometimes it’s spelled as sub-pixels and other times as subpixel.
- Line 22 – can you elaborate on how this can be used for biodiversity? It’s not obvious to the reader.
- Line 82 – I think the authors mean Google Earth Pro and not Google Earth Engine as Streetview and historical imagery are available on Google Earth Pro.
- Section 2.3 – can you make it explicit that the visual interpretation was done for the year 2015? Could you also elaborate on how you used the land cover maps – I am assuming they were used as ancillary evidence and were not sufficient on their own for labelers to decide? Otherwise, the labels might be reproducing errors in existing land cover datasets. Out of curiosity, was each sample interpreted once?
- Line 93 – how many interpreters were trained? Was this done through a crowdsourcing campaign or were the labelers employees at IIASA/university etc?
- Line 95 – it’s unclear if the group of experts is separate from the interpreters trained. Is that a subset of everyone trained? Did experts serve a different function such as reviewing interpreted labels or they were interpreters themselves.
- Line 109 – how was it determined that they were misclassifications? Did a second interpreter check (agree/disagree)?
- Figure 2 – indicates wetlands as herbaceous while Table 1 defines wetlands as either herbaceous or woody. Can you clarify the discrepancy?
- Section 3.2 – could the data be used for a fractional cover classification? Maybe you could list that as a use case as well. Usage in bullet point #2 goes against the good practice for accuracy assessment/validation that has now been widely accepted by the remote sensing community. Maybe instead you could suggest the data be used for a statistical cross validation during the model refinement stages of analysis. https://www.sciencedirect.com/science/article/abs/pii/S0034425714000704
- Lines 146-151 – According to Table 1 “subpixels were classified as trees when trees fall in the center of a subpixel (10 m x 10 m)” and in this portion of the manuscript you are saying “In such cases, tree cover was not the dominant class within individual pixels, yet we still needed to label some of them as “trees” to match the overall percentage.” Two things: 1) those two statements are in contradiction, 2) it’s unclear what “to match the overall percentage” means, 3) up to this point the impression was that the labeling was done at 10 m resolution and now it appears it was done at 100 m and matched down to 10 m somehow. Can you please clarify? How was 65% cover estimated if not by determining how many of the 100 10 m pixels had tree at the center of the subpixel? This statement at the end seems to contradict the definition in the table.
- Line 155 – still unclear to me how misclassification was determined. See comment above – was it misclassification relative to another interpreter or expert? This is important as you claim that this is a high-quality dataset but it’s not exactly clear what metrics were used to determine quality.
Zenodo comments
- The difference between validation_id and sampleid is unclear from the description. Seems like they are the same thing.
- Based on Table 1 I thought unique_id would be a value between 1 and 13, however, the values are totally different (e.g., 3027, 3024) and not described anywhere. I think it will be useful to keep these consistent to help users plug this dataset directly into their analyses (which usually require numerical values for classes).
-
RC2: 'Comment on essd-2025-468', Anonymous Referee #2, 28 Aug 2025
reply
Comment on "A global reference data set for land cover mapping at 10 m resolution" by Lesiv et al., ESSD
General comments:
This is a short and interesting manuscript describing a reference dataset at 10 meter resolution for land cover and land use mapping. Authors trained experts to judge the land cover type of 10 m resolution sub-pixel through high resolution images, geo-tagged photos and other scientific datasets from multiple sources and compiled results with a global coverage. This product is an important and valuable training/validating data source for fine resolution land cover type or biodiversity mapping. But before I can recommend the paper for publishing on ESSD, I have the following concerns related to how authors present their work.
Specific comments:
- It is interesting that you provide burnt areas as another land use cover type, but I'm curious about how you can use this information for other studies. Can you provide some examples?
- I have a concern about potential time frame mismatch across different products you used. When judging one location, it maybe possible that the multiple image or scientific data products you used is obtained from different years, and the land cover type at the exact location might change (e.g., due to urban expansion, deforestation, etc.). I assume this to be one of the uncertainty sources which need experts to put more careful consideration before making decision. How did you handle this?
- I would strongly recommend authors to provide a certain quantitative evaluation on the accuracy of your data product, as the accuracy evaluation to be one of the requirements for publishing your work on ESSD.
Technical corrections:
Line 64: Where is the start location (lon - lat) of your global systematic sample?
Line 67: "in areas with low classification accuracy" Which reference data you used to determine the classification accuracy? Through the assessment information from intermediate versions of the CGLSLC100 land cover map? Please clarify.
Line 68: What is "the initial training data set"? The data produced in step (1)? Need to clarify.
Line 80: "(NDVI) time series derived from Google Earth Engine (GEE)". Which satellite products did you use to calculate NDVI?
Line 81: "a time series of Sentinel 2 images that can be retrieved from Sentinel hub" The same question here. Which product of S2 you used? Or have you created a true color image?
Line 95: "18 land cover experts" might be good to acknowledge them if they agree, since they're authors of the dataset.
Line 109: "Thus, out of 100 interpretations that were checked, an interpreter could have made up to 5 to 10 misclassifications, which were mainly random mistakes." Did you strategically design the distribution of tasks to have, for example, 10% of the dataset assigned to 2 or more interpreters at the same time for accuracy assessment purposes?
Figure 2. I saw desert regions are marked as snow and ice. It's a typo or "missing data" over these regions? If so please use another color to represent regions with "no data". Please clarify and revise the figure.
Line 140: "for ecosystem mapping and complex modelling of biodiversity". Can you provide more details about how to map biodiversity?
Citation: https://doi.org/10.5194/essd-2025-468-RC2
Data sets
Global land cover data set at 10m for 2015 (Geo-Wiki) Myroslava Lesiv https://doi.org/10.5281/zenodo.14871660
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
636 | 149 | 7 | 792 | 8 | 15 |
- HTML: 636
- PDF: 149
- XML: 7
- Total: 792
- BibTeX: 8
- EndNote: 15
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1