Validation samples for the Land Cover Map of Europe 2017

Jenerowicz-Sanikowska, Małgorzata; Krätzschmar, Elke; Schauer, Peter; Gromny, Ewa; Malinowski, Radek; Krupiński, Michał; Lewiński, Stanisław; Rybicki, Marcin; Wojtkowski, Cezary

doi:10.5194/essd-2025-811

Preprints

https://doi.org/10.5194/essd-2025-811

Preprints

07 Apr 2026

| 07 Apr 2026

Status: this preprint is currently under review for the journal ESSD.

Validation samples for the Land Cover Map of Europe 2017

Małgorzata Jenerowicz-Sanikowska, Elke Krätzschmar, Peter Schauer, Ewa Gromny, Radek Malinowski, Michał Krupiński, Stanisław Lewiński, Marcin Rybicki, and Cezary Wojtkowski

Abstract. Accuracy assessment is an integral part of the production of land cover/land use maps. The process requires the availability of a good-quality validation dataset for the qualitative and quantitative evaluation of generated products. This paper describes the development of the validation dataset that was used for the accuracy assessment of the Land Cover Map of Europe 2017 in the context of the Sentinel-2 Global Land Cover project. Sample selection was based on a two-step stratified random sampling process. In the first step, validation sites (Sentinel-2 tiles) were selected randomly and in proportion to the area covered by each country. In the second step, validation sites were stratified with the CORINE Land Cover dataset, which enabled the proportional selection of land cover classes. The selected samples were visually inspected by experts who categorised them into 13 classes. This resulted in a large set of 52,024 samples, spread over Europe. The final dataset can be used to validate European land cover products at continental scale, and may also be included in larger (e.g. global) datasets, or for country-based studies.

Received: 30 Dec 2025 – Discussion started: 07 Apr 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Małgorzata Jenerowicz-Sanikowska, Elke Krätzschmar, Peter Schauer, Ewa Gromny, Radek Malinowski, Michał Krupiński, Stanisław Lewiński, Marcin Rybicki, and Cezary Wojtkowski

Status: open (extended)

Post a comment Subscribe to comment alert

RC1: 'Comment on essd-2025-811', Anonymous Referee #1, 12 May 2026 reply

This manuscript presents land cover and land use validation dataset developed for Europe. A high-quality validation dataset is important for map quality assessment and also for model improvements. This dataset provides extensive land cover validation points, coordinated at pan-european scale. To support efficient use of this dataset, the following improvements are recommended.
I recommend that the authors revise their terminology and adapt terms recommended by good practice guidance on land cover and change accuracy assessment by Tyukavina et al 2025. In particular, Section 2.1 of this guideline provides valuable guidance on definition and terminology. I recommend this because of some inconsistencies in this manuscript: samples – in geospatial applications, a sample means a collection of sampling units or sampling points. The singular word “sample” along with “sample units” should be used in this manuscript, the latter for when referring to a number of sample units (so not “samples”). Accordingly, I suggest the title “LULC validation dataset of Europe 2017.”
A quality (thematic accuracy) of the labels was not mentioned in this manuscript. Please provide information on this if it is available. #48 mentions further verification by a team of independent specialists. Please provide more information, as quality information is critical for reference datasets.
Although a proportional allocation was used as the basis for the two stages, small countries and minority land cover types were treated separately. This meant all sample units do not have the same inclusion probability as in simple random sampling. In the usage section, it is recommended to provide information about the estimators (accuracy assessment and area estimation formulas). In addition, as records of the reference data, the land use and land cover map/stratification labels are advised to be included, since they were used for stratification, and whether this stratification map is openly accessible to users (for interested users to calculate the inclusion probability). In general, for a stratified sample, it is necessary to have the strata weight/area, to allow area-weighted accuracy estimates by considering unequal inclusion probabilities.
Heterogenous sample points were removed from the dataset. Please make this clear in the abstract, as there is a limitation in representing heterogeneous landscapes, which can be significant in Europe.
# 11 – two-step stratified random sampling – later it is mentioned two stage cluster sampling. Which one is correct? Please adopt a consistent term
#74 – What layers were selected and why?
#67 – What does the generalization criteria mean here?
Figure 4: Please indicate what the “other” class is. This is not present in Table 2.
Sometimes, validation sites, tiles were mentioned. I understand this is PSU. Please make this consistent.
#145-150 – Please clarify if one aggregated image per year was used for interpretation or individual chips of different dates.

Reply

Citation: https://doi.org/10.5194/essd-2025-811-RC1

Małgorzata Jenerowicz-Sanikowska, Elke Krätzschmar, Peter Schauer, Ewa Gromny, Radek Malinowski, Michał Krupiński, Stanisław Lewiński, Marcin Rybicki, and Cezary Wojtkowski

Data sets

Validation dataset for Land Cover Map of Europe 2017 [dataset] M. Jenerowicz et al. https://doi.pangaea.de/10.1594/PANGAEA.934197

Małgorzata Jenerowicz-Sanikowska, Elke Krätzschmar, Peter Schauer, Ewa Gromny, Radek Malinowski, Michał Krupiński, Stanisław Lewiński, Marcin Rybicki, and Cezary Wojtkowski

Viewed

Total article views: 1,252 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
322	907	23	1,252	19	24

HTML: 322
PDF: 907
XML: 23
Total: 1,252
BibTeX: 19
EndNote: 24

Views and downloads (calculated since 07 Apr 2026)

Month	HTML	PDF	XML	Total
Apr 2026	178	52	17	247
May 2026	138	853	5	996
Jun 2026	6	2	1	9

Cumulative views and downloads (calculated since 07 Apr 2026)

Month	HTML	PDF	XML	Total
Apr 2026	178	52	17	247
May 2026	138	853	5	996
Jun 2026	6	2	1	9

Viewed (geographical distribution)

Total article views: 1,252 (including HTML, PDF, and XML) Thereof 1,252 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 08 Jun 2026

Short summary

We present the validation dataset created within the Sentinel-2 Global Land Cover project. Development of this dataset aimed at supporting accuracy assessment of pan-European database at both continental and country levels. The outcome was a large dataset composed of over 50,000 samples and 13 land cover/land use classes which represent different climatic regions and conditions.


Total:	0
HTML:	0
PDF:	0
XML:	0