Long-term Land Cover Dataset of the Mongolian Plateau Based on Multi-source Data and Rich Sample Annotations

Wang, Juanle; Li, Kai; Hong, Mengmeng; Shao, Yating; Sun, Zhichen; Liu, Meng; Li, Fengjiao; Su, Yuhui; Jia, Qilin; Liu, Yaping; Liu, Jiazhuo; Jiang, Jiawei; Ochir, Altansukh; Davaasuren, Davaadorj; Xu, Mengqiong; Sun, Yamin; Sun, Yifei; Huang, Shaopu; Zou, Weihao; Han, Tengfei; Sun, Feiran

doi:10.5194/essd-2024-237

Preprints

https://doi.org/10.5194/essd-2024-237

Preprints

10 Jul 2024

| 10 Jul 2024

Status: this discussion paper is a preprint. It has been under review for the journal Earth System Science Data (ESSD). The manuscript was not accepted for further review after discussion.

Long-term Land Cover Dataset of the Mongolian Plateau Based on Multi-source Data and Rich Sample Annotations

Juanle Wang, Kai Li, Mengmeng Hong, Yating Shao, Zhichen Sun, Meng Liu, Fengjiao Li, Yuhui Su, Qilin Jia, Yaping Liu, Jiazhuo Liu, Jiawei Jiang, Altansukh Ochir, Davaadorj Davaasuren, Mengqiong Xu, Yamin Sun, Yifei Sun, Shaopu Huang, Weihao Zou, Tengfei Han, and Feiran Sun

Abstract. The Mongolian Plateau (MP), with its unique geographical landscape and cultural features, plays an important role in the safety of the regional ecological environment and sustainable development in North Asia. Land-cover data form the basis of MP studies; however, global-scale data products often fail to reflect regional characteristics. Despite improvements in data resolution, the level of detail remains insufficient for MP studies. This study aimed to improve the land cover classification system for MP. Based on multi-source data and extensive sample annotations, we constructed a high-quality land cover dataset for the MP and analyzed regional land cover changes. In this study, we used Landsat 5 and 8 images as primary data, which were supplemented by NASADEM, nighttime light data, ESA WorldCover, and other multi-source data for auxiliary analysis. The MP was divided into 13 major land cover categories: forest, shrub, meadow steppe, real steppe, desert steppe, wetland, water, cropland, built areas, bare area, desert, sand, and ice. For these categories, we manually labeled the samples for seven periods: 1990, 1995, 2000, 2005, 2010, 2015, and 2020. We constructed a set of image features and used a random forest model to train and predict land-cover data from 1990 to 2020. The overall accuracy of the land-cover products was 83.9 %, with a kappa coefficient of 0.817. Analysis of land cover changes over the past 30 years revealed that 65.04 % of land cover types in the MP remained unchanged from 1990 to 2020. The sandy area decreased annually, with a significant trend of bare areas transitioning to vegetation. The land-cover dataset produced in this study includes land-cover data and labeled samples for seven periods, exhibiting high application potential. It can be used for research on land-cover changes and related resource and environmental issues in the MP, providing data to support ecological protection and natural resource management in the region.

Received: 15 Jun 2024 – Discussion started: 10 Jul 2024

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Status: closed

RC1:
'Comment on essd-2024-237', Anonymous Referee #1, 06 Aug 2024
The LUCC Implementation Strategy is of great importance to the environmental change research community as well as to decision-makers at the local, regional and global levels. Lots of organizations and scholars all over the world have discussed on the land use classification and have presented various land use classification systems according to the diverse standards during the last decades. What kind of existing classification system is used in this article? If it is the classification system established by the author, what is the principles for your classification system? May the authors present a unified understanding for other colleagues or the decision-makers? This study conducted a land use mapping study on the Mongolian Plateau based on Landsat data and night light data. The Mongolian Plateau has relatively few dynamic land cover dataset, which could present a data basis for local communities. However, the introduction of this study is not focused and the limitations of this research in Mongolia are not well introduced. In addition, the lack of detailed introduction of the data and methods makes the reliability of the research results questionable. Therefore, I regret that this study is not suitable for publication in ESSD. I listed my concerns as follows:
Major concerns:
With regard to the classification systems and the Landsat data, how to solve the pixel mixing problem? For example, how to identify the transitional areas with desert steppe, sand and bare land and so on?

How to use night lighting data in the method? Night-time light data has an important relationship with economic development. It may explain the building area of key regions. How about the rural area?

The dataset that Mongolia has no desert is unacceptable. The definition of bare land needs to be redefined.

It is suggested to establish a unified sample database, considering the samples with unchanged land use types in different periods, which will help us to expand the dataset in the same way.

Other specific concerns:
Lines 25-26: Why is there not enough detail when the resolution is improved? The abstract does not clearly describe what gaps the data in this study fills.

Lines 33-34: Only one classification accuracy evaluation result is obtained for multiple periods of data?

Lines 41-50: The introduction of the research background is not focused enough. As a land use classification study, readers would like to know more about the unique characteristics of land use changes on the Mongolian Plateau, what ecological and environmental impacts they have, and why monitoring is urgently needed.

Lines 51-85: In this section, the authors introduce some research on the Mongolian Plateau related to land use, but these studies are not sufficiently relevant to the gap in current research summarized in this section. In addition, some existing land use research in Mongolia is not mentioned, and it is recommended to comprehensively review the current land use research in Mongolia.

Lines 83-85: The research method should be introduced in detail as a separate paragraph, introducing the current mainstream methods and the advantages of this research method.

Lines 86-94: Similar to the previous question, it is not focused enough. What are the difficulties in collecting samples in Mongolia? Why use visual interpretation samples? These are the questions that readers are more concerned about.

Study area: The study area section needs to properly introduce the necessity of conducting land use research on the Mongolian Plateau. For example, a fragile ecosystem? An ecosystem undergoing drastic changes? In addition, the text marked in Figure 1 is confused with the base map color, and the map has too little information. It is recommended to add field photos of each land use type.

Lines 112-115: How to assist the annotation of sample points with existing data? The current description is too general. Why use night light data? Night light data often makes outstanding contributions to urban cover classification research in developed areas, but Ulaanbaatar is the only large city in Mongolia.

Lines 123-124: What is the reference for this classification system? The classification system determines whether the research results are reliable. I suggest that the author introduce this part in detail.

Table 1: Based on the data used in this study, are categories 3, 4, 5, and 6 easily distinguishable? The same question applies to categories 10, 11, and 12. In addition, the current version of the category description is too general, and some quantitative descriptions need to be given with reference to the IGBP classification system. For example, what is the vegetation coverage and height above which it is classified as forest and shrub?

Figure 2: Why is there no mention of research methods?

Line 139: As far as I know, the images of the Mongolian Plateau during the growing season are also easily affected by clouds. How did the author overcome this problem? Can the images of all periods be displayed in the form of an appendix to facilitate readers' judgment?

Section 3.2: It is recommended to integrate this part into the data source section.

Line 185: Will such random division result in a lack of validation samples to verify the true accuracy for land use categories with smaller sample sizes?

Line 191: Why choose the random forest method? Why set the parameters in this way?

Lines 195-200: What are the differences between these indicators? Is it necessary to use so many indicators?
Citation: https://doi.org/10.5194/essd-2024-237-RC1
RC2:
'Comment on essd-2024-237', Anonymous Referee #2, 12 Aug 2024
The authors have developed a time-series of land use maps in Mongolian Plateau, which is important for this region’s land use-related studies. However, several significant shortcomings render this study unsuitable for publication in ESSD, as detailed below:
Major concerns:
Innovation in methodology: The authors have previously published several articles on land use mapping in this region, covering most of the same time span and spatial extent. The methodology used in this study appears to be largely similar to those previous papers. Although the authors have addressed the knowledge gap in the introduction, I am not convinced that the authors have made a significant improvement. This study focuses solely on dataset development without offering any application or new insights, which may not meet the innovation for publication in ESSD.

Data reliability: Another significant concern is dataset validation. The authors split the sampling points into training and testing for modeling and testing. However, relying on visual interpretated sampling points as the primary data source is not a robust approach. Additionally, the prediction performance was relatively poor for certain land types and specific years, raising concerns about the dataset’s accuracy. Relying solely on 10% of the sampling points for validation is insufficient. I would recommend the authors to use regional inventories as the reference to gain the reliability of the dataset. Moreover, studies focusing on similar regions using different data sources could provide additional validation benchmarks.

Weak discussion: The Results and Discussion section lacks depth and citation support, diminishing the paper’s overall quality. I do not see any discussions presented. Several interesting questions remain unexplored. For example, why does the model’s performance vary across different land uses and years? What factors drive the spatial distribution of land use types in your analysis? What might cause the transition between different land use types and how do these impacts vary over different years? Where are these transitions most concentrated and what are the underlying reasons? Addressing these questions might enhance the paper’s value.

Specific comments:
Line 32: Since this dataset is not an annual time series, I suggest changing “periods” to “years”.
Line 57: The phrase “historical land cover and environmental data of areas” is unclear. Consider rephrasing for clarity.
Line 57: “This study” is misleading here. Change it to Their study.
Line 69: The citations can be merged.
Several abbreviations are used without explanation at first mention. Though some are very familiar in GIS community, it is still necessary to provide explanation for better readability.
Figure 1: I suggest inserting a global map with the MP to show the location of this region. In addition, some region names on the map are difficult to read and should be modified.
There is no mention of the product resolution and how different data sources were harmonized. This information is important for readers to understand the dataset.
Line 144: The interpolation used to generate cloud free images needs to be detailed.
Section 3.3.2: I am confused about the modeling process. From my understanding, one pixel contains multiple label values, which represent different land use types. Line 190 mentions the relationship between pixel value and label value was built up. Does that mean the same pixel value was used for different land use types in this pixel or if the pixel here is at a finer resolution that specifies to each land use type?
Figure 6: I suggest changing the unit of y-axis to Million km². It’s up to the authors to determine if the current unit works better in your context.
I suggest adding spatial maps of climate, topography, and other relevant factors to help readers better understand this region’s environmental conditions.
The classification for different land use types is confusing. The current descriptions do not sufficiently differentiate them, especially between bare area, desert, and sand.
Citation: https://doi.org/10.5194/essd-2024-237-RC2

Status: closed

RC1:
'Comment on essd-2024-237', Anonymous Referee #1, 06 Aug 2024
The LUCC Implementation Strategy is of great importance to the environmental change research community as well as to decision-makers at the local, regional and global levels. Lots of organizations and scholars all over the world have discussed on the land use classification and have presented various land use classification systems according to the diverse standards during the last decades. What kind of existing classification system is used in this article? If it is the classification system established by the author, what is the principles for your classification system? May the authors present a unified understanding for other colleagues or the decision-makers? This study conducted a land use mapping study on the Mongolian Plateau based on Landsat data and night light data. The Mongolian Plateau has relatively few dynamic land cover dataset, which could present a data basis for local communities. However, the introduction of this study is not focused and the limitations of this research in Mongolia are not well introduced. In addition, the lack of detailed introduction of the data and methods makes the reliability of the research results questionable. Therefore, I regret that this study is not suitable for publication in ESSD. I listed my concerns as follows:
Major concerns:
With regard to the classification systems and the Landsat data, how to solve the pixel mixing problem? For example, how to identify the transitional areas with desert steppe, sand and bare land and so on?

How to use night lighting data in the method? Night-time light data has an important relationship with economic development. It may explain the building area of key regions. How about the rural area?

The dataset that Mongolia has no desert is unacceptable. The definition of bare land needs to be redefined.

It is suggested to establish a unified sample database, considering the samples with unchanged land use types in different periods, which will help us to expand the dataset in the same way.

Other specific concerns:
Lines 25-26: Why is there not enough detail when the resolution is improved? The abstract does not clearly describe what gaps the data in this study fills.

Lines 33-34: Only one classification accuracy evaluation result is obtained for multiple periods of data?

Lines 41-50: The introduction of the research background is not focused enough. As a land use classification study, readers would like to know more about the unique characteristics of land use changes on the Mongolian Plateau, what ecological and environmental impacts they have, and why monitoring is urgently needed.

Lines 51-85: In this section, the authors introduce some research on the Mongolian Plateau related to land use, but these studies are not sufficiently relevant to the gap in current research summarized in this section. In addition, some existing land use research in Mongolia is not mentioned, and it is recommended to comprehensively review the current land use research in Mongolia.

Lines 83-85: The research method should be introduced in detail as a separate paragraph, introducing the current mainstream methods and the advantages of this research method.

Lines 86-94: Similar to the previous question, it is not focused enough. What are the difficulties in collecting samples in Mongolia? Why use visual interpretation samples? These are the questions that readers are more concerned about.

Study area: The study area section needs to properly introduce the necessity of conducting land use research on the Mongolian Plateau. For example, a fragile ecosystem? An ecosystem undergoing drastic changes? In addition, the text marked in Figure 1 is confused with the base map color, and the map has too little information. It is recommended to add field photos of each land use type.

Lines 112-115: How to assist the annotation of sample points with existing data? The current description is too general. Why use night light data? Night light data often makes outstanding contributions to urban cover classification research in developed areas, but Ulaanbaatar is the only large city in Mongolia.

Lines 123-124: What is the reference for this classification system? The classification system determines whether the research results are reliable. I suggest that the author introduce this part in detail.

Table 1: Based on the data used in this study, are categories 3, 4, 5, and 6 easily distinguishable? The same question applies to categories 10, 11, and 12. In addition, the current version of the category description is too general, and some quantitative descriptions need to be given with reference to the IGBP classification system. For example, what is the vegetation coverage and height above which it is classified as forest and shrub?

Figure 2: Why is there no mention of research methods?

Line 139: As far as I know, the images of the Mongolian Plateau during the growing season are also easily affected by clouds. How did the author overcome this problem? Can the images of all periods be displayed in the form of an appendix to facilitate readers' judgment?

Section 3.2: It is recommended to integrate this part into the data source section.

Line 185: Will such random division result in a lack of validation samples to verify the true accuracy for land use categories with smaller sample sizes?

Line 191: Why choose the random forest method? Why set the parameters in this way?

Lines 195-200: What are the differences between these indicators? Is it necessary to use so many indicators?
Citation: https://doi.org/10.5194/essd-2024-237-RC1
RC2:
'Comment on essd-2024-237', Anonymous Referee #2, 12 Aug 2024
The authors have developed a time-series of land use maps in Mongolian Plateau, which is important for this region’s land use-related studies. However, several significant shortcomings render this study unsuitable for publication in ESSD, as detailed below:
Major concerns:
Innovation in methodology: The authors have previously published several articles on land use mapping in this region, covering most of the same time span and spatial extent. The methodology used in this study appears to be largely similar to those previous papers. Although the authors have addressed the knowledge gap in the introduction, I am not convinced that the authors have made a significant improvement. This study focuses solely on dataset development without offering any application or new insights, which may not meet the innovation for publication in ESSD.

Data reliability: Another significant concern is dataset validation. The authors split the sampling points into training and testing for modeling and testing. However, relying on visual interpretated sampling points as the primary data source is not a robust approach. Additionally, the prediction performance was relatively poor for certain land types and specific years, raising concerns about the dataset’s accuracy. Relying solely on 10% of the sampling points for validation is insufficient. I would recommend the authors to use regional inventories as the reference to gain the reliability of the dataset. Moreover, studies focusing on similar regions using different data sources could provide additional validation benchmarks.

Weak discussion: The Results and Discussion section lacks depth and citation support, diminishing the paper’s overall quality. I do not see any discussions presented. Several interesting questions remain unexplored. For example, why does the model’s performance vary across different land uses and years? What factors drive the spatial distribution of land use types in your analysis? What might cause the transition between different land use types and how do these impacts vary over different years? Where are these transitions most concentrated and what are the underlying reasons? Addressing these questions might enhance the paper’s value.

Specific comments:
Line 32: Since this dataset is not an annual time series, I suggest changing “periods” to “years”.
Line 57: The phrase “historical land cover and environmental data of areas” is unclear. Consider rephrasing for clarity.
Line 57: “This study” is misleading here. Change it to Their study.
Line 69: The citations can be merged.
Several abbreviations are used without explanation at first mention. Though some are very familiar in GIS community, it is still necessary to provide explanation for better readability.
Figure 1: I suggest inserting a global map with the MP to show the location of this region. In addition, some region names on the map are difficult to read and should be modified.
There is no mention of the product resolution and how different data sources were harmonized. This information is important for readers to understand the dataset.
Line 144: The interpolation used to generate cloud free images needs to be detailed.
Section 3.3.2: I am confused about the modeling process. From my understanding, one pixel contains multiple label values, which represent different land use types. Line 190 mentions the relationship between pixel value and label value was built up. Does that mean the same pixel value was used for different land use types in this pixel or if the pixel here is at a finer resolution that specifies to each land use type?
Figure 6: I suggest changing the unit of y-axis to Million km². It’s up to the authors to determine if the current unit works better in your context.
I suggest adding spatial maps of climate, topography, and other relevant factors to help readers better understand this region’s environmental conditions.
The classification for different land use types is confusing. The current descriptions do not sufficiently differentiate them, especially between bare area, desert, and sand.
Citation: https://doi.org/10.5194/essd-2024-237-RC2

Data sets

Long-term Land Cover Dataset of the Mongolian Plateau Based on Multi-source Data and Rich Sample Annotations Juanle Wang, Kai Li, Mengmeng Hong, Yating Shao, Zhichen Sun, Meng Liu, Fengjiao Li, Yuhui Su, Qilin Jia, Yaping Liu, Jiazhuo Liu, Jiawei Jiang, Altansukh Ochir, Davaadorj Davaasuren, Mengqiong Xu, Yamin Sun, Yifei Sun, Shaopu Huang, Weihao Zou, Tengfei Han, and Feiran Sun https://www.scidb.cn/en/s/3AfYjm

Viewed

Total article views: 1,929 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,339	489	101	1,929	62	105

HTML: 1,339
PDF: 489
XML: 101
Total: 1,929
BibTeX: 62
EndNote: 105

Views and downloads (calculated since 10 Jul 2024)

Month	HTML	PDF	XML	Total
Jul 2024	148	15	9	172
Aug 2024	128	47	47	222
Sep 2024	50	10	1	61
Oct 2024	45	8	2	55
Nov 2024	34	14	0	48
Dec 2024	46	14	1	61
Jan 2025	48	26	4	78
Feb 2025	37	17	3	57
Mar 2025	29	16	2	47
Apr 2025	19	24	0	43
May 2025	34	25	2	61
Jun 2025	36	21	0	57
Jul 2025	39	29	1	69
Aug 2025	78	47	0	125
Sep 2025	267	13	1	281
Oct 2025	28	15	1	44
Nov 2025	27	36	2	65
Dec 2025	71	36	2	109
Jan 2026	49	23	13	85
Feb 2026	71	36	4	111
Mar 2026	55	17	6	78

Cumulative views and downloads (calculated since 10 Jul 2024)

Month	HTML	PDF	XML	Total
Jul 2024	148	15	9	172
Aug 2024	128	47	47	222
Sep 2024	50	10	1	61
Oct 2024	45	8	2	55
Nov 2024	34	14	0	48
Dec 2024	46	14	1	61
Jan 2025	48	26	4	78
Feb 2025	37	17	3	57
Mar 2025	29	16	2	47
Apr 2025	19	24	0	43
May 2025	34	25	2	61
Jun 2025	36	21	0	57
Jul 2025	39	29	1	69
Aug 2025	78	47	0	125
Sep 2025	267	13	1	281
Oct 2025	28	15	1	44
Nov 2025	27	36	2	65
Dec 2025	71	36	2	109
Jan 2026	49	23	13	85
Feb 2026	71	36	4	111
Mar 2026	55	17	6	78

Viewed (geographical distribution)

Total article views: 1,912 (including HTML, PDF, and XML) Thereof 1,912 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 26 Mar 2026

Short summary

This study presents a high-quality land cover dataset for the Mongolian Plateau (MP) using multi-source data, to improve regional geographical and environmental characteristic. The dataset covers from 1990 to 2020, achieving an accuracy of 83.9 %. It reveals that 65.04 % of land cover types remained unchanged over the past 30 years, with a clear trend of bare areas transitioning to vegetation. This dataset can support research on natural resources management and ecological protection in the MP.


Total:	0
HTML:	0
PDF:	0
XML:	0