TPLake-MED: A Monthly Extent Dataset for Lakes on the Tibetan Plateau

Zhao, Siyu; Zhao, Xiang; Zhao, Jiacheng; Zhang, Xin; Liu, Xingyu; Yao, Chengzhi

doi:10.5194/essd-2025-649

Preprints

https://doi.org/10.5194/essd-2025-649

Preprints

13 Nov 2025

| 13 Nov 2025

Status: a revised version of this preprint is currently under review for the journal ESSD.

TPLake-MED: A Monthly Extent Dataset for Lakes on the Tibetan Plateau

Siyu Zhao, Xiang Zhao, Jiacheng Zhao, Xin Zhang, Xingyu Liu, and Chengzhi Yao

Abstract. Lakes on the Tibetan Plateau have expanded markedly over recent decades, reflecting complex interactions between the regional water cycle and the cryosphere. Whereas annual datasets capture long-term trends, they often overlook short-term hydrological responses and seasonal transitions that are resolved by monthly observations. Consequently, a systematic understanding of intra-annual lake variability remains limited, largely because most existing datasets are designed for interannual scales, which makes monthly variations and seasonal patterns difficult to characterise. These limitations hinder investigations into the driving mechanisms and complicate assessments of climate-change impacts. To address this gap, we utilised Google Earth Engine (GEE) and the MODIS Surface Reflectance product MOD09A1 (500 m) to construct a monthly vector boundary dataset for lakes larger than 10 km² across the Tibetan Plateau for 2000–2024. Within this dataset, the number of large lakes larger than 50 km² ranged from 142 to 175, and the number of smaller lakes (10–50 km²) varies between 232 and 260 across the study period. A random forest classifier based on spectral indices was developed and validated with 533 balanced water/non-water samples, achieving an overall accuracy of 93.21 % and an F1 score of 0.927. To enhance spatial precision, we implemented a boundary optimisation workflow integrating filtering, morphological operations, and geometric rectification, thereby improving agreement between extracted and actual lake extents. Aggregate lake area on the Plateau increased at 34.91 km² per year, and typically reached its annual maximum in September or October. The relative monthly rate of area change showed higher values in the west, lower in the east, and stronger variability centrally; for individual lakes the maximum monthly relative change reached 28.43 % from 2000 to 2024. In addition, smaller lakes were more sensitive to environmental change than larger lakes. To our knowledge, this is the first monthly resolution vector dataset of Tibetan Plateau lakes that couples multi-temporal classification with morphological optimisation. The dataset provides critical support for climate-change research, ecological conservation, and policy formulation, and is publicly available at https://doi.org/10.12443/BNU.RSEC.TPLake-MED20251028.

Received: 29 Oct 2025 – Discussion started: 13 Nov 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Siyu Zhao, Xiang Zhao, Jiacheng Zhao, Xin Zhang, Xingyu Liu, and Chengzhi Yao

Status: final response (author comments only)

RC1:
'Comment on essd-2025-649', Anonymous Referee #1, 02 Jan 2026

The topic is interesting; authors found an indirect way to quantify the direct effects of glacier melting by determining the lake areas of Tibetan Platou. Nevertheless, the main idea is not new, there are previous studies in the topic, the MS can have its potentials. The built dataset is based on satellite images, and provides a better insight into the changing lake areas with higher spatial and temporal resolution than the existing data.
My first note is that TP as Tibetan Plateau in remote sensing usually referts to True Positive (and TP also appears in the MS later), so I suggest to find another abbreviation, e.g., TiP or something short and informative.
Introduction
Regarding the introduction, I do not think that it is a good idea to start with the Tibetan Plateau, because it is not the only area in the world where there are high mountains, glaciers, which also suffer by the climate change, global warming. Thus, a wider approach, showing the readers, the global characteristic of the lake-formation and glacier melting would be beneficial. I suggest to add a new first paragraph with this global approach and to find the option to cite all researchers having preliminary knowledge and studies.
The ”Roof of the World” term is enough to mention once, and both the introduction and Study area description start with it.
Study area and Methods
Square kilometer can be abbreviated as km2
In Fig 1, I did not see any lakes of 10-50 km2, a better method/color, background color/transparency should be chosen, in this way these lakes are just mentioned but cannot be seen.
I do not see the point of CART when RF is used for classification. Hundreds of decision trees are certainly better than a single one, as far as I see it makes no sense, to apply them together. The description provided for the CART is rather vague, and does explain its significance. Furthermore, the description of SVM and RF (and CART) is very short, do not help to understand the method (while, I agree that extensive descriptions are not welcome). The main things that are missing are the explained hyperparameter tuning options. What hyperparameters were taken into consideration and how the parameters were determined? Section 3.2.2 helps to see that, but still not enough.
I do not see the point to present the equations of Precision, Recall and F1 + RMSE, ubRMSE, Bias, R2, MAPE – as far as see, all papers have these equations, but I never see the reason. If these are shown, why the equations of RF, SVM, or CART are not shown?
I would suggest to avoid the direct citations such as in P8L206, instead cite the illustrations in brackets, and interpret – in all instances.
Fig 3. is a part of Fig 2. I do not think it would be a good idea to replicate the same figure in parts. Refer to Fig 2.
Results
The results are correctly presented; the figures help the understanding and serve as good background.
Discussion
The section of “Comparison with other products” is a good point, but a wider discussion would be needed. As I noted for the introduction, a global outlook would make the paper more popular, even if the comparison is not that direct.
https://global-surface-water.appspot.com/map
https://www.hydrosheds.org/
https://www.arcgis.com/apps/mapviewer/index.html?webmap=5d65be95ccc341d587896a81794021bf (JRC)
https://essd.copernicus.org/articles/17/2277/2025/
Just to show some options, but can make ideas for a comparison with the existing datasets. I see the difference, but please point on it, directly and discuss the similarities and differences, as in all scientific papers it is required.
It may be my fault, but I did not find a download link to see the data itself, so in this phase I do not have direct impression on the quality of the dataset.

Citation: https://doi.org/10.5194/essd-2025-649-RC1
- AC1: 'Reply on RC1', siyu zhao, 15 Feb 2026
  
  We sincerely appreciate the reviewers’ thorough and insightful comments, which have helped improve the quality of the manuscript. We have carefully addressed all the suggestions and incorporated them into the revision. Please find our detailed responses in the attached file.
  
  Citation: https://doi.org/10.5194/essd-2025-649-AC1
- AC2: 'Reply on RC1', siyu zhao, 15 Feb 2026
  
  Publisher’s note: this comment is a copy of AC1 and its content was therefore removed on 16 February 2026.
  
  Citation: https://doi.org/10.5194/essd-2025-649-AC2
RC2:
'Comment on essd-2025-649', Yingkui Li, 25 Feb 2026

This manuscript presents a timely dataset of monthly area changes for lakes larger than 10 km² on the Tibetan Plateau from 2000 to 2024, derived from the 500 m MODIS surface reflectance product. The authors extracted lake boundaries using a random forest classifier, combined with subsequent filtering and morphological post-processing. This publicly available dataset will provide a valuable foundation for understanding the dynamics of lake changes, their climatic and other environmental drivers, and their impacts on ecological systems and infrastructure security. However, I do have some comments and concerns regarding the current manuscript, and some of the issues need to be addressed before it can be accepted for publication.
My major comments are listed below:
1. Over the past two decades, numerous studies have focused on lakes on the Tibetan Plateau, including research on lake boundary extraction, change patterns and their driving factors, and the development of lake inventories. However, the current manuscript appears to cite only a few of these recent studies. I recommend that the authors conduct a more comprehensive literature review of lake-related research on the Tibetan Plateau from approximately the last 20 years. A selection of relevant studies that I have been involved in is provided below for the authors' consideration (but the authors should also review other related studies).
Liao, J., Shen, G., Li, Y., 2013. Lake variations in response to climate change in the Tibetan Plateau in the past 40 years. International Journal of Digital Earth 6, 534–549. https://doi.org/10.1080/17538947.2012.656290
Li, Y., Liao, J., Guo, H., Liu, Z., Shen, G., 2014. Patterns and Potential Drivers of Dramatic Changes in Tibetan Lakes, 1972–2010. https://doi.org/10.1371/journal.pone.0111890
Zhang, J., Hu, Q., Li, Y., Li, H., Li, J., 2021. Area, lake-level and volume variations of typical lakes on the Tibetan Plateau and their response to climate change, 1972–2019. Geo-spatial Information Science 24, 458–473. https://doi.org/10.1080/10095020.2021.1940318
2. The authors' proposed lake extraction method is based on a random forest classifier applied to MODIS surface reflectance data, followed by filtering and morphological post-processing. While the method achieves good overall accuracy, it remains susceptible to issues related to cloud cover and topographic or cloud shadows. To mitigate the impact of cloud cover, the authors developed pre-processing steps that fill cloud masks using imagery from other time periods. However, it remains unclear how the method addresses topographic and cloud shadows. The authors note that previous studies have incorporated DEMs and derived terrain factors to help resolve this issue, yet they did not integrate such terrain information into their own approach. It would be interesting to know the rationale behind this decision—specifically, why terrain features were not considered and whether the authors explored their potential to further improve classification accuracy in shadow-affected areas.
3. All methods developed or compared in this manuscript (random forest, SVM, and CART) are pixel-based classification approaches. However, object-based image analysis (OBIA) may be more effective for lake extraction, particularly with the recent development of deep learning-based image segmentation models, such as UNet and DeepLabV3+. Unlike pixel-based methods, OBIA segments images into homogeneous objects rather than classifying individual pixels, which has the advantages of better preserving lake integrity, reducing salt-and-pepper noise, and incorporating spatial context and shape information. It would be interesting to know whether the authors considered OBIA or deep learning-based segmentation methods and their rationale for selecting a pixel-based approach for lake extraction.
4. The authors used the 500 m MODIS dataset for lake classification. It is important to provide a quantitative estimate of the uncertainty introduced by this spatial resolution on lake boundary delineation and area calculations. For example, what is the impact of a one-pixel (or half-pixel) shift along the lake boundary on the estimated area for different lake sizes, particularly for smaller lakes near the 10 km² threshold? Such an uncertainty analysis would help readers better understand the magnitude of lake changes over time and assess whether observed variations are significant compared to the inherent limitations of the data due to the coarse resolution. I recommend that the authors include this uncertainty assessment to strengthen the interpretation of lake-changing trends.
5. The authors used a selection of very large lakes as examples to illustrate lake boundaries and comparisons with other datasets. As expected, the impacts of spatial resolution, cloud cover, and topographic shadows on these large lakes are relatively minor, and the boundaries are consequently of high quality. However, this does not demonstrate the method's performance for smaller lakes, which are more sensitive to these sources of error. For these small lakes, even minor misclassifications along the boundaries can result in substantial relative errors in area estimates. Therefore, I recommend that the authors provide a more systematic accuracy assessment stratified by lake size groups (e.g., 10–50 km², 50–100 km², 100–500 km², etc.). This would allow readers to understand how classification performance varies with lake size and to assess the reliability of the dataset across its full range, particularly for the smallest lakes where uncertainty is expected to be highest.

Some of my detailed comments are listed below:
Line 21: The overall accuracy of 93.21% and F1 score of 0.927. I believe that the overall accuracy of 93.31% is based on the confusion matrix of lake and non-lake classification. It is meaningless for the accuracy of the lake boundaries if having imbalanced samples, for example, a large portion of the classification area is non-lake. I suggest that the authors just focus on the lake-specific metrics, such as precision, recall, and F-1 score in the abstract. I guess the F-1 score of 0.927 is for lakes only, which is more meaningful. It should also include other metrics, such as precision and recall.

Line 42-52: See my general comments about the literature review

Line 54: what is water body index-based approach? Need to explain. Maybe the authors can explain it in the literature review part (different methods to map lake boundary).

Line 55: NDWI, MNDWI, AWEI: Need to define these terms when first using them.

Line 53-62: It remains unclear to me how machine learning methods can better handle cloud cover and topographic shadow when relying solely on satellite imagery. I suspect that the improved performance of these newer approaches stems from the incorporation of terrain features, rather than from the machine learning techniques themselves. While machine learning can enhance classification accuracy in general, it is unlikely to effectively mitigate issues related to cloud cover and terrain shadow without the integration of ancillary data such as DEM and other terrain derivatives. The authors should clarify the logic of this part.

Line 62-75: There are numerous inconsistencies in the citation format throughout the manuscript. For example, some citations include the year (e.g., Wang et al., 2023), while others list only the author names without the year (e.g., Li et al. and Liu et al.). Please carefully review the entire manuscript to ensure all citations are formatted consistently according to the journal's guidelines.

Line 82: The authors argue that “existing datasets emphasize large lakes, with insufficient coverage of small and medium lakes”. However, their manuscript also only focuses on large lakes. It is better to revise the logic of this part.

Line 100: This sentence repeats the same sentence already presented in the introduction. Remove it to avoid redundancy.

Line 118: define JRC and IoU. For the dataset, it is better to provide a reference or a website link.

Line 262: What is the uncertainty associated with lake boundaries extracted from the 500 m spatial resolution MODIS dataset? The authors should discuss how the relatively coarse resolution may affect the accuracy of lake area estimates, particularly for smaller lakes near the 10 km² threshold (about 40 pixels).

Line 274-275: The observation that lakes reach their maximum extent in September and October may also reflect a lag effect, as rainfall or meltwater from glaciers and permafrost may require time to flow into the lakes.

Line 290-295: Rather than simply describing the relationship between lake changes and climate factors, I recommend that the authors conduct some more detailed statistical analysis to quantify their relationships to strength the lake change driving mechanisms.

Line 315-317: What is the difference between “in the Plateau interior” and “across the interior”?

Section 4.2: I recommend reorganizing the Results section to present the accuracy assessment and uncertainty analysis before the other results. This change will allow readers to evaluate the reliability of the dataset before understanding the patterns and trends of lake changes.

Line 361-362: All lakes shown in Figure 8 are the largest lakes on the Tibetan Plateau, which are likely to yield better classification results and be less sensitive to issues such as cloud cover and topographic shadows. However, I believe the key test of the method's robustness lies in its performance on smaller lakes near the 10 km² threshold. These lakes comprise only 40 or more pixels and are inherently more vulnerable to misclassification caused by clouds, shadows, and mixed pixels.

Line 365: “low solar elevation”? Would you mean “topographic shadows”?

Line 371: What are seasonal climate changes? Maybe just seasonal changes. Also, Selin Co is larger than Nam Co now, so it should be a large lake, not a medium lake.

Line 384: The comparison with other datasets using only 11 large lakes is not enough. As expected, the differences for these large lakes are relatively small, which does not provide a rigorous test of the method's performance. It will be essential to compare some small lakes.

Citation: https://doi.org/10.5194/essd-2025-649-RC2
- AC3: 'Reply on RC2', siyu zhao, 16 Mar 2026
  
  We sincerely appreciate the reviewers' thorough and insightful comments, which have helped improve the quality of the manuscript. We have carefully addressed all the suggestions and incorporated them into the revision. Please find our detailed responses in the attached file.
  
  Citation: https://doi.org/10.5194/essd-2025-649-AC3

Siyu Zhao, Xiang Zhao, Jiacheng Zhao, Xin Zhang, Xingyu Liu, and Chengzhi Yao

Data sets

Monthly lake area changes larger than 10 km² on the Tibetan Plateau (2000–2024) Siyu Zhao et al. https://doi.org/10.12443/BNU.RSEC.TPLake-MED20251028

Siyu Zhao, Xiang Zhao, Jiacheng Zhao, Xin Zhang, Xingyu Liu, and Chengzhi Yao

Viewed

Total article views: 859 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
422	369	68	859	30	40

HTML: 422
PDF: 369
XML: 68
Total: 859
BibTeX: 30
EndNote: 40

Views and downloads (calculated since 13 Nov 2025)

Month	HTML	PDF	XML	Total
Nov 2025	203	66	14	283
Dec 2025	46	78	16	140
Jan 2026	75	96	10	181
Feb 2026	47	39	15	101
Mar 2026	51	90	13	154

Cumulative views and downloads (calculated since 13 Nov 2025)

Month	HTML	PDF	XML	Total
Nov 2025	203	66	14	283
Dec 2025	46	78	16	140
Jan 2026	75	96	10	181
Feb 2026	47	39	15	101
Mar 2026	51	90	13	154

Viewed (geographical distribution)

Total article views: 861 (including HTML, PDF, and XML) Thereof 861 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 26 Mar 2026

Short summary

We constructed a monthly vector boundary dataset (2000–2024) for lakes ≥10 km² on the Tibetan Plateau using Google Earth Engine and MODIS data. A spectral-index random forest (93.21 % accuracy) and post-processing enhanced boundary precision. The dataset (TPLake-MED) shows steady lake expansion (~ 34.91 km² per year) with peak area in September/October. Monthly changes are more significant in the west, and smaller lakes are more sensitive, offering insights for climate and ecosystem management.


Total:	0
HTML:	0
PDF:	0
XML:	0