TPLake-MED: A Monthly Extent Dataset for Lakes on the Tibetan Plateau
Abstract. Lakes on the Tibetan Plateau have expanded markedly over recent decades, reflecting complex interactions between the regional water cycle and the cryosphere. Whereas annual datasets capture long-term trends, they often overlook short-term hydrological responses and seasonal transitions that are resolved by monthly observations. Consequently, a systematic understanding of intra-annual lake variability remains limited, largely because most existing datasets are designed for interannual scales, which makes monthly variations and seasonal patterns difficult to characterise. These limitations hinder investigations into the driving mechanisms and complicate assessments of climate-change impacts. To address this gap, we utilised Google Earth Engine (GEE) and the MODIS Surface Reflectance product MOD09A1 (500 m) to construct a monthly vector boundary dataset for lakes larger than 10 km2 across the Tibetan Plateau for 2000–2024. Within this dataset, the number of large lakes larger than 50 km2 ranged from 142 to 175, and the number of smaller lakes (10–50 km2) varies between 232 and 260 across the study period. A random forest classifier based on spectral indices was developed and validated with 533 balanced water/non-water samples, achieving an overall accuracy of 93.21 % and an F1 score of 0.927. To enhance spatial precision, we implemented a boundary optimisation workflow integrating filtering, morphological operations, and geometric rectification, thereby improving agreement between extracted and actual lake extents. Aggregate lake area on the Plateau increased at 34.91 km2 per year, and typically reached its annual maximum in September or October. The relative monthly rate of area change showed higher values in the west, lower in the east, and stronger variability centrally; for individual lakes the maximum monthly relative change reached 28.43 % from 2000 to 2024. In addition, smaller lakes were more sensitive to environmental change than larger lakes. To our knowledge, this is the first monthly resolution vector dataset of Tibetan Plateau lakes that couples multi-temporal classification with morphological optimisation. The dataset provides critical support for climate-change research, ecological conservation, and policy formulation, and is publicly available at https://doi.org/10.12443/BNU.RSEC.TPLake-MED20251028.
The topic is interesting; authors found an indirect way to quantify the direct effects of glacier melting by determining the lake areas of Tibetan Platou. Nevertheless, the main idea is not new, there are previous studies in the topic, the MS can have its potentials. The built dataset is based on satellite images, and provides a better insight into the changing lake areas with higher spatial and temporal resolution than the existing data.
My first note is that TP as Tibetan Plateau in remote sensing usually referts to True Positive (and TP also appears in the MS later), so I suggest to find another abbreviation, e.g., TiP or something short and informative.
Introduction
Regarding the introduction, I do not think that it is a good idea to start with the Tibetan Plateau, because it is not the only area in the world where there are high mountains, glaciers, which also suffer by the climate change, global warming. Thus, a wider approach, showing the readers, the global characteristic of the lake-formation and glacier melting would be beneficial. I suggest to add a new first paragraph with this global approach and to find the option to cite all researchers having preliminary knowledge and studies.
The ”Roof of the World” term is enough to mention once, and both the introduction and Study area description start with it.
Study area and Methods
Square kilometer can be abbreviated as km2
In Fig 1, I did not see any lakes of 10-50 km2, a better method/color, background color/transparency should be chosen, in this way these lakes are just mentioned but cannot be seen.
I do not see the point of CART when RF is used for classification. Hundreds of decision trees are certainly better than a single one, as far as I see it makes no sense, to apply them together. The description provided for the CART is rather vague, and does explain its significance. Furthermore, the description of SVM and RF (and CART) is very short, do not help to understand the method (while, I agree that extensive descriptions are not welcome). The main things that are missing are the explained hyperparameter tuning options. What hyperparameters were taken into consideration and how the parameters were determined? Section 3.2.2 helps to see that, but still not enough.
I do not see the point to present the equations of Precision, Recall and F1 + RMSE, ubRMSE, Bias, R2, MAPE – as far as see, all papers have these equations, but I never see the reason. If these are shown, why the equations of RF, SVM, or CART are not shown?
I would suggest to avoid the direct citations such as in P8L206, instead cite the illustrations in brackets, and interpret – in all instances.
Fig 3. is a part of Fig 2. I do not think it would be a good idea to replicate the same figure in parts. Refer to Fig 2.
Results
The results are correctly presented; the figures help the understanding and serve as good background.
Discussion
The section of “Comparison with other products” is a good point, but a wider discussion would be needed. As I noted for the introduction, a global outlook would make the paper more popular, even if the comparison is not that direct.
https://global-surface-water.appspot.com/map
https://www.hydrosheds.org/
https://www.arcgis.com/apps/mapviewer/index.html?webmap=5d65be95ccc341d587896a81794021bf (JRC)
https://essd.copernicus.org/articles/17/2277/2025/
Just to show some options, but can make ideas for a comparison with the existing datasets. I see the difference, but please point on it, directly and discuss the similarities and differences, as in all scientific papers it is required.
It may be my fault, but I did not find a download link to see the data itself, so in this phase I do not have direct impression on the quality of the dataset.