Spatial Patterns of Sandy Beaches in China and Risk Analysis of Human Infrastructure Squeeze Based on Multi-Source Data and Ensemble Learning
Abstract. Sandy beaches provide essential ecological and economic services, but their functions are increasingly threatened by human activities. Analyzing the spatial distribution of China's sandy beaches and the impacts of human activities offers valuable insights for coastal resource management and ecological protection. However, remote sensing technologies face challenges such as limited data sources and tidal influences, which affect recognition accuracy. Therefore, integrating multi-source remote sensing data and reducing the impact of tidal fluctuations to improve recognition accuracy remains a key challenge. This study proposes an innovative approach utilizing multi-source data and an ensemble learning model to identify sandy beaches in China (2016–2023). By integrating Sentinel-1/2 satellite data, terrain data, and nighttime light data, along with spectral, terrain, texture, and polarization features, sandy beaches were identified across multiple years, and the results were consolidated into a single-year dataset to analyze spatial patterns and risks from human infrastructure squeeze. (1) High-precision classification identified 2984 sandy beaches in China, covering a total area of 260.70 km2. Guangdong had the largest number, area, and perimeter, while Shanghai had the widest sandy beaches. (2) In Fujian, Guangdong, and Taiwan, the identified sandy beaches covered 149.68 km2, with perimeters of 5155.91 km and widths of 49.50 m, 32.83 m, and 50.70 m, respectively. These results were significantly better than those from reference datasets. (3) From 1990 to 2023, the area at risk from human infrastructure squeeze increased from 109.95 km2 to 245.58 km2, a rise of 135.63 km2, with the most significant increase occurring between 1990 and 2000. Guangdong and Fujian showed growth rates of 1.05 km2/year and 0.73 km2/year, respectively. This study provides an up-to-date dataset on China's sandy beaches. It assesses their spatial patterns and human impact risks, contributing to research and policy for the sustainable development of coastal zones (https://doi.org/10.5281/zenodo.15307240, Meng et al., 2025).
I have carefully read the manuscript entitled « Spatial Patterns of Sandy Beaches in China and Risk Analysis of Human Infrastructure Squeeze Based on Multi-Source Data and Ensemble Learning » by Jie Meng et al. and tested associated datasets available in Zenodo. It presents original information on sandy beach locations throughout China, taking the form of a shape file produced using ensemble learning and multi-source data, from which various beach spatial characteristics (e.g., number, width, area) and risk analysis of coastal squeeze can be assessed and analyzed. It is not the first time sandy beaches were mapped in China using satellite imagery - with already existing datasets mentioned and assessed in comparison by the authors - yet it is the first time ensemble learning is used, which according to the authors improved detection accuracy. Besides, when previous studies used Sentinel-2 data only, Sentinel 1 and Google Earth imagery were also included in this study. I have found the manuscript well written and illustrated, although additional careful proofreading by the authors could have avoided several mistakes such as identical section and figure titles, repetitions and typos. Figure captions in particular must be improved. I attach an annotated version of the manuscript, listing a number of technical corrections and specific comments, that the authors should take into account while preparing their revision.
My main comment relates to the evaluation of the data produced and concerns the truth data used for the evaluation, as well as the quality metrics selected. It is not clear how you produced the truth data (testing set) and why this differs from your dataset 1 which is also produced through visual interpretation (since the same method was used, results should be same, however Table 6 indicates this is not the case). I think this requires further explanation in the manuscript. You mention 14,694 features were labelled for testing. However, this spans 5 land cover classes with various repartitions from year to year. This means that in general, less than 400 sandy beaches were used for evaluating your dataset for each year, which is a lot less than 14,694!
Concerning quality metrics, why were they chosen, and are they really complementary to each other? Tables 5 and 6 suggest similar interpretations can be made for all metrics as results do not differ significantly. Additional information should be provided to guide readers in how to interpret these quality metrics and the results obtained.
Likewise, the authors evaluate other (independent) datasets (some produced by the authors themselves, eg. Dataset 1) which they name reference datasets. I think « reference » is misused in this context as it generally implies that the data were used for validation (truth data), which is not the case here.
Additional information could be provided and discussed in regard to the liability and capacities of the machine learning method used. What is the minimum beach size that can be detected? Looking at the dataset, it seems small pocket beaches can remain undetected. In contrast, elongated features (some artificial) may be erroneously detected as sandy beaches. What is the impact of tide range (potentially variable across China) and having images obtained at different tide levels on data consistency and how can this be improved? It is said using annual averages (composite images) helps mitigate this issues, but it is not clear how, particularly as Figure 2 shows an heterogeneous spatial distribution of satellite images and certainly, with regular revisits, this means images are obtained at different tide levels throughout the country. From this could arise systematic biases in the spatial characteristics deduced from the dataset (eg., beach width and area). Was this, and how, mitigated for this study?
The temporal change in coastal squeeze is assessed over the period 1990-2024, yet beach distribution data outside 2016-2024 are not available. Thus, which data did you use for sandy beach spatial coverage? Currently, this is not explained in the text.
In conclusion, despite I see potential in the manuscript, I recommend moderate to major revisions be undertaken before it can be fully considered for eventual publication in ESSD. I hope you will find my comments useful for preparing your revised manuscript.