the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Spatial Patterns of Sandy Beaches in China and Risk Analysis of Human Infrastructure Squeeze Based on Multi-Source Data and Ensemble Learning
Abstract. Sandy beaches provide essential ecological and economic services, but their functions are increasingly threatened by human activities. Analyzing the spatial distribution of China's sandy beaches and the impacts of human activities offers valuable insights for coastal resource management and ecological protection. However, remote sensing technologies face challenges such as limited data sources and tidal influences, which affect recognition accuracy. Therefore, integrating multi-source remote sensing data and reducing the impact of tidal fluctuations to improve recognition accuracy remains a key challenge. This study proposes an innovative approach utilizing multi-source data and an ensemble learning model to identify sandy beaches in China (2016–2023). By integrating Sentinel-1/2 satellite data, terrain data, and nighttime light data, along with spectral, terrain, texture, and polarization features, sandy beaches were identified across multiple years, and the results were consolidated into a single-year dataset to analyze spatial patterns and risks from human infrastructure squeeze. (1) High-precision classification identified 2984 sandy beaches in China, covering a total area of 260.70 km2. Guangdong had the largest number, area, and perimeter, while Shanghai had the widest sandy beaches. (2) In Fujian, Guangdong, and Taiwan, the identified sandy beaches covered 149.68 km2, with perimeters of 5155.91 km and widths of 49.50 m, 32.83 m, and 50.70 m, respectively. These results were significantly better than those from reference datasets. (3) From 1990 to 2023, the area at risk from human infrastructure squeeze increased from 109.95 km2 to 245.58 km2, a rise of 135.63 km2, with the most significant increase occurring between 1990 and 2000. Guangdong and Fujian showed growth rates of 1.05 km2/year and 0.73 km2/year, respectively. This study provides an up-to-date dataset on China's sandy beaches. It assesses their spatial patterns and human impact risks, contributing to research and policy for the sustainable development of coastal zones (https://doi.org/10.5281/zenodo.15307240, Meng et al., 2025).
- Preprint
(2358 KB) - Metadata XML
-
Supplement
(1401 KB) - BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on essd-2025-264', Anonymous Referee #1, 21 Aug 2025
-
CC1: 'Comment on:Spatial Patterns of Sandy Beaches in China and Risk Analysis of Human Infrastructure Squeeze Based on Multi-Source Data and Ensemble Learning', Xuecheng Zhou, 06 Oct 2025
Comment on:Spatial Patterns of Sandy Beaches in China and Risk Analysis of Human Infrastructure Squeeze Based on Multi-Source Data and Ensemble Learning
This study has made valuable contributions to coastal resource research by addressing long-term challenges in beach identification and human impact assessment. Its most obvious advantage lies in the innovative integration of multi-source remote sensing data and integrated learning, which effectively overcomes the limitations of traditional single data or single model methods. By combining Sentinel-1/2 images, terrain data, and nighttime light data, and extracting four-dimensional features, this study constructed a stacked ensemble model that integrates RF, SVM, CART, and GBDT. The consistent high accuracy from 2016 to 2023, as well as comparisons with three reference datasets, confirm the robustness of this method, particularly in reducing misclassification of bare land and urban areas, which is a common issue in existing research. The resulting 10 meter resolution beach dataset and regional pattern analysis have filled the gap in long-term beach monitoring nationwide, providing a reliable data foundation for coastal ecological management.
This study also provides practical insights into the risk of human infrastructure crowding. By establishing a 100 meter buffer zone and analyzing impermeable surface data from 1990 to 2023, the increase in risk areas and regional differences were quantified. Linking risk trends with economic factors further reveals the coupling relationship between coastal urbanization and beach degradation, providing targeted guidance for policy-making, such as prioritizing protection in high-risk areas such as Shandong and Guangdong.
However, there are several aspects that deserve improvement. Firstly, tidal disturbances have not been fully resolved. Although years of data can alleviate tidal effects, the lack of tidal phase matching may introduce spatial inconsistency in beach extraction. Future work can integrate tidal prediction models or on-site tidal data to select time series images with consistent tidal conditions, improving spatiotemporal accuracy. Secondly, the assessment of infrastructure squeeze is relatively simple: relying solely on impermeable surface expansion and buffer zone analysis cannot capture dynamic and detailed impacts . Higher resolution data and multi criteria models will improve the granularity of risk attribution.
In addition, it can enhance the interpretability of integrated models. This study did not analyze the importance of features or the collaborative or redundant relationships between underlying models. Adding SHAP values or permutation importance analysis will clarify the contribution of each feature and optimize the model structure, reducing computational costs without sacrificing accuracy.
Overall, this study establishes a solid benchmark for coastal beach research, balancing methodological rigor and practical value. Addressing the aforementioned limitations will further enhance its scientific impact and practicality for sustainable coastal area management.
References
- Tao Zhang,Yiqun Zhang & Xiaofei Leng.(2025).Trends in gastric cancer burden in the Western Pacific region from 1990 to 2021 and projections to 2040.Frontiers in Oncology,15,1506479-1506479. https://doi.org/10.3389/FONC.2025.1506479.
- Jordan Gacutan,Ibon Galparsoro & Arantza Murillas-Maza.(2019).Towards an understanding of the spatial relationships between natural capital and maritime activities: A Bayesian Belief Network approach.Ecosystem Services,40,101034-101034. https://doi.org/10.1016/j.ecoser.2019.101034.
-
RC2: 'Comment on essd-2025-264', Anonymous Referee #2, 06 Oct 2025
This study establishes a coastal sandy beach recognition framework for China based on multi-source remote sensing data and a stacking ensemble learning approach, producing a nationwide dataset with 10 m resolution from 2016 to 2023. The research demonstrates significant innovation and practical value in terms of data, methodology, results, and applications: the fusion of multi-source data effectively enhances classification reliability, the ensemble learning framework substantially improves accuracy, and high-resolution national-scale mapping has been achieved, revealing the spatiotemporal evolution of human infrastructure encroachment risks from 1990 to 2023. This would provide robust scientific support for coastal ecological conservation and sustainable management. The overall structure of the manuscript is logical, with adequate support from figures and data, and it meets the requirements for journal publication. In general, the work holds considerable academic value and practical significance, and I recommend acceptance after minor revisions to further improve rigor and readability.
Specific comments:
- L161: The window size used for gray-level co-occurrence matrix (GLCM) texture feature extraction (e.g., 3×3, 5×5) is not reported. Please specify and explain the choice, as window size has a substantial impact on metrics such as Entropy and ASM.
- L304: The overall English fluency is good; however, some minor errors in singular/plural forms and article usage remain. For example, in Section 5.2, the sentence “Stacking not only ensures competitive accuracy but also offers strong applicability…”—the term applicability could be replaced with generalizability. A final proofreading or use of a grammar-checking tool is recommended.
- In Figures 1, 2, 5, 6, 10, and 13, the blue regions represent the ocean. However, adjacent countries bordering China are not labeled, which may cause confusion. It is recommended to label neighboring countries to avoid potential misinterpretation.
- In Figure 7, consider including Sentinel-2 true-color imagery for comparison with the classification results, thereby enhancing the intuitiveness and persuasiveness of the figure.
- Several figure captions contain redundancies or unclear phrasing. Please review and refine them to ensure clarity and conciseness.
- The minimum detectable sandy beach size is not explicitly addressed. Please clarify this aspect in the discussion.
- The conclusion primarily emphasizes the value of the study, but provides limited discussion of methodological limitations and future perspectives. Please expand this section accordingly.
- Inconsistent use of Chinese and English punctuation marks is observed. Please standardize formatting throughout the manuscript.
- The conclusion would benefit from elaborating on the transferability of the method, such as its applicability to other countries or to datasets with different spatial resolutions.
- Some English references do not follow the journal’s formatting guidelines regarding author abbreviations (e.g., use of “and” vs. “&”). Please check and revise.
- For Table 3, please clarify whether the model parameters were optimized through hyperparameter tuning or adopted as default values.
- L35–40: The statement that “traditional field surveys are inefficient” could be strengthened by citing 1–2 recent field-based studies (within the past three years), to demonstrate coverage of the latest research progress.
Citation: https://doi.org/10.5194/essd-2025-264-RC2
Data sets
Spatial Patterns of Sandy Beaches in China and Risk Analysis of Human Infrastructure Squeeze Based on Multi-Source Data and Ensemble Learning Jie Meng, Duanyang Xu, Zexing Tao, Quansheng Ge https://zenodo.org/records/15307240
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
1,280 | 69 | 39 | 1,388 | 23 | 31 | 42 |
- HTML: 1,280
- PDF: 69
- XML: 39
- Total: 1,388
- Supplement: 23
- BibTeX: 31
- EndNote: 42
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
I have carefully read the manuscript entitled « Spatial Patterns of Sandy Beaches in China and Risk Analysis of Human Infrastructure Squeeze Based on Multi-Source Data and Ensemble Learning » by Jie Meng et al. and tested associated datasets available in Zenodo. It presents original information on sandy beach locations throughout China, taking the form of a shape file produced using ensemble learning and multi-source data, from which various beach spatial characteristics (e.g., number, width, area) and risk analysis of coastal squeeze can be assessed and analyzed. It is not the first time sandy beaches were mapped in China using satellite imagery - with already existing datasets mentioned and assessed in comparison by the authors - yet it is the first time ensemble learning is used, which according to the authors improved detection accuracy. Besides, when previous studies used Sentinel-2 data only, Sentinel 1 and Google Earth imagery were also included in this study. I have found the manuscript well written and illustrated, although additional careful proofreading by the authors could have avoided several mistakes such as identical section and figure titles, repetitions and typos. Figure captions in particular must be improved. I attach an annotated version of the manuscript, listing a number of technical corrections and specific comments, that the authors should take into account while preparing their revision.
My main comment relates to the evaluation of the data produced and concerns the truth data used for the evaluation, as well as the quality metrics selected. It is not clear how you produced the truth data (testing set) and why this differs from your dataset 1 which is also produced through visual interpretation (since the same method was used, results should be same, however Table 6 indicates this is not the case). I think this requires further explanation in the manuscript. You mention 14,694 features were labelled for testing. However, this spans 5 land cover classes with various repartitions from year to year. This means that in general, less than 400 sandy beaches were used for evaluating your dataset for each year, which is a lot less than 14,694!
Concerning quality metrics, why were they chosen, and are they really complementary to each other? Tables 5 and 6 suggest similar interpretations can be made for all metrics as results do not differ significantly. Additional information should be provided to guide readers in how to interpret these quality metrics and the results obtained.
Likewise, the authors evaluate other (independent) datasets (some produced by the authors themselves, eg. Dataset 1) which they name reference datasets. I think « reference » is misused in this context as it generally implies that the data were used for validation (truth data), which is not the case here.
Additional information could be provided and discussed in regard to the liability and capacities of the machine learning method used. What is the minimum beach size that can be detected? Looking at the dataset, it seems small pocket beaches can remain undetected. In contrast, elongated features (some artificial) may be erroneously detected as sandy beaches. What is the impact of tide range (potentially variable across China) and having images obtained at different tide levels on data consistency and how can this be improved? It is said using annual averages (composite images) helps mitigate this issues, but it is not clear how, particularly as Figure 2 shows an heterogeneous spatial distribution of satellite images and certainly, with regular revisits, this means images are obtained at different tide levels throughout the country. From this could arise systematic biases in the spatial characteristics deduced from the dataset (eg., beach width and area). Was this, and how, mitigated for this study?
The temporal change in coastal squeeze is assessed over the period 1990-2024, yet beach distribution data outside 2016-2024 are not available. Thus, which data did you use for sandy beach spatial coverage? Currently, this is not explained in the text.
In conclusion, despite I see potential in the manuscript, I recommend moderate to major revisions be undertaken before it can be fully considered for eventual publication in ESSD. I hope you will find my comments useful for preparing your revised manuscript.