the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
ASM-SS: The First Quasi-Global High Spatial Resolution Coastal Storm Surge Dataset Reconstructed from Tide Gauge Records
Abstract. Storm surges (SSs) cause massive loss of life and property in coastal areas each year. High spatial resolution and long-term SS records are the basis for deepening our understanding of this disaster. However, such global or quasi-global scale information could only be simulated by global numerical models until now due to the sparse and uneven distribution of tide gauge stations. In this paper, the all-site modeling framework for the data-driven model was implemented on a quasi-global scale within areas severely affected by SSs caused by tropical and extratropical cyclones. Compared to single-site modeling data-driven models, it can provide SS information for ungauged points. Compared to numerical models, it can reconstruct long-term SSs faster with fewer computational resources. We generated the first high spatial resolution (every 10 km per station along the coastline) hourly SS dataset ASM-SS (all-site modeling storm surge) within 45° S to 45° N, whose record length is over 80 years from 1940 to 2020. Assessments indicate that for 95th extreme SSs, the precision of this model (medians of correlation coefficients, root mean square errors, and mean biases are 0.66, 9 cm, and -4.4 cm, respectively) is slightly better than that of the state-of-the-art global hydrodynamic model (medians are 0.58, 10.8 cm, and -4.3 cm); for annual maximum SSs, our model is more stable than the numerical model with overall root mean square error and coefficient of determination optimizing by around 23.1 % and 14.8 %, respectively. This dataset could provide possible alternative support for coastal communities to estimate return levels of extremes, analyze variations (intensity, frequency, and trend) of SSs, and other relevant applications. The ASM-SS dataset is available at https://doi.org/10.5281/zenodo.13293595 (Yang et al., 2024a).
- Preprint
(12390 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on essd-2024-350', Anonymous Referee #1, 04 Sep 2024
Review of the paper “ASM-SS: The First Quasi-Global High Spatial Resolution Coastal Storm Surge Dataset Reconstructed from Tide Gauge Records” by Lianjun Yang, Taoyong Jin, and Weiping Jiang
Recommendation – a major revision is required before the paper can be published.
The paper describes a new dataset of storm-surges. The methodology is well described and written. However, a main issue needs to be addressed before the paper can be recommended for publication. The purpose of the dataset development is not well grounded and presented.
For example, section 3.3 opens with:
“The comparison with TGs indicated that ASM is slightly better than GTSM in the research area. Moreover, the significant advantages of the ASM data-driven model are: (1) compared to single-site data-driven models, it can provide SS information for ungauged points; (2) compared to numerical models, it can reconstruct long-term SSs faster.”
These are the advantages of the model while the paper should focus on the benefits of the dataset. One can argue that a single-site data model can provide higher accuracy, and numerical models can also give results in ungauged points and with higher resolution than the newly developed dataset. In which case, who are the target users for the dataset?
The paper should be revised to clarify the advantages and novelty of the dataset. Here are some points to consider:
- One of the key features is the spanning Does the accuracy change over time? It must vary as it is directly affected by the ERA5 inputs.
- The discussion of spatial variability in accuracy needs elaboration. The boxed continental statistics look nice but do not address all the shown differences. For example, performance in U.S. and Canada coasts are much better than in Central America. That could be attributed to the climatology, forcing fields or TG availability.
- Adding for each output point a measure of accuracy/quality based on the author’s evaluations will be of much help to the users.
- Excluding most of Europe devalues the significance of the dataset as it is supposed to be “quasi-global”.
Specific Remarks:
Lines 46-47: “Changes in SSs also need to be discussed. Unlike the widely agreed impact of sea level rise on ESLs, the contribution from changes in SSs remains controversial.”
The sentence is unclear and confusing. Please, rewrite it.
Line 57: “There are three main ways to obtain high-frequency...”. Listing the three ways together is problematic as data-driven models are dependent on TG observations. Additionally, authors’ data-driven model depends also on a numerical model. It would be great to introduce the concept of the data-driven model afterwards.
Line 66: “However, numerical models are relatively time-consuming, especially for the global simulation.” Time-consuming is not the most concerning matter for the dataset users. What are the disadvantages of numerical models related to their performance?
Line 85: “Therefore, the ASM provides an opportunity to obtain global long-term and high spatial coverage SSs simply and efficiently.” This para emphasizes the effectiveness of the model rather than ensuring that data-driven models provide reliable results. Please, rewrite it.
Line 95: The section opens with describing the inputs to the model. It would be better to start by explaining the methodology in general.
Line 98: https://doi.org/10.1002/qj.4803 should be also referenced since you use the extended version of ERA5.
Line 131: “every 20 km per coastal station.” The choice of the 10km resolution in the developed dataset is unclear and not explained. The Numerical model resolution is 2.5 and the data-driven model resolution is 20km yet the dataset is provided at 10km. Please justify this choice.
Line 137: “coastal stations with a 10 km resolution” the using term station here is confusing. Please, find a different way to call the model results (maybe nodes).
Line 175: “In addition, as mentioned in the introduction, our analysis here did not focus on the equatorial region (~6°S to ~6°N), the South Atlantic, and the southeastern Pacific.”
If these areas are a part the published dataset they must be evaluated as well. It is suggested to have the evaluations of the entire domain as the primary and the limited evaluations in the appendix.
Line 180: Fig 3 caption says “Model evaluation at tide gauges..”. Since there are several models involved in the work there should be a naming consistency to avoid confusion.
Line 214: “the significant advantages of the ASM data-driven model are: (1) compared to single-site data-driven models, it can provide SS information for ungauged points; (2) compared to numerical models, it can reconstruct long-term SSs faster. In this section” As it was mentioned above, these advantages are first and foremost related to the model itself rather than the developed dataset and its preference compare to the similar datasets.
Appendix A: Either add text to the appendix or move the figures to the sections which mention them.
Overall, the authors showed the new dataset has potential. But the paper requires significant improvement to be published.
Citation: https://doi.org/10.5194/essd-2024-350-RC1 - AC1: 'Response to Reviewer #1 Comments', Lianjun Yang, 04 Nov 2024
-
RC2: 'Comment on essd-2024-350', Anonymous Referee #2, 08 Oct 2024
The manuscript uses machine learning methods to establish relationships between tide gauge measurements and several atmospheric and oceanic variables, generating a global coastal storm surge dataset at 10 km spatial resolution. Overall, the generated dataset is of substantial application value, and the validation results show strong performance, particularly in the reconstruction of extreme values—a known challenge for AI models. The topic aligns well with the aims of ESSD. However, there are several key areas that require attention to ensure the manuscript is clear, methodologically sound, and accessible to readers.
Majors:
- The discussion of previous studies in the introduction lacks depth. The authors list previous studies without effectively explaining how the current work advances the field. To strengthen this section, the introduction should focus more on the existing gaps in storm surge modeling and how the proposed dataset addresses those shortcomings. The classification of storm surge research is overly simplified. The four categories mentioned in the second paragraph overlap and include one another. Moreover, the machine learning approach presented in this paper is described as separate from AI-based methods, though it clearly falls within that domain as a regression model. The difference between this approach and single-site models is primarily in the inputs used, such as geographic and temporal variables, but the fundamental methodology remains similar. A more refined categorization would provide better context for the reader.
- The description of the model’s methodology lacks sufficient detail on its innovations. For instance, the choice of specific atmospheric and oceanic variables from ERA5 should be justified, and the process of integrating geographical and temporal variables requires further explanation. How were these inputs pre-processed to allow for prediction across any coastal location or time? This is a key aspect of the model and should be clarified. Although more detailed explanations may have been presented in the authors' previous publications, it is still important to concisely convey these methodological details in this data-focused paper to ensure readers can fully understand the process without referring to other sources.
- One of the key strengths of the model is its superior performance in predicting extreme storm surge events compared to numerical models. However, the reasons behind this superior performance are not fully explored. A deeper analysis of why the machine learning model performs better than numerical models in extreme cases, particularly considering that AI models often struggle with extremes, would add significant value.
- While the manuscript provides a thorough discussion of the spatial performance of the dataset, it lacks an analysis of the model’s temporal performance. How does the model perform over the 1940–2020 period? Are there periods when the model is more or less accurate? Providing this temporal analysis would add an important dimension to the validation results.
- Figure 1 shows several tide gauge stations in South America and West Africa with long records, yet these regions are not featured in the validation results. The authors should explain why results from these areas were excluded from the analysis.
- The manuscript suffers from imprecise language and grammatical errors. Phrases like "coastline having complicated shapes" (line 41) and "internal climate variability" (line 49) are vague and not commonly used in geoscience literature. Additionally, phrases such as "numerical models are based on shallow water equations" (line 65) overly simplify the complexity of these models. Grammatical issues such as "until now" (line 9) and "will" (line 89) create ambiguity and should be corrected for clarity. Moreover, the manuscript contains an excessive number of speculative terms such as "some," "might," "may," and "slightly better." Scientific writing should avoid this level of uncertainty when possible, and more precise language should be used. Where quantifiable data are available, the authors should provide specific numbers to reduce ambiguity.
- The authors should ensure that the data description fully complies with the journal's requirements. Additional details about the structure and usage of the dataset may be necessary for ESSD’s standards.
Minors:
- The choice of an hourly temporal resolution for the dataset is not fully explained. The authors should provide a rationale for this decision, especially considering the implications for data volume and usability.
- The mention of "small phase shifts" (line 120) lacks context. The origin of these phase shifts and their impact on the results should be discussed in detail.
- Units such as cm/m should be standardized across the manuscript. Similarly, decimal precision should be consistent for a more professional and coherent presentation of the data.
- The gray lines in the figures (presumed to be tropical cyclone paths) should be explicitly described, and their inclusion justified. What purpose do these lines serve, and how do they enhance the understanding of the storm surge dataset?
- The same color bar is used for multiple metrics, which can create confusion. I recommend using separate color bars for each metric to avoid misinterpretation.
- The use of "surge" as a variable name in the NetCDF files is problematic, as it refers to a physical phenomenon rather than a dataset variable. I recommend choosing a more precise name that clearly describes the data field.
- Line 62, a space between “abovementioned”.
Another comment:
The manuscript emphasizes the computational inefficiency of numerical models, but fails to acknowledge that AI models, particularly those involving extensive preprocessing, ground truth acquisition, and training, can also be computationally expensive. Large/big AI models often require substantial computing power. A more balanced comparison of the computational demands of AI models versus numerical models would provide a fairer perspective on the advantages and limitations of each approach.
Summary:
Overall, this manuscript presents a highly valuable and timely contribution to the field of storm surge modeling. The application of machine learning to generate a global, high-resolution dataset fills an important gap in coastal hazard prediction, especially for regions lacking sufficient observational data. The dataset’s strong performance in reconstructing extreme values, combined with its spatial resolution, demonstrates its potential for numerous applications in coastal risk management and scientific research. While there are areas that could benefit from further clarification and refinement, particularly in terms of methodological transparency and computational comparisons, the work is commendable. It reflects a significant step forward in leveraging AI for oceanographic data analysis, and with some improvements, it will undoubtedly become a highly valuable resource for the community.
Citation: https://doi.org/10.5194/essd-2024-350-RC2 - AC2: 'Response to Reviewer #2 Comments', Lianjun Yang, 04 Nov 2024
- AC1: 'Response to Reviewer #1 Comments', Lianjun Yang, 04 Nov 2024
- AC2: 'Response to Reviewer #2 Comments', Lianjun Yang, 04 Nov 2024
Data sets
ASM-SS: The First Quasi-Global High Spatial Resolution Coastal Storm Surge Dataset Reconstructed from Tide Gauge Records Lianjun Yang, Taoyong Jin, and Weiping Jiang https://doi.org/10.5281/zenodo.13293595
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
431 | 90 | 158 | 679 | 13 | 15 |
- HTML: 431
- PDF: 90
- XML: 158
- Total: 679
- BibTeX: 13
- EndNote: 15
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1