the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
OpenSWI: A Massive-Scale Benchmark Dataset for Surface Wave Dispersion Curve Inversion
Abstract. Surface wave dispersion curve inversion plays a critical role in both shallow geophysical exploration and deep geological studies, yet it remains hindered by sensitivity to initial models, susceptibility to local minima, and low computational efficiency. Recently, data-driven deep learning methods, inspired by their success in computer vision and natural language processing, have shown promising potential to overcome these challenges. However, the lack of large-scale and diverse benchmark datasets remains a major obstacle to the development and evaluation of such methods. To address this gap, we introduce OpenSWI, a comprehensive benchmark dataset generated through the Surface Wave Inversion Dataset Preparation (SWIDP) pipeline. OpenSWI comprises two synthetic datasets tailored to different research scales and application scenarios, namely OpenSWI-shallow and OpenSWI-deep, as well as an AI-ready real-world dataset for generalization evaluation, OpenSWI-real. OpenSWI-shallow is derived from the 2-D geological model dataset OpenFWI, containing over 22 million 1-D velocity profiles paired with their fundamental-mode phase and group velocity dispersion curves, spanning a broad spectrum of shallow geological structures (e.g., flat layers, faults, folds, and realistic stratigraphy). OpenSWI-deep is built from 14 global and regional 3-D geological models, comprising approximately 1.26 million high-fidelity 1-D velocity-dispersion data pairs for deep earth studies. OpenSWI-real, compiled from open-source projects, contains two sets of observed dispersion curves and their corresponding 1-D reference models, serving as a benchmark for evaluating the generalization of deep learning models. To demonstrate the utility of OpenSWI, we trained deep learning models on OpenSWI-shallow and OpenSWI-deep, and evaluated them on OpenSWI-real. The results show strong agreement between the predicted and reference velocity models, confirming the diversity and representativeness of the OpenSWI dataset. To facilitate the advancement of intelligent surface wave dispersion curve inversion techniques, we release the OpenSWI dataset (https://doi.org/10.5281/zenodo.16874111) and the SWIDP toolbox along with associated resources (https://doi.org/10.5281/zenodo.16884901), providing open resources to support the research community.
- Preprint
(24671 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on essd-2025-502', Filippo Gatti, 12 Jan 2026
- AC1: 'Reply on RC1', Feng Liu, 13 Mar 2026
-
RC2: 'Comment on essd-2025-502', Anonymous Referee #2, 02 Mar 2026
General Comments:
Liu et al. construct OpenSWI, a comprehensive benchmark dataset designed for surface wave dispersion curve inversion, comprising three subsets: OpenSWI-shallow, OpenSWI-deep, and OpenSWI-real. These datasets effectively address the growing need for large-scale and diverse training resources to facilitate AI-based inversion techniques in both shallow and deep geophysical applications. The manuscript presents a systematic and geologically workflow for datasets construction, generating a large number of velocity dispersion curves from multiple publicly available synthetic and real models. Besides, the authors develop a unified quality control and standardization process, and several effective data augmentation strategies for building a massive and structurally diverse dataset. Finally, the author validated the feasibility and effectiveness of their datasets by testing on multiple real-world observations datasets.
Overall, this work is timely and potentially impactful. The scale of the dataset and the effort toward open-source release are commendable, and the proposed workflow provides a reproducible foundation for future dataset expansion. Nevertheless, several aspects of the data processes, forward modeling details, model training design, and overall presentation would benefit from further clarification and refinement. Addressing these issues would improve the clarity, methodological rigor, and reliability of the benchmark dataset for future applications.
Specific comments:
- Page 5, Lines 111-112, and Figures 2, 4: The authors mention that artifacts (e.g., zero or abnormal values) are corrected through interpolation or single-point removal during the quality control process. In Figure 2, the anomalous low-velocity point appears to be a numerical artifact introduced during interpolation after fault insertion, which may indeed be non-physical in the context of a normal fault setting. However, Flat–Fault and Fold–Fault models shown in Figure 4, some geological scenarios may involve reverse faulting or locally overturned strata. In such cases, localized low-velocity anomalies or sharp velocity inversions could be geologically reasonable rather than numerical artifacts. How does the quality control process distinguish between numerical artifacts and geologically meaningful velocity inversions? Please clarify.
- Page 7, Lines 141-144: The explanation of the procedures applied for depths <120 km and ≥120 km is unclear and potentially misleading. Although the manuscript states that Brocher’s empirical formulas are less applicable at depths ≥120 km, Brocher’s empirical relationship still appears to be used to compute ρ after deriving Vp from Vs based on a constant assumption. Please clarify this workflow. Besides, the manuscript adopts a fixed value of 1.79 for all depths below 120 km. Could this assumption reduce the variability, diversity, or realism of the dataset? Furthermore, might the use of different parameter conversion procedures above and below 120 km introduce an artificial discontinuity at this boundary?
- In the model training, the authors adopt MSE as the loss function for training the inversion model. Have alternative loss functions been evaluated, such as MAE or smoothed MAE (Huber loss)? Since MSE tends to promote smoother predictions, could this potentially affect the preservation of boundaries with sharp velocity discontinuities?
Technical comments:
- Page-3, Line 56: The citation format “...of the researchers Merrifield et al. (2022)” is inappropriate.
- Page-5, Lines 128-129: Please provide a detailed description of the de-duplication procedure. It would be helpful if the authors could clarify whether the de-duplication was implemented during the profile extraction stage (e.g., by applying a spatial sampling interval), or performed after extraction using a quantitative similarity criterion.
- Page-8, Lines 175-176: The expression “they provide deep learning models with ...” is somewhat informal, e.g., “they provide .... samples for model training, ...”
- Page-18, Lines 314-316: Please avoid using single-sentence paragraphs.
- Page-22, Lines 368-369: The described learning rate decay intervals (20 and 200 epochs) appear inconsistent with the corresponding figure 10, which seems to show decay at approximately 40 and 500 epochs. Please clarify.
- Page-23, Lines 399-401: Please avoid using single-sentence paragraphs.
- Figure 1 caption: The description “white box” appears inconsistent with the figure, as the box appears closer to gray instead of white.
- Figure 2: Please add the axis scale (with units) for the density curves, as currently only the velocity scale is shown.
- Figure 7: The central global map shows several gray dots. Could the authors clarify whether these are meaningful markers or possible visualization artifacts (e.g., due to low image resolution)?
Citation: https://doi.org/10.5194/essd-2025-502-RC2 - AC2: 'Reply on RC2', Feng Liu, 13 Mar 2026
Data sets
OpenSWI-dataset Feng Liu https://doi.org/10.5281/zenodo.16874111
Model code and software
OpenSWI-toolbox Feng Liu https://doi.org/10.5281/zenodo.16884901
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 532 | 506 | 40 | 1,078 | 36 | 44 |
- HTML: 532
- PDF: 506
- XML: 40
- Total: 1,078
- BibTeX: 36
- EndNote: 44
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
The size and the extent of the proposed database are remarkable and certainly of interest for the community. However, there are a few issues that must be addressed before publication:
- extracting 1D profiles from the same 3D geology, while adding some random fluctuation, seems to create a bias in the dataset (profiles are close to each other and they all described the same large geological structures).
- too few information are provided, even in the appendix, about the DDPM. In particular, on how viable is to expand the dataset with diffusion model: does the DDPM reproduce the same statistics? how many iterations are needed to infer new samples? how diverse are those samples? Unless the DDPM model has some novel feature, I think its role in this paper is rather marginal and can be overlooked. Otherwise, it should be expanded to highlight its importance
- what is the highest frequency that the geological models can propagate?
- are the random perturbations introduced by author consistent with the natural uncertainty? What about small scale heterogeneity which is well known to have a specific 3D correlation structure? Why did not the authors include this in their dataset?
- The authors overlooked one major dataset, published on this journal in 2024, which provides 30000 ground motion simulations including complex randomized geology:
Lehmann, F.; Gatti, F.; Bertin, M.; Clouteau, D. Synthetic Ground Motions in Heterogeneous Geologies from Various Sources: The HEMEW S -3D Database. Earth Syst. Sci. Data 2024, 16 (9), 3949 3972. https://doi.org/10.5194/essd-16-3949-2024.
This database span a ~10x10 km² for each sample and it is constructed with a minimum bias. Considered the fact that the dataset provides (geology,time-histories) couples, it would be interesting to benchmark the proposed model out-of-distribution, which is the most difficult aspect of benchmarking a new ML model
- The transformer architecture presented in the paper seem a little too advanced for such a simple dataset (dispersion curves vs 1D geological profile). It is necessary to benchmark it with existing alternative deep learning models in order to consider it as a reliable alternative.