Articles | Volume 18, issue 4
https://doi.org/10.5194/essd-18-2769-2026
© Author(s) 2026. This work is distributed under the Creative Commons Attribution 4.0 License.
OpenSWI: a massive-scale benchmark dataset for surface wave dispersion curve inversion
Download
- Final revised paper (published on 21 Apr 2026)
- Preprint (discussion started on 05 Nov 2025)
Interactive discussion
Status: closed
Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor
| : Report abuse
-
RC1: 'Comment on essd-2025-502', Filippo Gatti, 12 Jan 2026
- AC1: 'Reply on RC1', Feng Liu, 13 Mar 2026
-
RC2: 'Comment on essd-2025-502', Anonymous Referee #2, 02 Mar 2026
- AC2: 'Reply on RC2', Feng Liu, 13 Mar 2026
Peer review completion
AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload
AR by Feng Liu on behalf of the Authors (16 Mar 2026)
Author's response
EF by Polina Shvedko (17 Mar 2026)
Manuscript
Author's tracked changes
ED: Referee Nomination & Report Request started (17 Mar 2026) by Andrea Rovida
RR by Anonymous Referee #2 (28 Mar 2026)
RR by Filippo Gatti (10 Apr 2026)
ED: Publish as is (10 Apr 2026) by Andrea Rovida
AR by Feng Liu on behalf of the Authors (13 Apr 2026)
Manuscript
The size and the extent of the proposed database are remarkable and certainly of interest for the community. However, there are a few issues that must be addressed before publication:
- extracting 1D profiles from the same 3D geology, while adding some random fluctuation, seems to create a bias in the dataset (profiles are close to each other and they all described the same large geological structures).
- too few information are provided, even in the appendix, about the DDPM. In particular, on how viable is to expand the dataset with diffusion model: does the DDPM reproduce the same statistics? how many iterations are needed to infer new samples? how diverse are those samples? Unless the DDPM model has some novel feature, I think its role in this paper is rather marginal and can be overlooked. Otherwise, it should be expanded to highlight its importance
- what is the highest frequency that the geological models can propagate?
- are the random perturbations introduced by author consistent with the natural uncertainty? What about small scale heterogeneity which is well known to have a specific 3D correlation structure? Why did not the authors include this in their dataset?
- The authors overlooked one major dataset, published on this journal in 2024, which provides 30000 ground motion simulations including complex randomized geology:
Lehmann, F.; Gatti, F.; Bertin, M.; Clouteau, D. Synthetic Ground Motions in Heterogeneous Geologies from Various Sources: The HEMEW S -3D Database. Earth Syst. Sci. Data 2024, 16 (9), 3949 3972. https://doi.org/10.5194/essd-16-3949-2024.
This database span a ~10x10 km² for each sample and it is constructed with a minimum bias. Considered the fact that the dataset provides (geology,time-histories) couples, it would be interesting to benchmark the proposed model out-of-distribution, which is the most difficult aspect of benchmarking a new ML model
- The transformer architecture presented in the paper seem a little too advanced for such a simple dataset (dispersion curves vs 1D geological profile). It is necessary to benchmark it with existing alternative deep learning models in order to consider it as a reliable alternative.