Preprints
https://doi.org/10.5194/essd-2025-502
https://doi.org/10.5194/essd-2025-502
05 Nov 2025
 | 05 Nov 2025
Status: this preprint is currently under review for the journal ESSD.

OpenSWI: A Massive-Scale Benchmark Dataset for Surface Wave Dispersion Curve Inversion

Feng Liu, Sijie Zhao, Xinyu Gu, Fenghua Ling, Peiqin Zhuang, Yaxing Li, Rui Su, Lihua Fang, Lianqing Zhou, Jianping Huang, and Lei Bai

Abstract. Surface wave dispersion curve inversion plays a critical role in both shallow geophysical exploration and deep geological studies, yet it remains hindered by sensitivity to initial models, susceptibility to local minima, and low computational efficiency. Recently, data-driven deep learning methods, inspired by their success in computer vision and natural language processing, have shown promising potential to overcome these challenges. However, the lack of large-scale and diverse benchmark datasets remains a major obstacle to the development and evaluation of such methods. To address this gap, we introduce OpenSWI, a comprehensive benchmark dataset generated through the Surface Wave Inversion Dataset Preparation (SWIDP) pipeline. OpenSWI comprises two synthetic datasets tailored to different research scales and application scenarios, namely OpenSWI-shallow and OpenSWI-deep, as well as an AI-ready real-world dataset for generalization evaluation, OpenSWI-real. OpenSWI-shallow is derived from the 2-D geological model dataset OpenFWI, containing over 22 million 1-D velocity profiles paired with their fundamental-mode phase and group velocity dispersion curves, spanning a broad spectrum of shallow geological structures (e.g., flat layers, faults, folds, and realistic stratigraphy). OpenSWI-deep is built from 14 global and regional 3-D geological models, comprising approximately 1.26 million high-fidelity 1-D velocity-dispersion data pairs for deep earth studies. OpenSWI-real, compiled from open-source projects, contains two sets of observed dispersion curves and their corresponding 1-D reference models, serving as a benchmark for evaluating the generalization of deep learning models. To demonstrate the utility of OpenSWI, we trained deep learning models on OpenSWI-shallow and OpenSWI-deep, and evaluated them on OpenSWI-real. The results show strong agreement between the predicted and reference velocity models, confirming the diversity and representativeness of the OpenSWI dataset. To facilitate the advancement of intelligent surface wave dispersion curve inversion techniques, we release the OpenSWI dataset (https://doi.org/10.5281/zenodo.16874111) and the SWIDP toolbox along with associated resources (https://doi.org/10.5281/zenodo.16884901), providing open resources to support the research community.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Feng Liu, Sijie Zhao, Xinyu Gu, Fenghua Ling, Peiqin Zhuang, Yaxing Li, Rui Su, Lihua Fang, Lianqing Zhou, Jianping Huang, and Lei Bai

Status: open (until 12 Dec 2025)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Feng Liu, Sijie Zhao, Xinyu Gu, Fenghua Ling, Peiqin Zhuang, Yaxing Li, Rui Su, Lihua Fang, Lianqing Zhou, Jianping Huang, and Lei Bai

Data sets

OpenSWI-dataset Feng Liu https://doi.org/10.5281/zenodo.16874111

Model code and software

OpenSWI-toolbox Feng Liu https://doi.org/10.5281/zenodo.16884901

Feng Liu, Sijie Zhao, Xinyu Gu, Fenghua Ling, Peiqin Zhuang, Yaxing Li, Rui Su, Lihua Fang, Lianqing Zhou, Jianping Huang, and Lei Bai
Metrics will be available soon.
Latest update: 05 Nov 2025
Download
Short summary
We introduce a large and diverse dataset that supports the development of machine learning methods for studying Earth structures through surface wave dispersion curves. Existing research has been limited by the absence of such benchmark data. Our dataset includes both computer-generated and real-world examples, allowing models to be tested and compared in a consistent way. By making these resources openly available, we aim to advance research on the shallow and deep Earth.
Share
Altmetrics