the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
OneDZ: A Global Detrital Zircon Database and Implications for Constructing Giant Geoscience Database
Abstract. The amount of detrital zircon U-Pb geochronology data and Lu-Hf isotopic data has doubled with the continuous improvement of testing methods, and has developed into the most closely integrated research field in earth science with big data methods. However, how to effectively construct giant databases in geoscience has become a challenge. Here, we present OneDZ, a global comprehensive detrital zircon U-Pb geochronology and Lu-Hf isotope database, which includes diverse samples with data source, location, stratigraphy, depositional age, and various elemental and isotopic information. OneDZ collected corresponding regions, stratigraphic and lithological information to facilitate quick access for users. Comparing with current zircon database, OneDZ complies 1,925,687 gains of detrital zircon U-Pb and 275,971 gains of detrital zircon Lu-Hf records from 275,971 publications. Furthermore, the construction of OneDZ leverages artificial intelligence (AI) and programming scripts and offers insights into managing large-scale unstructured data in geosciences. This paper further discusses the perspective of applying big data methods in the research of zircon-related areas. This database exemplifies the power of big data in Earth sciences, providing a platform for investigating zircon data in deep time. It serves as a springboard for research, offering new insights in understanding Earth's past, present, and future. The database (Li and Hu, 2025) is freely available via Zenodo at https://zenodo.org/records/15522949. All code snippets in this research are accessible via https://github.com/KeranLi/Global-Detrital-Zircon. The OneDZ web platform is accessible via https://dedc.geoscience.cn/onedz/.
- Preprint
(3653 KB) - Metadata XML
-
Supplement
(930 KB) - BibTeX
- EndNote
Status: open (until 08 Aug 2025)
-
RC1: 'Comment on essd-2025-157', Anonymous Referee #1, 03 Jul 2025
reply
General Comments
The OneDZ manuscript presents a modern approach to detrital zircon U-Pb geochronology and Lu-Hf isotopic data compilation. It highly improves efficiency in data management and curation using AI and python scripts to automate data validation and cleaning.
The need for such a compilation is evident, as reflected by the large download count of the archival dataset that accompanies this manuscript.
While great effort was put into creating the protocols, scripts and backend of OneDZ, its functionality as a standalone browser tool requires further development.
Specific Comments
User-serving components
There seems to be some missing functionality on the OneDZ user interface, e.g. https://dedc.geoscience.cn/onedz/HomePage.html returns a 404 error and the other two menu links are inactive.
When performing a coordinate search on the OneDZ user interface there seems to be a certificate issue blocking HTTP API requests over the HTTPS domain and preventing data download. This should be addressed and part of regular maintenance if the database frontend is intended as a community resource.
The table data extraction tool in DeepShovel seems to have better accuracy than most commercially available OCR products.
Navicat is a commercial software, if the intention is to “enhance user-friendliness”, while providing an interface that is accessible consider using an open source software (e.g. DBeaver).
Manuscript comments
There is no reference throughout the paper of other existing databases such as EarthBank (https://ausgeochem.auscope.org.au/map) or Geochron (https://www.geochron.org/geochronsearch.php), which provide a similar product, with more user-friendly interfaces.
The authors stated that “Although almost no previous research summarized the difficulties in collecting data sources”. There is an extensive body of literature on this topic, here is just a recent example https://doi.org/10.3390/rs16091484.
The authors mention “To ensure accessibility and inclusivity, Chinese-language papers on detrital zircons have been meticulously translated into English.” This is a major effort that is highly welcomed by the international community. To further ensure accessibility and inclusiveness of data access, consider translating the menus and buttons of the user interface as well.
Regarding the documented “spatial skew” of the OneDZ dataset – a side-by-side comparison of OneDZ sample distribution maps with AusGeochem/EarthBank (a global compilation that began as a nationally focused effort) reveals that curatorial priorities also contribute to regional data availability.
How is the discordance ratio defined in the database and was it calculated for all papers in the same way? See https://doi.org/10.1016/j.earscirev.2019.102899 for discussion.
Since this contribution is focused on an SQL database, the most useful figure would be a database schema with tables and keys and relationships noted.
“Class-2 and Class-3 types provide a more nuanced classification based on grain size” - Class-2 seems to provide a classification based on lithology (conglomerate, sandstone, mudstone, etc.).
Please clarify if publication Best Ages are what users have access to in the database.
Please make the code you used for the two resampling methods and SMOTE available in the Github, supplements, and mentioned around rows 255 and 395 respectively in the preprint.
The term “Paleo globality” is not frequently used in Earth Sciences. Consider rewording to paleo reconstruction of spatial distribution (or equivalent) to avoid reader confusion.
“Therefore, the evaluation results based on OneDZ, the world's largest detrital zircon database, indicate that the global scope of zircon big data research needs further assessment.” It would be useful to postulate what types of assessment you are implying e.g. which current day areas require more sampling. Comparisons with other databases seem useful as well.
“The impact of data sparsity is controlled by the 2 σ error” While the errors might help with outlier identification, they do not control data sparsity. Consider rewording this sentence.
Technical Corrections
Pre-Print Technical Corrections and recommendations are presented as comments in the attached pdf file.
Zenodo Dataset
The organization of the Zenodo archival dataset is confusing. The first version of the dataset contains SQL files without any description. The SQL files are then referenced as strongly recommended for use in the description of version v2 but are not present in the file list. To improve findability of key files SQL files should be added to v2, or at least a note clarifying that the SQL files should be downloaded from v1. The warnings in notes 1-3 while pertinent, are not very specific to this dataset. Since there are known and systematic errors, they should be specifically documented (e.g. which Chinese, Latin and Arabic characters have not been converted correctly) and/or fixed, either with excel macros or AI cleaning. Documenting the cleaning process of the transformed dataset would result in an important contribution for the community at large and improving LLMs that also struggle with these types of data transformations.
Supplementary material
Some of the Github python scripts contain the same header block which states ”This module is mainly designed to remove duplicate samples”, even for modules that have over functions e.g. latitude and longitude estimation. Accurate code documentation is essential for reusability.
Congratulations to the authors for their sustained efforts and thoughtful considerations in improving access to geoscience data.
Data sets
OneDZ: A Global Detrital Zircon Database and Implications for Constructing Giant Geoscience Database Keran Li https://zenodo.org/records/15522949
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
320 | 81 | 18 | 419 | 25 | 5 | 13 |
- HTML: 320
- PDF: 81
- XML: 18
- Total: 419
- Supplement: 25
- BibTeX: 5
- EndNote: 13
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1