the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
OneDZ: A Global Detrital Zircon Database and Implications for Constructing Giant Geoscience Database
Abstract. The amount of detrital zircon U-Pb geochronology data and Lu-Hf isotopic data has doubled with the continuous improvement of testing methods, and has developed into the most closely integrated research field in earth science with big data methods. However, how to effectively construct giant databases in geoscience has become a challenge. Here, we present OneDZ, a global comprehensive detrital zircon U-Pb geochronology and Lu-Hf isotope database, which includes diverse samples with data source, location, stratigraphy, depositional age, and various elemental and isotopic information. OneDZ collected corresponding regions, stratigraphic and lithological information to facilitate quick access for users. Comparing with current zircon database, OneDZ complies 1,925,687 gains of detrital zircon U-Pb and 275,971 gains of detrital zircon Lu-Hf records from 275,971 publications. Furthermore, the construction of OneDZ leverages artificial intelligence (AI) and programming scripts and offers insights into managing large-scale unstructured data in geosciences. This paper further discusses the perspective of applying big data methods in the research of zircon-related areas. This database exemplifies the power of big data in Earth sciences, providing a platform for investigating zircon data in deep time. It serves as a springboard for research, offering new insights in understanding Earth's past, present, and future. The database (Li and Hu, 2025) is freely available via Zenodo at https://zenodo.org/records/15522949. All code snippets in this research are accessible via https://github.com/KeranLi/Global-Detrital-Zircon. The OneDZ web platform is accessible via https://dedc.geoscience.cn/onedz/.
- Preprint
(3653 KB) - Metadata XML
-
Supplement
(930 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on essd-2025-157', Anonymous Referee #1, 03 Jul 2025
General Comments
The OneDZ manuscript presents a modern approach to detrital zircon U-Pb geochronology and Lu-Hf isotopic data compilation. It highly improves efficiency in data management and curation using AI and python scripts to automate data validation and cleaning.
The need for such a compilation is evident, as reflected by the large download count of the archival dataset that accompanies this manuscript.
While great effort was put into creating the protocols, scripts and backend of OneDZ, its functionality as a standalone browser tool requires further development.
Specific Comments
User-serving components
There seems to be some missing functionality on the OneDZ user interface, e.g. https://dedc.geoscience.cn/onedz/HomePage.html returns a 404 error and the other two menu links are inactive.
When performing a coordinate search on the OneDZ user interface there seems to be a certificate issue blocking HTTP API requests over the HTTPS domain and preventing data download. This should be addressed and part of regular maintenance if the database frontend is intended as a community resource.
The table data extraction tool in DeepShovel seems to have better accuracy than most commercially available OCR products.
Navicat is a commercial software, if the intention is to “enhance user-friendliness”, while providing an interface that is accessible consider using an open source software (e.g. DBeaver).
Manuscript comments
There is no reference throughout the paper of other existing databases such as EarthBank (https://ausgeochem.auscope.org.au/map) or Geochron (https://www.geochron.org/geochronsearch.php), which provide a similar product, with more user-friendly interfaces.
The authors stated that “Although almost no previous research summarized the difficulties in collecting data sources”. There is an extensive body of literature on this topic, here is just a recent example https://doi.org/10.3390/rs16091484.
The authors mention “To ensure accessibility and inclusivity, Chinese-language papers on detrital zircons have been meticulously translated into English.” This is a major effort that is highly welcomed by the international community. To further ensure accessibility and inclusiveness of data access, consider translating the menus and buttons of the user interface as well.
Regarding the documented “spatial skew” of the OneDZ dataset – a side-by-side comparison of OneDZ sample distribution maps with AusGeochem/EarthBank (a global compilation that began as a nationally focused effort) reveals that curatorial priorities also contribute to regional data availability.
How is the discordance ratio defined in the database and was it calculated for all papers in the same way? See https://doi.org/10.1016/j.earscirev.2019.102899 for discussion.
Since this contribution is focused on an SQL database, the most useful figure would be a database schema with tables and keys and relationships noted.
“Class-2 and Class-3 types provide a more nuanced classification based on grain size” - Class-2 seems to provide a classification based on lithology (conglomerate, sandstone, mudstone, etc.).
Please clarify if publication Best Ages are what users have access to in the database.
Please make the code you used for the two resampling methods and SMOTE available in the Github, supplements, and mentioned around rows 255 and 395 respectively in the preprint.
The term “Paleo globality” is not frequently used in Earth Sciences. Consider rewording to paleo reconstruction of spatial distribution (or equivalent) to avoid reader confusion.
“Therefore, the evaluation results based on OneDZ, the world's largest detrital zircon database, indicate that the global scope of zircon big data research needs further assessment.” It would be useful to postulate what types of assessment you are implying e.g. which current day areas require more sampling. Comparisons with other databases seem useful as well.
“The impact of data sparsity is controlled by the 2 σ error” While the errors might help with outlier identification, they do not control data sparsity. Consider rewording this sentence.
Technical Corrections
Pre-Print Technical Corrections and recommendations are presented as comments in the attached pdf file.
Zenodo Dataset
The organization of the Zenodo archival dataset is confusing. The first version of the dataset contains SQL files without any description. The SQL files are then referenced as strongly recommended for use in the description of version v2 but are not present in the file list. To improve findability of key files SQL files should be added to v2, or at least a note clarifying that the SQL files should be downloaded from v1. The warnings in notes 1-3 while pertinent, are not very specific to this dataset. Since there are known and systematic errors, they should be specifically documented (e.g. which Chinese, Latin and Arabic characters have not been converted correctly) and/or fixed, either with excel macros or AI cleaning. Documenting the cleaning process of the transformed dataset would result in an important contribution for the community at large and improving LLMs that also struggle with these types of data transformations.
Supplementary material
Some of the Github python scripts contain the same header block which states ”This module is mainly designed to remove duplicate samples”, even for modules that have over functions e.g. latitude and longitude estimation. Accurate code documentation is essential for reusability.
Congratulations to the authors for their sustained efforts and thoughtful considerations in improving access to geoscience data.
-
AC1: 'Reply on RC1', Keran Li, 01 Aug 2025
We are deeply appreciative of the detailed and constructive feedback provided by the reviewers for our manuscript titled “OneDZ: A Global Detrital Zircon Database and Implications for Constructing Giant Geoscience Database.” Your insights have been invaluable in guiding us to enhance both the functionality of our database and the clarity of our manuscript. We have meticulously reviewed each comment and have implemented comprehensive revisions to address the concerns raised.
Specifically, we have made substantial progress in improving the OneDZ user interface. The initial issues with the web platform, including the 404 errors and inactive links, have been resolved through a server migration to a more robust infrastructure. The new web platform is now accessible at [https://www.onedz.top/](https://www.onedz.top/), and we have enriched its functionalities, including the development of a contribution data module that is currently under testing. Additionally, we have addressed the certificate issues that were previously blocking HTTP API requests, ensuring smoother data downloads and interactions. We have also taken steps to improve the international accessibility of our database by incorporating automated translation tools and recommending open-source software like DBeaver for database interactions.
In the manuscript, we have made several critical revisions to enhance its accuracy and completeness. We have expanded the introduction to include references to similar databases such as EarthBank and Geochron, acknowledging their contributions and differentiating our work. We have also revised the section on discordance ratio calculation to provide a clearer methodology based on Andersen et al. (2019). Furthermore, we have added a comprehensive Entity-Relationship (ER) diagram to illustrate the database schema and have detailed the resampling methods used in our study, making the code available on GitHub for transparency and reproducibility.
We are committed to ongoing improvements and are actively working on further enhancements to the OneDZ database. We believe that the revisions we have made address the reviewers' concerns comprehensively and will significantly benefit the scientific community. We are grateful for the opportunity to improve our work and look forward to any additional suggestions that may help us achieve our goal of creating a dynamic and accessible global detrital zircon database.
-
AC1: 'Reply on RC1', Keran Li, 01 Aug 2025
-
AC2: 'Comment on essd-2025-157', Keran Li, 01 Aug 2025
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2025-157/essd-2025-157-AC2-supplement.pdf
-
RC2: 'Comment on essd-2025-157', Bryant Ware, 05 Aug 2025
General Comments
This OneDZ manuscript outlines a modern approach to constructing a database through leveraging AI and python scripts to compile detrital zircon U-Pb geochronology and Lu-Hf isotopic data. Through utilising such technologies for data compilation, validation, and cleaning, as well as open access to the Python scripts, the efficiency of data management and curation can be significantly enhanced.As with reviewer one, I recognise and appreciate the substantial effort that has gone into developing the protocols, scripts, and backend infrastructure for OneDZ. However, I also have trouble using the standalone browser as it is currently implemented. Additionally, the manuscript itself requires considerable revision to improve clarity, coherence, and overall presentation of the work. However, once revised, I believe this work could make a valuable contribution to the research community, not only those working with detrital zircon U-Pb and Lu-Hf data, but also those developing or aiming to develop large geochemical databases, through its demonstration of modern technological applications and the useful insights it offers into database construction.
Specific Comments
Navigating to the https://www.onedz.top/ downloads tab, I am unable to get the search and download buttons to function. It is unclear to me how to enter the longitude (with an east or west or should this be done somehow with a negative and positive number?). Nothing appears to happen when I press the ‘search’ button and then I receive an “下载失败:Failed to fetch” error message when I press the ‘download’ button.
I unfortunately do not have the skill set to check the SQL files.Looking through some of the .csv files there appears to be an issue with the age and uncertainty columns (e.g., Published 206Pb/238U age (Ma) Published 206Pb/238U 1œÉ uncert. Published 206Pb/238U 2œÉ uncert.). The listed age is the same as the 1 sigma uncertainty and the 2 sigma uncertainty is the number from the preceding two cells doubled (e.g.,1769 1769 3538). The ‘best age’ and ‘best age uncert.’ Appear to be correct.
The corrections made already to the Zenodo files following reviewer one’s comments improve the understanding and thus accessibility of the data that can be downloaded there, thank you.
Technical Corrections
Please see the annotated PDF for specific, technical, and some general comments/ recommendations on the manuscript.-
AC3: 'Reply on RC2', Keran Li, 07 Aug 2025
Thank you for your positive feedback. We did not anticipate that our modest attempts would attract such significant attention. However, we must acknowledge that, as sedimentologists without a background in computer science or databases, the challenges we encountered during our learning and development process far exceeded our initial expectations. This has led us to continually push the boundaries of our knowledge and explore how earth science researchers can rapidly construct large-scale databases in the context of the rapid development of computer science, particularly artificial intelligence. Nevertheless, we recognize that there are still many areas for improvement and that our database is far from perfect. We hope to have the opportunity to continuously demonstrate our process of creating new tools and enhancing the quality of our database, providing new references for future research.
We appreciate your careful review. Our current testing suggests that there may be two potential causes: (1) The query range is too broad, resulting in low retrieval efficiency. (2) Under the constraints of network bandwidth, the number of repeated requests is too low. In response to these potential issues, we are optimizing the retrieval methods of MySQL to improve retrieval efficiency. We are also optimizing the backend to accommodate network conditions by increasing the number of requests. Updates on our progress will be posted in the News section.
In fact, we referred to the book *A First Course in Statistics* to convert the errors of 1σ and 2σ. The 1σ error typically refers to the probability that data falls within one standard deviation (σ) of the mean (μ) in a normal distribution, which is 68.27%. This means that if the data follows a normal distribution, approximately 68.27% of the data values will fall within the interval (μ - σ, μ + σ). The 2σ error refers to the probability that data falls within two standard deviations of the mean, which is 95.45%. In other words, approximately 95.45% of the data values will fall within the interval (μ - 2σ, μ + 2σ). From a probabilistic standpoint, these two are not equivalent, but the twofold relationship is based on the assumption that the results approximately follow a Gaussian distribution. Considering that the actual situation may not meet the Gaussian distribution, we used the 5th, 25th, 50th, and 95th percentiles for approximate estimation. The 2σ error is estimated by the difference between the 95th and 5th percentiles, while the 1σ error is estimated by the difference between the 75th and 25th percentiles. This is why there may or may not be a twofold relationship between the two.
We will update the manuscript with the remaining comments (I have not yet found where to update the manuscript, but the discussion system has advised not to update the manuscript. I am still waiting for the specific method to upload the manuscript. I speculate that it will be after the discussion ends). Thank you once again for your careful review.
Citation: https://doi.org/10.5194/essd-2025-157-AC3
-
AC3: 'Reply on RC2', Keran Li, 07 Aug 2025
Data sets
OneDZ: A Global Detrital Zircon Database and Implications for Constructing Giant Geoscience Database Keran Li https://zenodo.org/records/15522949
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
398 | 128 | 26 | 552 | 36 | 8 | 20 |
- HTML: 398
- PDF: 128
- XML: 26
- Total: 552
- Supplement: 36
- BibTeX: 8
- EndNote: 20
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1