the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
The 2024 Release of the Global Heat Flow Database (GHFDB): Quality Assessment, Metadata Standards, and a Century of Geothermal Data
Abstract. The Global Heat Flow Database is a comprehensive data compilation on published heat-flow measurements dating back to the 1950s. The International Heat Flow Commission first released the database in 1963. Recent activities within the World Heat Flow Database Project (funded by the DFG German Research Association) and the Task Force VIII of the International Lithosphere Program (ILP) have focused on (1) developing a new, modern digital data infrastructure with integrated quality control of the data, (2) creating a new dedicated metadata scheme for reporting heat-flow data, (3) conducting a comprehensive review of the original literature to supplement the original metadata according to the new scheme, and (4) thoroughly adding new measurements from the literature. As a result, the 2024 release presents a substantial update, with the number of heat flow observations increasing from 58,302 data points in 2012 to 91,182 in 2024, while the number of literature sources simultaneously increased from 572 to 1,586 documents. A key part of this process was the introduction of a new, comprehensive metadata scheme and the development of the GHFDB Data Template, which facilitates the structured and detailed reporting of heat flow observations in accordance with the new scheme. The GHFDB Data Template captures methodological details, uncertainty estimates, and contextual information, forming the basis for a newly implemented, multi-dimensional quality-assessment system. The improved data submission workflow, now supported by the option of obtaining digital object identifier (DOI), making the newly submitted data citable in literature, as is increasingly required by journals. This service encourages direct contributions from researchers and ensures transparency, attribution, and long-term data stewardship by the partner repository GFZ Data Services. The new heat flow database release marks a significant step towards establishing a global, quality-assured data infrastructure and lays the foundation for more reliable, reusable, and interoperable heat-flow datasets across scientific disciplines.
Competing interests: The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. However, Dr. Kirsten Elger is a member of the editorial board of the journal.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.- Preprint
(3161 KB) - Metadata XML
-
Supplement
(919 KB) - BibTeX
- EndNote
Status: open (until 11 Sep 2025)
-
CC1: 'Comment on essd-2025-341', Philippe Marbaix, 10 Aug 2025
reply
Dear authors and editor
I had a look at this paper from the perspective of data structure and would like to ask a few questions in all modesty – some may be naïve because I do not know the specifics of this dataset, which has been compiled over decades and will likely benefit from the upgrade and continued development. I hope these questions will contribute usefully to the open discussion and to the preparation of the final version of the paper (at least from the viewpoint of clarity, if other readers have the same questions).
On the database and its structure
A preliminary question to make sure that I am not misunderstanding the intention: when the paper refers to “the database” is it the dataset referred to in the assets – and only that, not an underlying structure which the dataset would be exported from?
Assuming that the database is the provided dataset, I am wondering why it is entirely “flat”: there is only one data table, while section 3.1 of the paper presents a “hierarchical structure”. I understand that there is a concept of parent (mostly sites) and child (mostly measurements) in the database, but all records (row in the data table) appear to contain detailed information about a child and a parent simultaneously. In lines 203-204, “parent entry” looks a bit like a misnomer: unless I missed something there are no specific parent entries (unique records which would each be devoted to defining a specific parent). Instead, information about a parent appears to be duplicated in all of its child rows. Duplication is something that one usually tries to avoid in a database, as it may theoretically lead to inconsistencies. In principle, it would be possible to have a table for sites with a parent ID, and a table of measurements each referring to the parent – a very classical “foreign key” in the context of databases. It may well be that I missed something, as this is obvious in the context of SQL databases, which the paper refers to in section 7. it would still be possible to export the data in an Excel workbook while keeping a (more) structured approach, with each table in an Excel sheet. Related questions are:
- Was the flat format adopted to continue a legacy practice, or for other reasons?
- Wouldn’t a more structured format help in further developing the database, to conveniently describes situations where the measurements at a given sites are somehow grouped, e.g. because there are several measurements which only differ by their depth, date, or method?
- What is meant by “chosen (as the) parent entry” in line 367? How can we determine that a row/entry is a “parent” in the dataset?
Potential uses for climate studies
While a database of heat flow has relevance for studies related to climate and climate change by providing boundary conditions for heat fluxes, I am wondering whether the database is built with the intention of being, now or in the future, be relevant for the study of past climate. For example, Hopcroft and Gallagher (2023) refer to an « IHFC database » to assess climate change over the last 500 years using geothermal data. Are there several “IHFC databases”, with the one described in this paper focusing on background (steady-state) geothermal flow, and some other database devoted to more detailed flow and temperature vertical profiles?
(this partly links to a previous question – depending on the intention, the structure of the database may benefit from enabling depth profiles; I thought that vertical profiles were not the intention, but then I wondered about the role of the “Digital borehole” feature in the online GHFDB - https://portal.heatflow.world/explore/ )
Technical aspect of the file provided in the “Data sets” section (The Global Heat Flow Database: Release 2024):
The data is provided as an Excel file only. When opened, this file generates a message asking for updating data or not. The Data > Workbook links function of Excel reveals that some cells indeed have links to other workbooks (out of the dataset) :
Z:\WG\PROJEKTE\P_HeatFlow\databases\_DB_IHFC_Update_2025\Release_2024\Popov_etal._2021.xlsx (+ 2 other files)
This should not be present in a released dataset (it is not convenient, suggesting that something is missing from the provided files, etc.).
References
In the introduction, I think that the sentences in lines 38 – 40 mentioning permafrost, climate change, and oceanography would benefit from being supported by specific references.
Section 9 about applications and limitation of the database would also benefit from references to the literature, especially the part on climate change in lines 866 to 869 (and in relation with my earlier question, which link to the intended uses and limitations).
Reference in this comment:
Hopcroft, P. O. and Gallagher, K.: Global Variability in Multi‐Century Ground Warming Inferred From Geothermal Data, Geophysical Research Letters, https://doi.org/10.1029/2023GL104631, 2023.
Citation: https://doi.org/10.5194/essd-2025-341-CC1 -
AC2: 'Reply on CC1', Florian Neumann, 18 Aug 2025
reply
Thanks for your feedback.
A preliminary question to make sure that I am not misunderstanding the intention: when the paper refers to “the database” is it the dataset referred to in the assets – and only that, not an underlying structure which the dataset would be exported from?
In this paper, “the database” refers to the released dataset
Was the flat format adopted to continue a legacy practice, or for other reasons?
Wouldn’t a more structured format help in further developing the database, to conveniently describes situations where the measurements at a given sites are somehow grouped, e.g. because there are several measurements which only differ by their depth, date, or method? What is meant by “chosen (as the) parent entry” in line 367? How can we determine that a row/entry is a “parent” in the dataset?The heat-flow community has used flat table exports since the 1960s and many users still expect and work with single-table CSV/Excel formats. A flat structure avoids the need for users to join tables, and ensures compatibility with common spreadsheet and GIS workflows.
Internally, the curation system does treat sites and measurements hierarchically. In the export, “parent” information is repeated in each measurement row so the dataset remains self-contained.
Terminology (“chosen parent entry”): This means that for sites with multiple measurements, one record is designated as the “parent” row in order to define the site metadata. However, in the export there are no separate parent-only rows. This is for the transparency for the end users.
We recognize that a more structured (relational) representation could reduce duplication and better represent groupings of measurements, but for now the flat format maximizes accessibility and interoperability.
Potential uses for climate studies
This is already in the making where we aim to create a dataset with Temperature vs. depth data as well relevant thermal petropysical parameters.
Technical aspect of the file provided in the “Data sets” section (The Global Heat Flow Database: Release 2024):
Thanks, this will be changed.
References
Thank you for this suggestion. We agree that the mentioned passages would benefit from additional references.
Citation: https://doi.org/10.5194/essd-2025-341-AC2
-
AC1: 'Comment on essd-2025-341', Florian Neumann, 18 Aug 2025
reply
Thanks for your feedback.
A preliminary question to make sure that I am not misunderstanding the intention: when the paper refers to “the database” is it the dataset referred to in the assets – and only that, not an underlying structure which the dataset would be exported from?
In this paper, “the database” refers to the released dataset
Was the flat format adopted to continue a legacy practice, or for other reasons?
Wouldn’t a more structured format help in further developing the database, to conveniently describes situations where the measurements at a given sites are somehow grouped, e.g. because there are several measurements which only differ by their depth, date, or method? What is meant by “chosen (as the) parent entry” in line 367? How can we determine that a row/entry is a “parent” in the dataset?The heat-flow community has used flat table exports since the 1960s and many users still expect and work with single-table CSV/Excel formats. A flat structure avoids the need for users to join tables, and ensures compatibility with common spreadsheet and GIS workflows.
Internally, the curation system does treat sites and measurements hierarchically. In the export, “parent” information is repeated in each measurement row so the dataset remains self-contained.
Terminology (“chosen parent entry”): This means that for sites with multiple measurements, one record is designated as the “parent” row in order to define the site metadata. However, in the export there are no separate parent-only rows. This is for the transparency for the end users.
We recognize that a more structured (relational) representation could reduce duplication and better represent groupings of measurements, but for now the flat format maximizes accessibility and interoperability.
Potential uses for climate studies
This is already in the making where we aim to create a dataset with Temperature vs. depth data as well relevant thermal petropysical parameters.
Technical aspect of the file provided in the “Data sets” section (The Global Heat Flow Database: Release 2024):
Thanks, this will be changed.
References
Thank you for this suggestion. We agree that the mentioned passages would benefit from additional references.
Citation: https://doi.org/10.5194/essd-2025-341-AC1
Data sets
The Global Heat Flow Database: Release 2024 Global Heat Flow Data Assessment Group https://doi.org/10.5880/fidgeo.2024.014
Model code and software
Heat Flow Quality Analysis Toolbox Saman F. Chishti et al. https://doi.org/10.5880/fidgeo.2025.043
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
268 | 43 | 31 | 342 | 21 | 16 | 17 |
- HTML: 268
- PDF: 43
- XML: 31
- Total: 342
- Supplement: 21
- BibTeX: 16
- EndNote: 17
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1