The 2024 Release of the Global Heat Flow Database (GHFDB): Quality Assessment, Metadata Standards, and a Century of Geothermal Data

Neumann, Florian; Norden, Ben; Balkan-Pazvantoğlu, Elif; Elbarbary, Samah; Petrunin, Alexey G.; Elger, Kirsten; Jennings, Samuel; Frenzel, Simone; Fuchs, Sven

doi:10.5194/essd-2025-341

Preprints

https://doi.org/10.5194/essd-2025-341

Preprints

17 Jul 2025

| 17 Jul 2025

Status: a revised version of this preprint is currently under review for the journal ESSD.

The 2024 Release of the Global Heat Flow Database (GHFDB): Quality Assessment, Metadata Standards, and a Century of Geothermal Data

Florian Neumann, Ben Norden, Elif Balkan-Pazvantoğlu, Samah Elbarbary, Alexey G. Petrunin, Kirsten Elger, Samuel Jennings, Simone Frenzel, and Sven Fuchs

Abstract. The Global Heat Flow Database is a comprehensive data compilation on published heat-flow measurements dating back to the 1950s. The International Heat Flow Commission first released the database in 1963. Recent activities within the World Heat Flow Database Project (funded by the DFG German Research Association) and the Task Force VIII of the International Lithosphere Program (ILP) have focused on (1) developing a new, modern digital data infrastructure with integrated quality control of the data, (2) creating a new dedicated metadata scheme for reporting heat-flow data, (3) conducting a comprehensive review of the original literature to supplement the original metadata according to the new scheme, and (4) thoroughly adding new measurements from the literature. As a result, the 2024 release presents a substantial update, with the number of heat flow observations increasing from 58,302 data points in 2012 to 91,182 in 2024, while the number of literature sources simultaneously increased from 572 to 1,586 documents. A key part of this process was the introduction of a new, comprehensive metadata scheme and the development of the GHFDB Data Template, which facilitates the structured and detailed reporting of heat flow observations in accordance with the new scheme. The GHFDB Data Template captures methodological details, uncertainty estimates, and contextual information, forming the basis for a newly implemented, multi-dimensional quality-assessment system. The improved data submission workflow, now supported by the option of obtaining digital object identifier (DOI), making the newly submitted data citable in literature, as is increasingly required by journals. This service encourages direct contributions from researchers and ensures transparency, attribution, and long-term data stewardship by the partner repository GFZ Data Services. The new heat flow database release marks a significant step towards establishing a global, quality-assured data infrastructure and lays the foundation for more reliable, reusable, and interoperable heat-flow datasets across scientific disciplines.

Received: 10 Jun 2025 – Discussion started: 17 Jul 2025

Competing interests: The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. However, Dr. Kirsten Elger is a member of the editorial board of the journal.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 3161 KB)

Supplement (919 KB)

Download & links

Florian Neumann, Ben Norden, Elif Balkan-Pazvantoğlu, Samah Elbarbary, Alexey G. Petrunin, Kirsten Elger, Samuel Jennings, Simone Frenzel, and Sven Fuchs

Status: final response (author comments only)

CC1:
'Comment on essd-2025-341', Philippe Marbaix, 10 Aug 2025
Dear authors and editor
I had a look at this paper from the perspective of data structure and would like to ask a few questions in all modesty – some may be naïve because I do not know the specifics of this dataset, which has been compiled over decades and will likely benefit from the upgrade and continued development. I hope these questions will contribute usefully to the open discussion and to the preparation of the final version of the paper (at least from the viewpoint of clarity, if other readers have the same questions).
On the database and its structure
A preliminary question to make sure that I am not misunderstanding the intention: when the paper refers to “the database” is it the dataset referred to in the assets – and only that, not an underlying structure which the dataset would be exported from?
Assuming that the database is the provided dataset, I am wondering why it is entirely “flat”: there is only one data table, while section 3.1 of the paper presents a “hierarchical structure”. I understand that there is a concept of parent (mostly sites) and child (mostly measurements) in the database, but all records (row in the data table) appear to contain detailed information about a child and a parent simultaneously. In lines 203-204, “parent entry” looks a bit like a misnomer: unless I missed something there are no specific parent entries (unique records which would each be devoted to defining a specific parent). Instead, information about a parent appears to be duplicated in all of its child rows. Duplication is something that one usually tries to avoid in a database, as it may theoretically lead to inconsistencies. In principle, it would be possible to have a table for sites with a parent ID, and a table of measurements each referring to the parent – a very classical “foreign key” in the context of databases. It may well be that I missed something, as this is obvious in the context of SQL databases, which the paper refers to in section 7. it would still be possible to export the data in an Excel workbook while keeping a (more) structured approach, with each table in an Excel sheet. Related questions are:
Was the flat format adopted to continue a legacy practice, or for other reasons?

Wouldn’t a more structured format help in further developing the database, to conveniently describes situations where the measurements at a given sites are somehow grouped, e.g. because there are several measurements which only differ by their depth, date, or method?

What is meant by “chosen (as the) parent entry” in line 367? How can we determine that a row/entry is a “parent” in the dataset?

Potential uses for climate studies
While a database of heat flow has relevance for studies related to climate and climate change by providing boundary conditions for heat fluxes, I am wondering whether the database is built with the intention of being, now or in the future, be relevant for the study of past climate. For example, Hopcroft and Gallagher (2023) refer to an « IHFC database » to assess climate change over the last 500 years using geothermal data. Are there several “IHFC databases”, with the one described in this paper focusing on background (steady-state) geothermal flow, and some other database devoted to more detailed flow and temperature vertical profiles?
(this partly links to a previous question – depending on the intention, the structure of the database may benefit from enabling depth profiles; I thought that vertical profiles were not the intention, but then I wondered about the role of the “Digital borehole” feature in the online GHFDB - https://portal.heatflow.world/explore/ )
Technical aspect of the file provided in the “Data sets” section (The Global Heat Flow Database: Release 2024):
The data is provided as an Excel file only. When opened, this file generates a message asking for updating data or not. The Data > Workbook links function of Excel reveals that some cells indeed have links to other workbooks (out of the dataset) :
Z:\WG\PROJEKTE\P_HeatFlow\databases\_DB_IHFC_Update_2025\Release_2024\Popov_etal._2021.xlsx (+ 2 other files)
This should not be present in a released dataset (it is not convenient, suggesting that something is missing from the provided files, etc.).
References
In the introduction, I think that the sentences in lines 38 – 40 mentioning permafrost, climate change, and oceanography would benefit from being supported by specific references.
Section 9 about applications and limitation of the database would also benefit from references to the literature, especially the part on climate change in lines 866 to 869 (and in relation with my earlier question, which link to the intended uses and limitations).
Reference in this comment:
Hopcroft, P. O. and Gallagher, K.: Global Variability in Multi‐Century Ground Warming Inferred From Geothermal Data, Geophysical Research Letters, https://doi.org/10.1029/2023GL104631, 2023.
Citation: https://doi.org/10.5194/essd-2025-341-CC1
- AC2: 'Reply on CC1', Florian Neumann, 18 Aug 2025
  
  Thanks for your feedback.
  A preliminary question to make sure that I am not misunderstanding the intention: when the paper refers to “the database” is it the dataset referred to in the assets – and only that, not an underlying structure which the dataset would be exported from?
  In this paper, “the database” refers to the released dataset
  Was the flat format adopted to continue a legacy practice, or for other reasons?
  
  Wouldn’t a more structured format help in further developing the database, to conveniently describes situations where the measurements at a given sites are somehow grouped, e.g. because there are several measurements which only differ by their depth, date, or method? What is meant by “chosen (as the) parent entry” in line 367? How can we determine that a row/entry is a “parent” in the dataset?
  The heat-flow community has used flat table exports since the 1960s and many users still expect and work with single-table CSV/Excel formats. A flat structure avoids the need for users to join tables, and ensures compatibility with common spreadsheet and GIS workflows.
  Internally, the curation system does treat sites and measurements hierarchically. In the export, “parent” information is repeated in each measurement row so the dataset remains self-contained.
  Terminology (“chosen parent entry”): This means that for sites with multiple measurements, one record is designated as the “parent” row in order to define the site metadata. However, in the export there are no separate parent-only rows. This is for the transparency for the end users.
  We recognize that a more structured (relational) representation could reduce duplication and better represent groupings of measurements, but for now the flat format maximizes accessibility and interoperability.
  Potential uses for climate studies
  This is already in the making where we aim to create a dataset with Temperature vs. depth data as well relevant thermal petropysical parameters.
  Technical aspect of the file provided in the “Data sets” section (The Global Heat Flow Database: Release 2024):
  Thanks, this will be changed.
  References
  Thank you for this suggestion. We agree that the mentioned passages would benefit from additional references.
  
  Citation: https://doi.org/10.5194/essd-2025-341-AC2
AC1: 'Comment on essd-2025-341', Florian Neumann, 18 Aug 2025

Thanks for your feedback.
A preliminary question to make sure that I am not misunderstanding the intention: when the paper refers to “the database” is it the dataset referred to in the assets – and only that, not an underlying structure which the dataset would be exported from?
In this paper, “the database” refers to the released dataset
Was the flat format adopted to continue a legacy practice, or for other reasons?

Wouldn’t a more structured format help in further developing the database, to conveniently describes situations where the measurements at a given sites are somehow grouped, e.g. because there are several measurements which only differ by their depth, date, or method? What is meant by “chosen (as the) parent entry” in line 367? How can we determine that a row/entry is a “parent” in the dataset?
The heat-flow community has used flat table exports since the 1960s and many users still expect and work with single-table CSV/Excel formats. A flat structure avoids the need for users to join tables, and ensures compatibility with common spreadsheet and GIS workflows.
Internally, the curation system does treat sites and measurements hierarchically. In the export, “parent” information is repeated in each measurement row so the dataset remains self-contained.
Terminology (“chosen parent entry”): This means that for sites with multiple measurements, one record is designated as the “parent” row in order to define the site metadata. However, in the export there are no separate parent-only rows. This is for the transparency for the end users.
We recognize that a more structured (relational) representation could reduce duplication and better represent groupings of measurements, but for now the flat format maximizes accessibility and interoperability.
Potential uses for climate studies
This is already in the making where we aim to create a dataset with Temperature vs. depth data as well relevant thermal petropysical parameters.
Technical aspect of the file provided in the “Data sets” section (The Global Heat Flow Database: Release 2024):
Thanks, this will be changed.
References
Thank you for this suggestion. We agree that the mentioned passages would benefit from additional references.

Citation: https://doi.org/10.5194/essd-2025-341-AC1
RC1:
'Comment on essd-2025-341', Andrew Goodwillie, 10 Sep 2025
A) General comments

Most practitioners in the geosciences will know that field programs are expensive and often challenging. So, it is essential to maximise the costly investment in data collection through the careful and responsible stewardship of data resulting from those field programs. In that regard, this paper on the Global Heat Flow Database (GHFDB) by Neumann et al. is exemplary. I reviewed this paper with interest since the importance of making available a robust compilation of legacy data cannot be overstated.
For members of the heat flow community the 2024 GHFDB release was surely much anticipated. The GHFDB team should be congratulated for embarking in 2019 on what is clearly a transformational update to a valuable community resource. The compilation effort alone - to unearth and evaluate data from so many disparate sources - is most inspiring.
The paper describes a major update to the Global Heat Flow Database. The update includes an impressive range of improvements:
A substantial increase in available carefully-evaluated data values,

A new statistically-based data quality assessment regimen,

The enhancement of a standardised metadata template,

The adoption of a flexible parent-child database schema and controlled vocabulary,

A new capability to assign DOIs for directly-submitted data so that contributors receive suitable citation credit,

And, the goal of introducing a new data portal.

In addition to the database content update, the paper describes the enhanced GHFDB Data Template that was developed in conjunction with the user community. The far-sighted desire of the GHFDB team to make it easier for future data contributors to submit robust, reliable metadata through this template will be critical in making future GHFDB updates less onerous.
Overall, the preprint is well-written and logically structured and it offers suitable contextual description to help users of the GHFDB 2024 release to understand both the quality and limitations of the data.

B) Specific comments
Perhaps the most obvious missing aspect of the paper is a clear, compelling justification of the database update effort. To address this, Section 9.4 should be moved to the Introduction and each example should be bolstered with suitable references from the scientific literature. In addition, the case would be strengthened if the authors are able to show metrics to quantify research that was made possible only by the existence of the GHFDB.

The involvement of the heat flow user community is mentioned in a few places. Community buy-in is essential for community trust in, and wide-scale adoption of, a compilation effort such as GHFDB. The paper would benefit from an expanded discussion describing how the GHFDB team engaged with the community.

Section 8.2 (lines 531-572) is largely a copy of Section 4. This duplication of text is both unnecessary and confusing.

Section 9.3 presents maps of a global interpolation of the heat flow data. Presumably the maps will be made available on the new data portal. The inclusion of standard deviation maps is useful but the user may better understand the sparseness and weighting of data measurements if masked maps representing different standard deviation values were available. Ideally, the user would be able to select a mask level interactively, such as to mask out everything with a standard deviation of greater than, say, 5 or 10 mW/m2. (Similar masked maps of global bathymetry instantly show the poor ship track data coverage even though the un-masked maps imply complete coverage. An example of a masking function, for bathymetry data, is available on the GMRT MapTool web interface.)

In Section 6, first paragraph, we are told that journals and publishers increasingly require authors to deposit data in domain-specific research repositories. I disagree with that statement. From my own experiences of working with data repositories, authors are increasingly using generic automated storage sites such as Zenodo simply to obtain an instant DOI to satisfy a journal editor.

Section 7 talks about the development of a new, one-stop shop digital data portal. I see this as potentially one of the most important long-term aspect of the GHFDB effort since, if designed well, the portal could quickly become the primary interface between users and the database, with the GHFDB serving as the back end. As such, instead of relegating a discussion of the portal to Section 7, more emphasis could be given to it by moving it higher up the paper.

C) Technical corrections
In section 3.1, first paragraph, there are two mentions of "core site". To avoid confusion with the sites of drill cores or sediment gravity/piston/push cores, perhaps replace the word "core" with "fundamental" or "basic".

Throughout the paper, there are many typos, rogue punctuation marks, and grammatical issues including missing or extra words, and inconsistencies (e.g. of "heat flow" versus "heat-flow"). On my first reading, I noted almost 30 instances. The Section 10 Summary is especially riddled with typos and grammatical errors.

D) Summary
This compelling paper marks a watershed moment for the heat flow community. It should be published once the above points have been addressed.
Citation: https://doi.org/10.5194/essd-2025-341-RC1
RC2:
'Comment on essd-2025-341', Anonymous Referee #2, 31 Dec 2025

The manuscript provides a comprehensive description of the structure and content of the 2024 release of the Global Heat Flow Database. While some of the material has appeared in earlier publications (e.g. the metadata structure of the GHFDB and the quality assurance scheme), its repetition here is worthwhile to save the reader tedious cross referencing. The simple statistical analyses and kriged maps of the 2024 GHFDB dataset provide readers and researchers with a rapid global overview of the current data quality and geographic distribution, providing value by highlighting geographic regions where future research might focus to fill in data gaps.
My main concern with the manuscript is a lack of clarity about whose work (past, present and future) is being reported. The authors are affiliated with five different research groups distributed across three organizations. Many references to "we" in Sections 2 and 10 are vague about who "we" is. Does "we" refer only to the authors (which in some instances would imply the authors are arguably claiming credit for work done by others), or does "we" mean that the authors are speaking on behalf of their institutions including support staff and co-researchers? Please clarify this early in the manuscript.

Citation: https://doi.org/10.5194/essd-2025-341-RC2
- AC3: 'Reply on RC2', Florian Neumann, 03 Jan 2026
  
  Thank you for this constructive comment. We appreciate the reviewer’s concern regarding the clarity of authorship and attribution.
  We acknowledge that the use of “we” in Sections 2 and 10 may be ambiguous and could be interpreted in different ways. During the revision process, we will clarify explicitly, early in the manuscript. Where activities, results, or developments are the outcome of broader institutional efforts, collaborations, or prior work by others, we will revise the text to attribute these contributions more precisely and avoid any unintended implication of exclusive authorship or credit.
  Regards,
  Florian Neumann
  
  Citation: https://doi.org/10.5194/essd-2025-341-AC3

Florian Neumann, Ben Norden, Elif Balkan-Pazvantoğlu, Samah Elbarbary, Alexey G. Petrunin, Kirsten Elger, Samuel Jennings, Simone Frenzel, and Sven Fuchs

Supplement

https://doi.org/10.5194/essd-2025-341-supplement

Data sets

The Global Heat Flow Database: Release 2024 Global Heat Flow Data Assessment Group https://doi.org/10.5880/fidgeo.2024.014

Model code and software

Heat Flow Quality Analysis Toolbox Saman F. Chishti et al. https://doi.org/10.5880/fidgeo.2025.043

Florian Neumann, Ben Norden, Elif Balkan-Pazvantoğlu, Samah Elbarbary, Alexey G. Petrunin, Kirsten Elger, Samuel Jennings, Simone Frenzel, and Sven Fuchs

Viewed

Total article views: 2,582 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
1,901	611	70	2,582	118	75	108

HTML: 1,901
PDF: 611
XML: 70
Total: 2,582
Supplement: 118
BibTeX: 75
EndNote: 108

Views and downloads (calculated since 17 Jul 2025)

Month	HTML	PDF	XML	Total
Jul 2025	107	20	19	146
Aug 2025	248	26	12	286
Sep 2025	844	20	10	874
Oct 2025	101	44	3	148
Nov 2025	112	70	2	184
Dec 2025	104	125	5	234
Jan 2026	137	100	10	247
Feb 2026	104	102	5	211
Mar 2026	131	91	4	226
Apr 2026	13	13	0	26

Cumulative views and downloads (calculated since 17 Jul 2025)

Month	HTML	PDF	XML	Total
Jul 2025	107	20	19	146
Aug 2025	248	26	12	286
Sep 2025	844	20	10	874
Oct 2025	101	44	3	148
Nov 2025	112	70	2	184
Dec 2025	104	125	5	234
Jan 2026	137	100	10	247
Feb 2026	104	102	5	211
Mar 2026	131	91	4	226
Apr 2026	13	13	0	26

Viewed (geographical distribution)

Total article views: 2,514 (including HTML, PDF, and XML) Thereof 2,514 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 16 Apr 2026

Download

Preprint (3161 KB)
Metadata XML

Short summary

The Global Heat Flow Database grew from 58,302 data points in 2012 to 91,182 in 2024, with enhanced quality assessments. Despite this, gaps in data and methodological details persist, especially in underrepresented regions. The database is crucial for geophysical, geothermal, and environmental research, offering valuable insights into Earth's thermal processes.


Total:	0
HTML:	0
PDF:	0
XML:	0