P3-PetroPhysical Property Database – a global compilation of lab measured rock properties

Petrophysical properties are key to populate local and/or regional numerical models and to interpret results from geophysical investigation methods. Searching for rock property values measured on samples from a specific rock unit at a specific location might become a very time-consuming challenge given that such data are spread across diverse compilations 10 and that the number of publications on new measurements is continuously growing and data are of heterogeneous quality. Profiting from existing laboratory data to populate numerical models or interpret geophysical surveys at specific locations or for individual reservoir units is often hampered if information on the sample location, petrography, stratigraphy, measuring method and conditions are sparse or not documented. Within the framework of the EC funded project IMAGE (Integrated Methods for Advanced Geothermal Exploration, EU grant 15 agreement No. 608553), an open-access database of lab measured petrophysical properties has been developed (Bär et al., 20182019: P3 Database, http://dx.doi.org/10.5880/GFZ.4.8.2019.P3. The goal of this hierarchical database is to provide easily accessible information on physical rock properties relevant for geothermal exploration and reservoir characterization in a single compilation. Collected data include ‘classical’ petrophysical, thermophysical and mechanical properties and, in addition, electrical conductivity and magnetic susceptibility. Each measured value is complemented by relevant meta-information such 20 as the corresponding sample location, petrographic description, chronostratigraphic age, if available, and original citation. The original stratigraphic and petrographic descriptions are transferred to standardized catalogues following a hierarchical structure ensuring inter-comparability for statistical analysis (Bär et al., 2019: P3 Petrography, http://dx.doi.org/10.5880/GFZ.4.8.2019.P3.p, Bär et al., 20182019: P3 Stratigraphy, http://dx.doi.org/10.5880/GFZ.4.8.2019.P3.s). In addition, information on the experimental setup (methods) and the 25 measurement conditions are listed for quality control. Thus, rock properties can directly be related to in-situ conditions to derive specific parameters relevant for simulating subsurface processes or interpreting geophysical data. We describe the structure, content and status quo of the database and discuss its limitations and advantages for the end-user.

• Have you done any QC for duplicate entries? This can often be an issue when compiling various sources • You suggest a number of extensions and inclusions in the manuscript -how long will this database be supported into the future? What kind of time scale would the online portal discussed be seen? I think this will do a lot for its longevity and accessibility.
• Line 20, page 2, close bracket 5 • Figure 1 resolution, although not of particular importance, is too low. This may just be a result of the draft version though?
• Line 15 page 3 -I'd suggest moving the extensive bracketed section to the end of the sentence rather than 4 words from the end • Line 6 page 4 -Suggest reword to: "This shall ensure a reduction in bias introduced by : : :" or "This shall reduce bias introduced by : : :" 10 • Line 14, page 11 -I find this sentence very confusing, I think it needs rewriting entirely. Sometimes along the lines of "In addition to the primary option of manual database quality control whereby/through … , an automatic process of quality control was implemented" or similar. You may even want to delete this paragraph entirely as the second paragraph with some modification seems sufficient as an opening statement for this subheading.
Comment on calculation type 2 for radiogenic heat production: Thank you very much for this valuable comment. We are currently in discussion about how to include radiogenic heat production and if we include it at all in future versions of P³. With the newest publication of Gard et al. (2019) in ESSD and their 'Global whole-rock geochemical database' with more than 1.3 Mio. entries worldwide, which also includes the calculation of the radiogenic heat production based on the geochemistry, the values included in P³ might be obsolete in the future. But up to know, it is beyond us to check for double entries between P³ 5 and Gard et al. (2019) and we would leave it to the experienced researcher which database to refer to. Referring to the manual quality check of Ashwal1987, it will always be possible, that some entries in P³ are not fully correct since errors as presented here are only to be found manually and not with our semiautomatic quality control implemented. Additionally, we needed to compile different specific methods into the more general methods described in the P³ Readme. We have checked this entry and would be happy to get notice about any other obvious error by the readers and users of our database to enhance the quality 10 of future releases of P³.

Comment on compressed database version for download:
We have created a .zip compressed version of the P³ excel version which is now also available for download with the submission of the revised manuscript. Additionally, we corrected the CSV version of the database with ";" as separator and "." instead of "," as decimal marker. The PDF version of the database has been removed according to the comments of both referees. 15

Comment on QC for double entries:
We only quality checked duplicate entries by documenting both primary and secondary sources and additionally by geographic comparison of sample locations. So far, we are quite certain, that if at all, only a small number of duplicate entries might be included, e.g. where the coordinates were not given in sufficient detail in the references.
It might well be that different authors sampled the same location. Such samples were not considered as duplicates.

Comment on suggested future extensions and inclusions of P³ and future support:
The database in its current version will 20 be permanently available and accessible via the doi and GFZ data services. The future development and extensions of the database as almost everywhere in the scientific world strongly depends on ongoing project funding and successful research proposals in the future. Funding for extensions is currently available until the end of 2022. The online portal is planned to be developed within the next four years. A research proposal for a project where this will be developed is currently under review.
We additionally would like to thank the anonymous referee #1 for pointing out some typos and sentences with problematic 25 grammar and changed these in revised version of the manuscript.

Anonymous Referee #2
Received and published: 2 April 2020 In this paper the authors describe the P3 petrophysical database they developed thanks to the support of a EU funded project i.e., IMAGE. How the data were collected and organised and the content of the database are well described in the manuscript.
This topic is of interest for the scientific community aimed to a quick search for rocks petrophysical properties for different 5 purposes. Moreover, the authors report about pre-upload data selection criteria and provide an interesting way to classify the content data in term of quality.
However, there are some general questions and others more related to the text and consequently in the dataset that have to be clarified: -Did the authors consider the possibility to use the GeoSciML model of geological features to build the database? GeoSciML 10 is a recognised international standard frame work aligned on INSPIRE. GeoSciML is useful for basic data exchange and it allows to easily extend the model to address more complex scenarios. Did the author consider a possible INSPIRE compliance of the database? If yes how? If no why? -The dataset presented by the authors, follows almost all the FAIR (Findable Accessible Interoperable Reusable) requirements (i.e., I checked the dataset with this online tool: https://www.ands-nectar-rds.org.au/fair-tool). It is worthwhile that this 15 important fact is mentioned in the text.
-Regarding the datasets provided together with the manuscript, I would put more emphasis on the interoperable file format.
Although a txt file is provided (which is readable by different computer machines), it would be formally better to have the database in a comma separated format (.csv) as specified in the file "Reeadme_P__Petrophysical_Property_Database_V1_Release_2019.pdf" available in the repository. The pdf version can 20 be avoided because it is a too large file and in my case it wasn't visible.
-In the appended file named: "Reeadme_P__Petrophysical_Property_Database_V1_Release_2019.pdf", downloadable from the repository reported in the manuscript (ftp://datapub.gfzpotsdam. de/download/10.5880. GFZ.4.8.2019.P3.s/) in the section 4. 'File format' is mentioned that the dataset is published in comma separated ASCII file (csv, MS-DOS) with columns delimited by ";". However, there is no csv file in the linked repository (cfr. point above). It exists a txt file where 25 column are separated by TABs. In the same txt file (but also in the xls spreadsheet) the decimal marker seems to be a comma "," and not the dot "." as stated. Please check it out and in adjust accordingly.
-Page 4 -Line 24 to 35 -The authors explain the choice to use a flat file instead of a relational database with the pros and cons. They built the database following a relational database system (Line 27), but this statement is not clear. I recommend the rewording in order to clarify that point. What does it mean the sentence at Line 27? Did the author create an Entity-30 Relation (ER) model? The ER model is highly recom mended even if a flat file database is developed, so it would be good to add the ER model into the paper describing at least the cardinalities among the different entities. At Line 30 the authors describe the positive aspects to have a relational database. Among them there are: i) data is uniquely stored just once and ii) eliminating data duplication that they can be referred to the important property named "referential integrity constraint". How the authors guarantee this property? Did they perform any check on the xls file to guarantee the database consistency? -Page 7 -Line 7 -It is mentioned that the latitude/longitude coordinates are UTM based. The UTM system is a projected 5 system that implies planar coordinates (i.e., easting and northing). In the database the field related to coordinates report geographical coordinates as decimal degree. It is not clear which is finally the used coordinate system or why there is this difference between the text and the database. This has to be sort out. Beside that it would be good to explain also how the coordinates retrieved from literature were treated if samples had different coordinate typology. Thank you very much for this question. We did indeed consider the possibility to use the GeoSciML model of geological features to build the database. However, the initial database of our own measurements was already based on the petrographic and stratigraphic classification schemes as published by Bär et al. (2019). Reason for that is the direct link to the well database 5 of the federal Geological Survey of Hessen, Germany, where the majority of our initial samples originated from. Additionally, these classifications with their internal hierarchical structure allows for a much more detailed classification compared to the GeoSciML petrographic terms. For the next release of the P³ database however, we prioritize to implement a direct link of our classifications to the GeoSciML geological features to allow for the proposed future compliance with INSPIRE. Since for both classifications, our own and the GeoSciML the terms are defined by geological vocabularies, such a link is possible but will 10 have no direct impact on the main contents of P³ -the petrophysical properties. For single entries, however, the conversion would have to be corrected on a case by case basis, as different vocabularies might not match one to one.
Referee comment 2: It is worthwhile to mention the important fact in the text, that the dataset presented by the authors, follows almost all the FAIR (Findable Accessible Interoperable Reusable) requirements (...).
Thank you for this comment. We fully agree and adapted the text accordingly. Thank you very much for this comment. We will remove the pdf-file of P³ since both referees have mentioned that it is not useable. Additionally, we have provided a new .csv file as interoperable file format with ";" as separator as specified in the P³ Readme. We have removed the TAB separated txt file and checked and implemented the correct use of the decimal marker as 25 dot "." instead of ",". Thank you very much for this comment.
We reworded the sentence in order to clarify this point. We did create an Entity-Relation (ER) model already during the development of P³ but did not change the file format for this first release from excel to another one. In the revised manuscript, we added an Entity-Relationship-Diagram as Appendix.
We checked for duplicate entries manually and adjusted the paragraph in the manuscript accordingly: 'The database was developed as flat-file format using Microsoft Excel to keep it as simple and easy to handle even by the 5 unexperienced user as possible. While other database structures are in comparison much more efficient, their database management schemes may render it too difficult for users not familiar with SQL to recover the desired data. However, the internal design of P³ with multiple sub-entities and tables is structured following a relational database management system (RDBMS, Codd, 1970) with an Entity-Relation (ER) model (see Appendix 1), so that it could easily be transferred to e.g. the well-established structured query language (SQL, Chamberlin and Boyce, 1974). Following this ERM the database could 10 easily be organised into multiple tables using the names of the tables as unique keys as links to other sub-tables. The main advantages of a relational database over a flat file format are that data is uniquely stored just once, eliminating data duplication, as well as performance increases due to greater memory efficiency and easy filtering and rapid queries (Gard et al., 2019).
However, the current flat-file structure allows for easy modification and extensions as new requirements emerge, as for example by adding more sub-tables for newly developed property measurements not fitting to any of the already included 15 properties could be added at later stages. On the other hand, filtering and quality control to ensure that data is entered into the database only once and that no duplicates exist had to be done manually. In our case data duplicates where removed by checking the coordinates of each data point with a radius of uncertainty of 1 km and, if necessary, manually removing every double entry identified.' Referee comment 6: -Page 7 -Line 7 -It is mentioned that the latitude/longitude coordinates are UTM based. The UTM 20 system is a projected system that implies planar coordinates (i.e., easting and northing). In the database the field related to coordinates report geographical coordinates as decimal degree. It is not clear which is finally the used coordinate system or why there is this difference between the text and the database. This has to be sort out. Beside that it would be good to explain also how the coordinates retrieved from literature were treated if samples had different coordinate typology.
Thank you very much for this valuable comment. Indeed the manuscript text is misleading at this point. It was planned in an 25 earlier version to use UTM. However, in the final version we used latitude/longitude in decimal degrees with a WGS84 reference system. For a conversion from literature, we used either Google Earth (Web Mercator Projection) or ArcGIS to allocate a latitude/longitude value in decimal degrees and the associated uncertainty to each data point. We are aware that this 'Google maps method' is not accurate but exact geographic information is quite often not provided in the literature used for this compilation. Typically in this case, only location names are provided. For all literature where both the exact coordinates 30 and the reference system was given, or where the location of the datapoints was given on a georeferenced map with the required information on the coordinate system used, we used ArcGIS to digitize the position of the individual samples. Therein, we used the same geographic projection as given in the original literature and either included the points as tabular values or we georeferenced the given maps accordingly and picked the points on the maps. Afterwards, the resulting coordinates were transferred to decimal degrees in the WGS84 reference with the transformation as suggested by ArcGIS. We have not documented the exact coordinate transformation used in each case in the database however.
We additionally would like to thank the anonymous referee #2 for pointing out some typos and sentences with problematic 5 grammar as well as suggestions for improvements of the figures and we have changed these in the revised version of the manuscript.
Last referee comment: In the conclusions and perspectives the plan to develop a public accessible web-based interface is highly recommended for the future and should be prioritized, because the high number of rows (more than 75,000 ) and columns (around 300) doesn't make the database so easily query-able and browsable, even in the excel software package. 10 Thank you for the remark. We will prioritize this work in the future as recommended. Please be referred here to our answers to the comments of the first anonymous referee: 'The database in its current version will be permanently available and accessible via the doi and GFZ data services. The future development and extensions of the database as almost everywhere in the scientific world strongly depends on ongoing project funding and successful research proposals in the future. Funding for extensions is currently available until the end of 2022. The online portal is planned to be developed within the next four years.  In addition, information on the experimental setup (methods) and the 25 measurement conditions are listed for quality control. Thus, rock properties can directly be related to in-situ conditions to derive specific parameters relevant for simulating subsurface processes or interpreting geophysical data.
We describe the structure, content and status quo of the database and discuss its limitations and advantages for the end-user.
Keywords: relational database, rock physical properties, laboratory measurements, global data compilation. 30

Introduction
The characterisation and utilisation of subsurface reservoirs generally relies on applying geophysical investigation methods and/or numerical simulation codesboth requiring, in turn, the knowledge of physical rock properties at depth. The strategy of populating numerical models with petrophysical properties can differ. For local-scale models, laboratory data from individual samples collected from the geological unit of interest may exist. In this case, this direct information should be used 5 together with sophisticated (physical and empirical) laws to populate the entire geological unit. For regional and continentalscale models, in contrast, parameters have to be generalised with respect to the spatial and physical variability of the investigated lithological units.
Individual rock types or petrographies typically exhibit a great variability in related properties due to heterogeneous mineral compositions, variable textures and differing porosity distribution (Schön, 2015). Existing rock properties compilations are 10 both an example for the high variability and for the different purposes of such databases (e.g. Cermak and Rybach, 1982, Clark, 1966, Landolt-Börnstein, PetroMod, Schön, 2004, Mortimer, 2005, Hantschel and Kauerauf, 2009, Lilios and Exadaktylos, 2011, Descamps et al., 2013, Aretz et al., 2015. Since such compilations are mostly published with limited meta-information, it is difficult to extract data for formations of interest. This is even aggravated due to additional limitations like the focused coverage of certain rock types or geographic areas (e.g. Germany: In addition, different compilations do not provide a homogenised set of meta-information. Furthermore, exploration data availability often depends on national legislation. In some countries industrial resource exploration data, including 25 petrophysical properties measured on cores of deep wells, may be public after a certain time period and then usually is incorporated in national information systems. In other cases exploration data remains confidential for longer time periods or even infinitely resulting in scarce data availability for the respective countries. Due to the current publication policy of international research institutions where a high number of peer-reviewed publications become more and more important for the individual scientific career, the amount of petrophysical data recorded worldwide 30 increased dramatically. These publications however are spread among many different geoscientific journals and dispersed in many hundreds of publications. Given the rate of newly published property data combined with the multitude of publishing journals, countries and authors, the research for and collection of data can be incredibly time-consuming. Recent studies show that domain experts spend nearly 80% of their working hours into collecting, cleansing and managing their domain specific data (CrowdFlower, 2016). An effective, comprehensive collection, collation and dissemination of this data is deemed critical to promote rapid, creative and accurate research (Gard et al., 2019).
To facilitate (i) efficient search for and research on measured rock physical properties, (ii) further evaluation of the property data using complementing meta-information, and (iii) adequate property generalisation for specific units, a comprehensive 5 database was developed within the framework of the EC funded project IMAGE (Integrated Methods for Advanced Geothermal Exploration, Grant Agreement No. 608553). The aim of this database is to compile, store and publicly provide petrophysical property data from published laboratory test results on rock samples of any kind including as much metainformation as possible. So far, literature data relevant for the IMAGE project and laboratory data collected during the IMAGE project were fed into this novel PetroPhysical Property Database (P 3 ). Here, we present the current state of P³ and release

Contents and Structure of the Database
P³ is publicly accessible and contains physical rock properties measured in laboratory experiments. It is licensed under a 5 creative commons (CC-BY 4.0) license and its structure follows the FAIR guiding principles for scientific data management and stewardship (Wilkinson et al. (2016). All data are selected to represent the characteristic scale of rock samples of few centimetres to decimetres, depending on the measurement methods (as described by numerous norming institutions or committees as e.g. the International Society for Rock Mechanics and Rock Engineering (ISRM), European Committee for Standardization (C)EN, International Organization for Standardization (ISO), American Society for testing and Materials 10 (ASTM international) and many more) for the different properties. Within P³ we aimed at homogenising measurement method descriptions to increase the inter-comparability between individual reported values. Larger-scale data from geophysical well logging, hydraulic well testing, integrating geophysical methods or other field-scale measurements, which integrate over larger rock volumes or several rock types are not yet included in the database (Figure 1). This shall reduce bias introducedThis shall ensure to reduce bias introduced by heterogeneities within larger geobodies including open or partly open discontinuities like 15 fissures, fractures, bedding or schistosity. In addition, judged based on the lithological description, we did not include data from very small scale samples, where the volume of interest is likely smaller than the minimum representative elementary volume (REV) (e.g. Ringrose and Bentely, 2015) for the investigated rock type. The full range of the scale-dependency of petrophysical properties as described in previous studies (e.g. Enge et al., 2007 is thus not yet reflected by the database but is planned to be incorporated in future versions. 5 To ensure that source data is publicly available to researchers, only data from scientific publications (books or peer reviewed journals) or proceedings (e.g. IGA Geothermal Papers/Conference Database) as well as published research reports (e.g. dissertations or publicly available student's theses, project reports) were included in P³. The database only contains measurements with a minimum amount of meta-information to allow for reasonable interpretations, generalisations, or simulations based on the collected data. The minimum associated meta-information is the reference to the data origin (citation) 10 and information about the petrography to allow for a classification according to a certain lithotype. If available, additional meta-data were included, such as the sampling location (potentially including its type, e.g. outcrop, abandoned or active quarry, vertical or deviated well), the affiliation to a registered sample set (e.g. International Geo Sample Number (IGSN, cf. Devaraju et al., 2016, Lehnert et al., 2006), stratigraphy, sample dimensions, measurement method or device and measurement conditions (pressure, temperature, stress) including degree of saturation and type of saturating fluid. Conversion of published 15 values to SI units as well as correction of some minor errors from published data or omissions from previous databases as they are identified is an ongoing process during the data curation.
The database was developed as flat-file format using Microsoft Excel to keep it as simple and easy to handle even by the unexperienced user as possible. While other database structures are in comparison much more efficient, their database management schemes may render it too difficult for users not familiar with SQL to recover the desired data. However, the 20 internal design of P³ with multiple sub-entities and tables is structured following a relational database management system (RDBMS, Codd, 1970) with an Entity-Relation (ER) model (see Appendix 2), so that it could easily be transferred to e.g. the well-established structured query language (SQL, Chamberlin and Boyce, 1974). Following this relational structureERM the database could easily be organised into multiple tables using the names of the tables as unique keys as links to other sub-tables.
The main advantages of a relational database over a flat file format are that data is uniquely stored just once, eliminating data 25 duplication, as well as performance increases due to greater memory efficiency and easy filtering and rapid queries (Gard et al., 2019). This However, the current flat-file structure allows for easy modification and extensions as new requirements emerge, as for example by adding more sub-tables for newly developed property measurements not fitting to any of the already included properties could be added at later stages. On the other hand, filtering and quality control to ensure that data is entered into the database only once and that no duplicates exist had to be done manually. In our case data duplicates where removed 30 by checking the coordinates of each data point with a radius of uncertainty of 1 km and, if necessary, manually removing every double entry identified.
Following the minimum requirements, the database is structured into three main sections or super entities (Figure 2), which are sets of data tables (described in more detail in the following parts of the paper). The first, named 'meta information', contains all meta-information on the sample including the sampling location, the sample type and dimensions as well as information on its petrography and stratigraphy and thus acts as primary table for unique sample identification. The second section or super entity contains the measured property value(s) of the unique rock samples. This section is sub-grouped into thermophysical properties, 'classical' petrophysical properties, mechanical properties as well electrical and magnetic properties and fields for property specific remarks. Finally, the third section or super entity named 'quality control' includes 5 all information relevant for the quality assessment of each data record (property measurement of the unique samples). Here, especially information on the measurement conditions (methodology, pressure and temperature conditions, degree of saturation etc.) are documented and used for the implemented semi-automatic quality control and assessment.
The first super-entity 'meta-information' consist of five tables or entities: sample ID, reference, sampling location, sample information, petrography and stratigraphy. A description of each of these tables is included in the following sub-chapters. The 10 tables for petrography and stratigraphy are available separately. The super-entity 'rock properties' contains 28 separate subtables for all properties included so far into the database each following a similar internal structure (see chapter 2.4). For many samples measurements of multiple properties were available and included into the database, which results in multiple documentation of the 'meta-information' of these samples in the current file structure. The super-entity 'quality control' contains two tables or entities, the first one for documentation of the measurement conditions and the second one for the 15 automated quality assessment of the entries (see chapter 2.5).

Sample Information
To distinguish measurements of different properties on a single sample or of the same properties performed at varying measurement conditions, every measurement is listed in a separate row. To group measurement data from individual samples, every sample receives a unique sample ID, which acts the primary key of each record and links multiple measurements 20 conducted on a single rock sample. The sample ID consists of the surname of the first author and the year of publication, together with a sequential number for the particular rock sample presented in the respective publication. In case of several references per author and year an additional letter (a,b,…) is introduced after the year. is linked to an accompanying reference database, compatible to all major reference management tools (e.g. EndNote, Citavi, BibTeX, JabRef, etc.), which contains the full information (Co-Authors, full title, journal, volume, pages, etc.) on the reference.
The references are abbreviated in a Bibtexkey according to the terminology used for individual samples. At best, only primary references are given. In case the primary reference is unavailable, while the data point is published as part of a review (or the like), a secondary reference was introduced. 30 Additionally, the date of input and the name of the person who generated the entry into the database (the editor, listed as contributors in chapter 6 team list) is documented.

Sampling Location 5
The sub-section 'sampling location' contains all relevant information on the location where a sample was obtained. Generally, rock samples can be sampled in an outcrop, a quarry or a well. In case neither the sampling location is given as outcrop, quarry or well, nor any exact coordinates are given in the corresponding publication, the location type "area" is selected. Furthermore, for every location type, a name, a country and state is given (e.g. location type: outcrop, location name: Fontainebleau, location country: France, location state/department: Seine-et-Marne).

Location Coordinates
The location coordinates describe the latitude and longitude based on the UTM-System (Universal Transverse Mercator) with the reference system WGS84 of the sampling point at the surface in decimal degrees. Another category of entry is the elevation 5 given in metres above sea level (m.a.s.l.). In the case of a core sample taken from a well, the latitude and longitude of the wellhead is given. In case of an area with undefined sampling point, e.g. "sample from the Rhenish Massif", a midpoint from this geological province has been assessed and a radius of uncertainty (in km) for the sampling location is givenestimated. For elongated areas (e.g. the Red Sea, the Upper Rhine Graben etc.) the choice of a circular radius of uncertainty artificially increases the uncertainty. The introduction of polygons for the definition of an area is discussed to be included in future releases 10 of the database. If no information is given for the location, the longitude and latitude are noted as 999 to avoid wrong map displays and half the circumference of the earth is used as uncertainty.
For a conversion of the sample coordinates retrieved from the literature we used either Google Earth (Web Mercator Projection) or ArcGIS to allocate a latitude/longitude value in decimal degrees and a rough estimation of the associated uncertainty to each data point. We are aware that this 'Google maps method' is not accurate but exact geographic information is quite often 15 not provided in the literature used for this compilation. Most common are the provision of location names or maps only. For all literature data points where both the exact coordinates and the reference system was given, or where the location was given on a georeferenced map with the required information on the coordinate system used, we used ArcGIS for transformation.
Therein, we used the same geographic projection as given in the original literature and either included the points as tabular values or we georeferenced the given maps accordingly and picked the points on the maps. Afterwards, the resulting 20 coordinates were transferred to decimal degrees in the WGS84 reference with the transformation method for the specific projected coordination system as suggested by ArcGIS. We have not documented the exact coordinate transformation used in each case.

Original Sample ID
To allow for reviewing original publications, the primarily given sample identification numbers or names are documented in 25 addition to the P³ sample ID. This makes it easier to search for a specific sample in a publication, which might have been used for further measurements or more detailed descriptions by other authors subsequently or individual users of the database.

International Geo Sample Number
The International Geo Sample Number (IGSN, cf. Devaraju et al., 2016, Lehnert et al., 2006) is a unique identifier for samples and specimens collected from the natural environment (http://www.igsn.org/). In order to enable locating, identifying, and well as the IGSN database in order to ensure access to more meta-information like sampling methods, project related information, etc., currently not implemented in P³. As described by Strong et al. (2016) the adoption of IGSNs will ensure compatibility and interoperability with other international databases, including the promotion of standard methods to locate, identify and cite physical samples.

Sample Type 5
Samples can have different shapes that are particularly relevant for the measurement technique. Core samples do have different characteristics than rock blocks or drill cuttings, etc. so that P³ reserves a separate column for the sample type.

Sample Dimensions [m]
Together with the documentation of the sample type, if available, information about its length, height, width and for cores, diameter, all given in meters, are documented. If the rock property "density" is measured for any sample where the dimensions 10 are given, sample volume and weight might be calculated as well. This additional information together with its petrography was essential to evaluate whether a sample reaches a Representative Elementary Volume (REV) or not.

Sample Coordinates
For several samples taken at a single sampling location (e.g. a large outcrop or quarry), eventually individual sample coordinates are given (longitude, latitude and elevation). For samples from a cored well, additionally, the depth of the sample 15 is given in measured depth (MD) and, if available, in true vertical depth (TVD) referenced to the ground level (i.e. meters below ground level, m b.g.l.). If data on the geometry of deviated wells are available, it is optional to either enter the sample location relative to the wellhead or with its exact location and elevation (with respect to the sea level).

Petrography or Rock Type
The petrography or rock type classification scheme is defined in a complementary database (Bär et al., 2019: P³ -Petrography, 20 http://dx.doi.org/10.5880/GFZ.4.8.2019.P3.p) directly published together with P³. Its internal structure is based on a hierarchical subdivision of rock types, where the rock description generally becomes more detailed with increasing rank of petrographic classification (based on the well database of the Geological Survey of Hessen, Germany: Hessisches Landesamt für Umwelt, Naturschutz, Umwelt und Geologie (HLNUG)). This hierarchical subdivision is based on international conventions (e.g. Bates and Jackson, 1987, Gillespie and Styles, 1999, Robertson, 1999, Hallsworth and Knox, 1999, Bas and 25 Streckeisen, 1991, Schmid, 1981, Fisher and Smith, 1991. Furthermore, the classification corresponds to the subdivision provided by existing property data compilations such as e.g. Hantschel and Kauerauf (2009), Schön (2011), Rybach (1984 and . Petrographic classifications from rank 1 to rank 4 can usually be identified from macroscopic descriptions of well logs, cores and geological mapping (Figure 3). The petrographic classifications from rank 5 to rank 9 require additional information on the texture or grain size, the modal composition or the geochemistry etc., which can usually only be acquired by microscopic or comparable special investigations. Overall, there are nine ranks covering a total of 1494 petrographies. The petrographic classification of a sample in P³ is based on the sample description within the original literature reference. A petrographic ID and a corresponding petrographic parental ID directly correlate the different classifications and their ranks (Table 1). This allows for example, to integrate all petrographies with higher ranks to a corresponding general term of lower rank and 5 statistically analyse the associated physical rock property values across petrographic definition boundaries (Figure 3). In P³, the petrographic ID, the petrographic parent ID and the simplified petrographic term are documented. Additionally, for 15 each sample original petrographic descriptions of the primary references can be presented if available. Details on the texture, homogeneity, layering, consolidation state of the sample and the direction of measurement with regard to internal structural features (such as bedding etc.) as well as degree of alteration or weathering can be documented together with specific remarks.

Stratigraphy 5
The stratigraphy of each sample was inserted into the database in two complementary ways. The first way is to use the definitions of the international chronostratigraphic chart of the IUGS v2016/04 (Cohen et al., 2013, updated) (Table 2). In contrast, a more detailed description of the local stratigraphic unit can also be documented if provided in the primary reference.

PetroPhysical Properties
The properties included in P³ can be grouped into 'classical' petrophysical properties, thermophysical properties, mechanical properties as well as electrical and magnetic properties ( Figure 2). Overall, 28 different rock properties are included so far and documented in separate sub-tables of the database following a similar internal structure. Based on the original reference, the 10 measurement is given as a value, which if available is complemented by a standard deviation, a minimum and maximum value and the number of measurements. Thus, it is possible to either include single measurements or mean values while still offering the opportunity of statistical evaluation by incorporating the number of measurements corresponding to a mean value. important for statistical analysis and comparability of the results of different methods. Particularly, the type of method might have a large impact on the quality and device-specific error of any measurement. Finally, specific remarks can be made for each value separately.

Quality Control
In addition to the primary option of manual database quality controlAs addition of the primary option of manual database quality control, which is by providing the information of the original data source, an automatic process of quality control was implementedan automatic quality control was implemented in P³. Therefore minimum requirements for a value to be included in the database were defined as already described in section 2. 5 To provide a quality estimate for each data entry in terms of provided meta-information, a set of key criteria is automatically analysed: (i) uncertainty of the geographic location, (ii) the rank of petrographic classification, (iii) the rank of stratigraphic classification, (iv) the completeness of information on measurement conditions and, (v) the statistical type of a value (e.g. single value, mean value etc.). For each key criterion, four different quality classes (excellent =1, average =2, poor = 3; and minimum) are defined and computed to numerical quality indices (qi, Table 3). A bulk quality index is calculated according to 10 the arithmetic mean of the quality indices of the different criteria, where values < 1.5 are considered excellent, values ≥ 1.5 < 2.5 are considered average and values ≥ 2.5 are considered poor and values > 3.5 only meet the minimum requirements.

Geographic Uncertainty
Concerning the location of the sample, a geodetic accuracy of less than 100 m is considered to be excellent quality, which should always be the case for outcrop samples or drill cores. If the information on the location only contains a description of 15 a geological unit in a certain region or area, the related size of this area is considered for the definition of the quality indices.
If the location can be constrained to a region with a radius of less than 1 km the quality is considered average whereas if the radius of uncertainty is between 1 km and 100 km, it is considered poor. Larger radius of uncertainty is considered as quality class 4.

Petrography or Rock Type 20
If the original petrographic or lithological description allows for the allocation of a petrographic term with a rank of 6 or higher, the quality is considered excellent, for a rank of 5 it is considered average because these petrographic terms usually allow for a distinction of petrographies as used for reservoir-or site-scale geological models. For a rank of ≤ 4 the quality is considered poor (compare Figure 3 and Table 1). To enter the database at all, the petrographic description of a sample has to allow for an allocation of a petrographic term of rank ≥2. This classification at least allows for a distinction of petrographies on a level used 25 for continental-scale geological models.

Stratigraphy
Concerning the stratigraphy of the sample, (i) information on the chronostratigraphic Stage or Age is considered to be excellent, (ii) information on the stratigraphic Series or Epoch is defined as average and (iii) if only the chronostratigraphic System or Period is given, it is considered poor. To enter the database, there is no minimum requirement for the information on the stratigraphic age, since (i) stratigraphy does not directly control physical properties and (ii) scientific users might retrospectively derive stratigraphic information from the sampling location in combination with the petrography of the sample and additional information such as geological maps.

Measurement Conditions
For every data point, the measurement conditions can be entered. These are the temperature (K), pressure (Pa), saturating fluid 5 and the degree of saturation (%) as well as for the mechanical properties additional information about the ambient stress field, σ1, σ2, σ3 (MPa), and the pore pressure of the sample (MPa). For the sonic velocities (vp and vs) the frequency of the sonic pulse and, for the uniaxial compressive strength and related mechanical properties, the strain rate can be given as additional measurement conditions. The quality assessment of the measurement conditions is based on both the measurement conditions and the measurement 10 device, which is needed to be able to quantify the specific measurement error typical for a certain method. Excellent quality is only provided if information is available on all these points. If only the measurement device and the temperature and pressure conditions or the degree of saturation is available, the data quality is defined as average. If only the device, or the temperature and pressure conditions, or the degree of saturation is described in the original reference the quality is considered to be poor.

Measurement Parameter
The last criterion for the quality control is the type of value representing the property. In general, single measurement values for a sample are ranked higher in quality than mean values of various measurements applied to a sample. Accordingly, single measurements are considered as excellent and mean values as average or poor. If the mean value is not only accompanied by the number of measurements to calculate the mean value, but also by the minimum and maximum as well as the standard 5 deviation from this set of measurements, the quality is defined as average. In contrast, a mean value accompanied only by a number of measurements is defined as poor. Values resulting from an unspecified number of measurements are not considered for quality control but still included into the database with NA ("not available") in the respective column for number of measurements to enable the user to exclude these values in statistical analyses.

Status of the Database, Data Availability and Quality 10
Up to now, data that entered the database are either from published data collections, scientific papers, student's theses and

Figure 5: Locations of data points currently included in P 3 for the Europe (for references see Appendix 1). Topographic map is the ETOPO1 map (Amante and Eakins, 2009)
The amount of data entries for different petrographies shows that all main consolidated rock types are well represented. With 5 38,219 property measurements from sedimentary rocks, 25,261 from magmatic rocks, 9,235 from metamorphic rocks, and 1,308 from unconsolidated rocks, petrographies usually considered as reservoir rocks are dominant making up more than 75% of the data.
Since P³ was collected to serve the goals of the IMAGE project and will always represent work in progress, its data entries are unevenly distributed among the different properties (Table 4) as well as regions. In its current version, the entries for some 10 properties derive from only a few sources. For example, radiogenic heat production values contained in the database have mainly been derived from the compilation of Vilà et al. (2010). This compilation, which is based on many secondary references, includes more than 2,100 representative U, Th and K concentrations from all over the world (originally published in 102 studies). Based on this chemical composition database, Vilà et al. (2010) calculated values of radiogenic heat production for a large variety of rock types. Of the original compilation (of Vilà et al., 2010), we have incorporated into the database only those values that were associated with sufficient metadata and based on actual lab investigations and not on spectral gamma ray and density data of borehole geophysical logs. Newer compilation on radiogenic heat production (e.g. Hasterock &Webb, 2017 5 andHasterock et al., 2017) have not yet been included. Concerning the data quality, the quality indices both for the bulk index as well as for the five indices defined in Table 3 show 10 a wide dispersion over all quality classes. The quality indices for the petrography, the geographic uncertainty and the measurement parameter show mainly quality values of 1 to 3 representing a good quality of input data documentation in average. Only the quality indices for measurement conditions and for the stratigraphy, where quality index values of 3 and 4 are dominant show that the documentation of these metadata is not satisfactory for a large share of the compiled data.  Table 3.

Discussion
The current status of the database already shows a lot of benefits that such a compilation has automatically brings along, but 5 also some limitations, which have to be addressed in future amendments. The defined minimum requirements for a datum to be integrated into P³ guarantee its usability in terms of statistical, spatial, petrographic and stratigraphic analyses. Since it also contains multiple properties measured on a single sample, direct correlations with other data and properties are facilitated. This may help identifying new relationships (formal, causal or statistical correlations), and contribute to a better understanding of the limitations of generalisation or possibilities for upscaling approaches. The automatic quality assessment allows for a quick 10 evaluation of a single datum within a group of selected entries. The possibility of correlating data also simplifies and accelerates the identification of key references for rock parameters in specific regions, for specific rock types, or stratigraphic units. Furthermore, the database allows to systematically analyse the dependency of property values on the corresponding measurement conditions. Thus, the most important added value of P³ compared to existent databases is its dimension (large number of entries corresponding to a large number of petrophysical properties) as well as the documented meta-information. 15 Despite all benefits, such a database can never be complete and is always prone to uncertainties. To identify errors in original publications (in terms of property values and meta-information, e.g. sample preparation, accuracy of measurements, sampling bias, lab worker bias, measurement methods, reference standards and many more) is beyond the scope of this compilation. In addition, data-input errors, errors concerning the interpretation or the petrographic and stratigraphic classification cannot be excluded. We assume that the quality check of the original publications and the data therein has already been done by skilled reviewers or editors of the corresponding scientific journals, respectively theses. In addition to that, the quality indices developed as part of P³ allow the user to quickly evaluate the quality of each data point and thus help with the decision whether the original reference should be re-assessed or not.
Additionally, P³ includes values generated with different established or newly developed measurement methods, delivering 5 data of different quality and uncertainty. Hence, data comparability is not necessarily granted and a statistic assessment can only be representative if these effects are considered. Due to the documentation of the original source, however, the related detailed information of a chosen sample set can be verified if necessary. For subsequent applications, such as modelling, the spatial distribution of the data has to be considered as well as the origin of the samples. Due to diverse effects (such as temperature, pressure, weathering, diagenetic history, etc.), properties measured from outcrop analogue samples might differ 10 considerably in quality from those of the same formation at in-situ conditions within a deep reservoir formation. It remains to the experienced user to evaluate if the tabulated datum is applicable and if sufficient meta-information is given. In case of doubt, the users are referred to the original publications.

Conclusions and Perspectives
We developed the P³ database of petrophysical rock properties measured on rock samples in various laboratories. P³ is designed 15 to be as transparent and useful for various purposes as possible through the integration of multiple meta-information (including the original reference) for each data point. The database already comprises a great variety of properties, petrographies, stratigraphies etc. from samples investigated all over the world. In this first release, 75.,573 data points from 316 publications were included. The current compilation of samples mainly reflects the project goals of the geothermal project IMAGE (van Wees et al., 2015), while the applicability of P³ certainly can be seen in various geoscientific fields focusing on subsurface 20 utilisation (e.g. oil and gas, CCS, hydrogeology, subsurface storage of radioactive waste etc.). The collected data will help researchers and users particularly in the early stages of new geothermal or any other projects to make a first assessment of the subsurface geothermal rock properties. This will help planning future exploration needs and, in areas where the existing data density is sufficient, even support direct modelling or exploitation projects. Additionally, the database will help improving local and regional geoscientific studies with different focus on utilisation of the subsurface. Compiling the data from various sources, however, has shown that the general documentation of measured petrophysical properties is very heterogeneous and often the minimum requirements defined for our P³ were not fulfilled. We therefore appeal to the reviewers and editors of scientific journals to ensure that any publication containing original measurements of 30 petrophysical properties should come along with all the helpful and necessary meta-information as described here. Only if these requirements are fulfilled, a published dataset is of added value for the scientific community and can be used for subsequent investigations or applications.

Disclaimer
The authors declare that they have no conflict of interest.

Acknowledgements
The research leading to these results has received funding from the European Community's Seventh Framework Programme 10 under EU grant agreement No. 608553 (Project IMAGE). We thank all contributors and also our external cooperation partners for their support and work to fill the database with valuable data or by providing valuable reports or publications to be included in the compilation.