The PetroPhysical Property Database (P3) – a global compilation of lab-measured rock properties

Abstract. Petrophysical properties are key to populating local and/or
regional numerical models and to interpreting results from geophysical
investigation methods. Searching for rock property values measured on
samples from a specific rock unit at a specific location might become a very
time-consuming challenge given that such data are spread across diverse
compilations and that the number of publications on new measurements is
continuously growing and data are of heterogeneous quality. Profiting from
existing laboratory data to populate numerical models or interpret
geophysical surveys at specific locations or for individual reservoir units
is often hampered if information on the sample location, petrography,
stratigraphy, measuring method and conditions is sparse or not documented. Within the framework of the EC-funded project IMAGE (Integrated Methods for
Advanced Geothermal Exploration, EU grant agreement no. 608553), an
open-access database of lab-measured petrophysical properties has been
developed (Bär et al., 2017, 2019b: P3 – database,
https://doi.org/10.5880/GFZ.4.8.2019.P3. The goal of this hierarchical
database is to provide easily accessible information on physical rock
properties relevant for geothermal exploration and reservoir
characterisation in a single compilation. Collected data include classical
petrophysical, thermophysical, and mechanical properties as well as electrical conductivity and magnetic susceptibility. Each measured value is
complemented by relevant meta-information such as the corresponding sample
location, petrographic description, chronostratigraphic age, if available,
and original citation. The original stratigraphic and petrographic
descriptions are transferred to standardised catalogues following a
hierarchical structure ensuring inter-comparability for statistical analysis
(Bär and Mielke, 2019: P3 – petrography,
https://doi.org/10.5880/GFZ.4.8.2019.P3.p; Bär et al., 2018, 2019a:
P3 – stratigraphy,
https://doi.org/10.5880/GFZ.4.8.2019.P3.s). In addition, information on
the experimental setup (methods) and the measurement conditions are listed
for quality control. Thus, rock properties can directly be related to
in situ conditions to derive specific parameters relevant for simulating
subsurface processes or interpreting geophysical data. We describe the structure, content and status quo of the database and
discuss its limitations and advantages for the end user.



Introduction
The characterisation and utilisation of subsurface reservoirs generally relies on applying geophysical investigation methods and/or numerical simulation codes -both requiring, in turn, the knowledge of physical rock properties at depth. The strategy of populating numerical models with petrophysical properties can differ. For local-scale models, laboratory data from individual samples collected from the geological unit of interest may exist. In this case, this direct information should be used together with sophisticated (physical and empirical) laws to populate the entire geological unit. For regional and continental-scale models, in contrast, parameters have to be generalised with respect to the spatial and physical variability of the investigated lithological units.
In addition, different compilations do not provide a homogenised set of meta-information. Furthermore, exploration data availability often depends on national legislation. In some countries industrial resource exploration data, including petrophysical properties measured on cores of deep wells, may be public after a certain time period and then usually is incorporated in national information systems. In other cases exploration data remain confidential for longer time pe-riods or even infinitely, resulting in scarce data availability for the respective countries.
Due to the current publication policy of international research institutions where a high number of peer-reviewed publications has become more and more important for the individual scientific career, the amount of petrophysical data recorded worldwide increased dramatically. These publications, however, are spread among many different geoscientific journals and dispersed in many hundreds of publications. Given the rate of newly published property data combined with the multitude of publishing journals, countries and authors, the research for and collection of data can be incredibly time-consuming. Recent studies show that domain experts spend nearly 80 % of their working hours collecting, cleansing and managing their domain-specific data (CrowdFlower, 2016). An effective, comprehensive collection, collation and dissemination of these data are deemed critical to promote rapid, creative and accurate research (Gard et al., 2019).
To facilitate (i) efficient search for and research on measured rock physical properties, (ii) further evaluation of the property data using complementing meta-information, and (iii) adequate property generalisation for specific units, a comprehensive database was developed within the framework of the EC-funded project IMAGE (Integrated Methods for Advanced Geothermal Exploration, grant agreement no. 608553). The aim of this database is to compile, store and publicly provide petrophysical property data from published laboratory test results on rock samples of any kind including as much meta-information as possible. So far, literature data relevant for the IMAGE project and laboratory data collected during the IMAGE project were fed into this novel PetroPhysical Property Database (P 3 ). Here, we present the current state of P 3 and release version 1.0 in excel format (Bär et al., 2019b: P 3 -database, https://doi.org/10.5880/GFZ.4.8.2019.P3).
2 Contents and structure of the database P 3 is publicly accessible and contains physical rock properties measured in laboratory experiments. It is licensed under a creative commons (CC-BY 4.0) license, and its structure follows the FAIR guiding principles for scientific data management and stewardship (Wilkinson et al. (2016). All data are selected to represent the characteristic scale of rock samples of a few centimetres to decimetres, depending on the measurement methods (as described by numerous norming institutions or committees such as the International Society for Rock Mechanics and Rock Engineering (ISRM), European Committee for Standardization (C)EN, International Organization for Standardization (ISO), American Society for Testing and Materials (ASTM International) and many more) for the different properties. Within P 3 we aimed at homogenising measurement method descriptions to increase the inter-comparability between individual reported values. Larger-scale data from geophysical well logging, hydraulic well testing, integrating geophysical methods or other fieldscale measurements, which integrate over larger rock volumes or several rock types, have not yet been included in the database (Fig. 1). This shall reduce bias introduced by heterogeneities within larger geobodies including open or partly open discontinuities like fissures, fractures, bedding or schistosity. In addition, judged based on the lithological description, we did not include data from very small scale samples, where the volume of interest is likely smaller than the minimum representative elementary volume (REV) (e.g. Ringrose and Bentely, 2015) for the investigated rock type. The full range of the scale dependency of petrophysical properties as described in previous studies (e.g. Enge et al., 2007;Jahn et al., 2008;Howell et al., 2014;Rühaak et al., 2015) is thus not yet reflected by the database but is planned to be incorporated in future versions.
To ensure that source data are publicly available to researchers, only data from scientific publications (books or peer-reviewed journals) or proceedings (e.g. IGA Geothermal Papers/Conference Database) as well as published research reports (e.g. dissertations or publicly available student theses, project reports) were included in P 3 . The database only contains measurements with a minimum amount of meta-information to allow for reasonable interpretations, generalisations or simulations based on the collected data. The minimum associated meta-information is the reference to the data origin (citation) and information about the petrography to allow for a classification according to a certain lithotype. If available, additional meta-data were included, such as the sampling location (potentially including its type, e.g. outcrop, abandoned or active quarry, vertical or deviated well), the affiliation to a registered sample set (e.g. International Geo Sample Number (IGSN, cf. Devaraju et al., 2016;Lehnert et al., 2006)), stratigraphy, sample dimensions, measurement method, or device and measurement conditions (pressure, temperature, stress) including degree of saturation and type of saturating fluid. Conversion of published values to SI units as well as correction of some minor errors from published data or omissions from previous databases as they are identified is an ongoing process during the data curation.
The database was developed as a flat-file format using Microsoft Excel to keep it as simple and easy to handle even by the unexperienced user as possible. While other database structures are in comparison much more efficient, their database management schemes may render it too difficult for users not familiar with SQL to recover the desired data. However, the internal design of P 3 with mul-tiple sub-entities and tables is structured following a relational database management system (RDBMS, Codd, 1970) with an entity-relation (ER) model (see Appendix B) so that it could easily be transferred to for example the wellestablished structured query language (SQL, Chamberlin and Boyce, 1974). Following this ER model the database could easily be organised into multiple tables using the names of the tables as unique keys and as links to other sub-tables. The main advantages of a relational database over a flat-file format are that data are uniquely stored just once, eliminating data duplication, as well as performance increases due to greater memory efficiency and easy filtering and rapid queries (Gard et al., 2019). However, the current flat-file structure allows for easy modification and extensions as new requirements emerge; for example, by adding more subtables for newly developed property, measurements not fitting to any of the already included properties could be added at later stages. On the other hand, filtering and quality control to ensure that data are entered into the database only once and that no duplicates exist had to be done manually. In our case data duplicates where removed by checking the coordinates of each data point with a radius of uncertainty of 1 km and, if necessary, manually removing every double entry identified.
Following the minimum requirements, the database is structured into three main sections or super-entities (Fig. 2), which are sets of data tables (described in more detail in the following parts of the paper). The first, named "metainformation", contains all meta-information on the sample including the sampling location, the sample type, and dimensions as well as information on its petrography and stratigraphy and thus acts as a primary table for unique sample identification. The second section or super-entity contains the measured property value(s) of the unique rock samples. This section is subgrouped into thermophysical properties, classical petrophysical properties, mechanical properties, and electrical and magnetic properties and fields for property-specific remarks. Finally, the third section or super-entity named "quality control" includes all information relevant for the quality assessment of each data record (property measurement of the unique samples). Here, especially information on the measurement conditions (methodology, pressure and temperature conditions, degree of saturation) is documented and used for the implemented semi-automatic quality control and assessment.
The first super-entity, meta-information, consist of five tables or entities: sample ID, reference, sampling location, sample information, petrography and stratigraphy. A description of each of these tables is included in the following sub-chapters. The tables for petrography and stratigraphy are available separately. The super-entity "rock properties" contains 28 separate sub-tables for all properties included so far into the database each following a similar internal structure (see Sect. 2.4). For many samples, measurements of multiple properties were available and included into the database, which results in multiple documentation of the meta-information of these samples in the current file structure. The super-entity quality control contains two tables or entities -the first one for documentation of the measurement conditions and the second one for the automated quality assessment of the entries (see Sect. 2.5).

Sample information
To distinguish measurements of different properties on a single sample or of the same properties performed at varying measurement conditions, every measurement is listed in a separate row. To group measurement data from individual samples, every sample receives a unique sample ID, which acts as the primary key of each record and links multiple measurements conducted on a single rock sample. The sample ID consists of the surname of the first author and the year of publication, together with a sequential number for the particular rock sample presented in the respective publication. In the case of several references per author and year, an additional letter (a, b, . . . ) is introduced after the year.
For example, Fourier1822_1 stands for sample 1 within a publication of Fourier (1822). In the case of more than one publication per year, Fourier1822a_1 would represent sample 1 within a publication of Fourier (1822a). The sample ID is linked to an accompanying reference database, compatible with all major reference management tools (e.g. EndNote, Citavi, BibTeX, and JabRef), which contains the full information (co-authors, full title, journal, volume, pages, etc.) in the reference. The references are abbreviated in a Bibtexkey according to the terminology used for individual samples. At best, only primary references are given. In cases where the primary reference is unavailable, while the data point is published as part of a review (or the like), a secondary reference was introduced.
Additionally, the date of input and the name of the person who generated the entry into the database (the editor, listed as contributors in Appendix A, team list) are documented.

Sampling location
The subsection "sampling location" in the P 3 database contains all relevant information on the location where a sample was obtained. Generally, rock samples can be sampled in an outcrop, a quarry or a well. In cases where neither the sampling location is given as outcrop, quarry, or well nor any exact coordinates are given in the corresponding publication, the location type area is selected. Furthermore, for every location type, a name, a country and a state are given (e.g. location type: outcrop, location name: Fontainebleau, location country: France, location state/department: Seine-et-Marne).

Location coordinates
The location coordinates describe the latitude and longitude with the reference system WGS84 of the sampling point at Figure 2. Schematic structure of P 3 illustrating the three sections or super-entities: meta-information, rock properties and quality control. Different input parameters (small font) are grouped according to the entities or property sub-tables (italics) they belong to. the surface in decimal degrees. Another category of entry is the elevation given in metres above sea level (m a.s.l.). In the case of a core sample taken from a well, the latitude and longitude of the wellhead is given. In the case of an area with an undefined sampling point, e.g. sample from the Rhenish Massif, a midpoint from this geological province was assessed and a radius of uncertainty (in km) for the sampling location was estimated. For elongated areas (e.g. the Red Sea and the Upper Rhine Graben) the choice of a circular radius of uncertainty artificially increases the uncertainty. The introduction of polygons for the definition of an area is discussed to be included in future releases of the database. If no information is given for the location, the longitude and latitude are noted as 999 to avoid wrong map displays, and half the circumference of the earth is used as uncertainty.
For a conversion of the sample coordinates retrieved from the literature, we used either the map publisher Google Earth (Web Mercator projection) or the Geographic Information System (GIS) software ArcGIS to allocate a latitudelongitude value in decimal degrees and a rough estimation of the associated uncertainty to each data point. Exact geographic information is quite often not provided in the literature used for this compilation. Most common is the provision of location names or maps only. For all literature data points where both the exact coordinates and the reference system was given, or where the location was given on a georeferenced map with the required information on the coordinate system used, we used ArcGIS for transformation. Therein, we used the same geographic projection as given in the original literature and either included the points as tabular values or we georeferenced the given maps accordingly and picked the points on the maps. Afterwards, the resulting coordinates were transferred to decimal degrees in the WGS84 reference with the transformation method for the specific projected coordination system as suggested by ArcGIS.

Original sample ID
To allow for reviewing original publications, the primarily given sample identification numbers or names are documented in addition to the P 3 sample ID. This makes it easier to search for a specific sample in a publication, which might have been used for further measurements or more detailed descriptions by other authors subsequently or individual users of the database.

International Geo Sample Number
The International Geo Sample Number (IGSN, cf. Devaraju et al., 2016;Lehnert et al., 2006) is a unique identifier for samples and specimens collected from the natural environment (http://www.igsn.org/, last access: 14 August 2020). In order to enable locating, identifying and citing physical samples, the IGSN number was listed if available. Furthermore, entries allow for cross-linking both the P 3 and the IGSN database in order to ensure access to more meta-information like sampling methods and project-related information currently not implemented in P 3 . As described by Strong et al. (2016) the adoption of IGSNs will ensure compatibility and interoperability with other international databases, including the promotion of standard methods to locate, identify and cite physical samples.

Sample type
Samples can have different shapes that are particularly relevant for the measurement technique. Core samples do have different characteristics than rock blocks or drill cuttings so that P 3 reserves a separate column for the sample type.

Sample dimensions (m)
Together with the documentation of the sample type, if available, information about its length, height, and width as well as diameter for cores, all given in metres, are documented. If the rock property "density" is measured for any sample where the dimensions are given, sample volume and weight might be calculated as well. This additional information together with its petrography was essential to evaluate whether a sample reaches a representative elementary volume (REV) or not.

Sample coordinates
For several samples taken at a single sampling location (e.g. a large outcrop or quarry), eventually individual sample coor-dinates are given (longitude, latitude and elevation). For samples from a cored well, additionally, the depth of the sample is given in measured depth (MD) and, if available, in true vertical depth (TVD) referenced to the ground level (i.e. metres below ground level, m b.g.l.). If data on the geometry of deviated wells are available, it is optional to enter the sample location either relative to the wellhead or with its exact location and elevation (with respect to the sea level).

Petrography or rock type
The petrography or rock type classification scheme is defined in a complementary database (Bär et al., 2019a: P 3petrography, https://doi.org/10.5880/GFZ.4.8.2019.P3.p) directly published together with P 3 . Its internal structure is based on a hierarchical subdivision of rock types, where the rock description generally becomes more detailed with increasing rank of petrographic classification (based on the well database of the Geological Survey of Hesse, Germany: Hessisches Landesamt für Umwelt, Naturschutz, Umwelt und Geologie, HLNUG). This hierarchical subdivision is based on international conventions (e.g. Bates and Jackson, 1987;Gillespie and Styles, 1999;Robertson, 1999;Hallsworth and Knox, 1999;Le Bas and Streckeisen, 1991;Schmid, 1981;Fisher and Smith, 1991). Furthermore, the classification corresponds to the subdivision provided by existing property data compilations such as Hantschel andKauerauf (2009), Schön (2011), and Clauser and Huenges (1995a, b).
Petrographic classifications from rank 1 to rank 4 can usually be identified from macroscopic descriptions of well logs, cores and geological mapping (Fig. 3). The petrographic classifications from rank 5 to rank 9 require additional information on the texture or grain size, the modal composition, or the geochemistry, which can usually only be acquired by microscopic or comparable special investigations. Overall, there are nine ranks covering a total of 1494 petrographies. The petrographic classification of a sample in P 3 is based on the sample description within the original literature reference. A petrographic ID and a corresponding petrographic parental ID directly correlate the different classifications and their ranks (Table 1). This allows, for example, integrating all petrographies with higher ranks to a corresponding general term of lower rank and statistically analysing the associated physical rock property values across petrographic definition boundaries (Fig. 3).
In P 3 , the petrographic ID, the petrographic parent ID and the simplified petrographic term are documented. Additionally, for each sample, original petrographic descriptions of the primary references can be presented if available. Details on the texture, homogeneity, layering, consolidation state of the sample and the direction of measurement with regard to internal structural features (such as bedding) as well as the degree of alteration or weathering can be documented together with specific remarks.

Stratigraphy
The stratigraphy of each sample was inserted into the database in two complementary ways. The first way is to use the definitions of the international chronostratigraphic chart of the IUGS v2016/04 (Cohen et al., 2013, updated) according to international standardisation. These chronostratigraphic units are also compiled in a complementary database (Bär et al., 2019a: P 3 -stratigraphy, https://doi.org/10.5880/GFZ.4.8.2019.P3.s) to ensure that formations of a certain age are connected to the corresponding stratigraphic epoch, period or erathem. Thus, the chronostratigraphic units are directly correlated to each other by their stratigraphic ID and stratigraphic parent ID, allowing for statistical analysis of the properties of certain stratigraphic units (Table 2). In contrast, a more detailed description of the local stratigraphic unit can also be documented if provided in the primary reference.

Petrophysical properties
The properties included in P 3 can be grouped into classical petrophysical properties, thermophysical properties, mechanical properties, and electrical and magnetic properties (Fig. 2). Overall, 28 different rock properties have been included so far and documented in separate sub-tables of the database following a similar internal structure. Based on the original reference, the measurement is given as a value, which if available is complemented by a standard deviation, a minimum and maximum value, and the number of measurements. Thus, it is possible to include either single measurements or mean values while still offering the opportunity of statistical evaluation by incorporating the number of measurements corresponding to a mean value. Furthermore, the measurement method for each property value is presented by means of a common nomenclature documented in the supplementary report (Bär et al., 2019b: P 3 -data description, https://doi.org/10.5880/GFZ.4.8.2019.P3). This is important for statistical analysis and comparability of the results of different methods. Particularly, the type of method might have a large impact on the quality and device-specific error of any measurement. Finally, specific remarks can be made for each value separately.

Quality control
In addition to the primary option of manual database quality control, which is by providing the information of the original data source, an automatic process of quality control was implemented in P 3 . Therefore minimum requirements for a value to be included in the database were defined as already described in Sect. 2.
To provide a quality estimate for each data entry in terms of provided meta-information, a set of key criteria is automatically analysed: (i) uncertainty of the geographic location, (ii) the rank of petrographic classification, (iii) the rank of stratigraphic classification, (iv) the completeness of information on measurement conditions, and (v) the statistical

Geographic uncertainty
Concerning the location of the sample, a geodetic accuracy of less than 100 m is considered to be excellent quality, which should always be the case for outcrop samples or drill cores. If the information on the location only contains a description of a geological unit in a certain region or area, the related size of this area is considered for the definition of the quality indices. If the location can be constrained to a region with a radius of less than 1 km, the quality is considered average, whereas if the radius of uncertainty is between 1 and 100 km, it is considered poor. A larger radius of uncertainty is considered as quality class 4.

Petrography or rock type
If the original petrographic or lithological description allows for the allocation of a petrographic term with a rank of 6 or higher, the quality is considered excellent; for a rank of 5 it is considered average because these petrographic terms usually allow for a distinction of petrographies as used for reservoiror site-scale geological models. For a rank of ≤ 4 the quality is considered poor (compare Fig. 3 and Table 1). To enter the database at all, the petrographic description of a sample has to allow for an allocation of a petrographic term of rank ≥ 2. This classification at least allows for a distinction of petrographies on a level used for continental-scale geological models.

Stratigraphy
Concerning the stratigraphy of the sample, (i) information on the chronostratigraphic stage or age is considered to be excellent; (ii) information on the stratigraphic series or epoch is defined as average; and (iii) if only the chronostratigraphic system or period is given, it is considered poor. To enter the database, there is no minimum requirement for the information on the stratigraphic age, since (i) stratigraphy does not directly control physical properties and (ii) scientific users might retrospectively derive stratigraphic information from the sampling location in combination with the petrography of the sample and additional information such as geological maps.

Measurement conditions
For every data point, the measurement conditions can be entered. These are the temperature (K), pressure (Pa), saturating fluid, and the degree of saturation (%), as well as for the mechanical properties additional information about the ambient stress field, σ 1 , σ 2 , σ 3 (MPa) and the pore pressure of the sample (MPa). For the sonic velocities (v p and v s ) the frequency of the sonic pulse and, for the uniaxial compressive strength and related mechanical properties, the strain rate can be given as additional measurement conditions. The quality assessment of the measurement conditions is based on both the measurement conditions and the measurement device, which is needed to be able to quantify the specific measurement error typical for a certain method. Excellent quality is only provided if information is available on all these points. If only the measurement device and the temperature and pressure conditions or the degree of saturation are available, the data quality is defined as average. If only the device, the temperature and pressure conditions, or the degree of saturation is described in the original reference, the quality is considered to be poor.

Measurement parameter
The last criterion for the quality control is the type of value representing the property. In general, single measurement values for a sample are ranked higher in quality than mean values of various measurements applied to a sample. Accordingly, single measurements are considered to be excellent and mean values are considered to be average or poor. If the mean value is not only accompanied by the number of measurements to calculate the mean value, but also by the minimum and maximum as well as the standard deviation from this set of measurements, the quality is defined as average. In contrast, a mean value accompanied only by a number of measurements is defined as poor. Values resulting from an unspecified number of measurements are not considered for quality control but still included into the database with NA (not available) in the respective column for a number of measurements to enable the user to exclude these values in statistical analyses.

Status of the database, data availability and quality
Up to now, data that entered the database are either from published data collections, scientific papers, student's theses and scientific projects, or technical reports (316 references altogether; see Appendix A). So far, 75 573 data points from all over the world (Figs. 4 and 5) have been collected. The data are not reasonably good around the globe but rather show a strong dominance of samples sourced from central Europe and the United States. This reflects the original purpose of the IMAGE project as well as public availability of existing The number of data entries for different petrographies shows that all main consolidated rock types are well represented. With 38 219 property measurements from sedimentary rocks, 25 261 from magmatic rocks, 9235 from metamorphic rocks and 1308 from unconsolidated rocks, petrographies usually considered as reservoir rocks are dominant, making up more than 75 % of the data.
Since P 3 was collected to serve the goals of the IMAGE project and will always represent work in progress, its data entries are unevenly distributed among the different properties (Table 4) as well as regions. In its current version, the entries for some properties derive from only a few sources. For example, radiogenic heat production values contained in the database have mainly been derived from the compilation of Vilà et al. (2010). This compilation, which is based on many secondary references, includes more than 2100 representative U, Th and K concentrations from all over the world (originally published in 102 studies). Based on this chemical composition database, Vilà et al. (2010) calculated values of radiogenic heat production for a large variety of rock types. Of the original compilation (of Vilà et al., 2010), we have incorporated into the database only those values that were associated with sufficient metadata and based on actual lab investigations and not on spectral gamma ray and density data of borehole geophysical logs. Newer compilations on radiogenic heat production (e.g. Hasterok and Webb, 2017;Hasterok et al., 2018) have not yet been included.
Concerning the data quality, the quality indices both for the bulk index and for the five indices defined in Table 3 show a wide dispersion over all quality classes. The quality indices for the petrography, the geographic uncertainty and the measurement parameter show mainly quality values of 1 to 3 representing a good quality of input data documentation on average. Only the quality indices for measurement conditions and for the stratigraphy where quality index values of 3 and 4 are dominant show that the documentation of this metadata is not satisfactory for a large share of the compiled data.

Discussion
The current status of the database already shows a lot of benefits that such a compilation has automatically brought along but also some limitations, which have to be addressed in future amendments. The defined minimum requirements for a datum to be integrated into P 3 guarantee its usability in terms of statistical, spatial, petrographic and stratigraphic analyses. Since it also contains multiple properties measured on a single sample, direct correlations with other data and properties are facilitated. This may help in identifying new relationships (formal, causal or statistical correlations) and contribute to a better understanding of the limitations of generalisation or possibilities for upscaling approaches. The automatic quality assessment allows for a quick evaluation of a single datum within a group of selected entries. The possibility of correlating data also simplifies and accelerates the identification of key references for rock parameters in specific regions, for specific rock types, or stratigraphic units. Furthermore, the database allows us to systematically analyse the dependency of property values on the corresponding measurement conditions. Thus, the most important added value of P 3 compared to existent databases is its dimension (large number of entries corresponding to a large number of petrophysical properties) as well as the documented meta-information.
Despite all benefits, such a database can never be complete and is always prone to uncertainties. To identify errors in original publications (in terms of property values and meta-information, e.g. sample preparation, accuracy of measurements, sampling bias, lab worker bias, measurement methods, reference standards and many more) is beyond the scope of this compilation. In addition, data-input errors, errors concerning the interpretation, or the petrographic and stratigraphic classification cannot be excluded. We assume that the quality check of the original publications and the data therein have already been done by skilled reviewers or editors of the corresponding scientific journals and theses. In addition to that, the quality indices developed as part of P 3 allow the user to quickly evaluate the quality of each data point and thus help with the decision of whether the original reference should be reassessed or not.
Additionally, P 3 includes values generated with different established or newly developed measurement methods, delivering data of different quality and uncertainty. Hence, data comparability is not necessarily granted, and a statistic assessment can only be representative if these effects are considered. Due to the documentation of the original source, however, the related detailed information of a chosen sample set can be verified if necessary. For subsequent applications, such as modelling, the spatial distribution of the data has to be considered as well as the origin of the samples. Due to diverse effects (such as temperature, pressure, weathering and diagenetic history), properties measured from outcrop analogue samples might differ considerably in quality from those of the same formation at in situ conditions within a deep reservoir formation. It is up to the experienced user

Figure 6.
Overview of the quality indices (qi) distribution of the P 3 input data quality assessment. For the definition of the quality indices see Table 3. to evaluate if the tabulated datum is applicable and if sufficient meta-information is given. In case of doubt, the users are referred to the original publications.

Data availability
The excel version of the P 3 database (Bär et al., 2019b: P 3  The stratigraphic classification table (Stratigraphy), which is also included in P 3 (Bär et al., 2019a: Stratigraphic classification

Conclusions and perspectives
We developed the P 3 database of petrophysical rock properties measured on rock samples in various laboratories. P 3 is designed to be as transparent and useful for various purposes as possible through the integration of multiple sources of meta-information (including the original reference) for each data point. The database already comprises a great variety of properties, petrographies, stratigraphies, etc. from samples investigated all over the world. In this first release, 75 573 data points from 316 publications were included. The current compilation of samples mainly reflects the project goals of the geothermal project IMAGE (van Wees et al., 2015), while the applicability of P 3 certainly can be seen in various geoscientific fields focusing on subsurface utilisation (e.g. oil and gas, carbon capture and storage (CCS), hydrogeology, and subsurface storage of radioactive waste). The collected data will help researchers and users particularly in the early stages of new geothermal or any other projects to make a first assessment of the subsurface geothermal rock properties. This will help planning future exploration needs and, in areas where the existing data density is sufficient, even support direct modelling or exploitation projects. Additionally, the database will help improve local and regional geoscientific studies with a different focus on the utilisation of the subsurface. A first release of this database (Bär et al., 2019b: P 3 -database, https://doi.org/10.5880/GFZ.4.8.2019.P3) including a report and a reference list of all included publications is available as supplementary data to this publication.
Compiling the data from various sources, however, has shown that the general documentation of measured petro-physical properties is very heterogeneous, and often the minimum requirements defined for our P 3 were not fulfilled. We therefore appeal to the reviewers and editors of scientific journals to ensure that any publication containing original measurements of petrophysical properties should come along with all the helpful and necessary meta-information as described here. Only if these requirements are fulfilled, a published dataset is of added value for the scientific community and can be used for subsequent investigations or applications.
Since a database like P 3 can never be complete, a further extension based on not-yet-considered publications, newly published data or own measurements is foreseen. Furthermore, we both hope to collaborate with existing compilation authors in the future to combine the collations into one more useful systems but also support the use of this version of the P 3 database for other database initiatives as a supplement of their own records. We plan to develop a publicly accessible web-based interface to facilitate the ability of external users to perform specific queries on petrophysical properties. In addition, such queries shall be feasible based on a web-based geographic information system, which may be connected to additional information such as worldwide geological maps (e.g. OneGeology, http://www.onegeology.org, last access: 14 August 2020). With this system, external users shall be given the opportunity to contribute to the database and thereby simplifying the access to measurements, which may improve their visibility considerably. Thus, the database will be continuously updated and at certain stages newly released by the editors. For this purposes the database will be implemented using a relational database management system (RDBMS) following the third normal form (3NF) according to Codd (1970) and Maier (1983) to reduce the volume of stored data by elimination of multiple storage of the same information for a sampling location, which will strongly increase its flexibility, durability and applicability especially for the SQL-experienced user. This will facilitate the linking of P 3 to similar databases in the future.

Appendix A: List of references for Figs. 4 and 5
Appendix B Figure B1. Relational structure of the PetroPhysical Property Database P 3 as an entity-relation diagram (ERD). Sub-entities are linked through the sample ID, the petrographic ID or stratigraphic ID.
Author contributions. KB, TR and JB defined the scope of this work; designed and structured the database; and defined the criteria for the quality control. Furthermore, they collected and reviewed most of the referenced literature and supervised the team, who reviewed and collected the literature, transferred the data from literature or own sources into the database, and restructured the database where needed. KB performed the majority of the quality control. KB prepared the manuscript and managed the manuscript during the review process with significant contributions from TR and JB.
Competing interests. The authors declare that they have no conflict of interest.