Interactive comment on “ Retrospect and prospect of a section-based stratigraphic and palaeontological database – Geobiodiversity Database

GENERAL COMMENTS This article has the potential to provide a useful and welcome introduction to the history, structure, content, functionality and analytical tools, and proposed future of the Geobiodiversity Database, a database with huge research potential that remains much less well known and used within the international geoscience community than the US-based Paleobiology Database. The current version of the manuscript does not, however, fully achieve its goals. It is lacking in details in some parts, particularly those dealing with historical aspects, and at times difficult to follow and somewhat repetitive. In this review I first provide some major comments on

The GBDB became the formal database of the International Commission on Stratigraphy in August 2012 at the 34 th 60 International Geological Congress in Brisbane, Australia, and, as a result, a new major goal of the GBDB is to integrate stratigraphic standards (e.g. the GSSPs) with comprehensive and authoritative web-based stratigraphic information service for global geoscientists, educators and the public.
Since 2011, data related to early Paleozoic, especially Ordovician and Silurian periods, stratigraphic and palaeontological records had been quantitatively analyzed and a serial of scientific findings were achieved. The related research themes include 65 the Ordovician and Silurian palaeogeography and tectonic evolution of South China (Chen et al., 2012;2014b;2017a), the spatio-temporal pattern of the Ordovician and Silurian marine organisms from China 2017b;Zhang et al., 2014a;2016), and the Paleozoic paleogeography evolution of South China (Chen et al., 2018;Zhang et al., 2014b;Hou et al., 2020). Recently, nearly all data of Paleozoic marine organisms of GBDB were used to analyze the biodiversity evolution . Though all data were from China, the Paleozoic geological sections of China actually covered several 70 palaeocontinents and might reflect the global biodiversity change.
In 2017, the GBDB became a data partner of the British Geological Survey (BGS) and started to digitalize the fossil and stratum data and establish the datasets for the BGS. This is a time-taking job and still is carrying by the GBDB data entry team, for the BGS amassed and housed about 3 million fossils gathered over more than 150 years at thousands of sites across the British Islands. 75 At the end of 2018, the head of the GBDB, Dr. Fan J.X., left the NIGP, CAS and Dr. Xu H.H. took over the GBDB. Besides the data collecting, processing and visualization as the GBDB group did during 2007-2018, data of fossil terrestrial organisms, such as insects and plants, were input into the GBDB, the database and the website were optimized and re-designed, and furthermore, the new GBDB working team pays more attention to the data analyzing, a professional artificial intelligence working group is joining for data analyzing. The GBDB is ushering a new start. 80

The data of the Geobiodiversity Database
The Geobiodiversity database (GBDB) was designed as a stratigraphic and palaeontological database and its input format was designed as geological section-based, which means that data entry clerks or any scientific users must input the metadata for the GBDB according to the geological sections or virtual sections. Every metadata record contains all geological information of a geological section, including its basic unit (or bed or layer), sediment color, lithology, thickness, horizon, 85 locality, palaeo-block, geological age, bio-stratigraphy, geochemistry, palaeo-ecology, radio isotopic age, fossil collection and any available original information of the rock specimens or fossil sample during the fieldwork. An individual geological section normally can be subdivided into dozens of basic units when it is input the GBDB. Such geological section records with much information can be readily found from stratigraphic and palaeontological literature. However, many paleontological https://doi.org/10.5194/essd-2020-164 Preprint. Discussion started: 17 July 2020 c Author(s) 2020. CC BY 4.0 License. descriptions or reports are lacking detailed stratigraphic description, the GBDB includes these records as virtual sections, 90 which has only a very portion, for example, of a single bed or collection of the whole section. The borehole core record is also input into the GBDB as a virtual section (Figure 1).
The stratigraphic data in GBDB were based on those published in Chinese literature since the 1920s. By November 2019, all stratigraphic horizons and nearly all published geological sections can be browsed in the GBDB (Figures 2, 3). It is noteworthy that in the GBDB there isn't direct record of fossil occurrences, which, however, are related to the stratigraphic 95 records of the database and included in the palaeontological data of the GBDB. The palaeontological data are from the fossil collection of individual geological sections and borehole cores, including taxonomy (species, genus, family, order, class and division), major group, synonym (opinion data) and description (key features) ( Figure 1). Though the GBDB is geological section-based, from which fossil occurrences can be output, it is compatible with fossil occurrence-based database. Most fossil collections and occurrences of all sections from China were included in the GBDB ( Figure 3). Subsequent authors in further 100 study amended a portion of fossil taxa from these sections. In this way, there are also a plenty of opinion data in the GBDB.
Since 2017, the GBDB started to record the data of Global Boundary Stratotype Section and Point (GSSP) of the International Commission on Stratigraphy, including the detail information of GSSP and some panorama and threedimensional scanning of individual GSSP.
Since August 2017, British Geological Survey (BGS) and GBDB started to collaborate in stratigraphic and palaeontological 105 data processing. GBDB data working team help to digitalize the geological reports from the BGS archive and to build isolated datasets for it. This part of work is still ongoing and much work of data sorting, cleaning and checking are still awaiting.
Since 2019, the GBDB starts to include the borehole core data of petroleum companies, such as China National Offshore Oil Corporation and China National Petroleum Corporation.
In brief, as much as possible stratigraphic and palaeontological records are collected from the original geological 110 publications. Since the establishment, the GBDB data team consciously collected and included stratigraphic and palaeontological data from Chinese literature. The detailed statistic outcome is given here for the first time (Table 1) (see Xu, 2020).

Newly-added data in the GBDB
For a long time, the biodiversity evolution study was based on marine organism fossil records. For example, the earliest 115 quantitative analysis of the geological time biodiversity that draws the conclusion of the five mass-extinction (Raup and Sepkoski, 1982) and a serial of related geological biodiversity studies were based on marine organism fossil family or genus records (Jablonski, 1994;Rong et al., 2006;Alroy et al., 2008).
The quantitative study based on terrestrial organism fossil records is relatively less in spite of there has been a number of palaeontological studies of terrestrial organisms. There used to be quantitative studies on the plant diversity of the Silurian and 120 https://doi.org/10.5194/essd-2020-164 Preprint. Discussion started: 17 July 2020 c Author(s) 2020. CC BY 4.0 License.
Devonian periods that was significant for the early plant evolution and diversification (Xiong et al., 2013) and the study on plant diversity change during the Permian-Triassic boundary (Xiong and Wang, 2007) that is the very time of the greatest mass extinction of the geological history wiping off over 95% marine organisms (Jablonski, 1994). Both plant diversity studies used fossil record data from South China and listed the data as the supplementary material of the published papers. It took the authors of the two studies a few years to complete the data collection, even the data were only from South China palaeo-block. 125 An inconvenient fact is that the database of fossil terrestrial organisms is not as good as that of marine ones. Based on this, it took the GBDB over a year to complete the related database and to collect related data consciously. GBDB now has exclusive databases for the fossil terrestrial organisms. By far, the fossil plant record dataset has collected 738 Devonian plant species occurrences from global localities and thousands of Mesozoic plant species occurrences from China. These data will be included into the GBDB after the work of data formatting and cleaning. 130

Fossil insect records in the GBDB
The insect fossil records in the GBDB were little because few accurate insect fossil occurrences or collections were recorded with geological section descriptions. Additionally, plenty of insect fossils were found from ambers instead of lithological horizons. As a result, a number of fossil insect studies were carried directly without detailed stratigraphic descriptions. Fossil insect occurrences and collections are not closely related to their lithological horizons. The insect fossil records in the GBDB 135 greatly increase after taking over the international fossil insect database of the International Palaeoentomological Society, EDNA (https://fossilinsectdatabase.co.uk/), which holds details of the holotypes of all fossil insects in the world.
The EDNA was named after Edna Clifford who started the recording of new species on a card index system and was designed as an update of Handlirsch's 1906-1908 "Die Fossilen insekten und die phylogenie der rezenten formen" which listed all the then known fossil insect species. Handlirsch recorded 5 160 species in 1906. The database is detailed in its contents: it 140 records taxonomic information, synonym details, references for every species (including the page number where it is introduced), and for holotypes site details, stratigraphic information, and geological details are recorded. All the data has been obtained from exhaustive literature searches.
The EDNA aims to be a complete, fully interactive, list of all the species of insects named from the fossil record, with the site, geological age, and reference for each holotype. Updating and checking will be ongoing, and the data available will be 145 greatly improved if details of omissions and errors are sent to the administrator for incorporation. The data comes from an exhaustive literature search and in the 2019 edition contains 28 439 species names (including synonyms) extracted from 5218 references ( Figure 3d). The data is held in 38 fields, all of which are searchable, independently or in combination, and the output can contain any one or more as required.
Fields include: generic and specific names, citation, subfamily, family, superfamily, division, suborder and order: Author, 150 title, journal, and date of publication, and page on which the species is first described: Time data including stage, epoch, subperiod, period and era and age (range) in millions of years: Bed, member, formation, and group: Site name, nearest feature (town, river etc.) county, state, country and continent (Figure 4). For all taxonomic ranks, citations can be included and both junior and senior synonyms displayed. Natural History Museum London Library call numbers are also included.

Database comparisons and discussions 155
The section-based Geobiodiversity database is different from the fossil occurrence-based Paleobiology Database (PBDB), which was founded in 1998 and became the largest paleobiological database. Data includes fossil taxa, collection, opinions (paleobiological views from different authors) and even related publications. The data volume of the PBDB is larger than the GBDB ( Table 1). The noticeable difference lies in that the PBDB has little information about geological sections. Whilst the GBDB is known as its large number of geological sections. or internal reports. This explains that GBDB has more references than the PBDB (Table 1).
As we mentioned, the GBDB is geological section-based; every record was subdivided into detailed parts when being input 165 in the database. The fossil occurrence and collection data can also be exported from the GBDB, just as those in the PBDB.
Nevertheless, the fossil taxon number recorded in GBDB is about 30% of that in PBDB, whilst the fossil occurrence records in GBDB is about 40% of that in PBDB (Table 1). This is because the two databases have different histories, the PBDB was founded in 1998, the GBDB, in 2007. The second reason for the difference in data quantity of the two databases is that for a long time the GBDB had focused on stratigraphic records instead of only fossils, and the palaeontological information had 170 been input as complementary items of individual stratigraphic data (Figure 1).
The stratigraphic study and recording in the GBDB are reminiscent of Macrostrat (https://macrostrat.org/), which is a platform for the aggregation and distribution of geological data relevant to the spatial and temporal distribution of sedimentary, igneous, and metamorphic rocks as well as data extracted from them. Macrostrat aims to become a community resource for the addition, editing, and distribution of new stratigraphic, lithologic, environmental, and economic data. By November 2019, https://doi.org/10.5194/essd-2020-164 Preprint. Discussion started: 17 July 2020 c Author(s) 2020. CC BY 4.0 License.

Problems, improvements and prospects
The website of GBDB was started online in 2007, few updating was given since that time. According to the feedback received from the users of the GBDB, the existing problems of the website are listed as followings.
1) The website is developed using Net Framework 2.0, which is out of date and results in that the interface and layout of the website are not readily to update, and furthermore, the website would lose pages or has no response when querying. 185 2) The data volume of the individual datasets is neither visible nor searchable.
3) The data query is not friendly, only the geographic and horizon terms can be used as keywords for searching. 4) Data is not readily accessible. Only the registered users have access to the data, but the new registration requires the activation of the web administrator. 5) Data download is not convenience or friendly. The downloading process includes several steps of selecting the data to 190 the extra dataset and exporting the data from the dataset.
6) The data format is not well compatible with that of other databases.
Additionally, the backup mechanism is not using in the GBDB and its data is potentially hazardous. Updating and improvement to the GBDB and its website are necessary to make the data widely used.
We comprehensively updated the server and the website of the GBDB, making the database a safe data bank and the website 195 a new and friendly portal. The new website has the optimized input and output of data, the search engine, and the data examination system.
After the first step of inputting, the raw data will be checked by registered authorizers, such action aims to make sure that the data conform to the publication but not to the authorizer's point of view. Only the checked records go into the database.
No matter the identity of the enterer, the procession of the data input is in such way. 200 In the previous version of the GBDB website, registered users who would like to download data need to search and select certain lithologic units and build a temporary dataset. Only the data in this dataset can be downloaded unfriendly. The new website simplifies this process, no temporary dataset is required. Any user can search and download interesting data directly.
Additionally, the types of output data are compatible with occurrence-based data. The user of GBDB can obtain both sectionbased stratigraphic and fossil occurrence data. 205 Data export format includes the regular spreadsheet, such as Excel, CSV, and computer-readable JSON files. An exclusive spreadsheet form is designed for the geological sections in the GBDB. Its structure is better matching the geological column and can be output into the graph readily.
The updates and new features of the GBDB 2.0β also include: 1) Data visualization is developed. All data are plotted on the world map of the homepage that also displays the volume of 210 the all data in the right up corner. The view center is the map of China and the map can be zoomed in or out using mouse scroll.