The Reading Palaeofire database: an expanded global resource to document changes in fire regimes from sedimentary charcoal records

Sedimentary charcoal records are widely used to reconstruct regional changes in fire regimes 2 through time in the geological past. Existing global compilations are not geographically 3 comprehensive and do not provide consistent metadata for all sites. Furthermore, the age 4 models provided for these records are not harmonised and many are based on older calibrations 5 of the radiocarbon ages. These issues limit the use of existing compilations for research into 6 past fire regimes. Here, we present an expanded database of charcoal records, accompanied by 7 new age models based on recalibration of radiocarbon ages using INTCAL2020 and Bayesian 8 age-modelling software. We document the structure and contents of the database, the 9 construction of the age models, and the quality control measures applied. We also record the 10 expansion of geographical coverage relative to previous charcoal compilations and the 11 expansion of metadata that can be used to inform analyses. This first version of the Reading 12 Palaeofire Database contains 1676 records (entities) from 1480 sites worldwide. The database 13 is available from https://doi.org/10.17864/1947.000345.


Introduction 15
Wildfires have major impacts on terrestrial ecosystems (Bond et al., 2005;Bowman et al., 16 2016;He et al., 2019;Lasslop et al., 2020), the global carbon cycle (Li et al., 2014;Arora and 17 Melton, 2018;Pellegrini et al., 2018;Lasslop et al., 2019), atmospheric chemistry (van der 18 Werf et al., 2010;Voulgarakis and Field, 2015;Sokolik et al., 2019) and climate (Randerson 19 et al., 2006;Li et al., 2017;Harrison et al., 2018;Liu et al., 2019). Although the climatic, 20 vegetation and anthropogenic controls on wildfires are relatively well understood (e.g. 21 Harrison et al., 2010;Bistinas et al., 2014;Knorr et al., 2016;Forkel et al., 2017;Li et al., 22 2019), recent years have seen wildfires occurring in regions where they were historically rare 23 (e.g. northern Alaska, Greenland, northern Scandinavia: Evangeliou et al., 2019;Hayasaka, 24 2021) and an increase in fire frequency and severity in more fire-prone regions (e.g. California, 25 the circum-Mediterranean, eastern Australia; e.g. Abatzoglou and Williams, 2016;Dutta et al., 26 2016;Williams et al., 2019: Nolan et al., 2020. It is useful to look at the pre-industrial era 27 (conventionally defined as pre 1850 CE) to understand whether these events are atypical. The 28 pre-industrial past also provides an opportunity to characterise fire regimes before 29 anthropogenic influences, both in terms of ignitions and fire suppression, became important. 30 Ice-core records provide a global picture of changes in wildfire in the geologic past (Rubino et 31 al., 2016). However, wildfires exhibit considerable local to regional variability because of the 32 spatial heterogeneity of the various factors controlling their occurrence and intensity (Bistinas 33 et al., 2014;Andela et al., 2019;Forkel et al., 2019). Thus, it is useful to use information that 34 can provide a picture of regional changes through time. Charcoal, preserved in lake, peat or 35 marine sediments, can provide a picture of such changes (Clark and Patterson, 1997;Conedera 36 et al., 2009). The wildfire regime can be characterised from sedimentary charcoal records 37 through total charcoal abundance per unit of sediment, which can be considered as a measure 38 of the total biomass burned (e.g. Marlon et al., 2006) or by the presence of peaks in charcoal 39 accumulation which, in records with sufficiently high temporal resolution, can indicate 40 individual episodes of fire (e.g. Power et al., 2006). 41 The Global Palaeofire Working Group (GPWG) was established in 2006 to coordinate the 42 compilation and analysis of charcoal data globally, through the construction of the Global 43 Charcoal Database (GCD: Power et al., 2008). The GPWG was initiated by the International 44 Geosphere-Biosphere Programme (IGBP) Fast-Track Initiative on Fire and subsequently 45 recognised as a working group of the Past Global Changes (PAGES) Project in 2008. There 46 have now been several iterations of the GCD (Power et al., 2008;Power et al., 2010;Daniau et al., 2012;Blarquez et al., 2014;Marlon et al., 2016), which since 2020 has been managed 48 by the International Palaeofire Network as the Global Palaeofire Database (GPD; 49 https://paleofire.org). The GCD has been used to examine changes in fire regimes over the past 50 two millennia (Marlon et al., 2008), during the current interglacial (Marlon et al., 2013), on 51 glacial-interglacial timescales (Power et al., 2008;Daniau et al., 2012;Williams et al., 2015) 52 and in response to rapid climate changes (Marlon et al., 2009;Daniau et al., 2010), as well as 53 to examine regional fire histories (e.g. Mooney et al., 2011;Vannière et al., 2011;Marlon et 54 al., 2012;Power et al., 2013;Feurdean et al., 2020). However, there are a number of limitations 55 to the use of the GCD for analyses of palaeofire regimes. Firstly, the database does not include 56 many recently published records and needs to be updated. Secondly, there are inconsistencies 57 among the various versions of the database including duplicated and/or missing sites, 58 differences in the metadata included for each site or record, and missing metadata and dating 59 information for some sites or records. Perhaps most crucially, the age models included in the 60 database were made at different times, using different radiocarbon calibration curves, and using 61 different age-modelling methods. The disparities between the archived age models preclude a 62 detailed comparison of changes in wildfire regimes across regions. 63 Here, we present an expanded database of charcoal records (the Reading Palaeofire Database, 64 RPD), accompanied by new age models based on recalibration of radiocarbon ages using 65 INTCAL2020 (Reimer et al., 2020) and using a consistent Bayesian approach (BACON: 66 Blaauw. et al., 2021) to age-model construction. However, we have retained the original age 67 models for all the sites for comparison and to allow the user to choose a preferred age model. 68 The RPD is designed to facilitate regional analyses of fire history; it is not designed as a 69 permanent repository. We document the structure and contents of the database, the construction 70 of the new age models, the expanded metadata available, and the quality control measures 71 applied to check the data entry. We also document the expansion of the geographic and 72 temporal coverage, and in the availability of metadata, relative to previous GCD compilations. 73

Compilation of data 75
The database contains sedimentary charcoal records, metadata to facilitate the interpretation of 76 these records, and information on the dates used to construct the original age model for each 77 record. Some records were obtained from the GCD. There are multiple versions of the GCD 78 which differ in terms of the sites and the types of metadata included. We compared the GCDv3 79 (Marlon et al., 2016), GCDv4 (Blarquez, 2018) and GCD webpage versions 80 (http://paleofire.org) and extracted a single unique version of each site and entity across the 81 three versions. Where sites or entities were duplicated in different versions of the GCD, we 82 used the latest version. Missing metadata and dating information for these records were 83 obtained from the literature or from the original data providers. Some sites in the GCD were 84 represented by both concentration data and the same data expressed as influx (i.e. concentration 85 per year) from the same samples; because influx calculations are time dependent, we have only 86 retained concentration data for such sites to allow for future improvements to age models. 87 Influx can be easily computed using data available in the RPD. We also removed duplicates 88 where the GCD contained both raw data and concentration data from the same entity. We 89 extracted published charcoal records that do not appear in any version of the GCD from public 90 repositories, specifically PANGAEA (https://www.pangaea.de/), NOAA National Centre for 91 Environmental Information (https://www.ncdc.noaa.gov/data-access/paleoclimatology-data), 92 the Neotoma Paleoecology Database (https://www.neotomadb.org/), the European Pollen 93 Database (http://www.europeanpollendatabase.net/index.php) and the Arctic Data Centre 94 (https://arcticdata.io/catalog/); if these records were also in the GCD we replaced the GCD 95 version. Additional charcoal data, dating information and metadata were provided directly by 96 the authors. All the records in the current version of the database are listed in the Supplementary 97 Information (SI Table 1). 98

Structure of the database 99
The data are stored in a relational database (MySQL), which consists of 10 linked tables, 100 specifically "site", "entity", "sample", "date info", "unit", "entity link publication", 101 "publication", "chronology", "age model", and "model name". Figure 1 shows the relationships 102 between these tables. A description of the structure and content of each of the tables is given 103 below, and more detailed information about individual fields is given in the Supplementary 104 Material (SI Table 2 which is the primary key of another table and ensures that there is a link between these tables. 112

Site metadata (table name: site) 113
A site is defined as the hydrological basin from which charcoal records have been obtained 114 (Table 1). There may be several charcoal records from the same site, for example where 115 charcoal records have been obtained on central and marginal cores from the same lake or where 116 there is a lake core and additional cores from peatlands and/or terrestrial deposits (e.g. small 117 hollows, soils) within the same hydrological basin. A site may therefore be linked to several 118 charcoal records, where each record is treated as a separate entity. The site table contains basic 119 metadata about the basin, including site ID, site name, latitude, longitude, elevation, site type, 120 and maximum water depth. The site names are expressed without diacritics to facilitate 121 database querying and subsequent analyses in programming languages that do not handle these 122 characters. Latitude and longitude are given in decimal degrees, truncated to six decimal places 123 since this gives an accuracy of <1m at the equator. Broad categories of site type are 124 differentiated (e.g. terrestrial, lacustrine, marine), with subdivisions according to geomorphic 125 origin (e.g. lakes are recorded according to whether they are e.g. fluvial, glacial or volcanic in 126 origin). In addition to coastal salt marshes and estuaries, we include a generic coastal category 127 for all types of sites that lie within the coastal zone and the hydrology may therefore have been 128 affected by changes in sea level. Wherever possible, the size of the basin and the catchment 129 are recorded (in km 2 ) but if accurate quantified information is not available the basin and 130 catchment size are recorded by size classes. The site table also contains information on whether 131 the lake or peatland is hydrologically closed or has inflows and outflows, which can affect the 132 source, quantity and preservation of charcoal in the sediments. A complete listing of the sites 133 and entities in the RPD is given in Table S1. A list of the valid choices for fields that are 134 selected from a pre-defined list (e.g. site type) is given in Table S2. 135

Entity metadata (table name: entity) 137
This table provides metadata for each individual entity (Table 2). In addition to distinguishing 138 multiple cores from the same basin as separate entities, we also distinguish different size 139 classes of charcoal from the same core when these data are available. Different charcoal size 140 classes from the same core are also treated as separate entities in the database. However, we 141 have removed duplicates where the same record was expressed in different ways (e.g. as both 142 raw counts and concentration, or as concentration and influx) to avoid confusion and mistakes 143 when subsequently processing these data. The RPD contains raw data wherever possible, 144 concentration data when the raw data is not available, and only includes influx data if neither 145 are available. When specific cores were given distinctive names in the original publication or 146 by the original author, we include this information in the entity name for ease of cross-147 referencing. The entity metadata include information that can be used to interpret the charcoal 148 records, including depositional context, core location, measurement method, and measurement 149 unit. There is no standard measurement unit for charcoal, and in fact, there are >100 different 150 units employed in the database. For convenience, there is a link table to the measurement units 151 (table name: unit). In addition, the entity table provides the source from which the charcoal 152 data were obtained, including whether these data are from a version of the GCD, a data 153 repository or were provided by the original author, and an indication of when the record was 154 last updated. A list of the valid choices for fields that are selected from a pre-defined list (e.g. 155 depositional context) is given in Table S2. A list of the charcoal measurement units currently 156 in use in the RPD is given in Table S3. 157

Sample metadata and data (table name: sample) 160
The sample table provides information on the average depth in the core or profile and the 161 thickness of the sample on which charcoal was measured. The thickness measurements relate 162 to the total thickness of the charcoal sample and provide an indication of whether the sampling 163 was contiguous downcore. The sample table also provides information on the sample volume 164 and the quantity of charcoal present. The charcoal measurement units have been standardised 165 by converting units expressed as multiples (e.g. fragments x100) back to the whole numbers 166 and by converting units expressed in mg or kg to g. As a result, the values in the RPD may 167 apparently differ from published values. 168

Dating information (table name: date info) 171
This table provides information about the dates available for each entity that can be used to 172 construct an age model. We include information about the age of the core top for records that 173 were known to be actively accumulating sediment at the time of collection. In addition to 174 radiometric dates, we include information about the presence of tephras (either dated at the site 175 or independently dated elsewhere) and stratigraphic events that can be used to establish 176 correlative ages (e.g. changes in the pollen assemblage that are dated in other cores from the 177 region, or evidence of known fires in the catchment). Wherever possible the name of a tephra 178 is given, to facilitate the use of subsequent and more accurate estimates of its age. Similarly, 179 the basis for correlative dates is given, again to facilitate the use of updated estimates of the 180 age of the event. Radiocarbon ages are given in radiocarbon years, but all other ages are given 181 in calendar years BP using 1950 CE as the reference zero date. Error estimates are given for 182 radiometric ages and wherever possible for calendar ages. We provide an indication of whether 183 a specific date was used in the original age model for the entity, and an explanation for why 184 specific dates were rejected, since this can be a guide as to whether the dates should be 185 incorporated in the construction of new age models. A list of the valid choices for fields that 186 are selected from a pre-defined list (e.g. material dated) is given in Table S2. 187 188

Publication information (table name: publication) 190
This

New age model information (table name: age_model) 201
This table contains information about the age models that have been constructed for this version 202 of the database using the INTCAL2020 calibration curve (Reimer et al., 2020)

Construction of new age models 208
The original age models for the charcoal records were made at different times, using different 209 radiocarbon calibration curves, and using different age-modelling methods. We standardised 210 the age modelling, using RBacon (Blaauw and Christen, 2011;Blaauw et al., 2021) to construct 211 new Bayesian age-depth models in the ageR package (Villegas-Diaz et al., 2021). The ageR 212 package provides functions that facilitate the supervised creation of multiple age models for 213 many cores and different data sources, including databases, comma and tab separated files. The 214 INTCAL20 Northern Hemisphere calibration curve (Reimer et al., 2020) and the SHCAL20 215 Southern Hemisphere calibration curve (Hogg et al., 2020) were used for entities between the 216 latitudes of 90° and 15°N and 15 to 90°S respectively. Entities in equatorial latitudes (15°N to 217 15°S) used a 50:50 mixed calibration curve to account for north-south air mass-mixing 218 following Hogg et al. (2020), and radiocarbon ages from marine entities were calibrated using 219 the Marine20 calibration curve (Heaton et al., 2020). 220 To estimate the optimum age modelling scenarios based upon the date and sample information 221 for each entity, multiple RBacon age models were run using different prior accumulation rate 222 (acc.mean) and thickness values. Prior accumulation rate values were selected using an initial 223 linear regression of the ages in each entity, which was then increased (decreased) sequentially 224 from the default value up to twice more (less) than the initial value. As an example, if the initial 225 accumulation rate value selected from the linear regression was 20 yr/cm, age models would 226 also be run using values of 10, 15, 20, 30 and 40 yr/cm. In cases where the regional 227 accumulation rate was known, the upper and lower values of the accumulation rate scenarios 228 were manually constrained. The range of prior thicknesses used in the models were calculated 229 by increasing and decreasing the RBacon default thickness value (5 cm) up to a value one 230 eighth of the overall length of the core. For a 400 cm core for example, the thickness scenarios 231 would be 5, 10,15,20,25,30,35,40,45  and have also been included in the age models run in ageR. In instances where the 235 sedimentation rates were different above and below an hiatus, separate age models were run 236 before and after the non-deposition period to account for these variations (Blaauw and Christen, 237 2011). 238 A three-step procedure was used to select the best model for each entity. First, an optimum 239 model was selected by ageR, using the lowest quantified area between the prior and posterior 240 accumulation rate distribution curves (Supplementary Figure 1). This selection was checked 241 manually using comparisons between the distance of the estimated ages and the controls to 242 check the accuracy of the model interpolation. Finally, the age model was visually inspected 243 to ensure that final interpolation accurately represented the date information and did not show 244 abrupt shifts in accumulation rates or changes at the dated depths. If the ageR model selection 245 was deemed to be erroneous or inaccurate, the next suitable model with the lowest area between 246 the prior and posterior curves, which accurately represented the distribution of dates in the 247 sequence, was selected (Supplementary Figure 2). 248

Quality control 249
Individual records in the RPD were compiled either by the original authors or from published 250 and open-access material by specialists in the collection and interpretation of charcoal records. 251 Records that were obtained from published and open-access material were cross-checked 252 against publications or with the original authors of those publications whenever possible. Null 253 values for metadata fields were identified during the initial checking procedure, and checks 254 were made with the data contributors to determine whether these genuinely corresponded to 255 missing information. In the database, null values are reserved for fields where the required 256 information is not applicable, for example water depth for terrestrial sites or laboratory sample 257 numbers for correlative dates. We distinguish fields where information could be available but 258 was never recorded or has subsequently been lost (represented by -999999), and fields where 259 we were unable to obtain this information but it could be included in subsequent updates of the 260 database (represented by -777777). We also distinguish fields where specific metadata is not 261 applicable (represented by -888888), for example basin size for a marine core or water depth 262 for a terrestrial small hollow. 263 Prior to entry in the database, the records were automatically checked using specially designed 264 database scripts (in R) to ensure that the entries to individual fields were in the format expected 265 (e.g. text, decimal numeric, positive integers) or were selected from the pre-defined lists 266 provided for specific fields. Checks were also performed to find duplicated rows (e.g. 267 duplicated sampling depths within the same entity). 268

Overview of database contents 269
This first version of the RPD contains 1676 individual charcoal records from 1480 sites 270 worldwide. This represents a 128% increase compared to the number of records in version 3 271 of the Global Charcoal Database (GCDv3: Marlon et al., 2016;736 records) and a 79% increase 272 compared to version 4 (Blarquez, 2018; 935 records) and a 36% increase compared to the 273 online version of the GCD (1232 records). The RPD includes 840 records that are not available 274 in any version of the GCD, and provides updated or corrected information for a further 485 275 records that were included in the GCD. Raw data are available for 14% of the entities and 276 concentration for 67% of the entities; influx based on the original age models is given for 16% 277 of the entities. The original age models for 67 (4%) of the records included in the RPD were 278 derived solely by layer counting, U/Th or Pb dates, or isotopic correlation and therefore are 279 already expressed in calendar ages. However, we have provided new age models for 22 of these 280 records (33%), where the dates or correlations points were specified, using the supervised age 281 modelling procedure for consistency. New age models have been created for 807 (50%) of the 282 remaining charcoal records where the original chronology was based on radiometric dating. 283 The geographic coverage of the RPD (Figure 2) is biased towards the northern extratropics.  LGM . 307 Information about site type (Figure 4a) is included in the database because this could influence 308 whether the charcoal is of local origin or represents a more regional palaeofire signal. For 309 example, records from small forest hollows provide a very local signal of fire activity and 310 records from peat bogs most likely sample fires on the peatland itself, whereas records from 311 lakes could provide both local and regional fire signals. More than half (55%) of the records in 312 the RPD are derived from lakes (811 entities). Records from peatlands are also well represented 313 (471 entities, 32%). Basin size, particularly in the case of lakes, influences the source area for 314 charcoal particles transported by wind. However, the existence of inflows and outflows to the 315 system can also affect the charcoal record. Quantitative information is now available for more 316 than half of the lake sites (Figure 4b), and most (691 sites, 81%) of the records (Figure 4c) are 317 from relatively small lakes (<1 km 2 ). A quarter of the charcoal records from lakes (Figure 4d)  R package used to create the new age models is available from https://github.com/special-331 uor/ageR (Villegas-Diaz et al., 2021). 332

Conclusions 333
The Reading Palaeofire Database (RPD) is an effort to improve the coverage of charcoal 334 records that can be used to investigate palaeofire regimes. New age models have been 335 developed for 48% of the records to take account of recent improvements in radiocarbon 336 calibration and age modelling methods. In addition to expanded coverage and improved age 337 models, considerable effort has been made to include metadata and quality control information 338 to allow the selection of records appropriate to address specific questions and to document 339 potential sources of uncertainty in the interpretation of the records. The first version of the RPD 340 contains 1676 individual charcoal records (entities) from 1480 sites worldwide. Geographic 341 coverage is best for the northern extratropics, but the coverage is good except for semi-arid and 342 arid regions. Temporal coverage is good for the past 2000 years, the Holocene and back to the 343 LGM, but there is a reasonable number of longer records. The database is publicly available, 344 both as an SQL database and as csv files. We would like to thank our many colleagues from the PAGES Global Palaeofire Working 365 Group for their contributions to the construction of the Global Charcoal Database which 366 provided the starting point for the current compilation, and our colleagues from the Leverhulme 367 Centre for Wildfires, Environment and Society for discussions of the use of palaeodata to 368 reconstruct past fire regimes. We thank Manfred Rösch for providing information on dating 369 for several sites. We also thank Dan Gavin and Jack Williams for helpful reviews of the original 370 manuscript. 371 372