The PALMOD 130k marine palaeoclimate data synthesis version 2

Jonkers, Lukas; Hollstein, Martina; Siccha, Michael; Kucera, Michal

doi:10.5194/essd-2025-599

Preprints

https://doi.org/10.5194/essd-2025-599

Preprints

09 Oct 2025

| 09 Oct 2025

Status: a revised version of this preprint was accepted for the journal ESSD and is expected to appear here in due course.

The PALMOD 130k marine palaeoclimate data synthesis version 2

Lukas Jonkers, Martina Hollstein, Michael Siccha, and Michal Kucera

Abstract. Palaeoclimate data hold the unique promise of providing a long-term perspective on climate change and as such can serve as an important benchmark for climate models. However, palaeoclimate data have generally been archived with insufficient standardisation and metadata to allow for transparent and consistent uncertainty assessment in an automated way. Thanks to improved computation capacity, transient palaeoclimate simulations are now possible, calling for data products containing multi-parameter time series rather than information on a single parameter for a single time slice. To confront transient simulations that span the last glacial-interglacial cycle with palaeoclimate data, we have compiled a multi-parameter marine palaeoclimate data synthesis that contains time series spanning 0 to 130,000 years ago. In 2020 Jonkers et al. (2020) published the first version of the PALMOD 130k marine palaeoclimate data synthesis and described our data synthesis strategy and the contents and format of the data product in detail. Here we present a major update of the data product that markedly increases both the spatial and temporal coverage. Version 2 of the synthesis contains 2,286 time series of eight palaeoclimate parameters from 475 individual sites, each associated with rich metadata, age–depth model ensembles, and information to refine and update the chronologies. Version 2 contains 468 time series of benthic foraminifera δ¹⁸O; 357 of benthic foraminifera δ¹³C; 423 of near sea surface temperature; 482 and 273 of planktonic foraminifera δ¹⁸O and δ¹³C; and 128, 111 and 44 of carbonate, organic carbon and biogenic silica content, respectively. Compared to version 1, all radiocarbon ages have been recalibrated and the age-depth models updated. In addition, near sea surface temperature estimates based on planktonic foraminifera Mg/Ca and on UK37' have been recalculated using a single calibration thus ensuring global comparability and comprehensive assessment of their uncertainty. The data product is available in two formats (R and LiPD) facilitating use across different software and operating systems and can be downloaded at https://doi.pangaea.de/10.1594/PANGAEA.984602 (Jonkers et al., 2025b). This data descriptor presents our updating methodology and describes the contents and format of the data product in detail and concludes with recommendations on palaeodata stewardship to increase the reusability of such data.

Received: 30 Sep 2025 – Discussion started: 09 Oct 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Lukas Jonkers, Martina Hollstein, Michael Siccha, and Michal Kucera

Status: closed

RC1:
'Comment on essd-2025-599', Anonymous Referee #1, 11 Nov 2025

Jonkers and co-authors give an update on the PALMOD data base of marine paleotracers for the las 130 k. This is a necessary and significant update for the PALMOD data base. To previous version was limited to d13C and d18O from benthic foraminifera, Jonkers et al. now include planktic foraminefera stable isotopes, and other proxies such as Mg/Ca and carbonate and biogenic silica content. They also include temperature reconstructions for each of the sites. The authors claim to have merged age models with proxy data, assigning an age value to each downcore sample. This is a useful update, which saves users from the necessity to interpolate the age models to the data depth scales.

I am worried by the presentation of the data. The authors choose two formats: R and LiPD files. However, the way they have structured the data makes it very hard to look at it at a glance. For R (.RDS) files R needs to be used. I am an advanced python programmer, and I wasn't able to quickly access the data. Both in R and LiPD the data sets of each coring sites were saved without column names, and I find no easy way to see a depth,age,proxy list on screen. I thought LiPD would be easier, since I know that inside a LiPD folder the data are saved as .csv, however, the problem is the same: The is no reference at each file to know what I am seeing. Of course, it could be that I am not knowledgeable enough to open these files. But I am myself a data person, so if I had a problem accessing the data base, it is reasonable to think that many other users will have issues too.
I recommend the authors to re-include netcdf files of each coring sites as part of the data base (they were included in the first version of the PALMOD database). These files are more universal than LiPD and R files, and readable by different software types. In addition, I recommend the authors to produce a more human-readable version of the data base. These could just be the csv files inside the LiPD directories, if we have in each file explicit information of what each column is. Otherwise, I am sad to say that this important data product will be useful for just a very few R and LiPD experts.

Citation: https://doi.org/10.5194/essd-2025-599-RC1
- CC1:
  'Reply on RC1', Julien Emile-Geay, 29 Nov 2025
  
  Re: LiPD, perhaps this would help? https://pylipd.readthedocs.io/en/latest/. Also see examples of how to load the data into Pyleoclim for analysis here: https://linked.earth/PyleoTutorials/notebooks/L0_loading_to_series.html#loading-a-single-lipd-file
  If, as you contend, the data tables are not properly formatted, then it might not. But otherwise, this will save all Python users a lot of trouble with this dataset.
  Best,
  J.E.G.
  
  Citation: https://doi.org/10.5194/essd-2025-599-CC1
  - AC2: 'Reply on CC1', Lukas Jonkers, 30 Jan 2026
    
    We would like to thank Julien Emile-Geay for pointing us to the new online documentation on how to handle LiPD files using Python. We will update the manuscript accordingly.
    
    Citation: https://doi.org/10.5194/essd-2025-599-AC2
- AC1: 'Reply on RC1', Lukas Jonkers, 30 Jan 2026
  
  We have provided our response in the attached pdf
  
  Citation: https://doi.org/10.5194/essd-2025-599-AC1
RC2:
'Comment on essd-2025-599', Anonymous Referee #2, 02 Jan 2026

The article presents a major update to the first PALMOD synthesis, including many primary data compiled here for the first time. The manuscript is very well structured and the figures are very informative. I recommend publication after the following comments have been addressed.
Major Comment:
For the purpose of uncertainty propagation, it is tremendously useful that PALMOD 2 includes age-model ensembles, as well as “1000 ensemble time series of seawater temperature” whenever possible. However, a question arises as to how these two sources of uncertainty should be combined. Are all paths through the depth-age-SST space physically realizable, or do some result in unrealistically abrupt SST changes, or even reversals compared to what is observed in the raw data (for instance, Mg/Ca vs depth)? I do not know of a standard solution to this problem, but there are existing strategies to address it (e.g. Khider et al, 2017; doi:10.1002/2016PA003057); I believe it is worth raising here to ensure the best possible use of the data. It would be useful to add a figure showing the reconstructed SST history in one core, displaying quantiles of its distribution (e.g. 5%, 25%, 50%, 75%, 95%) through time, and sharing the associated code so that users can easily repeat this relatively rare analysis.
Minor Comments:
L113: “the updated radiocarbon curve”. We learn later in the text that it is IntCal20; it would be logical to cite it here.
L140: “estimates of their uncertainty” —> estimates of uncertainty
L227: “associated manuscript” —> original publications (there many be more than one, and “manuscript” typically refers to a pre-publication form)
L249: the reference to “convergence” will be unfamiliar to many readers; it may be worth explaining in more detail.
Table 1:
- “standardised parameter name” —> what standard vocabulary was used here? If none was available or relevant, how did the authors choose to standardise?
- “CalibrationUncertainty”: does this refer to 1sigma, 2sigma, or something else?
L273: “priority climate-relevant variables” —> who decided on such priorities?
L329: “14 time series shorter than 1 kyr are not shown.” —> why are such series included at all? Is there any scientific use to be made of them in the context of this compilation?
L359: “The majority … is based” -> The majority … ARE based
L363: “LDI” this is a new acronym to me. Would it be useful including a citation?
L376: “Data is freely available” —> Data ARE freely available.
L378/9: “We explicitly encourage users to also cite the original data when using this data product. “ Please provide an explicit example or two here, as I am not sure everyone would get the hint otherwise.
L412/3: “To allow ..” -> this sentence is incomplete. It implies a second clause that never comes. Please rewrite.
L414: “different calibration (schemes)” -> please remove parentheses.
L418: “only in the form as reported” -> only in the form reported (no “as”)
L444: “modelling strategies like for instance done in the “ -> modelling strategies like for instance used for the …
L451/2: Note that the for Python users, the relevant resource is now the PyLiPD package (https://pylipd.readthedocs.io/en/latest/), and associated tutorials (https://linked.earth/pylipdTutorials/intro.html).
L456/7: “Users are encouraged to report those to the lead author so they can be corrected”. Can you perhaps add a line about the process for reporting errors (github issues?), the versioning scheme for corrections, and how they will be released?
Question for the editor: does ESSD have a mechanism for issuing dataset corrections distinct from the standard Copernicus author correction?
L464: “metadata and chronology data is” -> metadata and chronology data are
L474: “invariantly” -> invariably
L481: “The lack of a standardised vocabulary and consistent ontology” -> what about the LinkedEarth ontology (https://linked.earth/ontology/), which is aligned to the LiPD scheme?
Re: “The lack of a standardised vocabulary”, the PaST Thesaurus qualifies in my book as a “standardised vocabulary”. Perhaps the authors mean that the two major data repositories (WDS-Paleo and PANGAEA) have yet to adopt the same vocabulary? If so, I agree that it is worth calling out, in hopes that they work more closely together (and the rest of the community) to make that happen.

Citation: https://doi.org/10.5194/essd-2025-599-RC2
- AC3: 'Reply on RC2', Lukas Jonkers, 30 Jan 2026
  
  Please see our reply to reviewer 2 in the attached pdf
  
  Citation: https://doi.org/10.5194/essd-2025-599-AC3

Status: closed

RC1:
'Comment on essd-2025-599', Anonymous Referee #1, 11 Nov 2025

Jonkers and co-authors give an update on the PALMOD data base of marine paleotracers for the las 130 k. This is a necessary and significant update for the PALMOD data base. To previous version was limited to d13C and d18O from benthic foraminifera, Jonkers et al. now include planktic foraminefera stable isotopes, and other proxies such as Mg/Ca and carbonate and biogenic silica content. They also include temperature reconstructions for each of the sites. The authors claim to have merged age models with proxy data, assigning an age value to each downcore sample. This is a useful update, which saves users from the necessity to interpolate the age models to the data depth scales.

I am worried by the presentation of the data. The authors choose two formats: R and LiPD files. However, the way they have structured the data makes it very hard to look at it at a glance. For R (.RDS) files R needs to be used. I am an advanced python programmer, and I wasn't able to quickly access the data. Both in R and LiPD the data sets of each coring sites were saved without column names, and I find no easy way to see a depth,age,proxy list on screen. I thought LiPD would be easier, since I know that inside a LiPD folder the data are saved as .csv, however, the problem is the same: The is no reference at each file to know what I am seeing. Of course, it could be that I am not knowledgeable enough to open these files. But I am myself a data person, so if I had a problem accessing the data base, it is reasonable to think that many other users will have issues too.
I recommend the authors to re-include netcdf files of each coring sites as part of the data base (they were included in the first version of the PALMOD database). These files are more universal than LiPD and R files, and readable by different software types. In addition, I recommend the authors to produce a more human-readable version of the data base. These could just be the csv files inside the LiPD directories, if we have in each file explicit information of what each column is. Otherwise, I am sad to say that this important data product will be useful for just a very few R and LiPD experts.

Citation: https://doi.org/10.5194/essd-2025-599-RC1
- CC1:
  'Reply on RC1', Julien Emile-Geay, 29 Nov 2025
  
  Re: LiPD, perhaps this would help? https://pylipd.readthedocs.io/en/latest/. Also see examples of how to load the data into Pyleoclim for analysis here: https://linked.earth/PyleoTutorials/notebooks/L0_loading_to_series.html#loading-a-single-lipd-file
  If, as you contend, the data tables are not properly formatted, then it might not. But otherwise, this will save all Python users a lot of trouble with this dataset.
  Best,
  J.E.G.
  
  Citation: https://doi.org/10.5194/essd-2025-599-CC1
  - AC2: 'Reply on CC1', Lukas Jonkers, 30 Jan 2026
    
    We would like to thank Julien Emile-Geay for pointing us to the new online documentation on how to handle LiPD files using Python. We will update the manuscript accordingly.
    
    Citation: https://doi.org/10.5194/essd-2025-599-AC2
- AC1: 'Reply on RC1', Lukas Jonkers, 30 Jan 2026
  
  We have provided our response in the attached pdf
  
  Citation: https://doi.org/10.5194/essd-2025-599-AC1
RC2:
'Comment on essd-2025-599', Anonymous Referee #2, 02 Jan 2026

The article presents a major update to the first PALMOD synthesis, including many primary data compiled here for the first time. The manuscript is very well structured and the figures are very informative. I recommend publication after the following comments have been addressed.
Major Comment:
For the purpose of uncertainty propagation, it is tremendously useful that PALMOD 2 includes age-model ensembles, as well as “1000 ensemble time series of seawater temperature” whenever possible. However, a question arises as to how these two sources of uncertainty should be combined. Are all paths through the depth-age-SST space physically realizable, or do some result in unrealistically abrupt SST changes, or even reversals compared to what is observed in the raw data (for instance, Mg/Ca vs depth)? I do not know of a standard solution to this problem, but there are existing strategies to address it (e.g. Khider et al, 2017; doi:10.1002/2016PA003057); I believe it is worth raising here to ensure the best possible use of the data. It would be useful to add a figure showing the reconstructed SST history in one core, displaying quantiles of its distribution (e.g. 5%, 25%, 50%, 75%, 95%) through time, and sharing the associated code so that users can easily repeat this relatively rare analysis.
Minor Comments:
L113: “the updated radiocarbon curve”. We learn later in the text that it is IntCal20; it would be logical to cite it here.
L140: “estimates of their uncertainty” —> estimates of uncertainty
L227: “associated manuscript” —> original publications (there many be more than one, and “manuscript” typically refers to a pre-publication form)
L249: the reference to “convergence” will be unfamiliar to many readers; it may be worth explaining in more detail.
Table 1:
- “standardised parameter name” —> what standard vocabulary was used here? If none was available or relevant, how did the authors choose to standardise?
- “CalibrationUncertainty”: does this refer to 1sigma, 2sigma, or something else?
L273: “priority climate-relevant variables” —> who decided on such priorities?
L329: “14 time series shorter than 1 kyr are not shown.” —> why are such series included at all? Is there any scientific use to be made of them in the context of this compilation?
L359: “The majority … is based” -> The majority … ARE based
L363: “LDI” this is a new acronym to me. Would it be useful including a citation?
L376: “Data is freely available” —> Data ARE freely available.
L378/9: “We explicitly encourage users to also cite the original data when using this data product. “ Please provide an explicit example or two here, as I am not sure everyone would get the hint otherwise.
L412/3: “To allow ..” -> this sentence is incomplete. It implies a second clause that never comes. Please rewrite.
L414: “different calibration (schemes)” -> please remove parentheses.
L418: “only in the form as reported” -> only in the form reported (no “as”)
L444: “modelling strategies like for instance done in the “ -> modelling strategies like for instance used for the …
L451/2: Note that the for Python users, the relevant resource is now the PyLiPD package (https://pylipd.readthedocs.io/en/latest/), and associated tutorials (https://linked.earth/pylipdTutorials/intro.html).
L456/7: “Users are encouraged to report those to the lead author so they can be corrected”. Can you perhaps add a line about the process for reporting errors (github issues?), the versioning scheme for corrections, and how they will be released?
Question for the editor: does ESSD have a mechanism for issuing dataset corrections distinct from the standard Copernicus author correction?
L464: “metadata and chronology data is” -> metadata and chronology data are
L474: “invariantly” -> invariably
L481: “The lack of a standardised vocabulary and consistent ontology” -> what about the LinkedEarth ontology (https://linked.earth/ontology/), which is aligned to the LiPD scheme?
Re: “The lack of a standardised vocabulary”, the PaST Thesaurus qualifies in my book as a “standardised vocabulary”. Perhaps the authors mean that the two major data repositories (WDS-Paleo and PANGAEA) have yet to adopt the same vocabulary? If so, I agree that it is worth calling out, in hopes that they work more closely together (and the rest of the community) to make that happen.

Citation: https://doi.org/10.5194/essd-2025-599-RC2
- AC3: 'Reply on RC2', Lukas Jonkers, 30 Jan 2026
  
  Please see our reply to reviewer 2 in the attached pdf
  
  Citation: https://doi.org/10.5194/essd-2025-599-AC3

Lukas Jonkers, Martina Hollstein, Michael Siccha, and Michal Kucera

Viewed

Total article views: 775 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
450	285	40	775	39	50

HTML: 450
PDF: 285
XML: 40
Total: 775
BibTeX: 39
EndNote: 50

Views and downloads (calculated since 09 Oct 2025)

Month	HTML	PDF	XML	Total
Oct 2025	198	39	6	243
Nov 2025	78	72	12	162
Dec 2025	57	81	10	148
Jan 2026	99	86	12	197
Feb 2026	18	7	0	25

Cumulative views and downloads (calculated since 09 Oct 2025)

Month	HTML	PDF	XML	Total
Oct 2025	198	39	6	243
Nov 2025	78	72	12	162
Dec 2025	57	81	10	148
Jan 2026	99	86	12	197
Feb 2026	18	7	0	25

Viewed (geographical distribution)

Total article views: 753 (including HTML, PDF, and XML) Thereof 753 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 07 Feb 2026

Short summary

Palaeoclimate data can provide a long-term perspective on climate change and serve as an benchmark for climate models. We have compiled a multi-parameter marine palaeoclimate data synthesis that contains time series spanning 0 to 130,000 years ago with rich metadata to facilitate reuse. We describe the methodology, contents and format of version 2 of this synthesis that presents a large increase in spatiotemporal coverage. We end with recommendations for palaeodata stewardship.


Total:	0
HTML:	0
PDF:	0
XML:	0