the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A geospatial inventory dataset of study sites in a Korean Quaternary paleoecology database
Abstract. Ecological insights beyond human-observable time scales are derived from geologically preserved records in lake and wetland sediments around the world. Nonetheless, significant regional data gaps persist in global syntheses of these records as regional open data practices are still emerging. South Korean Quaternary paleoecology data remain underrepresented in these global efforts, despite a growing body of the relevant research. Here, we organize an inventory of 328 paleoecological study sites (72 paleo-sites for sediment records and 256 surface sites for surface pollen samples) in South Korea, compiled from 66 research articles published between 2003 and 2023. We have structured three datasets related to this inventory: (1) Publication Metadata, which provides citation details of the 66 articles; (2) Site Inventory, which contains geospatial, depositional environments, chronological ranges, proxies, and indexed publications; and (3) Chron-Depth Collection, which includes chronological details (dating methods, age, and depth points) for each site. The sites span latitudes from 33.2508° N to 33.4808° N and longitudes from 126.1486° E to 129.2132° E, with elevations from -156 m to 1867.5 m. Sediment samples were collected by coring or trenching from six depositional environments: Open-coastal zone, Estuary, Lagoon, River, Volcanic cone, and Others. A total of 784 chronological controls (14C, OSL, and U-Th) were analyzed from 72 sediment records, and the majority based on radiocarbon dating. Pollen, diatoms, grain-size analysis, and geochemical markers have been extensively used as paleoenvironmental proxies, with multiproxy analyses becoming increasingly common in recent studies. To enhance accessibility, we have developed GeoEcoKorea, an online platform archiving raw data of the compiled studies or linking to it through our metadata, site inventory, and chron-depth datasets if the data is made available elsewhere. This initiative seeks to establish more data sharing agreements with domestic researchers by promoting the collaborative benefits of findable, accessible, interoperable, and reusable (FAIR) data.
- Preprint
(5856 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 18 May 2025)
-
RC1: 'Comment on essd-2025-130', John Williams, 07 May 2025
reply
This is a useful summary of a major data compilation of paleoecological data in Korea. To my knowledge, this is the first such compilation of these data. The data compilation is somewhat small in volume and spatial scope, but it represents an important step forward in building a culture of data sharing and FAIR data in Korea. Papers like this are really helpful for showing the data now available and as a starting point for future community-building efforts, building open databases, and conducting large-scale scientific research.
- Of the three datasets built (see Fig. 2), only one of them has a detailed description of variable names, in Table 1. The other two datasets need a similarly detailed table. Ideally these two new tables would be put in the main text, but if there’s not enough space, the two new tables could go into supplementary information. Specific comments about the data schema shown in Figure 2 and Table 1
- ‘Chron-Depth Collection’ is an awkward name… recommend naming to ‘Age-Depth Data’ or ‘Geochronology Data’.
- The Chron-Depth Collection table lacks a field for the depth of each age control, which is essential data/metadata; for age-depth models to be built or rebuilt, each age control must have a depth and time coordinate.
- The Chron-Depth Collection table also should include a field indicating whether a date is in radiocarbon years or calendar/calibrated years. Ideally should also indicate whether the age datum is 1950 CE (which is the standard for radiocarbon years) or something else.
- In Site Inventory, what does ‘Year’ store and is this information redundant with the ‘Year’ field in the Publication table? A standard principle of good database design is that each piece of information should be stored in only one location.
- In the Site Inventory, it is a little strange, or at least not internally consistent to store age as Oldest, Youngest endpoints, while depth is stored as an interval. Ideally Depth would also be stored as Top, Bottom endpoints.
- Elevation (m): Elevation above sea level?
- Recommend renaming ‘Specifications on environment’ to ‘Site Description’
- For Sample type, recommend renaming ‘Surface pollen’ to ‘Surface sample’ so that other kinds of surface data can be stored in the database.
- For Sample type, note that what this database calls ‘Trench’, Neotoma calls ‘Section’. ‘Section’ is a little more general, because ‘Trench’ implies a dug trench, while ‘Section’ can be any stratigraphic section or outcrop, e.g. including trenches, riverbanks, or rock outcrops. See also comment on L179 below.
- Could probably rename ‘Written language’ as ‘Language’. Also, there seems to be a confusion here, because ‘Language’ in Fig. 1 is part of the Publication table, but Table 1 is listed as describing the variables in the Site table.
- Table 1 lists a category called ‘Proxy’ and fields including Pollen, Diatom, etc. However, these fields aren’t shown in Figure 2’s entity relationship diagram. In general, Table 1 and Figure 2 should be checked to ensure that there is a 1:1 match between all variable names shown in each location.
- The paper needs to be very careful in how it handles and reports ages, because it is using at least three different time scales, creating major potential for confusion. First, it uses the standard AD/BC (or CE/BCE) timescale for information such as publication year. Second, it uses the radiocarbon timescale when reporting radiocarbon dates, which are expressed in years before present, uses 1950 CE as ‘present’ and has a non-linear relationship to the calendar-year timescale. Third, the paper uses the calendar-year timescale for OSL and U/Th dates, which also are expressed in years before present and can use various years as ‘present’.
- For all variables that store year, the paper should clearly establish what time scale is being used.
- For all geochronological datasets, the dataset should formally store the datum used for ‘present’ or the paper should establish that in all cases, ‘present’ is treated as 1950 CE.
- For all radiocarbon dates, they should be reported in their original radiocarbon-year dates with 1sigma error. The use of the radiocarbon timescale should be explicitly noted as a metadata field. The paper optionally could also include calendar-year conversions for each radiocarbon date as additional fields, along with information about the calibration curve and program used to convert from radiocarbon years to calendar years.
- For all other geochronological dates, the use of the calendar-year timescale should be explicitly noted as a metadata field.
- In the Summary, this paper does a good job of describing how this site inventory could be used for future work. That noted, I think the paper would be strengthened if its closing provided more of a forward-looking vision and potential future directions, in at least three ways. Papers like this can set the tone for a community and inspire others.
- Communicate more of a scientific vision (this could be done in both Intro and Conclusions/Summary). What kinds of research questions can be enabled when large open paleoecological databases are available? Cite more of the recent papers that have used large paleoecological datasets and discuss how these approaches could be used to advance questions in Korean paleoecology
- I know that the politics of data sharing and openness can be sensitive, but I think it would still be good to add a gentle call to authors to contribute their data and thereby further enhance the database by obtaining and adding the actual proxy datasets. This call follows naturally from the scientific vision… what scientific research questions are enabled by the inventory in its current form? What new questions would be possible if all datasets were also available?
- It would also be good to talk about the longer-term vision / plan for the GEK. One option is to keep it as a standalone database; another would be for it to join Neotoma as a Constituent Database. There are pros and cons to each option. A standalone database allows for more autonomy and local control but raises questions of sustainability; small databases find it hard to persist over time. You could point to other regional / national efforts such as the LAPD, APD, and Chinese Pollen Database as various options. The first two have been merged into Neotoma and the latter has remained a standalone database. The Japanese Pollen Database is somewhat in between, with some but not all of these data in Neotoma. You could discuss options without reaching a firm decision on which to pursue, so as to leave options open. Could also here talk about the importance of data governance and making sure that people feel like they have a voice in how their data are curated… this could e.g. lead to a working group, South Korean Pollen Council, or other group… describe options with invitations for others to join this effort. Williams et al. (2018) provides a good description of Neotoma data governance. Note of course that the authors have a much better sense of academic culture and norms in South Korea, so are best positioned to figure out how best to gently advance community-oriented data governance, while not pushing too hard.
See line-by-line comments below and attached PDF for more detailed comments. My edits to the PDF are fairly extensive but are mainly focused on readability and brevity. I’ve tried to keep substantive changes out of the PDF edits, but have flagged below in line-by-line comments any wording edits that might create a substantive change in tone or emphasis.
LINE-BY-LINE COMMENTS
L24-25: This closing sentence in the abstract about data sharing initiatives is good, but it would also be good to talk about scientific vision and future questions / opportunities that are being enabled.
L50: Should also add mention of open-source statistical packages, e.g. those developed by Mottl et al for rate-of-change analyses or Simpson for hierarchical GAMs.
L53 and elsewhere: note that ‘assumptions’ is often being used when ‘analyses’ or other wording would be better. Usually a scientist makes assumptions in the absence of data… the assumptions are precursors to formulation of hypotheses and the analysis of data. Much of the paper is focusing on this later stage of hypothesis testing and analysis.
L62: See edits to PDF, have revised text to clarify how regional databases can affiliate with Neotoma and advantages.
L64: See proposed rewording
L85 and Fig 1:
*Clarify what the blue bands represent, i.e. they are intended to highlight Korea and the absence of data
*Possibly could truncate the map and associated panel B at 65S and 85N given the absence of sites at these latitudes
L115 and Fig 2:
*typo in the second table: ‘depositionoanal’
L134-139: This paragraph needs to be written for a broad ESSD audience, who won’t e.g. know what the space-for-time substitution method is. Could cite Chevalier et al. (2020). Also ‘paleo-site’ here and elsewhere reads as jargony, suggest instead ‘fossil site’ or ‘paleoecological record’. And recommend here and elsewhere ‘surface sample’ instead of ‘surface site’ or ‘surface pollen’.
L150: What is meant by ‘distinct geographic coordinates’?L164 and elsewhere: Avoid using a hyphen to indicate numeric ranges, because it can be easily confused for a negative sign. Rewrite here to ’51 to 1,305 m’
Fig 6 / L215 and elsewhere: Change ‘Others’ to ‘Other’ when using it as a categorical name.
L172: Note that what the paper calls a sample (surface pollen, core, trench), Neotoma calls a collection unit. In Neotoma’s naming system, a site (such as a lake) can have multiple collection units (e.g. multiple cores per lake) and multiple samples per core, distributed by depth. Understood that this paper’s data schema doesn’t have to closely align with Neotoma… still, aligning language now will avoid much confusion later.
L179: Given these (useful) definitions, I’d argue that ‘Section’ or ‘Stratigraphic Section’ should be used instead of ‘Trench’. ‘Trench’ usually implies a ditch dug by human action. The manuscript describes instead naturally occurring outcrops, which aren’t trenches.
L191 and elsewhere: edit ‘Open-coastal zone’ to ‘Open Coastal Zone’ and edit ‘Volcanic cone’ to ‘Volcanic Cone’, capitalizing these names because you are proposing them as formally named categories.
L191-197: Can remove all quotation marks, because the capitalized names will be enough to establish these as defined entities.
L200: Given that so many samples are soil samples, recommend adding ‘Soil’ as a category and reducing the scope of ‘Other’.
L213/Fig. 7: Typo ‘coatal’
L222: Entering ages and depths as zero for the surface samples is not a good idea, because no measurements actually exist. Setting ages to zero is a particular problem, because in the radiocarbon timescale, 0 = 1950 CE, which would be an incorrect age for these samples. For depth, recommend leaving blank, and for age, recommend using the sampling year
L222-223: Somewhere in this paper, state explicitly a) that you are using the radiocarbon timescale for all ages and b) remind ESSD reader that on this timescale, 1950 CE = 0. Also, because some time-related variables use the AD/BC calendar year (e.g. for publications) while others us the radiocarbon timescale (e.g. for ages), be very explicit about which systems are being used for which variables.
L226: The statement that 96% of sites use only 14C dating doesn’t appear to match Fig. 8B, where the proportion looks more like 85%.
L230 / Fig 8: I don’t really understand the histogram plot shown in Fig. 8B. How do the three histograms differ from each other? They appear to contain overlapping information.
L236-237: I don’t understand what this parenthetical about -73 years is trying to convey.
L238 & Fig. 9: Recommend renaming ‘last glacial-interglacial cycle’ to ‘Pleistocene and Holocene’. The last glacial-interglacial cycle is usually reserved for sites that span the last 100,000 years, i.e. a full glacial-interglacial cycle. This renaming would also better align the other two categories ‘Pleistocene-only’ and ‘Holocene-only’
L275 & Fig 9:
*typo: ‘Interglcacial’
*x-axis title in Panel B is incorrect, it should indicate thousands of years ago, not years ago
*Panel D: Check the math used to calculate the number of dates per 1 cm depth interval. The reported values for the Pleistocene sites (anywhere from 1 date per cm to 2.5 dates per cm) is unimaginably high.
*The use of multiple scale breaks in the x-axis is non-traditional. Recommend either just one scale break or perhaps experimenting with a log scale.
L280 & Fig 10:
*Panel D legend: delete word ‘record’
*Panel D: the colors in the legend don’t appear to match the colors used in the map
L292-293: Per comments above, because radiocarbon dates and U/Th use different timescales, need XX
L300 and Fig. 11:
*What does ‘pMC’ stand for?
*1950 what? CE?
*Note that this figure does a good job of clearly differentiating calendar years vs. 14C years by indicating 14C kyr BP for some plots and kyr BP for others. This model should be adopted throughout ms. Be sure to define kyr abbreviation somewhere.
*L311: Please report the Pearsons correlation coefficient and significance test for this regression.
- Of the three datasets built (see Fig. 2), only one of them has a detailed description of variable names, in Table 1. The other two datasets need a similarly detailed table. Ideally these two new tables would be put in the main text, but if there’s not enough space, the two new tables could go into supplementary information. Specific comments about the data schema shown in Figure 2 and Table 1
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
133 | 17 | 7 | 157 | 4 | 5 |
- HTML: 133
- PDF: 17
- XML: 7
- Total: 157
- BibTeX: 4
- EndNote: 5
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1