Worldwide version-controlled database of glacier thickness observations

. Although worldwide inventories of glacier area have been coordinated internationally for several decades, a similar effort for glacier ice thicknesses was only initiated in 2013. Here, we present the third version of the Glacier Thickness Database (GlaThiDa v3), which


Introduction
A central challenge of glaciology is assessing the distribution and total ice volume of the world's glaciers. Increasingly detailed and globally complete inventories of the world's glaciers (WGMS and NSIDC, 2012;GLIMS and NSIDC, 2018;RGI Consortium, 2017) have been compiled with great effort over the last few decades. However, these inventories have been limited to glacier extent and surface elevation. The Glacier Thickness Database (GlaThiDa), launched by the World Glacier Monitoring Service (WGMS, https: //wgms.ch, last access: 9 November 2020) and supported by the International Association of Cryospheric Sciences (IACS) Working Group on Glacier Ice Thickness Estimation (https://cryosphericsciences.org/activities/ice-thickness, last access: 9 November 2020), complements these existing efforts by compiling and publishing a freely accessible database of glacier thickness observations (Gärtner-Roer et al., 2014).
Knowing the thickness of glacier ice is critical for predicting the rate and timing of glacier retreat and disappearance, the subsequent effects on local and regional hydrologic cycles and global sea level, and the associated environmental and social impacts. As the only worldwide repository of its kind, GlaThiDa plays an important role in local, regional, and global studies of glaciers, glacier ice volumes, and their potential sea-level rise contributions (e.g., Thorlaksson, 2017;Farinotti et al., 2017Farinotti et al., , 2019Meyer et al., 2018;Fischer, 2018;Ayala et al., 2019;Werder et al., 2020). The Ice Thickness Models Intercomparison eXperiment (ITMIX; Farinotti et al., 2017) only used GlaThiDa v1 (WGMS, 2014) to calibrate one of the participating models, but it helped garner support and data for GlaThiDa v2 (WGMS, 2016). GlaThiDa v2 was subsequently used to calibrate all participating models and evaluate model performance for an ensemble-based estimate of the thicknesses of all glaciers on Earth (Farinotti et al., 2019).
GlaThiDa v3 represents a major step forward for the database. We have more than doubled the spatial coverage and more than quadrupled the number of observations relative to v2, released in 2016, adding 3 million thickness measurements either submitted by researchers (46 % of new measurements) or imported from the IceBridge Data Portal (54 %, https://nsidc.org/data/icebridge, last access: 28 August 2019). In addition to summarizing the spatial and temporal coverage of the database, we present a case study on how simple open-source metadata formats and software tools can be used to implement modern data management practices. In the sections that follow, we describe a development environment for data -based on universal text-based file formats and the distributed version control system git (Chacon and Straub, 2014) -that maximizes data access and interoperability, automatically tracks and archives every change made to the dataset, continuously validates the structure and contents of the data, and facilitates dialogue with (and bug reports by) data users. A monospace font is used throughout the manuscript for software packages (e.g., git), files (e.g., datapackage.json), database tables (e.g., T), database table fields (e.g., POINT_LAT), and code samples (e.g., Fig. 3).

Methods and data
2.1 Data sources

Data compilation
Since the release of GlaThiDa v1 in 2014 (Gärtner-Roer et al., 2014), which focused on gathering glacier mean and maximum thickness estimates from published literature, a large number of thickness measurements have been submitted by members of the research community in response to two calls for data, one for version 2 in 2016 and another for version 3 in 2018. In all, researchers from institutions in Europe, North and South America, Oceania, and Asia have contributed data from Antarctica, Africa (Kenya, Tanzania), Asia (China, Georgia, India, Kazakhstan, Kyrgyzstan, Mongolia, Nepal, Russia, Tajikistan), Europe (Austria, Germany, Greenland, Svalbard, Iceland, Italy, Norway, Sweden, Switzerland), Oceania (New Zealand), North America (Canada, United States), and South America (Bolivia, Chile, Colombia, Peru).
Alongside these data submissions, airborne glacier thickness profiles collected by National Aeronautics and Space Administration (NASA) Operation IceBridge were retrieved from the corresponding National Snow and Ice Data Center (NSIDC) data portals. Since these campaigns were primarily focused on the Greenland and Antarctic ice sheets, only measurements within Randolph Glacier Inventory (RGI 6.0) glacier outlines (RGI Consortium, 2017) were included in GlaThiDa. These replaced the IceBridge data, located within Randolph Glacier Inventory (RGI 3.2) glacier outlines (RGI Consortium, 2013), included in GlaThiDa v1 in 2014.

Measurement methods
The surveys in GlaThiDa span the history of glacier ice thickness measurement and thus include several different survey methods, summarized in Table 1 and illustrated in Fig. 1. The most direct methods involve excavating or drilling through the ice. Although these typically produce a precise measurement, they do so only for a single point and at a great expense of time and money; correspondingly, these account for only 0.35 % of surveys in GlaThiDa (here, a "survey" roughly represents one measurement campaign on one glacier). The drilling surveys added in v3 (compiled by Fürst et al., 2018a, Table S3) were carried out in Svalbard in the 1970s through 1990s to map the thermal structure of glaciers (e.g., Jania et al., 1996) or to extract paleoclimate records (e.g., Kotlyakov et al., 2004). No drilling surveys were added in v2.
More often, ice thickness is measured indirectly using geophysical methods. For example, seismic soundings employ the propagation properties of elastic waves to determine the structure of the subsurface (described in Susstrunk, 1951). Although common in the 1950s through 1980s, they are expensive and time-consuming to collect; correspondingly, these account for only 0.84 % of surveys in GlaThiDa. No seismic surveys were added in v3; of those already in the database, the majority were carried out in the Austrian Alps up until the 1970s (reviewed in Aric and Brückl, 2001). Only one seismic survey -of Tasman Glacier, New Zealand, from 1971 (Anderton, 1975) -was added in v2.
The most common geophysical method is radar (i.e., radio detection and ranging, also known as "radio-echo sounding" or "ground-penetrating radar"), which is based on the transmission, reflection, and subsequent detection of radio waves (reviewed in Schroeder et al., 2020). Radar measurements can be collected quickly, from the ice surface or from airborne platforms, and account for 98.44 % of surveys in GlaThiDa. Of the radar surveys added in v3, a majority are provided by NASA Operation IceBridge (Koenig et al., 2010), which sponsored a series of airborne radar platforms: the Multichannel Coherent Radar Depth Sounder (MCoRDS; Paden et al., 2011Paden et al., , 2018Shi et al., 2010), High-Capability Airborne Radar Sounder (HiCARS; Blankenship et al., 2017a, b), Pathfinder Advanced Radar Ice Sounder (PARIS; Raney, 2010), and Warm Ice Sounding Explorer (WISE; Rignot et al., 2013a, b). Other large additions in v3 include terrestrial and aerial radar surveys in Svalbard (compiled by Fürst et al., 2018a, Table S2) from the early 1980s (e.g., Dowdeswell et al., 1984Dowdeswell et al., , 1986 to the present day (e.g., Martín-Español et al., 2013;Navarro et al., 2014;Lindbäck et al., 2018), as well as helicopter-borne radar surveys in the Swiss Alps (Rutishauser et al., 2016). Large additions in v2 include extensive terrestrial radar surveys of glaciers in the Italian and Austrian Alps (Fischer et al., 2015a, b).
Geophysical techniques less commonly used for measuring glacier ice thickness include geoelectric (e.g., electrical resistivity tomography) and electromagnetic (e.g., magnetotellurics, controlled-source induction) methods which invert variations in electrical resistivity with depth to map the subsurface. The only examples of these in the database (0.04 %), added in v1, are helicopter electromagnetic surveys of two Cascade Range volcanoes (Finn et al., 2012). The methods of the remaining surveys (0.33 %) are unknown because the original source is either not known or cannot be found. All studies included in GlaThiDa are acknowledged in the database; the studies cited above are only provided as examples.

Data package structure
Packaging of data is as important as the data themselves. This includes the physical representation of the data within files, the design of the metadata that describes the data, and the distribution of the data package to prospective users. Without proper packaging, data are much less likely to achieve their full potential. The approach described in the following sections implements (and extends) the FAIR guiding principles: scientific data should be Findable, Accessible, Interoperable and Reusable for both machines and people (Wilkinson et al., 2016). Our implementation, built on simple text files, was designed to meet the following criteria: widely supported, open, human-and machine-readable file formats to maximize interoperability and ease of use (Cerri and Fuggetta, 2007); compatible with line-based version control systems like mercurial and git to automatically track and store changes (Blischak et al., 2016), facilitate collaboration between multiple authors (Ram, 2013), and continuously release new versions as the dataset evolves over time (Rauber et al., 2015); described according to existing metadata standards to facilitate data interpretation, validation, and future contributions (Fowler et al., 2017), as well as reuse by software applications like the Global Terrestrial Network for Glaciers (GTN-G) data browser (http://www.glims. org/maps/gtng, last access: 9 November 2020).

Data (data/ * .csv)
The data are structured as three relational database tables ordered in increasing level of detail (Fig. 2). The first, overview table (T) contains information on the location, identity, and area of the surveyed glacier; the survey method used; and details about the authors and sources of the data. Glacier mean and maximum thickness, estimated from point measurements, are also included when available from data providers. The second table (TT) contains any mean and maximum thicknesses, estimated from point measurements, for surface elevation bands. Although rare, some ice thickness surveys are only available as surface elevation band estimates, their point measurements having been lost or never published. The third table (TTT) contains point thickness measurements. All tables include a survey identifier (GlaThiDa_ID, unique in T) that links entries between the tables, as well as a country code (POLITICAL_UNIT) and glacier name (GLACIER_NAME), which are replicated in TT and TTT as a convenience to users. Structural changes since GlaThiDa v1, described in the changelog (Sect. 2.2.4), have been limited to adding fields (e.g., PROFILE_ID in v3 to group point measurements by survey profile) and renaming fields (e.g., DEM_DATE to ELEVATION_DATE in v3 to clarify that provided surface elevations need not be from a digital elevation model).
Following FAIR principles, the three tables are stored as CSV (comma-separated values) files, a universally supported Table 1. Number of glacier surveys and point measurements, interquartile range of point thicknesses, and full range of survey years by survey method. In the database, a "survey" roughly represents one measurement campaign on one glacier, and a "point" represents a single ice thickness measurement (as opposed to a spatial mean).

Method Surveys Points Thickness (m) Years
Radar ( . Field photographs illustrating different methods for measuring glacier thickness. (a) Ground-penetrating radar measurements on Johnsons Glacier, Antarctica, in January 2020. The white sledge contains a radar transmitter, receiver, shielded antennas, control unit, and recording system. A Global Navigation Satellite System receiver antenna is mounted to the sledge. Credit: Francisco Navarro. (b) Aerial ground-penetrating radar measurements over Hansbreen, Svalbard, in spring 2011 (Navarro et al., 2014). The radar transmitter, receiver, and antennas are mounted to a wooden frame hung from the helicopter. Credit: Antoine Kies. (c) Hot water drilling on Rhonegletscher, Switzerland, in August 2018 by ETH Zurich's Glacier Seismology Group. Although used mostly for characterizing bed conditions, the drillings measure ice thickness as a side product. Credit: Johannes Landmann. (d) Seismic reflection measurements on Grubengletscher, Switzerland, using a sledgehammer as a seismic energy source. The black seismic line connects geophones to the recording system. Credit: Bernd Kulessa.
text format for representing tabular data. To maximize machine readability, the files do not contain any nondata content other than a single header line with field names. Data documentation is performed by a separate metadata file, described below.

Metadata (datapackage.json)
The structure and content of the data package is described in a single JSON (JavaScript Object Notation) file which conforms to the Frictionless Data Tabular Data Package specification . This file contains general metadata like the package's name, version, description, and license; a list of contributors; and links to published source datasets. The file also contains a detailed description of both the contents and structure of the tabular data files. In practice, CSV files come in a large number of variants; by making the format and character encoding explicit, we help both software and human users avoid unnecessary guesswork (Fig. 3). Each table field is described in turn, including its name and description, the data type it represents (string, integer, or floating point number), and any constraints on the values it can take (e.g., whether a value is required, falls within a numeric range, or matches a search pattern). In the example in Fig. 4, the description informs users that the field values are stored in the data files with "up to seven decimal places", while the pattern \-?[0-9] * (\.[0-9]{1,7})? (a regular expression conforming, as required by the Frictionless Data specification, to the XML Schema syntax; Biron and Malhotra, 2004) makes possible an automated test that this is indeed the case.
Finally, relations within and between tables are defined following relational database nomenclature. A unique key is a field (or set of fields) whose values must be unique for each row in the table. Unique keys can be stored in other tables (where they are called "foreign keys") to link the tables together. In the example in Fig. 5, each row of table TTT (point measurements) is uniquely identified by the combination of a survey identifier (GlaThiDa_ID), profile identifier (PROFILE_ID), and point identifier (POINT_ID). Each of these point measurements is linked to the corresponding row in table T (survey overviews) by the value of its survey identifier (GlaThiDa_ID), along with the replicated fields for country code (POLITICAL_UNIT) and glacier name (GLACIER_NAME).

Documentation (README.md)
The data package is fully described in datapackage.json, but the JSON format may not be familiar or welcoming to some users. Therefore, we automatically generate a more human-readable version from the contents of datapackage.json. The resulting README.md is a text file structured with Markdown, a widely supported markup language (Gruber, 2004). As a result, it is both easy to read as plain text and readily converted to other formats such as HTML (Hypertext Markup Language) or PDF (Portable Document Format). The choice of file formats increases user access while their shared JSON origin eliminates the risks and overhead associated with manually maintaining multiple files.

History (CHANGELOG.md)
All notable changes made to the data or metadata are recorded in a chronological list formatted in Markdown, CHANGELOG.md. This includes the update or removal of existing data records, additions of new data records, and changes to the file structure or data schema. The goal is to provide a variety of user groups with important information about the history of the dataset. Future maintainers can review past changes, developers can evaluate whether and how to update their processing chain based on structural changes, and users can discover what data have been added or updated since the last version.

Product development cycle
The Glacier Thickness Database (GlaThiDa) is a community effort that grows as more data are collected. For an evolving dataset like ours, the ability to revise and review collaboratively -to track changes and share those changes with others -is of great benefit to the communities that contribute to, maintain, and use the data. The development environment should therefore support the following activities: receive, review, and discuss issues with the dataset from and with the community; automatically track all changes made to the dataset by a distributed team of contributors; continuously validate the dataset as changes are made; release new versions on a rolling basis, to be archived -with a unique DOI (digital object identifier) for distribution, citation, and safekeeping in a scientific data repository (Paskin, 2005).

Tooling
To achieve the goals listed above, we have adopted tools, widely used for open-source software development, for open-data development. In our case, the dataset is stored as a file repository managed by the distributed version control system git and hosted on GitLab (https://gitlab.com, last access: 9 November 2020), an open-source equivalent of GitHub (https://github.com, last access: 9 November 2020), the popular online platform for collaborative software development. The underlying git software tracks changes (or "commits"), while GitLab provides interactive tools for garnering and managing input from the community. "Issues" -  which can be posted by anyone with a free account -track bug reports, feature requests, and other community dialogue. "Releases" tag a snapshot of the dataset, at any stage in the development cycle, as a numbered version. These snapshots can then be assigned a DOI and placed in a scientific data repository for citing and safekeeping. Version control systems like git are line based; that is, they track changes to text files on a line-by-line basis. Storing all data and metadata as text files, rather than binary files, allows us to automatically track all changes to the dataset. When fixes are made to existing data records, the change consists of the updated lines. When new records are added, the change consists of the appended lines. In this way, we avoid making a new copy of a file each time a change is made. Only versions published for download by users are compressed to a binary format to reduce bandwidth.

Versioning
The project follows the Semantic Versioning Specification (Preston-Werner, 2013) for software, adapted for data. Given a version number major.minor.patch, the major version is incremented for new data, the minor version is incre-  mented for changes to existing data, and the patch version is incremented for changes to metadata only.
Note that our versioning scheme does not communicate compatibility with downstream software dependencies, which is the primary purpose of semantic versioning. A proposed software-oriented alternative  is to increment the major version for incompatible changes (e.g., field removed, field constraint made more restrictive), the minor version for backwards-compatible changes (e.g., data added, field constraint made less restrictive), and the patch version for backwards-compatible fixes (e.g., fix data errors, update field description). However, we believe our data-oriented versioning is better aligned with our users and contributors, who are primarily concerned with the addition of new data following each call for submissions.

Schema validation
A major benefit of describing the data with machine-readable metadata (i.e., datapackage.json) is the ability to automatically validate the data against this description. This includes relations between tables, uniqueness within tables, and whether field values match field types and constraints: for example, whether dates match the expected format (YYYYMMDD) and latitude and longitude are numbers within the allowed limits ([−90, 90] and [−180, 180]). Furthermore, by using a standard format for the metadata, we can automatically validate the metadata itself against this standard.
That said, all metadata standards have their limits; they cannot express all the possible constraints we may wish to impose on our data. In addition to tests of single fields, validation of GlaThiDa includes tests across multiple fields -for   Fig. 6 as an example.
Continuous integration (CI) is a standard software development practice wherein changes to the code are verified by automated tests to detect and fix issues as quickly as possible (Fowler, 2006). In our case, we use CI pipelines integrated into GitLab to automatically validate data and metadata whenever a change is made to the repository, catching issues early in the development cycle and, crucially, before the next release.

Spatial and temporal coverage
GlaThiDa v3 is the most comprehensive public database of glacier thickness measurements to date. We have added 3 million new thickness measurements relative to GlaThiDa v2, released in 2016 ( Table 2). The new data, submitted by researchers or imported from the IceBridge data portal (https: //nsidc.org/data/icebridge, last access: 28 August 2019), include glaciers in Antarctica, Alaska, Canada, China, Greenland, Kazakhstan, Norway, Svalbard, Switzerland, and Tanzania.

Spatial coverage
To evaluate the spatial coverage of GlaThiDa with respect to the world's roughly 217 000 glaciers (RGI Consortium, 2017), we assigned each survey to glaciers in the Randolph Glacier Inventory (RGI 6.0) by intersecting point measurements and nominal glacier centerpoints with RGI glacier outlines. The result is that 3054 RGI glaciers have thickness measurements in GlaThiDa (a large increase from 1133 RGI glaciers in version 2). Out of 5141 glacier surveys, only 11 (0.2 %) do not fall within an RGI glacier outline. Of these, most are for glaciers not included in RGI 6.0 -specifically, very small glaciers in the European Alps (Blauschnee and Glacier de Tsarmine, Switzerland; Schwarzmilzferner, Austria) and glaciers that may be considered part of the Antarctic Ice Sheet (Lambert Glacier, Starbuck Glacier, and Scharffenbergbotnen). The remaining do not intersect an RGI outline because either the corresponding RGI outline is incorrect (Nördlicher Schneeferner, Germany) or the glacier has retreated since the survey was conducted (Columbia Glacier, Alaska). Figure 7 shows the coverage of intersected RGI glaciers on a world map. Table 3 lists the number and total area of intersected RGI glaciers by glacier region (GTN-G, 2017). While the proportion of intersected glaciers in a region is at most 14 % (region 7: Svalbard and Jan Mayen), the proportional area of intersected glaciers is much higher, up to 77 % (again, for Svalbard and Jan Mayen) -a result of larger glaciers being preferentially selected for measurement. The coverage in Svalbard is so high in v3 thanks to a recent regional compilation of available measurements (Martín-Español et al., 2015;Fürst et al., 2018b). The regions with the next best area coverage are those with substantial contributions from NASA Operation IceBridge: Arctic Canada, Antarctica, and Greenland. Despite these advances, large gaps persist in GlaThiDa, especially throughout Asia, the Russian Arctic, and the Andes. Poor coverage in these regions necessarily limits the quality of local and global glacier volume assessments and predictions of future change. Future efforts should be aimed at increasing spatial coverage and regional representation, both by performing new measurements and by conducting literature surveys and calls for data in underrepresented languages and regions of the world.
Overall, RGI glaciers with at least one thickness measurement account for 40 % (299 141 km 2 ) of the global RGI area of 746 092 km 2 (RGI Consortium, 2017). A better measure of data coverage is the area of a glacier that is within a certain distance of a thickness point measurement located on the same glacier. Globally, 36 % of the area of all surveyed glaciers (102 030 km 2 ), and 14 % of global glacier area, is within 1 km of a measurement. Although this represents a significant improvement over GlaThiDa v2 (6 %), this nevertheless means that the thickness of the vast majority of global glacier area must still be estimated through extrapolation, scaling methods (reviewed in Bahr et al., 2015), or models (reviewed in Farinotti et al., 2017).
The spatial coverage of point thickness measurements varies greatly by glacier. While 14 % of glaciers in GlaThiDa with point measurements have more than 100 point measurements km −2 on average, 50 % have fewer than 18 points (Fig. 8). Although measurements are often sparse with respect to total glacier area, measurements tend to be well distributed across glacier surface elevations, since measurements are often collected along longitudinal profiles. Dividing each glacier into 100 m elevation bands (calculated from the surface elevations of point measurements in GlaThiDa and the minimum and maximum glacier surface elevations in RGI), 50 % of glaciers with point measurements have measurements in at least half of their elevation bands and 11 % have measurements in all of their elevation bands.   Table 3. GlaThiDa coverage for each glacier region mapped in Fig. 7. The count and total area (km 2 ) of RGI glacier outlines with at least one (point, elevation band, or glacier-wide) thickness are listed for GlaThiDa v2 (2016) and v3 (2019)  Glaciers with few measurements but well distributed along their length can still be very useful for validating modeled ice thicknesses (Castellani, 2019), which are often computed along longitudinal ice-flow lines (reviewed in Farinotti et al., 2017).

Temporal coverage
The glacier thickness surveys in GlaThiDa span the years 1935-2018 (Table 1). This wide range of survey dates enables glaciers with repeat surveys (such as the example in Fig. 9) to be compared over time. However, it also complicates regional and global studies, which must account for thickness measurements spanning multiple years. Ideally, modeled ice thicknesses are evaluated against measured ice thicknesses coincident in time with the measured glacier outlines, surface elevations, and other time-varying parameters (e.g., surface mass balance, rates of ice thickness change, and surface velocities) used to initialize the model (Farinotti et al., 2017). For analysis spanning many glaciers (or all of the world's glaciers), it is not possible to ensure that all these data are coincident in time. For example, the median survey year for RGI glacier outlines is 2002, a decade earlier than the 2012 median for GlaThiDa surveys (Fig. 10), and the offset between surveys in GlaThiDa and their spatially coincident RGI outlines is 11-17 years (interquartile range). As Figure 7. Map comparing GlaThiDa coverage to global glacier coverage according to the Randolph Glacier Inventory (RGI 6.0). Each grid cell represents 78.7 km × 78.7 km (roughly 1 • × 1 • ) in a cylindrical equal-area map projection. Light blue cells contain GlaThiDa data from IceBridge, while the overlaying dark blue pixels contain GlaThiDa data from other sources. Numbered grey polygons correspond to the glacier regions (GTN-G, 2017) listed in Table 3. Made with Natural Earth country polygons (https://www.naturalearthdata.com, last access: 23 October 2020).
for surface elevations, the majority (88 %) of point thickness measurements in GlaThiDa include corresponding surface elevations, 85 % of which were measured the same year as the ice thickness. When available, temporally coincident ice thickness and surface elevation measurements can be used to calculate the elevation of the glacier bed (independent of the survey date over decades to centuries) and thus the ice thickness relative to a glacier surface surveyed at any other time.

Future additions
Our intention is for GlaThiDa to continue to grow and improve as errors are found and fixed, new measure-ments are made, and more data are found or submitted. Several datasets already published in open-data portals (e.g., https://data.npolar.no, last access: 25 March 2020; https://pangaea.de, last access: 25 March 2020; https:// arcticdata.io, last access: 25 March 2020; https://nsidc.org, last access: 25 March 2020, and https://www.usap-dc.org, last access: 25 March 2020) are slated for inclusion in a future version. However, these account for only a small number of the glacier thickness measurements still missing in GlaThiDa. For example, 460 glacier surveys pulled from the literature for v1 (Gärtner-Roer et al., 2014) are still missing the original point measurements from which the reported glacier-wide estimates were derived. An assessment of the Arctic data in GlaThiDa v2 by the Integrated Arctic Obser-   Dowdeswell et al. (1984Dowdeswell et al. ( , 1986, Björnsson et al. (1996), Lindbäck et al. (2018), Kristensen et al. (2008), and Paden et al. (2018). vation System (INTAROS, 2018) identified missing observations and concluded that pressure must continue to be placed on research groups to submit their data to the wider community. Ideally, all ice thickness surveys would be published in open-data portals and then added to GlaThiDa for complete coverage in a standard format.
To streamline data aggregation going forward, GlaThiDa may need to be restructured such that data are primarily organized by campaign or dataset rather than by glacier. The data tables were originally designed to accommodate mean glacier thicknesses pulled from the literature (Gärtner-Roer et al., 2014). Thus, each survey (i.e., each entry in table T, and thus each GlaThiDa_ID) is expected to contain measurements gathered on one visit to one glacier -even though each associated point (i.e., each entry in TTT) is also encoded with temporal and spatial coordinates. This data model complicates the addition of large campaigns and introduces confusing redundancy to the database. For example, the six datasets from Operation IceBridge had to be split across 4124 glacier surveys by date and by intersection with RGI glacier outlines.
Operation IceBridge, the source of 61 % of the thickness point measurements in GlaThiDa, is ending operation in 2020. The airborne mission was designed to avoid a gap in measurements between the ICESat satellite (2003)(2004)(2005)(2006)(2007)(2008)(2009) and its successor ICESat-2, which was launched in 2018. However, the ICESat satellites only measure surface elevation and not ice thickness, therefore ending a decade-long ice thickness campaign. In the absence of a successor to Operation IceBridge, future updates to GlaThiDa may not include as many new measurements as the latest version. However, since RGI 6.0 does not include glaciers on the Antarctic Peninsula mainland (Huber et al., 2017) or in the Mc-Murdo Dry Valleys (Frank Paul, personal communication, 2020), IceBridge data for those glaciers were not included in GlaThiDa v3; these remaining IceBridge data will be added in a future version.

Thickness uncertainties
The uncertainty of a glacier thickness measurement varies widely with the method used, the characteristics of the site, and the interpretation of the raw data (reviewed in Gärtner-Roer et al., 2014, Sect. 3.2). For example, sources of error for radar measurements (reviewed in Lapazaran et al., 2016a, Sect. 3.1) include the radio-wave velocity, the timing of the reflection, and migration (inverting for the reflection surface immediately below, rather than to the side), which can fail in or near steep terrain (Welch et al., 1998). Errors in the measurement position (e.g., due to the accuracy, placement, and movement of the GPS receiver) also translate to thickness errors proportional to the local thickness gradient (reviewed in Lapazaran et al., 2016a, Sect. 3.2). This is a larger issue for older data, especially those transformed from poorly defined coordinate systems or digitized from printed maps (e.g., Andreassen et al., 2015).
The uncertainty of a spatially averaged thickness further varies with the adequacy of the interpolation (and extrapolation), the assumed glacier boundary and, most importantly, the spatial coverage of the measurements (reviewed in Lapazaran et al., 2016b). Thickness measurements are typically acquired along sparse profiles, with coverage biased towards gentler terrain. Rarely, if ever, do they approximate a dense grid blanketing the whole glacier. From the law of error propagation, we would expect measurement errors to be smaller for spatial means. In practice, however, the opposite is more commonly the case, since measurement errors are often partly spatially dependent (Martín-Español et al., 2016), rather than truly random, and point coverage is often far from ideal.
A fraction of glacier thicknesses in GlaThiDa were published with uncertainty estimates: 26 % of mean glacier thicknesses in table T (drawn from about 35 studies, based on the listed references), 19 % of mean elevation band thicknesses in table TT (drawn from 4 studies), and 40 % of point thicknesses in table TTT (drawn from 51 studies). By computing percent uncertainties (100 %×uncertainty/thickness), we can compare the distribution of uncertainties by thickness type. We find that the reported uncertainties are significantly lower for point thicknesses than for the spatial meansan interquartile range of 3.1 %-5.5 % of the measured value for points versus 9.9 %-22.8 % and 20 %-50 % for glacier and elevation band means, respectively. The uncertainties reported by these studies may or may not be realistic. Nevertheless, the relatively high uncertainties reported for spatial means clearly indicate that, for these studies, interpolation errors outweighed any benefit gained from averaging the spatially independent errors in the point measurements.
In practice, the statistical definition of the uncertainties reported in GlaThiDa is likely to vary considerably. For example, many studies do not specify whether or not a reported error is the standard deviation. Others may not necessarily provide a full error estimate but rather the "resolution of the measurements", as in the case of some IceBridge datasets (MCoRDS, HiCARS, and PARIS). As pointed out by Martín-Español et al. (2016), based in part on two studies included in GlaThiDa (Pettersson et al., 2011;Saintenoy et al., 2013), errors for spatial means (e) in the literature can vary by orders of magnitude between two extreme assumptions: (underes-timate) local errors (e i ) are spatially independent, such that e =ē i / √ n (whereē i is the mean of the local errors and n is their number), or (overestimate) local errors are linearly dependent, such that e =ē i (i.e., the mean of the local errors is taken as the error of the mean). As a consequence, we intend to tighten reporting requirements and flag statistically nonconforming uncertainties in future versions of GlaThiDa.

Error detection
As recorded in the changelog, we fixed a large number of errors introduced in previous versions and in the initial compilation of the current version. Many were trivial to fix once discovered; others required reviewing published literature and datasets, checking original submissions, and, if necessary, corresponding with the data provider. Most errors were detected by the suite of automated validation tests described in Sect. 2.3.3. Checks of field-level constraints identified missing values in required fields, duplicate values in unique fields, out-of-range values in numeric fields, invalid characters in text fields, invalid values in enumerable fields, and future or nonexistent dates in date fields. Checks of tablelevel constraints identified duplicate values in unique keys (e.g., duplicate combinations of survey and point identifiers in TTT) and missing values in foreign keys (e.g., survey identifiers in TT and TTT missing in T). More complex tests identified missing values that were required by logical implication (e.g., thickness missing when thickness uncertainty provided), values that were invalid by logical implication (e.g., glacier identifier not present in the glacier database to which it refers), and values that were invalid by spatial implication (e.g., glacier coordinates outside assigned country or far from associated point measurements). Additional tests will necessarily need to be added proactively as the data evolve and retroactively as unforeseen errors are introduced and later discovered.

Data storage and version control
Every change, addition, and subtraction made to a file in the data package is tracked by git. As the number of changes grows over time, the git repository will inevitably grow in size. Cloud-storage hosts place storage limits on free accounts: GitLab.com limits repositories to 10 GB compressed; GitHub.com limits repositories to 100 GB and each file to 100 MB uncompressed. To keep repositories small in the presence of large files, git-lfs (Git Large File Storage) was developed to track files in the repository while storing them externally (Carlson and Schneider, 2019). However, whether the file is binary or text, a new copy of the file is made for each change -no matter how small the change. By storing our data as text files in the repository, changes are stored incrementally, which imparts significant storage benefits. At the time of writing, the repository is only 47.9 MB compressed despite 121 changes, including 8 versions of TTT.csv (294 MB uncompressed,42.3 MB compressed). Reaching the 10 GB storage limit on GitLab.com would require adding (or changing) roughly 1 billion point measurements. If this limit were ever reached, it could be lifted by migrating to a self-hosted GitLab installation.
Nevertheless, line-based version control systems like git are not optimized for tracking changes to tabular data. A change to a single cell is recorded as a change to the entire row, and swapping the order of two columns is recorded as a change of every row in the table (Fitzpatrick, 2013). Changes to tabular data can be described more compactly (and legibly) using specialized syntax (e.g., Tabular Diff Specification; Fitzpatrick, 2014), but these impart no storage benefit unless the underlying version control systems were rewritten to use them to store changes internally. Alternatively, many changes can be described as operations rather than as changes to file content, such as a log of Structured Query Language (SQL) commands to a relational database or the change history tracked by OpenRefine (Hirst, 2013). However, these would require specific software and a strict (and nonstandard) workflow to make changes to the data.

Conclusions
The Glacier Thickness Database (GlaThiDa) has been established as the international data repository for glacier ice thickness observations. Version 3 contains standardized data for roughly 3000 glaciers worldwide, collected from in situ and airborne measurements. Overall, 14 % of global glacier area is now within 1 km of a thickness measurement (located on the same glacier), although large regional gaps persist, especially in Asia, the Russian Arctic, and the Andes. Thanks to simple metadata formats and a development environment based on open-source software, GlaThiDa fulfills the FAIR principles (Findable, Accessible, Interoperable and Reusable for both machines and people) and surpasses them with auto-matic version control, continuous validation, and an interface for community dialogue. Hosted by the World Glacier Monitoring Service (WGMS), GlaThiDa will continue to serve the glaciological community as a trustworthy dataset.