the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Global Water Vapor Stable Isotope Dataset
Abstract. Stable isotopes in atmospheric water vapor (reported as H and O relative to VSMOW) provide valuable constraints on moisture sources, transport, and phase-change fractionation. Yet available observations remain fragmented across platforms, regions, and time periods, and cross-study comparison is often hindered by inconsistent metadata, calibration reporting, and quality-control practices. Here we compile and harmonize a global near-surface water vapor isotope dataset from three sources: the WaterIsotopes Database (wiDB; http://wateriso.utah.edu/waterisotopes), PANGAEA (https://www.pangaea.de), and peer-reviewed literature. The dataset spans 1981–2021 and contains 87,138 records from 112 sites/platforms. We standardized coordinates to WGS84, timestamps to UTC when possible, isotope units to per mil (‰) in delta notation, and compiled measurement metadata (instrument, method, and model where explicitly reported; e.g., Picarro CRDS, LGR OA-COS, IRMS following cryogenic trapping, and satellite retrieval products). A transparent quality-control workflow was applied to identify duplicates, inconsistent metadata, and implausible or poorly documented values, while preserving traceability to original sources. The resulting product provides a consistent observational basis for model evaluation and for comparative studies of water vapor isotope variability across climates and observation strategies. The Global Water Vapor Stable Isotope Dataset is available at https://doi.org/10.6084/m9.figshare.30893984 (Zhu and Yang, 2025).
- Preprint
(2008 KB) - Metadata XML
-
Supplement
(84 KB) - BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on essd-2025-805', Anonymous Referee #1, 13 Apr 2026
-
RC2: 'Comment on essd-2025-805', Anonymous Referee #2, 23 Apr 2026
This paper presents a database of the isotopic composition of near-surface water vapor. The initiative to create such a consistent and global database is very welcome.
I have identified significant and obvious errors (for an isotope specialist) in the database. I am unable to check the accuracy of all the isotopic data reported in the database. It is absolutely essential that the authors review the database and provide more details about their methodology and work to ensure its accuracy. I guess that the use of a software designed to automatically read figures in published studies can explain many of the errors in the sheet named “References data” in the Excel file.
In addition, I am concerned by the fact that the authors did not perform any quality controls on the isotopic data they reported in the database that is actually a simple data compilation (with wrong values in this current form) with usual metadata.
Everything is explained and detailed in my review below. The later stops at section 3. I do not think there is any point in me going any further with my review until the authors have checked the accuracy of the database.
As a consequence, I do not recommend the publication of such a database in the current form. I have also important major, moderate and minor comments that should be addressed in a revised version of the manuscript.
Detecting errors in the database:
I was surprised by the outlier in the deuterium excess value shown in the lower panel of Figure 4 in Sahel region. Indeed, the red circle corresponds to a value of deuterium excess higher than 90 per mil. Anyone experienced in analyzing near-surface water vapor isotope data is immediately drawn to this very high value.
I downloaded the excel file available at https://doi.org/10.6084/m9.figshare.30893984.
I looked for that station in the database. I found the corresponding latitude and longitude in the “References data” sheet of the Excel file. It is the “Niamey” station listed in rows 4341 through 4399.
First, I did not understand why several isotopic values for this station appeared in the database with similar dates (in the DD/MM/YY format) (3 data for 01/07/10, 17 data for 01/09/10, 8 data for 01/08/11 and so on). I noticed that many stations have this same significant problem in this sheet (Concordia station, EastGRIP ice core, Taipei, Liaodong Bay and so on). I also found values of the isotopic composition in deuterium of water vapor in the “delta_D” column for the Niamey station very uncommon while oxygen-18 values stand in the expected range.
In any case, I calculated deuterium excess based on the deuterium and oxygen-18 values in this Excel file and indeed found values higher than 90 per mil for deuterium excess. So I went and read the referenced article: https://doi.org/10.1002/2013JD020968, hereafter Tremoy et al., 2014.
First, I was unable to find deuterium values in water vapor in Tremoy et al., 2014. I assumed the authors used both oxygen-18 and deuterium excess values in water vapor to derive the deuterium values in water vapor. Why not? But the authors should say.
Second, I noticed that deuterium excess values in water vapor reported by Tremoy et al.,2014 began in mid-2011 and vary between -10 and 25 per mil (Figure 1 in Tremoy et al., 2014). I therefore wondered how it was possible that the database contained deuterium values for the period from July 2010 to August 2011, between lines 4341 and 4360, as well as so high deuterium excess values.
My feeling is that the authors likely used software designed to automatically read figures such as Figure 1. This software is probably not able of distinguishing between deuterium values (labeled “D”) and deuterium excess values (labeled “d”). The software likely also failed to detect the difference between the isotopic composition of rain and that of water vapor in Figure 1 of Tremoy et al., 2014. Consequently, in the database, I assume that the “delta_D” column contains not the deuterium isotopic composition of water vapor but the values for deuterium excess in rain.
I also guess that the software misunderstood the labels on the x-axis of Figure 1 in Tremoy et al., 2014, which would explain why several isotopic values are reported with similar dates.
I then wondered what oxygen-18 data had been included in the database. Indeed, Figure 1 in Tremoy et al., 2014 presents data averaged over 15 minutes and 24-hour moving averages. How did the software interpret these values? And so, what do the oxygen-18 values in the database correspond to?
I give this example to help the authors identify errors. I am waiting for the authors to explain in their response how they corrected the error I found at one station—an error that most likely occurred at other stations as well. Indeed, the authors must be very specific in their response and in the revised version of the manuscript regarding the method used to extract data from published figures. We need to have guarantees on the database accuracy.
Major comments:
(1) The database presented here is by no means complete. This shortcoming makes the database presented here interesting but less useful than it could be. I recommend expanding the database in two ways:
(i) The authors set 2021 as the end date in the database. This choice seems arbitrary and also quite curious. Indeed, the rise of near-surface isotopic composition of water vapor data is a recent phenomenon, particularly the use of the WS-CRDS technique. By making this choice, the authors have excluded a large number of studies published over the past five years.
(ii) It is difficult to understand why only the PANGAEA repository is considered. For example, the Zenodo repository is increasingly used by authors and should be included here. I recommend expanding the data search to include the most common repositories.
(2) After reading section 2-2 (and section 2-3), I understand that the database is actually a compilation of existing data, and that the authors did not perform any quality control on the isotopic measurements. For each measurement method, the authors mention important caveats and possible sources of uncertainty (also in Section 3-3). The authors should go further by proposing a data quality control process leading either to the exclusion of certain studies or to a ranking system using flags. The authors caution database users and encourage them to perform to carry out this quality work (sections 2-2-4 and 3-3). However, it seems to me that the value of a database lies not only in the compilation of data but also in providing data whose quality has been assessed and quantified.
To illustrate my point, I refer for example to Section 2-2-1. The authors mention uncertainties that could be related to humidity dependence of isotopic measurements and instrument drift. This is absolutely correct: these issues need to be addressed and corrected which is done in most of studies. If this is not the case, it is the responsibility of the authors of this manuscript to mention it or exclude these data. The same applies for studies that do not use appropriate tubes and do not heat them to eliminate possible condensation: the authors of the database are required to cite or exclude this data. At last, I do not understand what the authors mean by “calibration-scale transfer to VSMOW” as a potential source of uncertainty.
(3) There is no sense to give isotopic values in the database with so many numbers after coma given the uncertainties of measurements.
Moderate comments:
(1) Much of the introduction promotes water stable isotopes as a useful tool for studying various issues related to atmospheric water cycle at the local, regional, or global scale. This is certainly true and, as the authors note, further progress is needed to fully understand processes controlling the isotopic composition of water vapor. Certainly, a consistent and comprehensive database will be very useful and necessary for making progress, but it will not be sufficient on its own to identify and distinguish all the processes altering the isotopic composition of atmospheric water given the increasing advances made over the last decade on this subject (see for example the recent paper by Bailey et al. (2025), DOI 10.1088/2752-5295/ada17b). In addition, the objective 2 stated at the end of the introduction is not met in the paper. Section 3-2 serves basically to examine the consistency of the database from what we expect according to previous studies in terms of temporal and spatial variations of the isotopic composition of water vapor, not to identify processes.
Thus, the introduction needs to be reworded accordingly starting from line 71.
(2) Section 3-1 is not very useful in its current form. The geographic distribution of the data is shown in Figure 1, and it is not particularly interesting to look at the data distribution among the three sources on Figure 3b (it is sufficient to mention this in the text). However, some information is missing in this section. The authors should mention the temporal resolution (hourly? daily? other?) of the 87,138 measurements. Information on the length of time series is also missing and could be presented with a histogram on a new Figure 3b.
(3) Section 3-3 should move in the method section. I do not think that the entire session is useful. Lines 310-312: it is difficult to speak about a “unified calibration” for the database which is actually a compilation of existing data, with the only interest of gather all data in a unique Excel file with usual metadata.
(4) I recommend citing previous work carefully: several important and recent references are missing, and some are irrelevant (I provide a few examples below).
(5) Important work is needed to clarify figures and their captions.
- Figures 1 and 2 are not cited in the text.
- Figure 1: Please label the 3 categories of data as in the text: wiBD, literature and PANGAE.
- Figure 2: The data process shown here is difficult to link with the text of section 2. For example, how do the authors answer to the step “Are the data correct?”
- Figure 3:
(i) Usually, time is flying from left to right
(ii) Figure 3b is not useful, see comment above.
- Figures 4 and 5 have to be redone after check of database.
Figure 4:
(i) Please mention in the caption the lower panel relative to deuterium excess.
(ii) Please clarify in the caption how each circle is obtained. Is it an annual average calculated over the duration of each time series, taking into account only full years? How can you eliminate the potential long-term trend due to climate change on isotopic records when comparing older records (e.g., 1981–1983) with recent records (e.g., 2019–2021)?
(iii) Please increase the font size of the color bar and delimit the later with a lowest and a highest value instead of a threshold (instead of > or <).
Figure 5 :
(i) Please mention the number of data used for each boxplot in the caption or on the figure.
(ii) How did the authors define the seasons? It is better to mention the months such as for example DJF or JJA.
(iii) How did the authors define “tropical”, “arid”, “temperate”, “continental” and “polar” regions? Please clarify in the caption.
(iv) Please remind in the caption statistical values that are shown on boxplots.
(v) Please add deuterium excess on the figure.
Minor comments line by line up to section 3-2:
- In several places, the semicolon should be replaced with a period.
- Abstract, line 25: Please mention the data frequency.
- Lines 52-55: You should also mention the recent overview by Bailey et al. (2025), DOI 10.1088/2752-5295/ada17b. In general, some important and recent references on water vapor stable isotopes are missing.
- Line 60: You should cite the recent work by Bong et al. (2025), DOI 10.1029/2025JD044985.
- Lines 60-63: Probably right but very subjective and vague. In addition, the reference is not appropriate because it refers to CO2 isotopes.
- Line 67: What do you mean by “re-evaporation”? Please use “evaporation of”.
- Lines 71-75: Please reformulate. A comprehensive database alone will not be sufficient to separate all effects that can change the isotopic composition of atmospheric water. Indeed, a consistent database will help but a hierarchy of atmospheric models will also be necessary.
- Lines 75-77: Unclear, please reformulate.
- Lines 78-80: Processes leading to high values of both the isotopic composition and the deuterium excess of water vapor (and consequently of rain) in arid low-latitude regions, as well as to lower values at high latitudes, are fairly well understood. Please reformulate.
- Line 105: What do you mean by “high-resolution”?
- Line 116: Please discuss the temporal resolution of the data set (hourly? daily?).
- Line 214: Please clarify and quantify what you mean by “implausible values”. For example, a deuterium excess value higher that 90 per mil in near-surface water vapor in Sahel should have been identified with your method.
- Lines 246-247: Unclear. This is relevant to Section 2.
- Lines 246-250: I do not understand. Figure 4 shows the annual average of isotopic composition at each site, doesn’t it? (see also my comment about Figure 4 above).
Citation: https://doi.org/10.5194/essd-2025-805-RC2
Data sets
Global Water Vapor Stable Isotope Dataset Dongfei Yang et al. https://doi.org/10.6084/m9.figshare.30893984
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 272 | 103 | 28 | 403 | 54 | 20 | 29 |
- HTML: 272
- PDF: 103
- XML: 28
- Total: 403
- Supplement: 54
- BibTeX: 20
- EndNote: 29
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Review of "Global Water Vapor Stable Isotope Dataset" Yang et al., submitted to ESSD
The authors describe a dataset of water vapour isotope measurements that they claim has been compiled from existing literature and published datasets. A brief check of the the data therein, including publications to which this reviewer contributed, shows that for example, precipitation data is being misrepresented as vapour data, and that also other essential metadata has been lost during the compilation process. While I think it is in general useful to work on homogenising and compiling existing datasets, the present study lacks scientific rigour, and even basic quality control to an extent that in my opinion does not meet the requirements for the study to be publishable in ESSD. The accompanying dataset should be retracted or at least revised to exclude all erroneous data points. My detailed comments are included below.
In the introduction, the authors state that they provide compiled water vapour isotope data, but their dataset, apparently without noticing, lumps together precipitation and vapour measurements. Due to isotope fractionation between vapour and liquid, this error is so severe that the dataset becomes useless when the two data categories are mixed together, and will even lead to wrong conclusions if used by others. Examples from their dataset include, but are not limited to data points described in Chazette et al., (2021), as well as from the dataset described in the publication by Seidl et al. (2024).
The stable isotope values in the data file are often given with 5 or more digits, far beyond any reasonable measurement precision, in particular for the delta D where it is common to report with 1 digit precision. It is unclear how these data points were derived, but they are not provided in the referenced publications. Possibly the data points were digitised from publication figures? If so, this must be specified in the methods, but see below.
It is never explained how the data from references have been obtained exactly, and from point (2) above, it seems that they have been digitised from figures in existing publications. This reviewer finds it worrying that the authors did not simply contact the corresponding authors of the studies they wanted to compile the data from, which should be the first step before possibly considering to digitise and re-publishing data from published papers. When data has indeed been digitised from existing publications that results in a different precision, questions of copyright compliance and data validity need to be addressed.
Traceability and crediting of scientific works: The authors claim to have used very many sites ("and 4,421 literature-derived records from 85 observation sites") to source their data, but the data set only contains data from 31 papers. It should be no problem space-wise and would do due credit to explicitly cite all these studies in a summary table in the main manuscript, for example by listing the time range and number of samples of each paper. The website waterisotope.org is just a portal, so here the authors still should cite the original sources of the studies that produced the data.
The data file that the authors compiled lacks essential metadata. Only a single time is given per data point, so it is unknown what time interval this data should represent. The description in the paper is inconsistent with what is provided in the data file. The authors state that they computed the d-excess, but this information is not included in the data tables. Also information about instrument type and method are not included in the data tables, contrary to what is said in the manuscript.
This reviewer notes that the set-up and figures in this paper are very similar to a study on surface water samples that has been published last year in ESSD (Li et al., 2025). The similarity to that study would need to be pointed out and addressed in the manuscript.
Fig. 3a has a reversed time axis.
The table in the appendix contains the same errors as in the data set regarding the mix-up between vapour and precipitation data noted above.
Figure 2 contains a "Calibration and consistency" step - it is not explained what would be done here. All published data that is referenced were already calibrated, why should this happen at the end of merging the datasets?
Since the dataset lumps together vapour and precipitation data, Fig. 4 and Fig. 5 are meaningless. Fig. 4c has a suspicious high d-excess at geographical coordinate 0ºN, 0ºE which is possibly an artefact from missing geographic coordinates.
References
Li, R., Zhu, G., Chen, L., Qi, X., Lu, S., Meng, G., Wang, Y., Li, W., Zheng, Z., Yang, J., and Gun, Y.: Global Stable Isotope Dataset for Surface Water, Earth Syst. Sci. Data, 17, 2135–2145, https://doi.org/10.5194/essd-17-2135-2025, 2025.