the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A comprehensive 22-year global GNSS climate data record from 5085 stations
Abstract. This work presents a comprehensive global GNSS climate data record derived from 5085 stations, spanning a 22-year time period 2000–2021. Generated through the GPAC-Repro campaign, the dataset utilises state-of-the-art processing methodologies and precise products from the International GNSS Service (IGS) Repro-3 initiative. The dataset includes high-quality hourly estimates of Zenith Total Delay (ZTD) and Precipitable Water Vapour (PWV), offering improved accuracy and spatiotemporal coverage. A rigorous data screening and quality assessment framework was implemented, including formal error detection, offset identification, and extensive cross-validation with ERA5 reanalysis dataset, radiosonde profiles, and Very Long Baseline Interferometry (VLBI) measurements. Collectively, these efforts ensured the consistency, accuracy, and homogeneity of the dataset. In addition, diurnal, monthly, and annual variations in ZTD and PWV have been analysed to evaluate and demonstrate its feasibility for monitoring climate variability, atmospheric circulation, and weather extremes. The insights provided by the dataset address critical data gaps in global climate observing systems and provide a robust foundation for advancing climate research and applications. Representing a significant milestone in GNSS climatology, this dataset serves as a vital resource for the scientific community, supporting improved understanding of atmospheric processes and more effective responses to climate-related challenges.
- Preprint
(11759 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 25 Aug 2025)
-
RC1: 'Comment on essd-2025-283', Anonymous Referee #1, 25 Jul 2025
reply
General Comments
Overall I find that the dataset is of general interest to the community and should be published.
The first four sections are of the highest priority, whereas the section on "Further Analysis" more describes applications for future users of the dataset. Therefore, my attention is focused on the first sections to have a clear and understandable description in order to increase the possibility of many users of (and citations to) the dataset.
I have a small problem with the title because it gave me the impression that all 5085 stations have 22 year long time series. In fact the majority of the stations have time series that are significantly shorter. What about something like: " A global GPS climate data record for 5085 stations covering up to 22 years"?
Early in the manuscript you make clear that only GPS data are used. Later it is however very often you refer to GNSS data. I think it is okey only when you refer to GNSS data in general terms, not when you discuss your dataset. Furthermore, in the summary section you may have some ideas to discuss the future use of GNSS (rather than just GPS), pros, and cons?
Specific commentsline 79: You state that a shortcoming of previous studies is that the length of the time series is only around 10 years. This is certainly true and climate scientists often use 30 year averages which means that the time series in your dataset suffer from the same shortcoming. In fact many of the sites in the dataset have time series of about 10 years or less. I think it is fair to state this. The decision to use 30-year averaging periods was taken by IMO (now WMO) at a congress in 1935. Also at line 106 you give the false impression that all time series are 22 years.
line 89: later in the manuscript you also mention changes of hardware, such as antennas and radomes. It can be meaningful to mention these also here.
line 127: "Following a rigorous data screening process, 95 sites were excluded due to identified issues with the atmospheric results, leading to a final dataset comprising 5085 GNSS stations." I will be helpful to list these 95 sites, not in the manuscript but in the data archive, together with the reason for excluding the site. I searched for a specific IGS site, with a long time series, but could not find it among the 5085 sites included. Perhaps it is among the 95 sites? In any case it will be helpful for future users of GNSS PWV to be aware of problem sites.
line 162: Please motivate why you chose such a low cutoff angle. For example, when searching for trends varying systematic errors are very important and should be reduced. Multipath is an effect which get worse closer to the horizon and this is especially true if the horizon mask change over long time.
line 165: "a 27-hour time window was adopted," How was these 27 hours defined? In the paper by Dousa et al. (which you refer to) I do not find this specific value, only that three days were combined and thereafter the atmospheric estimates from the day in the middle were selected.
line 175+: Why did you chose this rather complicated procedure to derive the ZHD. Other studies I have seen use the Saastamoinen model for the gravity with the ground pressure and the site position as input parameters which I have assumed is sufficiently accurate for the ZWD retrieval. Did you examine if there were ay differences that motivate your choice?
Section 3.3: Why do you wait until this stage by removing unrealistic negative values of the PWV. You do not need reference data for this action.
When you remove outliers by comparing with reference data you assume that the reference data are correct. I think that needs to be discussed. From my point of view, one application of GPS/GNSS PWV is to use it as an independent dataset in order to identify problems in other datasets, such as the ERA5.
Table 2: The observing periods of the VLBI stations are much longer. I understand that you do not pick up data before the start of the GPS time series but there should be data after 2018? Please explain.
line 420: How do you define a robust agreement? For which value of the STD is it no longer robust?
Section 4.3. Is not clear to me what was the action taken after finding these changepoints. Did you modify the PWV time series in the data archive, or not? Also in this case, are you sure that some detections of changepoints are not due to problems with ERA5?
A related question is if you searched for changepoints in the ZHD. If such exist they will indirectly cause a corresponding junp/offset in the PWV.
Technical CorrectionsI find that the font size in all figures is unnecessarily small. The size could in general be say 50-100 % larger in order to improve the readability.
line 11: GPAC is not explained
line 34: please explain "data gaps". Are the gaps temporal or spatial or both?
line 53: changes in signals --> changes in the arrival time of the signals
line 57: With used together --> When used together
Figure 2: use more different colours of the different symbols, both i the a and in the b graph. The present version is useless to see the differences, only the distribution of sites on the globe is clear. Also I wonder if "data integrity" is identical to "data completeness" in the summary file downloaded from the archive? If so use the the same expression at both places.
lines 137, 141: Because you use British English spelling for "vapour" it will be consistent to write "colour".
lines 139, 303, 309, 406, 420, 427, 468, 492, 499, 607: there shall be a space between value and unit according to SI standards.
Table 1: If you include the citations in the strategy column you can shorten the running text significantly.
lines 176, 190, 328: Units shall not be in italics
Eq.(4): will be more clear if it is split into two equations. Furthermore, the cosine function shall not be written in italic font.
line 205: improve the contrast between font and background colour.
Figure 4: Also in this figure it is difficult to see the different colours.
line 239: These stations are referred to as MDO1, MDO2, and MDO3 in Figure 5 ( not MGxx). Furthermore, when you show examples in the manuscript I think it will be informative if you added where the stations are located. It will save the work of going into the data archive.
line 292: "with 95 problematic sites" indicates that these are included among the 5085 sites. Please rewrite.
Figure 11: Also this Figure is difficult to get any useful information from Are all these radiosonde stations used? I mean is there a GPS site close enough. Perhaps increase the size of the graph and make the VLBI symbols larger? Or delete the figure?
Figure 14: he quality should be improved so that it is clear where the GPS sites are located. Perhaps these graphs are not needed as well? Everyone interested for sure knows the topography of Hawaii and the Andes.
lines 406-407: Is this true solar time, local time, or UT?
linne 464: You can add "radomes" here.
line 491: Equator --> equator
Figure 18: Improve the contrast between the symbols, e.g., light green for daily values and black for monthly values.
Figures 20 and 21: Colour only areas around the GPS sites, say with a radius of 10-20 km?
line 571: Northern Hemisphere --> northern hemisphere
Citation: https://doi.org/10.5194/essd-2025-283-RC1 -
RC2: 'Comment on essd-2025-283', Anonymous Referee #2, 09 Aug 2025
reply
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2025-283/essd-2025-283-RC2-supplement.pdf
-
CC1: 'Comment on essd-2025-283', Olivier Bock, 13 Aug 2025
reply
Thank you for presenting this new GNSS dataset, which holds potential for climate studies. I found the manuscript well-structured and written, and the methodology sound. However, I would like to share a few comments and questions regarding some of the methodological choices and propose suggestions to further enhance the QC/QA process and overall quality of your dataset.
1) Section 2.2 GNSS data processing: While you mention adhering to the highest standards in your study, it is worth noting that the analysis was conducted using Bernese GNSS Software version 5.2. Since 2022, version 5.4 has been available, introducing several improvements. These include enhanced observation (RINEX) quality control and preprocessing, improved ambiguity resolution for PPP, and updated tropospheric models such as VMF3. Considering these advancements might further strengthen the robustness and quality of your dataset.
2) Section 2.3 Retrieval of PWV: the calculation of ZHD from the numerical integration of refractivity is not recommended (Jones et al., 2020, chap. 5.4.2), especially when only 37 pressure levels are available such as with ERA5. Instead, Saastamoinen formula should be used with surface pressure which can still be computed from ERA5 (with adequate interpolation between levels). This is the approach actually used by Haase et al. (2003) for their GNSS ZTD to IWV conversion. Note that they use only Eq. (2) for the integration of radiosonde profiles, which have many more vertical levels.
3) Section 3.2 Screening based on GNSS-ZTD results only: proper credit should be given to Bock, 2020, who introduced the general approach and the methodology to choose range-check and outlier check limits for GNSS ZTD and formal errors which are actually followed here.
To clarify the purpose and usage of the formal errors for the screening process, it may be judicious to more section 4.1 here.
4) Line 225 to 260: referring to systematic biases may be misleading here. Referring to the “consistency of ZTD estimated from collocated GNSS sites” as you do later on (Line 256) seems more precise.
I have one concern here with the impact of height differences. As you mention on line 238, “discrepancies … are often attributed to height differences”. This effect is actually expected. A simple rule of thumb approach predicts bias in ZHD around 10 mm and a few mm from ZWD as well based on a 50 m height difference. Such biases can be avoided by applying a proper vertical correction such as described in Bock et al., 2022. Following this approach can be very valuable in detecting station-specific biases when several nearby stations are available.
Another concern is with the impact of equipment changes which can mask short-term site-specific biases when computed over long periods. This should mentioned here and also underlines the importance of offset detection that is discussed later.
Line 235 to 250: the discussion employs the terms “bias, deviations and differences”, are these referring to mean values? Please clarify. Separating the mean and standard deviation of differences rather than using the RMS (which mixes both) would also give more insight into the nature of the differences.
Line 260: “additional quality control” would better fit here in place of “additional screening”.
5) Section 3.3 Screening based on comparison with reference PWV data: eliminating the negative IWV values may not be sufficient and may actually be hiding a more general bias between GNSS and ERA5 (possibly with seasonal variation). To avoid this caveat, it is preferable to compare ZTD values (as also recommended in Jones et al., 2020, chap. 5.4.1). Then you would probably notice a bias and decide to remove the flawed stations or find that representativeness errors in ERA5 are the limitation.
6) Line 274-292: I don’t understand the rationale of comparing GNSS PWV at a target station to ERA5 PWV at nearby stations. Here you mix to types of errors: GNSS vs. ERA5 (different data sources) and target vs. nearby (difference due to distance). If this procedure is inspired by Nguyen et al., 2024, it is actually not relevant here. Please clarify or correct.
Figure 8: this PWV difference series is really suspect. It looks like the ERA5 values in your GNSS – ERA5 differences are very close to zero. Please check.
7) Section 4.1 Formal errors in ZTD estimations: consider moving this Section to Section 3.2.
Give the % of the CDF corresponding to a formal error of 10 mm, which is the limit used for the range check in Section 3.2.
Line 299: “The formal errors of the estimated ZTD are known to play a key role in analysing the quality of GNSS”. Although formal errors may help in the QC/QA of GNSS ZTD estimates (Bock, 2020), it is an overstatement to say that they play a key role. Please moderate.
8) Section 4.2 Cross-comparison of PWV with external references
Line 232: the description of ERA5 (number of pressure levels, horizontal resolution, etc.) should be given earlier, e.g. in Section 2.3 when ERA5 is first introduced and used.
Line 383 and 405 + Line 229 (Section 3.2): explain why you chose three different collocation limits for these comparisons (GNSS vs GNSS, GNSS vs. VLBI, and GNSS vs. RS).
9) Units for ZTD and PWV comparisons: use mm for ZTD and kg/m2 for PWV to avoid confusion.
I am wondering whether some of the published GNSS vs. VLBI comparisons are not cited in the wrong unit (e.g. large biases and RMS values).
10) Section 4.3 Offset detection: my main concern here is that you applied the detection to a subset of stations only (less than 50% of all your stations). Considering the use of this dataset for climate studies (e.g. trend analysis), this is a serious drawback. Please explain why not all stations have been checked and whether you intend to further complete this.
Another couple of questions concern the confidence that can be placed in the PMTred method and the validation of the detected change-points. First, the number of change points with this method seems a little underestimated compared to studies using other methods (Venema et al., 2012; Van Malderen et al., 2020; Nguyen et al., 2021). Second, the validation with GNSS metadata is not robust as some equipment changes may be missing and not only equipment changes are suspected to produce offsets, but also environment changes. In addition, the numerous firmware changes are not expected to have a significant impact and instead lead to accepting many false detections. To overcome these limitations, it may be advisable to cross-compare the results from different detection methods and to implement a more robust validation method, e.g. based on multiple pairwise comparisons (Caussinus and Mestre, 2004; Menne and Williams, 2009; Nguyen et al., 2024).
Regarding alternative methods, Van Malderen et al., 2020, evaluated several of them in the similar context as the present study. Their benchmark study showed that PMTred, which is based on tests, is not performing well with the type of data used here (GNSS minus reanalysis differences). The best methods are indeed based on penalized maximum likelihood. One of them was first published under the name GNSSseg by Quarello et al., 2022, and recently updated and renamed PMLseg by Nguyen et al., 2025. This method uses penalized maximum likelihood and was specially developed for the segmentation of GNSS minus reanalysis differences. The authors may find it interesting as an alternative or simply for cross-checking their PMTred results.
11) Figures 20 and 21: the interpolation of 2D fields over the ocean looks very unrealistic. It may be preferable to mask the oceans in these figures.
Line 596: I was wondering if ENSO was not a more likely explanation for the interannual variability observed in these figures.
12) Section 7: Summary and Outlook: the text of this section seems to overstate a little the quality of the dataset given the mentioned limitations.
While the dataset could benefit from further homogenization and the use of the latest GNSS processing software, this work represents a valuable contribution that will enrich discussions and collaborations within the IAG community (e.g. within the ICCC joint working group C.8 on Optimal processing and homogenization of GNSS-PW climate data records).
References
Bock, O. (2020) Standardization of ZTD screening and IWV conversion, in: Advanced GNSS Tropospheric Products for Monitoring SevereWeather Events and Climate: COST Action ES1206 Final Action Dissemination Report, edited by Jones, J., Guerova, G., Douša, J., Dick, G., de Haan, S., Pottiaux, E., Bock, O., Pacione, R., and van Malderen, R., chap. 5, pp. 314–324, Springer International Publishing, https://doi.org/10.1007/978-3-030-13901-8_5.
Bock, O., Bosser, P., and Mears, C. (2022) An improved vertical correction method for the inter-comparison and inter-validation of integrated water vapour measurements, Atmos. Meas. Tech., 15, 5643–5665, https://doi.org/10.5194/amt-15-5643-2022
Caussinus, H. & Mestre, O. (2004) Detection and correction of artificial shifts in climate series. Journal of the Royal Statistical Society Series C: Applied Statistics, 53(3), 405–425. Available from: https://doi.org/10.1111/j.1467-9876.2004.05155.x
Jones, J., Guerova, G., Douša, J., Dick, G., de Haan, S., Pottiaux, E., Bock, O., Pacione, R., van Malderen, R. (Eds.) ; Advanced GNSS Tropospheric Products for Monitoring Severe Weather Events and Climate: COST Action ES1206 Final Action Dissemination Report (2020), Springer International Publishing: Cham, Switzerland, 2020 ; https://doi.org:10.1007/978-3-030-13901-8.
Menne, M.J. & Williams, C.N. (2009) Homogenization of temperature series via pairwise comparisons. Journal of Climate, 22(7), 1700–1717. Available from: https://doi.org/10.1175/2008jcli2263.1.
Nguyen, K. N., Bock, O., & Lebarbier, E. (2024). A statistical method for the attribution of change-points in segmented Integrated Water Vapor difference time series. International Journal of Climatology, 44(6), 2069–2086. https://doi.org/10.1002/joc.8441
Nguyen, K. N., Bock, O., & Lebarbier, E. (2025), PMLseg: a R package for the segmentation of univariate time series based on Penalized Maximum Likelihood. https://github.com/khanhninhnguyen/PMLSeg/tree/main
Quarello, A., Bock, O., Lebarbier (2022) GNSSseg, a statistical method for the segmentation of daily GNSS IWV time series, Remote Sensing, 14, 3379, https://doi.org/10.3390/rs14143379
Van Malderen, R., E. Pottiaux, A. Klos, P. Domonkos, M. Elias, T. Ning, O. Bock, J. Guijarro, F. Alshawaf, M. Hoseini, A. Quarello, E. Lebarbier, B. Chimani, V. Tornatore, S. Zengin Kazancı, J. Bogusz (2020). Homogenizing GPS integrated water vapor time series: benchmarking break detection methods on synthetic datasets. Earth and Space Science, 7, e2020EA001121. https://doi.org/10.1029/2020EA001121
Venema, V., Mestre, O., Aguilar, E., Auer, I., Guijarro, J. A., Domonkos, P., et al. (2012). Benchmarking monthly homogenization algorithms. Climate of the Past, 8, 89–115. https://doi.org/10.5194/cp‐8‐89‐2012
Citation: https://doi.org/10.5194/essd-2025-283-CC1
Data sets
A comprehensive 22-year global GNSS climate data record from 5085 stations Xiaoming Wang et al. https://www.pangaea.de/tok/3945654965e0ab80bb82b695dda9426b3e7b597c
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
478 | 89 | 30 | 597 | 18 | 29 |
- HTML: 478
- PDF: 89
- XML: 30
- Total: 597
- BibTeX: 18
- EndNote: 29
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1