the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Development of a global in-situ daily temperature dataset preferentially for 0000–2400 UTC from 1981–2024
Abstract. Large amounts of sub-daily temperature data are shared globally through the Global Telecommunication System (GTS) in near-real time and through international data exchanges. However, converting these data into a global daily temperature dataset with a uniform definition – especially for daily maximum (Tmax) and minimum (Tmin) temperatures – has proven challenging due to the independent observation schedules across the world. To address this issue, we developed a new method that decomposes sub-daily Tmax and Tmin records from the Integrated Surface Database (ISD) into finer intervals, subsequently reaggregating them into daily Tmax and Tmin based on prospective dateline. This new method increased the global daily Tmax and Tmin counts by 64 % and 45 %, respectively, compared to the original method, which relied on either two consecutive Tmax/Tmin records over 12 hours or a single Tmax/Tmin record over 24 hours. The Global Land Base Dataset-First Estimate Daily Data (GLBD-FED) was established for the period from 1981 to 2024, following corrections for misrecorded Tmax and Tmin and quality control. GLBD-FED includes daily maximum (Tmax), average (Tave), and minimum temperature (Tmin) from approximately 17,000 global sites, with daily data amounts reaching about 10,000 entries per day in the current decade. When compared to the Global Summary of the Day (GSOD) dataset, GLBD-FED exhibits more temperate performance over the last 40 years, showing a slightly lower daily Tmax (around -0.3 °C) and a higher daily Tmin (around +0.3 °C), with a nearly identical daily Tave (approximately +0.1 °C). GLBD-FED identifies records that could or almost could represent the highest or lowest temperature in the 24-hour period as daily Tmax or Tmin, while GSOD selects the highest or lowest records within the 24 hours. This variance in definitions, combined with different preferences for meteorological and airport report sources, contributes to these observed biases.
- Preprint
(3265 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on essd-2025-771', Anonymous Referee #1, 19 Mar 2026
-
RC2: 'Comment on essd-2025-771', Anonymous Referee #2, 23 Mar 2026
Development of a global in-situ daily temperature dataset preferentially for 0000-2400 UTC from 1981-2024
Su Yang, Panmao Zhai, Xuebin Zhang, Zijiang Zhou
ESSD-2025-771This work covers the creation of a new dataset based on NOAA's Global Summary of the Day (GSOD) dataset. It also forms an assessment of some of the data quality issues within the GSOD itself, and so would be an important addition to the literature. However, as a data product, it is a shame that the GLBD-FED dataset (or at least, this version) will be static with the change in the source data for the GSOD from ISD to GHCNh.
Most of my comments request clarifications or more information, which are detailed below. There are also some minor spacing and punctuation issues, which should be picked up during proofing at the latest
Line 59: You are correct that until August 2025, the GSOD did rely on the ISD as its input data (which in some cases hasn't been well understood, and led to these datasets being treated as independent). However, with the release of the GHCNh and termination of updates to the ISD in 2025 it is important to indicate at which point you obtained the GSOD, and that the current basis may be different to the version used in your studyLine 68: including references here which discuss these temporal representation issues in precipitation data could be of use to readers
Line 71: perhaps point forward to where these challenges are discussed
Line 81: [Also relevant to lines 233-35, 295-99] Somewhere it may be useful to note that to obtain "true" Tmax and Tmin values, subdaily temperature observations in the ISD (which are quasi instantaneous) even with hourly sampling will more often than not miss the Tmax and Tmin values because these do not occur at the "top" of the hour. And hence estimate of Tmaxs from the hourly data will mostly be lower, and Tmin will be higher, than the true values.
Line 84: why did you choose this time range? There is a big uptick in the number of available stations in the ISD in 1973, so I'm surprised this wasn't used as the start year.
Line 89-90: To get an hourly mean temperature, that would entail measurements were taken throughout the hour and then averaged. Although the ISD does contain some off-hour and sub-hourly data I think this would only be available for a subset of the stations and period of record. Please add an explanation of how you obtained the hour mean temperature from the ISD instantaneous measurements?
[I note from later sections that this is a 24h mean, which was not clear to me at this point, please do make it explicit here that this is a daily mean from the subdaily values]
Line 92: given your analysis starts in 1980, why do you only show results from 2011 in Figure 1?
Line 94: Are you sure these are two separate 6hourly cadences, offset by 3h; or could it be a combination of 6 hourly and 3 hourly data? I also suggest noting in this paragraph any selection criteria you use to calculate the daily mean (one observation in each quarter of the day, or all available observations in each quarter....)
Line 102: The plot suggests 0900 and 1200 show the peaks, please check.
Given these are max/min in 24h, we expect one value in each 24h period per station. As you show these as UTC combined with the likely observing schedule of reading max/min thermometers, how does the distribution of stations with latitude come into play in these plots?Line 115: I presume that in panels b, c, the red and blue bars have been purposefully offset? If so, please add this to the caption?
Line 119: the URL doesn't resolve
Line 136: Is the case study representative of all the mis-recorded values in the ISD? I suggest you also indicate the number of stations affected, and potentially the fraction of days for those stations (this plot could go into the Appendix).
Line 166: I presume "Uplimit" is "Upper Limit" - please amend the plot if so. I'd also suggest you don't use red and green in combination given some forms of colour blindness can make these indistinguishable.
It also appears if some red or green lines are missing - presumably because these have identical values. I think it could help if you can ensure that both lines are visible in these cases (eg using dashed lines).
In panel b there are no green lines at all and so it is hard to see what has changed with the correction.
The red lines (12h maximums) do not align with the hourly values, being always slightly above. If this is for presentational purposes, please state this in the caption and where relevant in the text, or give an explanation for these differences.Line 176-81: I'm afraid I do not understand what this paragraph is trying to convey.
Line 184-88: I think this is saying you obtain daily averages from 4 hourly averages, one in each 6h quadrant of the day? This implies that these have to be regularly spaced. Perhaps note that hh can be *one of* 0000, 0100....0500 rather than sequentially all of them which would give 6 values. Note that I think the end of the sequence should be 0500 rather than 0600 else there's some double counting.
Line 200: what do you mean by "less apparent"? What would have happened if the Tmin-15h start time was not aligned with the Tmin-24h start time?
Line 207: I can only see one dashed green arrow (I can't tell if this is the light or dark green one). Please correct the figure/caption.
Line 225: There are some clear geopolitical border effects visible in Fig 6 - so you could be more specific here.
Line 230: These figures will need to be large enough in layout so that the points can be seen. I suggest that the points are made slightly larger, and also that the colourscale used is sequential (e.g. viridis) and not a rainbow (www.climate-lab-book.ac.uk/2014/end-of-the-rainbow/, https://www.sciencedirect.com/science/article/pii/S0924271622002659)
Line 239: Perhaps replace "stuck" with "repeating"?
Line 243: How did you come up with these tests rather than any others? Can you give a simple summary of the additional temporal and spatial consistency tests here, as you have for the stuck value and inner consistency?
Line 248: Is this over the whole period of record, or per day? Please specify the time period over which the quality results are combined.
Line 267: Perhaps use a different example - this station moved by 450m (even with only a 20cm elevation change), but was only in the first position (72200699999) less than a single year. The GHCNh dataset which replaced the ISD may improve the linking of closely located stations in the merge process (https://www.ncei.noaa.gov/products/global-historical-climatology-network-hourly) as there is only a single station for this location (USW00054926).
Line 284 - b3 - there seems to be some quasi-periodic nature in the early part of the record for Tmin in the daily data volumes. Do you know why?
Line 285: Fig7 panels a1-3. Please use a sequential colourscale here (see comments against line 230)
Line 293-99: This paragraph has a number of spelling and grammar errors
Line 294: If nearly all records were included, why were some not?
Line 302-5: I think you could be more explicit here as these two comparisons almost sound as if they are the same, as my understanding is that the GSOD selects the highest/lowest records *from the hourly data* within 24 hours.
Line 311: Can you give an explanation for the periodicity for the first 15 years of the record in Tmax and Tmin?
Line 335 (Fig 9) - could the bar-plots please have some x-tick marks as the x-tick labels aren't quite aligned with the bar edges?
Line 352: see comments against lines 302-5, and line 81, being clear here that the GSOD selects the highest/lowest from the recorded subdaily (hourly) observations I think is important.
Line 358: I think it would be good to include why you have selected SYNOP over METAR, and hence do the inverse of the GSOD.
Line 385: As for Figure 3/line 166, I suggest replacing "Downlimit" with something that more clearly indicates this is the daily Tmin value for these two datasets.
Line 392: Please give the country in which this station is located at this point in the text, as you have done for the other two so far.
Line 402: Were these likely, or potential duplicated records? Some of the red points sit on the one-to-one line so may not be erroneous duplications. There could be weather conditions which do result in identical Tmax values (even if rare). Were there any repeated values for Tmax in the GLBD-FED data for this station? If so, then I suggest showing these, or stating that there were none. Clearly your plots and analysis suggest this is an issue which biases the GSOD data, but there are repeated GSOD values which lie on the line of equality, and a few points where GLBD-FED over estimates the Tmax, and discussing these could help give context to the GSOD biases.
Line 429: Please check the latitude and longitudes for 442920-99999. The ones presented place the station in Qingdao, Shandong Province (maybe Liuting, 548570-99999?), and the altitudes seem unlikely for Mongolia.
Line 434: It's not clear to me from the text and caption whether these two panels show subsampled GLBD-FED data - one extracting SYNOP, the other the METAR - or whether the METAR data has been specially extracted for panel b.
Line 439: Please give the station number in this case (as you have done with others).
Line 460: Please restate the averaging period at this point (see comment for line 89-90).
Line 466: The "original" method referred to here is what was used for GSOD? If so, please make this clear.
Line 468: Make it clear the increases were in the data counts, not the values.
Line 497-521: It could be useful to describe in words the behaviours that each test is looking to identify.
Citation: https://doi.org/10.5194/essd-2025-771-RC2 -
RC3: 'Comment on essd-2025-771', Anonymous Referee #3, 26 Mar 2026
This paper presents the development of a global in situ daily temperature dataset from 1981-2024. Getting access to daily long-term station data is getting harder (despite open data sharing resolutions from WMO, for example) so the development of this dataset is very welcome. However, while I appreciate the effort that has gone in to developing GLBD-FED, I am struggling to understand its purpose. For monitoring (including for extremes) and climate change applications it might not be fit for purpose and to test that you would need to assess the homogeneity of the data or do some comparisons with ‘high quality’ temperature data among other things. Comparing GLBD-FED with GSOD tells you about potential biases with another (not very high-quality) product but doesn’t tell us anything really about the quality of GLBD-FED for the potential applications for which it could be used. For example, you mention ‘monitoring extreme weather’ and ‘detecting climate change’ in the introduction but could you really use GLBD-FED for these purposes? Also monitoring applications will be limited as the dataset is static and it was unclear if the authors’ intention was ultimately to operationalise the dataset and how that could be done given the resource intensive way in which the dataset seems to be evaluated. In saying this, I think it’s important that this dataset is developed and shared but the authors should clarify its value and purpose and perhaps include some basic evaluation on the quality of the dataset for end user applications.
Some specific comments:
L32: What do you mean by ‘temperate performance’?
L49: I believe the correct acronym is ‘National Meteorological and Hydrological Services (NMHSs)’
L49-54. There is strange referencing here. The start of your intro is about the importance of daily in situ temperature measurements but here you reference a mixture of global, regional, daily, monthly and precipitation datasets with many of them very old references. I think you really have a chance to say here that there are really very few daily global temperature datasets based on in situ data available. We have GHCN-Daily and e.g. BEST and HadGHCND (which are gridded) but most of the daily ‘observed’ global products are for (gridded) precipitation and that’s mainly because we have numerous satellite products available (e.g. GPCP, IMERG, CMORPH, TRMM, CHIRPS, PERSIANN, GSMap etc), and a few in situ-based products (e.g. GPCC, CPC, REGEN). This combined with improved accessibility of the data, is a major gap you are filling with GLBD-FED.
L64: I don’t know why you start to talk about precipitation here because you have not specifically mentioned it earlier and actually it would probably be best just to stick to talking about temperature datasets and gaps.
L73: OK so you’re going to produce a global daily temperature dataset but it is still unclear what applications it could be used for, especially since there are known quality issues with GSOD which you are using for your comparison data.
L85: ISD has now been superseded by the Global Historical Climatology Network hourly (GHCNh) dataset (https://www.ncei.noaa.gov/products/global-historical-climatology-network-hourly). Understandably it takes a long time to develop a dataset and likely GHCNh was not available at the time you commenced development of yours. However, this at least needs to be acknowledged somewhere as it would also affect the operationalization of GLBD-FED if that was indeed the intention.
L91: It’s unclear why you chose 2011-2024 to produce Figure 1. Does the figure change if you include different time periods/the whole dataset period?
L110: display weaker temporal regularity – weird phrasing
L119: The hyperlink doesn’t work for me.
L148: I really appreciate the effort here to actually try and correct erroneous data rather than simply flagging it. However, I assume that you flag that it has had a correction applied?
L168: Do you mean ‘upper limit of temperature’ here?
L183: I couldn’t work out how you deal with missing hours or whether this is a problem.
L225: So it does look as if you improve the data volume in America, Australia, western Europe, and southern Asia but for e.g. North America and Australia there are already high quality, long-term daily temperature records that are readily accessible so it’s unclear what value additional daily data in GLBD-FED would bring. This brings me back to my general comment about the ultimate purpose of the dataset.
L239: The spike test seems to have a fixed threshold which may not be appropriate for regions of very low or high variability. I guess in combination with the other tests this could be accounted for but I highlight it as a possible limit to the qc checks.
L293-299: I think this paragraph missed some copyediting e.g. “comparted”, “manifest”, “mean”, “remarkable”. Please check typos, grammar and style. In addition, why do you only compare “nearly all records” and not “all records”. Which records did you not compare and why?
Figure 8: I note a trend in the bias for Tave through time. Do you have an explanation for this? I also note what looks like there is a seasonal cycle in the biases for all temperature aspects which you don’t mention and needs to be discussed.
Citation: https://doi.org/10.5194/essd-2025-771-RC3
Data sets
a global in-situ daily temperature dataset preferentially for 0000-2400 UTC from 1981-2024 Su Yang et al. https://doi.org/10.5281/zenodo.17895292
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 222 | 72 | 19 | 313 | 29 | 28 |
- HTML: 222
- PDF: 72
- XML: 19
- Total: 313
- BibTeX: 29
- EndNote: 28
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This is an interesting paper describing a data set of potential significant value. I think it has the makings of a publishable paper but needs some work before getting to that stage.
Although the authors were probably not aware of this when the dataset was being developed, it is unfortunate that both the ISD and GSOD datasets were retired in 2025 (ISD has been replaced by GHCN-Hourly, the replacement for GSOD is not yet online). As such this dataset will not be able to be updated in its current form – this is worth acknowledging, I think.
Major comments
Minor comments
L50-54 – this is a very long list of datasets, but none are particularly recent – suggest being more selective and focus on most recent versions.
L114 – is there any indication of the geographic distribution of different observation times?
L129-130 – needs a cross-reference to the appendix for specifics of the quality control methods used.
Section 4.1 – this looks like a fairly labour-intensive process, how practical is it to implement this across the network?
L184 – 'as discussed in section 3.1' – this matter is not really discussed there. There are some references which could be cited on methods for Tave calculation – see, for example, https://wires.onlinelibrary.wiley.com/doi/full/10.1002/wcc.46 and references therein.
L227 – it is worth noting that some individual countries have a particularly large increase (e.g. Finland is obvious on the max for both Tmax and Tmin).
L245-250 – realistically with the size of the dataset this process would need to be automated – can it be confirmed in the text whether this has been done?
L267-270 – would this be connected with different national practices about whether or not new identifiers are assigned when a site moves?
Figure 7 – it looks like in the timeseries there are some individual months with a sharp drop in the number of available stations, is this correct, and if so is there any explanation for it?
L294-297 – how is the averaging done here – is it an arithmetic mean of stations or area-weighted in some way?
L297-299 – in principle GSOD could be used for extreme events research but in practice I am not aware of any serious climate research which does so (probably because people are well aware of the limitations of the GSOD data).
L397 – 'generally reports warmer temperatures' – this isn't quite correct, what is true is that it generally reports warmer temperatures when there is a difference, but many days (presumably those where GSOD is not double-counting) have zero difference.
Figure 8 – is there any explanation for the seasonal cycle in the early years of these time series?
Figure 9c – it may or may not be relevant to the Australian results here that the standard times for reporting Tmin nationally in eastern Australia is 2200 or 2300 UTC (depending on season), although international synoptic reporting of this is sometimes at 0000 UTC.
Table 2 – the elevations are incorrect here – the general Ulan Bator metropolitan area is at approximately 1300m elevation.