HydroGFD3.0: a 25 km global near real-time updated precipitation and temperature data set

HydroGFD (Hydrological Global Forcing Data) is a data set of bias adjusted reanalysis data for daily precipitation, 1 and minimum, mean, and maximum temperature. It is mainly intended for large scale hydrological modeling, but is also 2 suitable for other impact modeling. The data set has an almost global land area coverage, excluding the Antarctic continent, at 3 a horizontal resolution of 0.25◦, i.e. about 25 km. It is available for the complete ERA5 reanalysis time period; currently 1979 4 until five days ago. This period will be extended back to 1950 once the back catalogue of ERA5 is available. The historical 5 period is adjusted using global gridded observational data sets, and to acquire real-time data, a collection of several reference 6 data sets is used. Consistency in time is attempted by relying on a background climatology, and only making use of anomalies 7 from the different data sets. Precipitation is adjusted for mean bias as well as the number or wet days in a month. The latter is 8 relying on a calibrated statistical method with input only of the monthly precipitation anomaly, such that no additional input 9 data about the number of wet days is necessary. The daily mean temperature is adjusted toward the monthly mean of the 10 observations, and applied to 1 h timesteps of the ERA5 reanalysis. Daily mean, minimum and maximum temperature are then 11 calculated. The performance of the HydroGFD3 data set is on par with other similar products, although there are significant 12 differences in different parts of the globe, especially where observations are uncertain. Further, HydroGFD3 tends to have 13 higher precipitation extremes, partly due to its higher spatial resolution. In this paper, we present the methodology, evaluation 14 results, and how to access to the data set at doi:10.5281/zenodo.3871707. 15


Introduction
16 Precipitation (P ) and temperature (T ) are key driving parameters for many impact models, and there are now many observa-17 tional data sets available. They differ regarding the spatio-temporal resolution, the historical coverage, and the data sources 18 included in the product. However, when it comes to continuously updated near real-time data sets, there are very few available 19 data sets. It is therefore challenging to find a product suitable for monitoring and initialization of forecasts for an impact model, 20 i.e. a product that fulfills both a long historical period for calibration and validation, as well as real-time updates. 21 While most data sets now offer a rather long historical period, the real-time availability is a greater challenge. Merged satel-22 lite and gauge data sets such as CHIRPS (Funk et al., 2015a), CMORPH (Joyce et al., 2004), and PERSIANN-CDR (Ashouri 23 et al., 2015) offer both high resolution and near-realtime components, but are limited to between the +/-50 or +/-60 degree lat-24 itude bands. Several data sets have made use of reanalysis data as a basis, adjusted using various gridded observational data sets 25 1 (Weedon et al., 2011(Weedon et al., , 2014Beck et al., 2017;Berg et al., 2018) :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: (Weedon et al., 2011(Weedon et al., , 2014Beck et al., 2017;Berg et al., 2018;Cucchi et 26 . The advantage is that the reanalysis products are readily available with a large range of variables and output frequencies. Still, 27 the downside with reanalysis products is that especially P is a model product and thereby suffers from model bias. Since the 28 bias can be substantial, several methods have been developed to adjust reanalysis, using different methods and reference data 29 sets. 30 A hydrological operational monitoring or forecast product has strong demands on availability and redundancy of the data 31 flows. The data set HydroGFD1 (Berg et al., 2018) was constructed and made operational for initializations of the hydrological 32 model HYPE (Lindström et al., 2010) for different set-ups across the globe. It offered near-realtime updating of daily P and 33 daily T (mean, minimum, and maximum), until the end of the last calendar month. The real-time components of HydroGFD1 34 were based on ERA-Interim reanalysis, extended by the ECMWF deterministic forecasts, adjusted using monthly mean P

100
In a final step, the three climatologies are harmonized by only retaining the grid points that are available consistently in 101 all data sets and all months. This leads also to the final land-mask of the HydroGFD3 data set : , ::: for ::::: which :::::::: adjusted :::: data ::: are 102 ::::::: produced.

126
The method essentially relates the number of wet days, N wet , to the monthly P anomaly, P anom , using also the climatological 127 wet day frequency ::::::::: (calculated :::: from ::: cru :::: wet :::: days :::: data ::: set), N clim wet as a predictor, and a tunable constant, k.  The production of the corrected data consists of the following steps. For P , the scaling can cause very large values in some cases, e.g. when e5 severely underestimates the number of wet days.

146
Therefore, P is limited to a maximum of 1500 mm/day, which is close to the highest observed record at that time scale. be filling : is :::: used :: to ::: fill : the grid point value. However, if no defined data is found, the anomaly will be set to 0 for T and 1 for 154 P ; in other words, the output will resort to adjustment :::::::: adjustment :::: will :: be : toward the HydroGFD3 climatology. Evaluation of the HydroGFD3 historical data set is presented for the mean climatology of P and T , as well as for regional  The historical period (1979-2016) is built on e5 corrected with the gpcch and cru data sets, respectively for for P and T .

167
There is only one tier produced for this period. e5 will later be released back to 1950, and the HydroGFD3 historical data will 168 then cover that period as well.  Since it does not make use of any observational data sets, it has received the internal file naming convention "none". For P , also 177 the number of wet days is adjusted, according to the description in Section 3.3, using the reanalysis anomalies as a predictor.

178
A closer to real-time product is possible, with the daily time step cpcp and cpcpt ::: cpct : products being available with a two 179 day latency, and e5t available at five day latency. The adjustment of the e5t data is then based on the latest available 30 days, 180 synchronized between the data sets, and is therefore called "Trailing". The HydroGFD3 data sets are updated at regular intervals. The "extended" period is updated each month, as new e5 and other 183 data sets become available. Each tier works independently, and can therefore become available at different times.

184
The "near real-time" period is updated at earliest five days into the new month, when e5t is available. By then, the cpcp and 185 cpcpt ::: cpct : products are generally available, but gpccf normally needs a few days more. Tier 3 needs no additional data sets, and is available together with e5t, but is produced at the calendar month timestep like the other products. The priority order is 187 independent for each variable, and goes from Tier 1-3.

238
HydroGFD3 tends to have higher extremes than other datasets. This is partly a resolution effect due to the 0.25 degree 239 resolution of HydroGFD :::::::::: HydroGFD3, and 0.5 degree of the other data sets used here. A coarser resolution will move all one will increase the extremes, and below one will decrease them. The baseline climatology therefore has an impact on the 243 extremes. Also the wet-days calculation of HydroGFD3 can affect the results, and we find that the dry regions, e.g. SAH and 244 MED, has more dry days in HydroGFD3 than in the other data sets. When e5 only gives few P days, while the observational 245 anomaly is high, the scaling factor can become very large, and the only process to limit this is the upper limit of 1500 mm/day, 246 which is seldom reached. The wfd-gpcc :::::::::: wfde5-gpcc, which has a similar methodology as HydroGFD3, still has lower extremes.

247
Besides the above mentioned under-catch corrections, the lower extremes may be due to the upper threshold applied to each 248 hour, as can be seen in the original wfd-code :::::::::: wfde5-code in the CDS-catalogue (https://doi.org/10.24381/cds.20d54e34).

249
For T , the general shapes of the PDFs agree across all data sets and regions . :::: (Fig. ::: 8). : However, there are sometimes 250 substantial differences between e5 and the observational data sets. Typically, e5 displays issues around 0 • C, which is common 251 in global models and related to melting conditions. There are also seasonal offsets outside the range of the observations. HydroGFD3 remains fairly close to cpct and wfd-cru :::::::: wfde5-cru : in most cases. Orographic effects on the T was ::: were : not 253 accounted for in this comparison, which can explain some of the differences in regions with varying orography such as TIB.

Temporal trends 255
To get an impression of the temporal trends, and to identify potential issues in the time series, we also investigate the time series 256 as an average over the Giorgi regions. To emphasize differences between the data sets, we discuss mainly differences relative 257 to a common reference, here chosen to be e5. In other words, we present the inverse bias of e5 compared to each observational 258 source.

Extending to near real-time 282
The near real-time products, in Fig. 3 called "trailing", use the daily updates of the cpcp and cpct observations. They are 283 therefore subject to the quality of the cpc products, and the changes in time as discussed in the previous section. This product 284 follows HydroGFD3 fairly closely to that shown in Fig. 9 and 10, as the main version Tier 2 is also based on cpcp and cpct, 285 but with corrections at calendar months.

286
In addition, also the "none" products are created with the trailing time window. These only replace the e5 climatology with 287 that of HydroGFD3, and is the simplest form of corrections of the mean. They act as the last failover :::::: failsafe option in the 288 production chain, before defaulting to un-corrected e5 data. We do not present this product in the time series plots, since it 289 would only constitute a constant annual cycle offset in comparison to e5. Compared to similar data sets based on reanalysis, such as WFD ::::::: WFDE5 and MSWEP, HydroGFD3 differs in that it has its 292 own climatological background, and performs the corrections based on anomalies of that same climatological time period. The 293 reason for using this method, as : is : to be able to switch datasets :::: data ::: sets : closer to real-time, without "jumps" in the time series.

302
In effect, this leads to enlarging the tail of the distribution. : , ::: e.g. :: in ::: the ::::: MED ::: and ::::: SAH ::::: region :: in :::: Fig. :: 7. : It is possible to restrict 303 the scaling by only allowing the scaling factor to be a few times the original value, but such restrictions would in turn impact 304 on the monthly mean. A potential method would be to "borrow" P from adjacent grid points on e5's excessive dry days, and 305 thereby reducing the scaling factors. This topic is being investigated for future updates of the methodology.

306
The regional analysis shows clearly that the observational data sets give substantially different results in some regions.

307
Diverse results are more common in data sparse regions or in regions where data are not generally available to all data sets. It is 308 therefore difficult to determine which is closer to the truth in a global assessment like this, and more detailed regional studies, The HydroGFD3 methodology of correcting the e5 reanalysis model toward an observational reference, along with the resulting 319 data sets were presented. We conclude that the data sets compare well with existing similar data sets.

320
The main new features of HydroGFD3 are:

18
-Near real-time corrected data until five days from now, i.e. following the continuous updates e5 + e5t time period.

323
-Temporal coverage from 1979, and will be extended back to 1950 along with the extended e5 data expected during 2020.

324
-Multiple redundancy options to avoid halting production when single data sets are delayed.

325
The data is freely available for the period 1979-2019, and by subscription for the real-time products. See Section 8 for 326 details.  The following years use instead gpccm and cpct ::: cpct :::: and ::::: gpccm : reference data.

331
Real-time updates of the data set are available for a processing charge via subscriptions. Please make a request here: 332 https://hypeweb.smhi.se/buy-water-services/data-subscription/ and make sure to mention the data set name "HydroGFD3".