A High-Accuracy Rainfall Dataset by Merging Multi-Satellites 1 and Dense Gauges over Southern Tibetan Plateau for 2014-2019 2 Warm Seasons

Abstract

However, all existing precipitation datasets show insufficient accuracy in high mountainous regions (Yilmaz et al., 2016;Derin et al., 2018;Derin et al., 2019;Anagnostou & Zhang, 2019), which hinders our understanding of climate and hydrological processes over these areas.This can be attributed to the complex physical nature of electromagnetic transmission and precipitation forming processes (Hong et al., 2007;Bitew & Gebremichael 2010;Dinku et al., 2010), and harsh environments in high mountains that lead to very limited deployment of in-situ rain gauges with insufficient representation of ground observations for training satellite-based precipitation retrieval algorithms.For instance, the Tibetan Plateau (TP) as the roof of the world is surrounded by imposing mountain ranges with an average elevation exceeding 4000 m.It generates several large rivers in Asia and provides invaluable freshwater resources for more than 1.4 billion people living downstream (Immerzeel et al., 2010).However, this vast plateau has very limited number of precipitation gauges across its 2.5 million km 2 area.The precipitation gauge network operated by China Meteorological Agency (CMA) contains only 86 gauges over the entire TP (Figure 1).These gauges are essential to correct satellite precipitation datasets.For example, GPM-IMERG 'Final' Run dataset uses Global Precipitation Climatology Centre (GPCC) database, GSMap_Gauge and CMORPH use NOAA Climate Prediction Centre (CPC) database.Although both GPCC and CPC databases received data through Global Telecommunication System (GTS), only part of the above-mentioned gauges in TP were utilized (Xie et al., 2007;Becker 2013).Previous evaluations over the TP indicated that most products present dependence on topography to varying degrees, and products adjusted by gauge observations shows better performance than satellite-only products (Gao et al.,2013;Lu et al., 2018).Therefore, a better spatial coverage of rain gauges is critical to correct satellite products in high mountains.
In 2014, the Ministry of Water Resources of China (MWR) launched the flash flood monitoring and alarming campaign.A large number of rain gauges is now accessible over the TP, especially in the southern TP.There are 440 new rain gauges totally involved in 6 years, and are available since 2014, independent of the existing CMA precipitation gauge network (Figure 1).These gauges provide measurements of precipitation in liquid phase (i.e., rainfall) at event time scale.A couple of recent studies have demonstrated the utility of this rain gauge network (Xu et al., 2017;He et al., 2017;Tian et al., 2018;Wang et al., 2020).For instance, Xu et al. (2017) evaluated the performance of TRMM and GPM and the dependence on topography and rainfall intensity based on the network.Their results demonstrated that the data quality of this dense gauge network is strictly controlled, serving as the currently highest gauge dense for satellite product evaluation on TP.Wang et al. (2020) used the gauge data to validate their reproduced precipitation dataset.However, there is not a merging product that assimilate the observations from this dense rain gauge network.This is apparently a unique opportunity to improve the performance of existing satellite-based precipitation datasets for its highest density and quality.
This study aims to provide a high-accuracy rainfall dataset by merging all available ground gauges and three good-quality satellite precipitation datasets over the southern TP for the warm seasons (June 10 th -October 31 st ) from 2014 to 2019.The remainder of this paper is organized as follows: Section 2 describes the study area and the source data.Section 3 provides details of the data merging method and the methods adopted to evaluate the quality of dataset.Results are presented in Section 4. The data availability and summary are provided in Section 5 and Section 6, respectively.

Southern Tibetan Plateau
Tibetan Plateau, known as the Asian water tower, mainly covers parts of China, India, Myanmar, Bhutan, Nepal and Pakistan.Tibetan Plateau, known as the Asian water tower, borders India, Myanmar, Bhutan and Nepal to the south and Pakistan to the west.
Various climate systems affect the plateau, including westerly winds in winter and the Indian monsoon in summer (Yao et al., 2012).Many Asian large rivers originate from this vast area, including the Yellow River, the Yangtze River, the Yarlung Zangbo River (YZR), Jinsha River (JR), Lancang River (LR), Salwen River (SR), Irrawaddy River (IR), Ganges River (GR), and Indus River (IDR).This study is focused on the southern part of TP (Figure 1), including the upper YZR Basin (YZRB) as a major basin.

Ground gauged rainfall
We combined two rain gauge networks managed by MWR and CMA to obtain a high-quality ground reference dataset up to date.The number of rain gauge is presented in Figure 1b, and varies across different years.The spatial distribution of all gauges is presented in Figure 1c.The gauges are mainly located in the middle reaches of YZRB and the east part of the study area.Despite the high density, we can see these rain gauges are not evenly distributed across the space.This makes satellite rainfall products over varying altitudes and aspects important.Daily rainfall observations during the warm seasons of 2014-2019 were accumulated from the original event scale measurements.
Total number of the CMA and MWR gauges ranges from 53 in 2015 to 377 in 2018, forming the densest rain gauge network up till now.
The CMA gauge data has been widely demonstrated as reliable and accurate in previous studies (Zhai et al., 2005;Su et al., 2020;He et al., 2020).Gauge data used in this study has been manufactured under strict quality control procedures, including (1) internal consistency check, (2) extreme values check (0~85mm/h), and (3) spatial consistency check (Ren et al., 2010).Rain gauges with erroneous values (e.g.enormously large values) were discarded from the entire records.Through strict quality control methods, the warm seasons from June 10th to October 31st were selected as the study period to maintain the high quality of outcome rainfall dataset.

Methodology
We used the Dynamic Bayesian Model Averaging (DBMA) method (Ma et al., 2017) to merge the satellite datasets with in-situ rain gauges.To evaluate the quality of the new dataset, we carried out statistical and hydrological evaluations and comparisons with CHIRPS and MSWEP in southern TP.

Dynamic Bayesian Model Averaging method
The Dynamic Bayesian Model Averaging (DBMA) method developed by Ma et al. (2018) was utilized in this work.A flow chart of the merging method is shown in  For statistical evaluation of the merged data against ground gauges, around 85% of the gauges were randomly selected to form a training gauge set for the merging approach in each year during 2014-2019, and the remaining 15% were used for test.
Training method DBMA of 40 days was only conducted in training dataset.

Statistical Evaluation
Performance of the multiple datasets were statistically evaluated by comparing with ground observations on the corresponding statelite grids.Relative bias (RB) and normalized root mean square error (RMSE) were adopted to measure the amount difference between the gridded rainfall and the gauged rainfall.Correlation Coefficient (CC) was used to evaluate the consistency between satellite estimates and gauge observations.The skill of rainfall data on detecting rainfall occurrence (rainfall events higher than zero) was evaluated through a set of metrics (similarly to Wilks, 2006): i.e.
the probability of detection (POD) assessing how good the multiple rainfall datasets are at detecting the occurrence of rainfall, false alarm ratio (FAR) measuring how often the gridded rainfall datasets detect rainfall when there actually is not rainfall, and critical success index (CSI) measuring the ratio of rainfall events that are correctly detected by the gridded datasets to the total number of observed or detected events.Equations for the above metrics are shown in Table 3.
Table 3. Statistical indices that were used to assess the performance of the gridded rainfall datasets.

Equation number
Relative Bias (RB) For the equations listed in Table 3, n is the total number of gridded product data and gauge observation data; i is the  th of satellite product data and gauge observation data; G i means gauge observation and G ̅ is the average of gauge observation.S i and S ̅ are gridded estimates and their average, respectively.a represents hit (i.e., event was detected to occur and observed to occur), b represents false alarm (i.e., event was detected to occur but not observed to occur), and c represents miss (i.e., event was not detected to occur but observed to occur).
Triple Collocation (TC) technique provides a platform for quantifying the root mean square errors of three products that estimate the same geophysical variable (Stoffelen, 1998).Roebeling et al. (2012) successfully applied the TC technique to estimate errors of three rainfall products across Europe.An extended Triple Collocation (ETC) introduced in Kaighin et al. (2014), which is able to estimate errors and correlation coefficients with respect to an unknown target was used in this study to compare the performance of the DBMA-merged data and two previous merged datasets of CHIRPS and MSWEP.

Hydrological Evaluation
In addition to the statistical assessments against rain gauges, hydrological assessment was used as a tool to test the performance of merged rainfall datasets on forcing hydrological modelling in the study area (similarly see Yong et al, 2012;Xue et al, 2013;Yong et al, 2014;Li et al, 2014).In this section, a semi-distributed hydrological model developed by Tian (2006), namely Tsinghua Hydrological Model based on Representative Elementary Watershed (THREW), was adopted for the hydrological assessment of rainfall datasets in the YZRB.YZRB has a drainage area of approximately 240,480 km 2 within China's boarder.The basin elevation ranges from 143 to 7,261 m, with an average of around 4,600 m.YZR is one of the most important transboundary rivers in South Asia and the highest river in the world, which is characterized by a dynamic fluvial regime with exceptional physiographic setting spreading along the eastern Himalayan region (Goswami, 1985).Due to complex terrain and strongly varying elevation, the YZRB is under control of a variety of climate systems, such as the semi-arid plateau climate prevailing in the upper and middle reaches, and the mountainous subtropical and tropical climates prevailing in the lower reaches.In the cold upper reaches, the mean annual rainfall is less than 300 mm.In the warm middle reaches, the mean annual rainfall falls between 300 mm and 600 mm.
The whole basin area above the Nuxia hydrological station was divided into 63 Representative Elementary Watersheds (REWs).Model parameters were calibrated by daily discharges measured at the Nuxia station.The calibration period is scheduled to run in the warm seasons from June 10 th to October 31 st in 2014-2017, encompassing a period length of 576 days.The validation period includes two warm seasons in 2018 and 2019 with a total duration of 288 days.Descriptions of the calibrated model parameters can be found in Table 4.An automatic algorithm pySOT developed by D.
Eriksson et al ( 2019) was used to optimize the parameter values based on an objectivefunction of NSE (Nash and Sutcliffe, 1970) in Eq. 7. To conduct a continuous hydrological simulation in the study period, the datasets of daily grid-based precipitation over China (Zhao et al., 2014) were used as model inputs in the non-warm seasons when merged rainfall is not available.
where,  is the total number of days in the evaluation period，   and    represent the observed and simulated runoff on the  th day, respectively. ̅  represents the average of observed runoff in the evaluation period.

Spatiotemporal Patterns
Based on the merging method, a new daily rainfall dataset with spatial resolution of 0.1°×0.1° in the warm seasons from June 10 th to October 31 st (144 days in each year) in 2014-2019 (864 days in six years) was generated.Figure 3 presents the spatial pattern of the mean rainfall over the six warm seasons of the merged data in southern TP.It is shown that extremely high summer rainfall centres concentrate in the south-eastern and south-western of the study area where is known as a world-famous heavy rainfall centre (see Biskop et al., 2015;Bookhagen & Burbank, 2006;Kumar et al., 2010).IMERG at basin scale when rainfall value is small, especially for light rainfall events smaller than 2 mm, but this difference tends to be small for heavy rainfall events.

Statistical Evaluation
Figure 5 shows the statistical evaluation of the merged and original datasets in the warm seasons.The statistical indices were calculated for three gauge groups including the training gauges, the test gauges and all gauges at different elevation bands.The datasets in general presented comparable performance for the training and test gauge groups, indicating that the sampling procedure of ground gauges is adequately random.
The comparable performance of merged data in training and test gauge groups demonstrated robustness of the merging method on varying gauges.In terms of RSME, CC, and POD, the DBMA-merged data shows much better performance on all gauge groups and elevation bands than the original satellite datasets.The smallest RSME of merged data indicate that the total rainfall amount of the merged data during the evaluation period showed the lowest difference from the total amount of gauged rainfall.
The highest CC and POD highlight the best consistency between merged data and ground gauge data on days when most regions in the basin were rainy.The RB of DBMA-merged data is at an intermediate level among the satellite datasets as it is the weighted average of those three datasets.The higher FAR and lower CSI of DBMAmerged data could be attributed to that the merging method detected rainfall events when rainfall estimate is higher than zero in any one of the three satellite datasets and thus resulted in overestimated rainfall occurrence.The overestimated rainfall occurrence might have small effects on the estimation of rainfall amount, as most of the falsely alarmed events were tiny.It is noteworthy that the performance of the merged data shows smaller variance across elevation bands than that of the original satellite datasets.This is most likely benefiting from the spatially dynamic optimal weights for the original satellite data.However, the merged data presented the largest difference from gauged data at the altitudes of 3000-3500 m, because there are much less gauges on this elevation zone.Figure 6 shows CC of different datasets on specific gauges.The merged data presents higher CC values in regions where are densely gauged, i.e., the middle reaches of YZRB and the east part of the study region, which can be expected as the dense ground gauges provided strongly informative benchmark likelihoods for the estimation of satellite data weights.On most of the gauges (Figure 6a), the merged-data presented higher CC values than the IMERG data, which is consistent with Figure 5c.On contrary, the merged-data showed reduced CC than GSMaP and CMORPH on more gauges (Figures 6b-c), indicating that involving IMERG data in the merging procedure on these gauges lead to deteriorated consistence performance.5).

Comparisons with other datasets
To avoid interference of ground gauge data that merged in the DBMA dataset, the ETC method introduced in Section 3.2 was applied to compare the three merged datasets in Table 6.The RMSE and CC of DBMA calculated by ETC were 1.11 and 0.80, respectively, both of which are obviously superior compared to the corresponding values estimated by CHIRPS and MSWEP, indicating that DBMA data is closer to the true value of rainfall in the study region.Runoff simulations forced by the three merged datasets during June 10th 2014 to October 31st 2019 estimated by the corresponding optimal parameter sets were presented in Figure 8.Note that the daily runoff is normalized as Eq. 8 for data security reasons.Simulation by the CHIRPS data presented the lowest performance with NSE values of 0.75 and 0.78 in the calibration and validation periods, respectively.The DBMA forced simulation showed the highest performance with NSE values of 0.93 and 0.86 in the calibration and validation periods, followed by the MSWEP forced simulation which estimated NSE values of 0.9 in the calibration period and 0.76 in the validation period.The performance of streamflow forced by the merged datasets are consistent with the agreements between the merged rainfall estimates and ground truth shown in Table 6.

Summary
We collated ground-based rainfall observations from a dense gauge network over southern TP.The gauged data provides crucial ground references of measured rainfall.
Based on this rain gauge network and three satellite rainfall datasets of IMERG, GSMaP, and CMORPH, a merged rainfall dataset in six warm seasons from June 10 th to October

Figure 1 .
Figure 1.(a) The location and topography of the TP and the spatial distributions of CMA gauges.

Figure 2 .
Figure 2. In the first step, a training dataset was formed by selecting samples from the ground gauged data and three original satellite datasets.The training period was set as 40 days.Increasing the length of the training period did not lead to obvious improvement of the merging method(Ma et al., 2018).In the second step, the training dataset was transformed by the Box-Cox Gaussian distribution, and the optimal weights for each of the original satellite datasets on a specific grid where a ground gauge is located on each training day were estimated by a logarithmic likelihood equation and the optimal expectation algorithm.In the third step, an ordinary Kriging interpolation method was applied to spatially interpolate the daily weights onto grids with no gauges.Finally, posterior spatiotemporal weights were used to obtain the final merged rainfall dataset.The DBMA-merged data has been proved inMa et al. (2017)  to outperform original satellite data during 2007-2012 over TP.

Figure 3 .
Figure 3. Spatial pattern of mean rainfall over six warm seasons in 2014-2019 of the DBMAmerged data in southern TP.In addition, Figure 4 compares the time series of average daily weight and rainfall over the YZRB basin derived from the DBMA-merged data and the original satellite datasets.As expected, the DBMA-merged daily rainfall in general fall in the envelope ranges of the three satellite datasets.Merged data is closer to CMORPH in June, September and October, while showing equal closeness to all the three source satellite data in July and August.It indicates that CMORPH is closer to the in-situ gauges than

Figure 4 .
Figure 4. Seasonal variations in basin-averaged (a) weights and (b) rainfall estimates of the multiyear daily values of IMERG, GSMaP, CMORPH and DBMA.

Figure 5 .
Figure 5. Comparisons of the statistical indices of (a) RB, (b)RMSE, (c) CC, (d) POD, (e) FAR and (f) CSI for training gauges, test gauges and all gauges at five elevation bands.

Figure 6 .
Figure 6.Spatial distributions of CC difference between (a) DBMA and IMERG, (b) DBMA and GSMaP, (c) DBMA and CMORPH Figure 7 presents the distributions of NSE values estimated by the ensemble parameter sets of the merged and original rainfall forces.It is shown that streamflow simulated by the DBMA data at the Nuxia station presented higher NSEs and smaller uncertainty ranges than that simulated by the original satellite datasets, indicating that streamflow simulations driven by the merged dataset showed stronger robustness and were less affected by uncertainty of parameter calibration.In addition to the Nuxia hydrological station, model performance on simulating streamflow at the interior hydrological stations of Yangcun, Nugesha, Gongbujiangda and Lhasa (Figure1) were evaluated in Figure7.It shows that the IMERG forced simulations presented poor NSE outliers lower than zero at the Lhasa station, in spite of their good performance at the Yangcun and Nugesha stations; the GSMaP forced simulations presented large uncertainty ranges in calibration period at Nugesha and Lhasa, and in validation period at Nuxia and Gongbujiangda; the CMORPH forced simulations showed the worst performance in validation period at the interior hydrological stations, despite their sound good performance in calibration period at Yangcun and Nugesha.In comparison to the satellite datasets, the DBMA forced simulations tend to perform consistently better with smaller uncertainties at all the hydrological stations, which can be attributed to that the merged data incorporated the advantages of different datasets in different regions and temporal periods and thus better captured the spatial variability of rainfall inputs in sub-basins.
31 st during 2014-2019 over the southern TP was established.The DBMA method was used to estimate weights varying in space and time of the three satellite datasets for the merged data.The merged rainfall dataset presented improved performance on representing the total amount of rainfall and detecting the occurrence of gauged rainfall events, and provide a more reliable forcing for hydrological simulations in the YZRB, compared to the original satellite datasets.Comparisons with previous merged rainfall datasets of CHIRPS v2.0 and MSWEP v2 that used relatively sparse rain gauges in the study area demonstrated high values of the newly installed rain gauges for providing robust ground reference for the merging of current satellite datasets.Our results indicated that the merged datasets can meet the critical needs of accurate forcing inputs for the simulations of warm season floods and the robustness calibration of hydrological models.Based on this high-accuracy rainfall data and reliable hydrological modelling, much further research in this region then could be conducted, for example, fluvial sediment transport modelling through coupling sediment and hydrology, validation and correction of precipitation from Global Climate Model, and future runoff projections based on reliable modelling calibration in history.

Table 1 .
Multiple satellite precipitation datasets used in this study.
Table 2 lists the numbers of training and test gauges in each of the warm seasons.The spatial distributions of gauges in each year are presented in Figure S1.Data from all gauges were involved in the training procedure of the final released version of the merged data.

Table 2 .
Number of rain gauges for training and test in 2014-2019.

Table 4 .
Calibrated parameters of the THREW model.

Table 6 .
Statistical RMSE and CC of merged datasets calculated by the ETC method.