A high-accuracy rainfall dataset by merging multiple satellites and dense gauges over the southern Tibetan Plateau for 2014–2019 warm seasons

Tibetan Plateau (TP) is well known as Asia’s water tower from where many large rivers originate. However, due to complex spatial variability in climate and topography, there is still a lack of a high-quality rainfall dataset for hydrological modeling and flood prediction. This study therefore aims to establish a highaccuracy daily rainfall product through merging rainfall estimates from three satellites, i.e., GPM-IMERG, GSMaP and CMORPH, based on a high-density rainfall gauge network. The new merged daily rainfall dataset with a spatial resolution of 0.1 focuses on warm seasons (10 June–31 October) from 2014 to 2019. Statistical evaluation indicated that the new dataset outperforms the raw satellite estimates, especially in terms of rainfall accumulation and the detection of ground-based rainfall events. Hydrological evaluation in the Yarlung Zangbo River basin demonstrated high performance of the merged rainfall dataset in providing accurate and robust forcings for streamflow simulations. The new rainfall dataset additionally shows superiority to several other products of similar types, including MSWEP and CHIRPS. This new rainfall dataset is publicly accessible at https://doi.org/10.11888/Hydro.tpdc.271303 (Li and Tian, 2021).


Introduction
Precipitation, linking atmospheric and hydrological processes, serves as a crucial component of the water cycle (Eltahir and Bras, 1996;Trenberth et al., 2003).Gridded precipitation datasets become more and more popular with the advent of satellite precipitation measurement.The most famous satellite gridded precipitation datasets include the Tropical Rainfall Measuring Mission (TRMM) (Huffman et al., 2007) and its successor the Integrated Multi-satellite Retrievals for Global Precipitation Measurement mission (GPM-IMERG) (Hou et al., 2014), the Global Satellite Mapping of Precipitation (GSMaP) (Ushio et al., 2009), the Climate Prediction Center (CPC) MORPHing technique (CMORPH) (Joyce et al., 2004), etc.These products have been successfully applied in various hydrometeorological studies and water resources management practices (Kidd and Levizzani, 2011;Jiang et al., 2012;Tong et al., 2014;Yang et al., 2015;Sun et al., 2016;Wang et al., 2017).
Published by Copernicus Publications.
However, all existing precipitation datasets show insufficient accuracy in high mountainous regions (Derin et al., 2016(Derin et al., , 2018(Derin et al., , 2019;;Zhang and Anagnostou, 2019), which hinders our understanding of climate and hydrological processes over these areas.This can be attributed to the complex physical nature of electromagnetic transmission and precipitation-forming processes (Hong et al., 2007;Bitew and Gebremichael, 2010;Dinku et al., 2010) and harsh environments in high mountains that lead to very limited deployment of in situ rain gauges with insufficient representation of ground observations for training satellite-based precipitation retrieval algorithms.For instance, the Tibetan Plateau (TP) as the roof of the world is surrounded by imposing mountain ranges with an average elevation exceeding 4000 m.It generates several large rivers in Asia and provides invaluable freshwater resources for more than 1.4 billion people living downstream (Immerzeel et al., 2010).However, this vast plateau has a very limited number of precipitation gauges across its 2.5 × 10 6 km 2 area.The precipitation gauge network operated by the China Meteorological Agency (CMA) contains only 86 gauges over the entire TP (Fig. 1).These gauges are essential to correct satellite precipitation datasets.For example, the GPM-IMERG "Final" Run dataset uses the Global Precipitation Climatology Centre (GPCC) database, and GSMaP_Gauge and the CMORPH use the NOAA Climate Prediction Center (CPC) database.Although both GPCC and CPC databases received data through the Global Telecommunication System (GTS), only part of the abovementioned gauges in the TP were utilized (Xie et al., 2007;Becker, 2013).Previous evaluations over the TP indicated that most products present dependence on topography to varying degrees, and products adjusted by gauge observations show better performance than satelliteonly products (Gao et al., 2013;Lu and Yong, 2018).Therefore, a better spatial coverage of rain gauges is critical to correct satellite products in high mountains.
In 2014, the Ministry of Water Resources of China (MWR) launched the flash flood monitoring and alarming campaign.A large number of rain gauges are now accessible over the TP, especially in the southern TP.There are 440 new rain gauges totally involved in 6 years and have been available since 2014, independent of the existing CMA precipitation gauge network (Fig. 1).These gauges provide measurements of precipitation in the liquid phase (i.e., rainfall) at the event timescale.A couple of recent studies have demonstrated the utility of this rain gauge network (Xu et al., 2017;He et al., 2017;Tian et al., 2018;Wang et al., 2020).For instance, Xu et al. (2017) evaluated the performance of TRMM and GPM and the dependence on topography and rainfall intensity based on the network.Their results demonstrated that the data quality of this dense gauge network is strictly controlled, serving as the currently highest gauge density for satellite product evaluation on the TP.Wang et al. (2020) used the gauge data to validate their reproduced precipitation dataset.However, there is not a merging product that assimilates the observations from this dense rain gauge network.This is apparently a unique opportunity to improve the performance of existing satellite-based precipitation datasets for its highest density and quality.
This study aims to provide a high-accuracy rainfall dataset by merging all available ground gauges and three goodquality satellite precipitation datasets over the southern TP for the warm seasons (10 June-31 October) from 2014 to 2019.The remainder of this paper is organized as follows: Sect. 2 describes the study area and the source data.Section 3 provides details of the data merging method and the methods adopted to evaluate the quality of dataset.Results are presented in Sect. 4. The data availability and summary are provided in Sects.5 and 6, respectively.

Southern Tibetan Plateau
The Tibetan Plateau, known as the Asian water tower, mainly covers parts of China, India, Myanmar, Bhutan, Nepal and Pakistan.Various climate systems affect the plateau, including westerly winds in winter and the Indian monsoon in summer (Yao et al., 2012).Many large Asian rivers originate from this vast area, including the Yellow River, Yangtze River, Yarlung Zangbo River (YZR), Jinsha River (JR), Lancang River (LR), Salween River (SR), Irrawaddy River (IR), Ganges River (GR) and Indus River (IDR).This study is focused on the southern part of the TP (Fig. 1), including the upper YZR basin (YZRB) as a major basin.

Ground gauged rainfall
We combined two rain gauge networks managed by MWR and CMA to obtain a high-quality ground reference dataset up to date.The number of rain gauges is presented in Fig. 1b and varies across different years.The spatial distribution of all gauges is presented in Fig. 1c.The gauges are mainly located in the middle reaches of the YZRB and the eastern part of the study area.Despite the high density, we can see these rain gauges are not evenly distributed across the space.This makes satellite rainfall products over varying altitudes and aspects important.Daily rainfall observations during the warm seasons of 2014-2019 were accumulated from the original event-scale measurements.The total number of the CMA and MWR gauges ranges from 53 in 2015 to 377 in 2018, forming the densest rain gauge network up till now.
The CMA gauge data have been widely demonstrated as reliable and accurate in previous studies (Zhai et al., 2005;Su et al., 2020;He et al., 2020).Gauge data used in this study have been manufactured under strict quality control procedures, including an (1) internal consistency check, (2) extreme values check (0-85 mm h −1 ) and (3) spatial consistency check (Ren et al., 2010).Rain gauges with erroneous values (e.g., enormously large values) were discarded from the entire records.In cold seasons there are many missing values, and only few gauges meet the requirements of the strict quality control method.So the warm seasons from 10 June to 31 October were selected as the study period to maintain the high quality of the outcome rainfall dataset, while gauged rainfall data are continuously collected to update our merged rainfall data.

Methodology
We used the dynamic Bayesian model averaging (DBMA) method (Ma et al., 2018a) to merge the satellite datasets with in situ rain gauges.To evaluate the quality of the new dataset, we carried out statistical and hydrological evaluations and comparisons with CHIRPS and MSWEP in the southern TP.

Dynamic Bayesian model averaging method
The DBMA method developed by Ma et al. (2018a) was utilized in this work.A flow chart of the merging method is shown in Fig. 2. In the first step, a training dataset was formed by selecting samples from the ground gauged data and three original satellite datasets.The training period was set as 40 d.Increasing the length of the training period did not lead to obvious improvement of the merging method (Ma et al., 2018a).In the second step, the training dataset was transformed by the Box-Cox Gaussian distribution, and the optimal weights for each of the original satellite datasets on a specific grid where a ground gauge is located on each training day were estimated by a logarithmic likelihood equation and the optimal expectation algorithm.In the third step, an ordinary kriging interpolation method was applied to spatially interpolate the daily weights onto grids with no gauges.Finally, posterior spatiotemporal weights were used to obtain the final merged rainfall dataset.The DBMA-merged data have been proven in Ma et al. (2018b) to outperform original satellite data during 2007-2012 over the TP.
For statistical evaluation of the merged data against ground gauges, around 85 % of the gauges were randomly selected to form a training gauge set for the merging approach in each

Statistical evaluation
Performance of the multiple datasets was statistically evaluated by comparing with ground observations on the corresponding satellite grids.Relative bias (RB) and normalized root mean square error (RMSE) were adopted to measure the amount difference between the gridded rainfall and the gauged rainfall.The correlation coefficient (CC) was used to evaluate the consistency between satellite estimates and gauge observations.The skill of rainfall data on detecting rainfall occurrence (rainfall events higher than zero) was evaluated through a set of metrics (similarly to Wilks, 2011), i.e., the probability of detection (POD), assessing how good the multiple rainfall datasets are at detecting the occurrence of rainfall; false alarm ratio (FAR), measuring how often the gridded rainfall datasets detect rainfall when there actually is not rainfall; and critical success index (CSI), measuring the ratio of rainfall events that are correctly detected by the gridded datasets to the total number of observed or detected events.Equations for the above metrics are shown in Table 3.
For the equations listed in Table 3, n is the total number of gridded product data and gauge observation data, i is the ith satellite product datum and gauge observation datum, Earth Syst.Sci.Data, 13, 5455-5467, 2021 https://doi.org/10.5194/essd-13-5455-2021Table 3. Statistical indices that were used to assess the performance of the gridded rainfall datasets.

Statistical indicators Equation
Optimal value Equation numberRelative bias (RB) G i means gauge observation, and G is the average of gauge observation.S i and S are gridded estimates and their average, respectively; a represents hit (i.e., event was detected to occur and observed to occur), b represents false alarm (i.e., event was detected to occur but not observed to occur), and c represents miss (i.e., event was not detected to occur but observed to occur).The triple collocation (TC) technique provides a platform for quantifying the root mean square errors of three products that estimate the same geophysical variable (Stoffelen, 1998).Roebeling et al. (2012) successfully applied the TC technique to estimate errors in three rainfall products across Europe.An extended triple collocation (ETC) introduced in McColl et al. (2014), which is able to estimate errors and correlation coefficients with respect to an unknown target, was used in this study to compare the performance of the DBMAmerged data and two previous merged datasets of CHIRPS and MSWEP.

Hydrological evaluation
In addition to the statistical assessments against rain gauges, hydrological assessment was used as a tool to test the per-formance of merged rainfall datasets on forcing hydrological modeling in the study area (similarly see Yong et al., 2012Yong et al., , 2014;;Xue et al., 2013;Li et al., 2015).In this section, a semi-distributed hydrological model developed by Tian (2006), namely the Tsinghua Hydrological Model based on Representative Elementary Watershed (THREW), was adopted for the hydrological assessment of rainfall datasets in the YZRB.The YZRB has a drainage area of approximately 240 480 km 2 within China's boarder.The basin elevation ranges from 143 to 7261 m, with an average of around 4600 m.YZR is one of the most important transboundary rivers in South Asia and the highest river in the world, which is characterized by a dynamic fluvial regime with an exceptional physiographic setting spreading along the eastern Himalayan region (Goswami, 1985).Due to complex terrain and strongly varying elevation, the YZRB is under control of a variety of climate systems, such as the semi-arid plateau climate prevailing in the upper and middle reaches and the mountainous subtropical and tropical climates prevailing in the lower reaches.In the cold upper reaches, the mean annual rainfall is less than 300 mm.In the warm middle reaches, the mean annual rainfall falls between 300 and 600 mm.https://doi.org/10.5194/essd-13-5455-2021 Earth Syst.Sci.Data, 13, 5455-5467, 2021 The whole basin area above the Nuxia hydrological station was divided into 63 representative elementary watersheds (REWs).Model parameters were calibrated by daily discharges measured at the Nuxia station.The calibration period is scheduled to run in the warm seasons from 10 June to 31 October in 2014-2017, encompassing a period length of 576 d.The validation period includes two warm seasons in 2018 and 2019 with a total duration of 288 d.Descriptions of the calibrated model parameters can be found in Table 4.An automatic algorithm pySOT developed by Eriksson et al. (2019) was used to optimize the parameter values based on an objective function of Nash-Sutcliffe efficiency coefficient (NSE) (Nash and Sutcliffe, 1970) in Eq. ( 7).To conduct a continuous hydrological simulation in the study period, the datasets of daily grid-based precipitation over China (Zhao et al., 2014) were used as model inputs in the non-warm seasons when merged rainfall is not available.
where N is the total number of days in the evaluation period, and Q n obs and Q n sim represent the observed and simulated runoff on the nth day, respectively.Q obs represents the average of observed runoff in the evaluation period.

Spatiotemporal patterns
Based on the merging method, a new daily rainfall dataset with a spatial resolution of 0.1 • × 0.1 • in the warm seasons from 10 June to 31 October (144 d in each year) in 2014-2019 (864 d in 6 years) was generated.Figure 3 presents the spatial pattern of the mean rainfall over the six warm seasons of the merged data in the southern TP.It is shown that extremely high summer rainfall centers concentrate in the southeast and southwest of the study area, which is known as a world-famous heavy rainfall center (see Biskop et al., 2016;Bookhagen and Burbank, 2006;Kumar et al., 2010).
In addition, Fig. 4 compares the time series of average daily weight and rainfall over the YZRB basin derived from the DBMA-merged data and the original satellite datasets.As expected, the DBMA-merged daily rainfall in general fall in the envelope ranges of the three satellite datasets.Merged data are closer to CMORPH in June, September and October while showing equal closeness to all three source satellite data in July and August.It indicates that CMORPH is closer to the in situ gauges than IMERG at basin scale when the rainfall value is small, especially for light-rainfall events with less than 2 mm of rainfall, but this difference tends to be small for heavy-rainfall events.

Statistical evaluation
Figure 5 shows the statistical evaluation of the merged and original datasets in the warm seasons.The statistical indices were calculated for three gauge groups including the training gauges, the test gauges and all gauges at different elevation bands.The datasets in general presented comparable performance for the training and test gauge groups, indicating that the sampling procedure of ground gauges is adequately random.The comparable performance of merged data in the training and test gauge groups demonstrated robustness of the merging method for varying gauges.In terms of RSME, CC and POD, the DBMA-merged data show much better performance in all gauge groups and elevation bands than the original satellite datasets.The smallest RSME of merged data indicates that the total rainfall amount of the merged data during the evaluation period showed the lowest difference from the total amount of gauged rainfall.The highest CC and POD highlight the best consistency between merged data and ground gauge data on days when most regions in  the basin were rainy.The RB of DBMA-merged data is at an intermediate level among the satellite datasets as it is the weighted average of those three datasets.The higher FAR and lower CSI of DBMA-merged data could be attributed to the fact that the merging method detected rainfall events when the rainfall estimate is higher than zero in any one of the three satellite datasets and thus resulted in overestimated rainfall occurrence.The overestimated rainfall occurrence might have small effects on the estimation of rainfall amount as most of the false alarm events were tiny.It is noteworthy that the performance of the merged data shows smaller variance across elevation bands than that of the original satellite datasets.This is most likely benefiting from the spatially dynamic optimal weights for the original satellite data.However, the merged data presented the largest difference from gauged data at the altitudes of 3000-3500 m because there are far fewer gauges in this elevation zone.
Figure 6 shows the CC of different datasets for specific gauges.The merged data present higher CC values in regions that are densely gauged, i.e., the middle reaches of the YZRB and the eastern part of the study region, which can be expected as the dense ground gauges provided strongly informative benchmark likelihoods for the estimation of satellite data weights.For most of the gauges (Fig. 6a), the merged data presented higher CC values than the IMERG data, which is consistent with Fig. 5c.In contrast, the merged data showed a reduced CC compared to GSMaP and CMORPH for more gauges (Fig. 6b and c), indicating that involving IMERG data in the merging procedure for these gauges leads to deteriorated consistence performance.4).

Uncertainty analysis
The automatic algorithm pySOT was run 200 times to investigate the modeling uncertainty caused by parameter calibration.Figure 7   NSEs and smaller uncertainty ranges than that simulated by the original satellite datasets, indicating that streamflow simulations driven by the merged dataset showed stronger robustness and were less affected by uncertainty in parameter calibration.
In addition to the Nuxia hydrological station, model performance when simulating streamflow at the interior hydrological stations of Yangcun, Nugesha, Gongbujiangda and Lhasa (Fig. 1) was evaluated in Fig. 7.It shows that the IMERG-forced simulations presented poor NSE outliers lower than zero at the Lhasa station in spite of their good performance at the Yangcun and Nugesha stations; the GSMaPforced simulations presented large uncertainty ranges in the calibration period at Nugesha and Lhasa and in the validation period at Nuxia and Gongbujiangda; the CMORPHforced simulations showed the worst performance in the validation period at the interior hydrological stations, despite their sound performance in the calibration period at Yangcun and Nugesha.In comparison to the satellite datasets, the DBMA-forced simulations tend to perform consistently better, with smaller uncertainties at all the hydrological stations, which can be attributed to the fact that the merged data incorporated the advantages of different datasets in different regions and temporal periods and thus better captured the spatial variability in rainfall inputs in sub-basins.

Comparisons with other datasets
To avoid interference of ground gauge data merged in the DBMA dataset, the ETC method introduced in Sect.3.2 was applied to compare the three merged datasets in Table 6.The RMSE and CC of DBMA calculated by ETC were 1.11 and 0.80, respectively, both of which are obviously superior compared to the corresponding values estimated by CHIRPS and  Runoff simulations forced by the three merged datasets during 10 June 2014 to 31 October 2019 estimated by the corresponding optimal parameter sets are presented in Fig. 8.Note that the daily runoff is normalized as Eq. ( 8) for data se-curity reasons.Simulation by the CHIRPS data presented the lowest performance, with NSE values of 0.75 and 0.78 in the calibration and validation periods, respectively.The DBMAforced simulation showed the highest performance with NSE values of 0.93 and 0.86 in the calibration and validation periods, followed by the MSWEP-forced simulation, which estimated NSE values of 0.9 in the calibration period and 0.76 in the validation period.The performance of streamflow forced by the merged datasets are consistent with the agreements between the merged rainfall estimates and ground truth shown in Table 6.

Summary
We collated ground-based rainfall observations from a dense gauge network over the southern TP.The gauged data pro- Supplement.The supplement related to this article is available online at: https://doi.org/10.5194/essd-13-5455-2021-supplement.
Author contributions.FT and KL designed the research.KL, RX and YM developed the approach and datasets.KL downloaded the datasets and performed most of the computation and analysis work.LY, ZH, HL and MYAK contributed to the revising of the paper.
Competing interests.The contact author has declared that neither they nor their co-authors have any competing interests.
Disclaimer.Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Acknowledgements.Ground gauge data from the hydrological bureau of MWR are acknowledged here.
Financial support.This research has been financially supported by the National Natural Science Foundation of China (grant no.92047301) and National Key R&D Program of China (grant no.2018YFC1508102).
Review statement.This paper was edited by Ge Peng and reviewed by two anonymous referees.

Figure 3 .
Figure 3. Spatial pattern of mean rainfall over six warm seasons in 2014-2019 of the DBMA-merged data in the southern TP.

Figure 4 .
Figure 4. Seasonal variations in basin-averaged (a) weights and (b) rainfall estimates of the multiyear daily values of IMERG, GSMaP, CMORPH and DBMA.
simulation Performance of the THREW model forced by different rainfall datasets is compared in Table 5, including NSE in the calibration period (NSEcal), NSE in the validation period (NSEval) and RB.The DBMA-merged dataset achieved the best runoff simulation among all rainfall inputs, with NSE reaching 0.93 and 0.86 in the calibration and validation period, respectively, indicating an excellent agreement between simulated and observed hydrographs.Both IMERG and GSMaP underestimated the measured daily discharge, but the DBMA-merged dataset improved such underestimations (see RB values in Table

Figure 5 .
Figure 5. Comparisons of the statistical indices of (a) RB, (b) RMSE, (c) CC, (d) POD, (e) FAR and (f) CSI for training gauges, test gauges and all gauges at five elevation bands.

Table 1 .
Multiple-satellite precipitation datasets used in this study.

Table 2 .
Number of rain gauges for training and testing in 2014-2019.
year during 2014-2019, and the remaining 15 % were used for testing.The training method DBMA of 40 d was only conducted in the training dataset.Table 2 lists the numbers of training and test gauges in each of the warm seasons.The spatial distributions of gauges in each year are presented in Fig. S1 in the Supplement.Data from all gauges were involved in the training procedure of the final version of the merged data.

Table 4 .
Calibrated parameters of the THREW model.
presents the distributions of NSE values estimated by the ensemble parameter sets of the merged and original rainfall forces.It is shown that streamflow simulated by the DBMA data at the Nuxia station presented higher https://doi.org/10.5194/essd-13-5455-2021EarthSyst.Sci.Data, 13, 5455-5467, 2021

Table 6 .
Statistical RMSE and CC of merged datasets calculated by the ETC method.