A 439-year simulated daily discharge dataset (1861–2299) for the upper Yangtze River, China

The outputs of four global climate models (GFDL-ESM2M, HadGEM2-ES, IPSL-CM5A-LR and MIROC5), which were statistically downscaled and bias corrected, were used to drive four hydrological models (Hydrologiska Byråns, HBV; Soil and Water Assessment Tool, SWAT; Soil and Water Integrated Model, SWIM; and Variable Infiltration Capacity, VIC) to simulate the daily discharge at the Cuntan hydrological station in the upper Yangtze River from 1861 to 2299. As the performances of hydrological models in various climate conditions could be different, the models were first calibrated in the period from 1979 to 1990. Then, the models were validated in the comparatively wet period, 1967–1978, and in the comparatively dry period, 1991–2002. A multi-objective automatic calibration programme using a univariate search technique was applied to find the optimal parameter set for each of the four hydrological models. The Nash–Sutcliffe efficiency (NSE) of daily discharge and the weighted least-squares function (WLS) of extreme discharge events, represented by high flow (Q10) and low flow (Q90), were included in the objective functions of the parameterization process. In addition, the simulated evapotranspiration results were compared with the GLEAM evapotranspiration data for the upper Yangtze River basin. For evaluating the performances of the hydrological models, the NSE, modified Kling– Gupta efficiency (KGE), ratio of the root-mean-square error to the standard deviation of the measured data (RSR) and Pearson’s correlation coefficient (r) were used. The four hydrological models reach satisfactory simulation results in both the calibration and validation periods. In this study, the daily discharge is simulated for the upper Yangtze River under the preindustrial control (piControl) scenario without anthropogenic climate change from 1861 to 2299 and for the historical period 1861–2005 and for 2006 to 2299 under the RCP2.6, RCP4.5, RCP6.0 and RCP8.5 scenarios. The long-term daily discharge dataset can be used in the international context and water management, e.g. in the framework of Inter-Sectoral Impact Model Intercomparison Project (ISIMIP) by providing clues to what extent human-induced climate change could impact streamflow and streamflow trend in the future. The datasets are available at: https://doi.org/10.4121/uuid:8658b22a-8f98-4043-9f8f-d77684d58cbc (Gao et al., 2019). Published by Copernicus Publications. 388 C. Gao et al.: 439-year simulated daily discharge dataset


Introduction
Global warming is the long-term rise in average temperature of the earth's climate system. Warming temperature alters global water circulation processes and could significantly influence the sustainability of society and economy (Jung et al., 2011). The variation in water resource availability in the context of global warming is acknowledged as a focus of many international research projects (Stagl et al., 2016;Raman et al., 2018;Maisa et al., 2019). The long-term accurate (as much as possible) 5 daily discharge time series are crucial for in-depth understanding of the changes in streamflow, and they are needed for subsequent climate change impact studies. However, discharge is monitored usually only for short observational periods in most river basins.
For generation of the long-term streamflow series, many data mining techniques including the sedimentological method, the hydrological field survey method, and the documentary analysis method can be applied (Longfield et al., 2018). Nevertheless, 10 low temporal resolution and insufficient accuracy of these estimations can hardly meet the demands of practical and research applications. Instead, the observed climatic variables and the outputs of climate models have often been used to drive hydrological models to evaluate changes in streamflow in the context of climate change (Braud et al., 2010;Chen et al., 2017;Su et al., 2017;Dahl, 2018;Seneviratne et al., 2018). But there is lack of research on the quantitative estimation of long-term streamflow for period longer than 400 years under different scenarios with and without anthropogenic climate change (Meaurio, 15 2017).
The Yangtze River is the longest river in China. It originates from the Tibetan Plateau and enters the East China Sea after flowing through 11 provinces. With a large topographic gradient and substantial water supply of approximately 10,000 m 3 s -1 on the average, the upper Yangtze River is rich in hydropower resources, but subjected to destructive flash floods. The Yangtze River has the longest hydrological observations in China. Data provided by the Cuntan hydrological station, which started 20 operating in 1939, facilitates hydro-meteorological studies in the instrumental period (Su et al., 2008;Wang et al., 2008;Su et al., 2017). As changes in streamflow at the Cuntan station directly influence inflow to the Three Gorges Reservoir, establishing long-term discharge series at the Cuntan station can support effective management of hydraulic projects. Besides, the longer discharge series can also provide a possibility to explore impacts of anthropogenic climate change on hydrology for international climate change research community. Therefore, we simulated daily discharge at the Cuntan hydrological station 25 in the upper Yangtze River in the period 1861 -2299 using available climate model outputs.
The outputs of four downscaled GCMs (GFDL-ESM2M, HadGEM2-ES, IPSL-CM5A-LR, and MIROC5) are utilized to drive four hydrological models (HBV, SWAT, SWIM and VIC) to simulate discharge at the Cuntan station. The climate forcing comprise (a) the scenario with anthropogenic climate change for the period 1861 -2299, which is subdivided into the historical period (1861 -2005) and the future period (2006 -2299) under different Representative Concentration Pathways (RCPs), and 3 (b) the preindustrial control scenario (piControl) for the period 1861 -2299, which is used as a reference to detect the influence of anthropogenic climate change on streamflow in the upper Yangtze River.

Study Area
The catchment area of the Cuntan hydrological station (29 ° 37 ′ N, 106 ° 36 ′ E) in the upper Yangtze River is approximately 860,000 km 2 , and 352.7 billion m 3 water is flowing through this point annually with average discharge of 109,34 m 3 s -1 in the 5 period of instrumental measurements beginning in 1939. Location of the Cuntan hydrological station, 311 GCM grids, meteorological stations and spatial distribution of the land use and soil types in the upper Yangtze River basin are shown in The upper Yangtze River have complex geomorphic types and broken topography. Mountains and plateaus account for most of the region, hills and plains are few. Influenced by the East Asia subtropical monsoon and a complex topography, climate 10 varies across the basin with annual air temperature and precipitation being high in the southeast but low in the northwest headstream region. According to observational data, the areal averaged annual mean temperature and precipitation are 12.3 ℃ and 1018 mm, respectively, during 1961 -2017 in the upper Yangtze River basin.

15
The outputs of the GCMs (GFDL-ESM2M, HadGEM2-ES, IPSL-CM5A-LR, and MIROC5) were statistically downscaled and bias corrected on a regular 0.5 × 0.5 ° resolution grid using a first-order conservative remapping scheme (Frieler et al., 2017;Lange, 2018). The GFDL model was developed by the Geophysical Fluid Dynamics Laboratory, Princeton University, USA, and all its integrations (approximately 100 in total), including GFDL-ESM2M and GFDL-ESM2G, were completed for the Coupled Model Intercomparison Project Phase 5 (CMIP5) protocol (Taylor et al., 2012). HadGEM2-ES is a coupled earth 20 system model that was developed by the Met Office Hadley Centre, UK, for the CMIP5 centennial simulations (Jones et al., 2011). The IPSL-CM5A-LR model was developed by the Institute Pierre Simon Laplace, France, and the model was built around a physical core that includes atmosphere, land surface, ocean and sea ice components (Dufresne et al., 2013). MIROC5 is a new version of the atmosphere-ocean GCM that was developed by the Japanese research community (Watanabe et al., 2010).

25
Lack of long-term homogeneous observational data and existing of confounding influence from socioeconomic drivers, make GCM simulations rarely cover the preindustrial period. In this study, climate simulations include a piControl scenario, representing a climate with natural variability under stable CO2 concentration of 286 ppm; a historical scenario, representing the historical CO2 concentration; and future RCP scenarios, representing various future CO2 concentration pathways. The availability of climate scenarios for the different periods is shown in Table 1 (see also Frieler et al., 2017). Note that not all simulations cover 22 nd and 23 rd century. Data after 2099 are available from three models under RCP2.6, only from IPSL under RCP4.5 and RCP8.5, but no simulations under RCP6.0.

5
The observed daily meteorological data for 1951 -2017 from 189 ground-based stations in the upper Yangtze River Basin used in this study were quality controlled by considering changes in instrument type, station relocations, and trace biases at the National Meteorological Information Centre of China Meteorological Administration (Ren et al., 2010), which was inputted into the hydrological models by spatial interpolation. During 1951 -2017, annual precipitation shows a decreasing trend, with multi-year average of 935 mm, and annual mean temperature has shown a positive trend with multi-year average of 10.5 ℃.  (Hu and Luo, 1992;Luo and Le, 1996)

GLEAM evapotranspiration data
Evapotranspiration data from the Global Land Evaporation Amsterdam Model (GLEAM) for 1986 -2005 that were released by the University of Bristol (Miralles et al., 2011) are used in our study to cross-check the performances of the hydrological models by means of the geographic information system (GIS) tools. The GLEAM data was generated based on a variety of satellite-sensor products at monthly scale with a spatial resolution of 0.25°. The spatial distributions of simulated 25 evapotranspiration with that from GLEAM are compared by GIS techniques, and kappa value of confusion matrix is also applied to evaluate the accuracy of simulated evapotranspiration (taking VIC output as an example) by refer to GLEAM.

Hydrological models and parameterization
Four hydrological models, HBV (Bergstrom et al., 1973), SWAT (Arnold et al., 1998), SWIM (Krysanova et al., 2005) and VIC (Liang et al., 1994) are used to simulate river discharge at the Cuntan hydrological station, and a flowchart of the hydrological modelling process is shown in Fig. 2. A brief introduction to these four hydrological models is given in Table 2 (see also Hattermann et al., 2017).
The univariate search technique, which can evaluate the informativeness of each feature individually, is used to calibrate the parameters. The objective functions include the Nash-Sutcliffe efficiency (NSE) of daily discharge (Nash and Sutcliffe, 1970) 5 and the weighted least squares function (WLS) of high flow (Q10) and low flow (Q90). To achieve the maximum NSE and the minimum gap between the observed and the simulated, parameterization processes are iterated over 2,000 times within the ranges of the valid parameter scopes in Table 3 (Lai et al., 2006).
For evaluating daily hydrograph simulation, ratio of the root mean square error to the standard deviation of measured data (RSR) is recommended (Moriasi et al., 2007). In addition, the Kling-Gupta efficiency (KGE) was developed to provide 10 diagnostic insights into the model performance by decomposing the NSE into three components: correlation, bias and variability (Gupta et al., 2009). In this study, four criteria, the NSE, RSR, Pearson's correlation coefficient (r) and KGE, are applied to the daily discharge series to evaluate the performance of hydrological models (Krysanova et al., 2018; Table   4) .Thresholds of acceptance of four criteria are derived from the references (Nash and Sutcliffe, 1970;Moriasi et al., 2007;Huang et al., 2012;King et al., 2012).

Climate change in the upper Yangtze River basin
According to ensemble mean of four GCMs, annual mean temperature in the upper Yangtze River basin in the period 1986 -25 2005 is 0.49 °C higher than that in the period 1861 -1900, the increase is lower than the global average of 0.61 °C in the same period. Compared to the piControl scenario, annual mean temperature is projected to increase significantly in the 21 st century, by 1.85 ~ 3.31 °C under RCPs. After 2100, surface air temperature will remain stable under RCP2.6 and increase only slightly under RCP4.5, but a significant increase in temperature will continue under RCP8.5, with an increase up to 13.5 °C by 2299 compared to the piControl scenario (Fig. 3a, Table 5). The visible abruption in temperature in the year 2100 under RCP4.5 and RCP8.5 in Fig. 3a are due to the fact that only the IPSL model runs are available after 2100 for these scenarios.
The long-term average monthly dynamics of temperature show a single-peak curve, with July is the hottest month. In the period 1861-2005, the inner-annual distribution pattern of temperature is very similar for the piControl and the historical scenarios (Fig. 3b). However, differences in the monthly temperatures between RCPs and piControl scenario become apparent 5 with time ( Fig. 3c-d). Taking the temperature in July as an example, difference between the two scenarios are approximately 1.9 ~ 3.2 °C in the 21 st century but will enlarge to 1.7 ~ 12 °C in the period 2100 -2299.
Compared with precipitation under piControl scenario, which has no monotonic trend, annual precipitation is approximately 2 % (16 mm) less in 1861 -2005 under historical scenario. With relative to the piControl scenario, changes of annual precipitation will be -1.2% ~ 1.3 % in the 21 st century under RCPs, and will be 0.6 %~ 2.2 % in the 2100 -2299 under RCP 10 2.6 and 4.5. Under RCP8.5, relative change of annual precipitation is -5.7 % and a wide range of fluctuations is projected with a variance as high as 94.3 in 2100 -2299, which is 63.2 % higher than the piControl scenario ( Fig. 4a- Table 5).
The long-term average monthly precipitation shows a single-peak curve, with precipitation highest in July and lowest in December and January. The differences in the long-term average monthly precipitation under RCPs and piControl scenario are projected to grow from -1.9 ~ 1.3 % before 2100 to -5.4 ~ 2.2 % in the period 2100 -2299 ( Fig. 4c-

Calibration and validation of the hydrological models
Previous study found that 1986/1987 was a change-point in the observational period for south China, with more obvious increase of temperature and decrease of precipitation since then (Thomas et al., 2012).  (Table 6). The KGE values are above the threshold in the calibration period for all models but slightly lower in the validation period for the SWIM and VIC models. The four hydrological models can also properly simulate high flow and low flow represented by Q10 and Q90 in calibration and validation periods. For example, Q10 result illustrates that the several severe floods mentioned previously are reproduced quite well by the model simulations: the peak flows of simulated discharge were 64,300 m 3 s -1 , 53,900 m 3 s -1 and 60,700 m 3 s -1 , respectively, in the 1930s, 7 1950s and 1990s, deviating by less than 10 % from the recorded peaks (Fig. 6).
To further validate the hydrological models, discharge simulated in another thirty-year historical period (1939 -1968) is compared with the observed data (Fig. 7).  (Fig. 8). Furthermore, a matrix consisting of 500 randomly selected pixels from simulated evapotranspiration by VIC and corresponding GLEAM grids is set up to get the kappa value. The deduced kappa value of 0.62 indicates a substantial agreement of two date sources.

Simulation of daily discharge for 1861 -2299
The simulated discharge time series for 1861 -2299 under the piControl scenario without anthropogenic climate change and scenarios with anthropogenic climate change effects are shown in Fig. 9a-b. Similar to precipitation trend, annual mean 15 discharge at the Cuntan station shows no significant trend from 1861 to 2299 under the piControl scenario. In historical period, annual mean discharge has shown a slight decrease trend in 1861 -2005. Under RCPs, annual mean discharge will be in a significant upward trend by the end of the 21 st century with increasing variation in the upper Yangtze River. Annual mean discharge shows no significant change since 2100 under RCP2.6 and RCP4.5, but a rapid decline is projected under high emission RCP8.5 scenario in future ( Fig. 9a- Table 5).
In 2270 -2299, a higher Q10 discharge is projected under RCP2.6 and RCP4.5 than the piControl scenario. Meanwhile, a higher Q90 discharge under RCP2.6 but a lower Q90 discharge under RCP4.5 is projected. But the relative changes of Q10 and Q90 discharge will reach -13.2 % and -50.4 % due to the rapid declining of discharge under RCP8.5 in 2270 -2299. The results indicate there will be more extreme hydrological events in the long run, especially under RCP8.5.
Similar to precipitation and temperature, average monthly discharge in 2070 -2099 and 2270 -2299 under both the piControl and RCP scenarios show single peak. Under RCP 4.5, a higher flood volume of August is projected in periods of 2070 -2099 and 2270 -2299 than the piControl scenario. Meanwhile, a higher volume in 2070 -2099 but a lower in 2270 -2299 5 under RCP8.5 is projected. Under RCP2.6, the flood volume of August is similar to piControl in both periods (Fig. 10ab). The Generalized Logistic Distribution (GLD), which is the optimistic distribution by Kolmogorov -Smirnov goodness of fit test, is applied to describe the statistical distribution of the daily maximum discharge (represented by annual Q10) for 2070-2099 and 2270-2299. It is found that the return level of daily maximum discharge under RCP2.6, RCP4.5, RCP6.0 and RCP8.5 are higher than piControl scenario in 2070 -2099 (Fig. 10c). Under RCP 4.5, a higher average of return level of 10 daily maximum discharge is projected in periods of 2070 -2099 and 2270 -2299 than the piControl scenario. For RCP8.5, the average of return level of daily maximum discharge is higher in 2070 -2099 but lower in 2270 -2299 than piControl scenario. Under RCP2.6, the average of return level of daily maximum discharge is similar to piControl scenario in both periods ( Fig. 10c-d).

15
The current study generates daily discharge series for the upper Yangtze River at the Cuntan gauging station in the period 1861 To ensure the reliability of simulated runoff, a multi-objective automatic calibration programme using a univariate search technique is applied to obtain the optimal parameter set for each hydrological model. For the objective functions, the daily discharge and the indicators of high and low flow are considered. Four criteria, including the NSE, KGE, RSR and r, are used 15 to evaluate the parameterization results. To assess the models' ability to satisfactorily simulate discharge under different climate conditions, hydrological models are validated both in dry and wet periods. Besides, evapotranspiration outputs by simulation process are compared with remote-sensing-based evapotranspiration from the GLEAM dataset to further validate performance of the models.
Previous studies have shown that HBV, SWAT and VIC hydrological models could be applied to the Cuntan station in the 20 upper Yangtze River after calibration (Huang et al., 2016；Su et al., 2017Chen et al., 2017). Our study proves that HBV, SWAT, SWIM and VIC models can satisfactorily simulate precipitation-runoff relation in a changing climate. Moreover, simulated extreme peak values in the 1930s, 1950s and 1990s are also in good agreement with the historical documented records of the catastrophic floods in the Yangtze River.
Although the simulation results are tested by several criteria, there are still uncertainties that could influence the outputs. These 25 uncertainties are associated with the GIS data (e.g., landuse data), selection of the GCMs, the model calibration procedure, and exclusion of water management practices, etc (Gerhard, et al., 2018). First, as no dynamic landuse data are available for the historical period before the 1980s and for the future, a static land use for 1990 is used for simulating river discharge before and after the industrial revolution (historical and RCP scenarios). Second, although the most up-to-date climate scenarios are used in this study, downscaling of GCMs and setting of climate scenarios still contribute a lot to the uncertainties in the hydrological simulations. Third, hydrological models are parameterized using the automatic calibration programme. The parameterization effect and model applicability are assessed according to the NSE, KGE, and RSR criteria. However, due to equifinality, there could be other parameter sets that may result in a similarly good performance. Combination of parameters and not the choice of individual parameter ultimately influences the result (Cheng et al., 2014). There is a lack of analyses on the effects of different parameter combinations in this study, and the uncertainty related to specific parameters in the models 5 needs to be analysed further. Fourth, since the 1990s, human interferences have escalated in the upper Yangtze River. The construction of dikes and reservoirs alter the timing and volume of peak discharge and base flow. Without consideration of effects of human interferences, but rather focus merely on the natural streamflow is one of the limitations in this study.
The datasets generated in our study are the only available long-term and relatively high-precision discharge sequences for the upper Yangtze River, which include 16 combinations of four hydrological models driven by four GCMs. Simulations by 10 multiple hydrological models and GCMs can provide a range of streamflow variations in future, which is a clue for water resource management strategies. According to our simulation results, the daily simulated discharge will be reduced with the decreasing precipitation in the future. Comparison of long-term simulated daily discharge under RCPs with anthropogenic climate change and under the piControl scenario without human-induced climate change can provide support to understand to which extent human-induced climate change may impact hydrological regime in the upper Yangtze River basin.    : ratio between the standard deviations of the simulated and observed data; β: ratio between the mean simulated and mean observed discharge (Nash and Sutcliffe, 1970) Ratio of the root mean square error and the standard deviation of observation (