The stochastic weather generator CLIGEN can simulate long-term weather
sequences as input to WEPP for erosion predictions. Its use, however, has
been somewhat restricted by limited observations at high spatial–temporal
resolutions. Long-term daily temperature, daily, and hourly precipitation
data from 2405 stations and daily solar radiation from 130 stations
distributed across mainland China were collected to develop the most
critical set of site-specific parameter values for CLIGEN. Ordinary kriging
(OK) and universal kriging (UK) with auxiliary covariables, i.e., longitude,
latitude, elevation, and the mean annual rainfall, were used to interpolate
parameter values into a
Weather generators (WGs) are stochastic models that can generate arbitrarily long sequences of weather variables with statistical properties that are similar to observations for a specific location or area (Yin and Chen, 2020). Early WGs were originally developed to provide surrogate climate series for hydrological, soil erosion, and agricultural models when the observed data could not satisfy the application requirements due to missing data, limited record length, or spatial coverage (Wilks and Wilby, 1999). Since the 1990s, WGs have received increased attention as a statistical downscaling tool for the assessment of climate change impact (Katz and Parlange, 1996; Maraun et al., 2010). While global climate models (GCMs)/regional climate models (RCMs) have been used for climate projections, outputs from these models were often too coarse to meet the requirements of earth surface process models in terms of spatial–temporal resolutions and were biased compared with observations. Statistical downscaling methods, mainly including perfect prognosis (PP), model output statistics (MOS), and WGs, can be used to downscale and bias-correct the output from GCMs/RCMs prior to earth surface model applications (Maraun and Widmann, 2018; Yin and Chen, 2020).
CLIGEN is a stochastic WG developed based on the generators used in the EPIC
and SWRRB models (Williams et al., 1985, 1984) and was
released in 1995, initially accompanying the process-based Water Erosion
Prediction Project (WEPP) model from the United States Department of
Agriculture (Nicks et al., 1995). CLIGEN can simulate a series of long-term
climate data in daily scale, including maximum and minimum temperatures,
precipitation, solar radiation, dew point, wind velocity, and direction. In
addition, CLIGEN can generate three inter-storm variables in sub-daily
scale, including storm duration, time to peak intensity (
Of the 10 CLIGEN-simulated weather elements, seven, namely daily maximum
and minimum temperature, daily precipitation, duration,
Summary of CLIGEN input parameters and the data used for the calculation of parameters.
Thirteen groups of input parameters related to temperature, solar radiation, and precipitation as listed in Table 1 are all parameters needed by CLIGEN to generate the aforementioned seven climate elements. As a site-specific weather generator, input parameters for CLIGEN can be directly prepared for stations with observed data. CLIGEN was initially released in the United States with a set of 2600 weather station parameter files (Flanagan et al., 2001). Parameters for the daily temperature and daily precipitation were calculated directly based on the observations of temperature and precipitation from each station. Parameters for daily solar radiation and storm pattern were based on 142 weather stations with daily solar radiation and sub-daily rainfall observations first and then extended to 2000 other stations using the triangulation interpolation method (Scheele and Hall, 2000).
Parameter regionalization, which extends model parameter values from stations with observations to areas/regions without observations, is required when the model is going to be used in these areas/regions. Commonly used parameter regionalization methods can be categorized as follows: (1) the parametric transplantation method, where a reference area that is spatially near or has similar climate characteristics to the target area is first selected, and then the parameters of the reference area are extended to the target area (Cheng et al., 2016); (2) spatial interpolation methods such as Thiessen polygon, inverse distance weighted, or ordinary kriging, which interpolate parameter values based on spatial correlations of parameters among multiple stations (Hutchinson, 1995); (3) parameter transfer as a function of regional properties such as multiple regression, based on correlations between parameters and regional characteristics (Cowpertwait et al., 1996); (4) regionalization considering both the spatial correlation of parameters and the correlation between parameters and regional characteristics, including external drift kriging and universal kriging that can be treated as combination methods to take advantage of method (2) and (3) (Haberlandt, 1998; Semenov and Brooks, 1999).
The accuracy of parameter regionalization is known to be influenced by several factors. Firstly, regionalization of climate variables with lower or regular spatial variability generally performs better than highly heterogeneous and discontinuous variables. Secondly, for the same climate variable, temporal resolution plays an important role. The climate variable at a monthly or annual scale tends to perform better than variables at a daily or hourly scale because data with finer resolutions possess greater spatial variability. Thirdly, adopted approaches affect the efficiency of regionalization. For example, Wilks (2008) compared and evaluated the interpolation accuracy of four spatial interpolation methods for parameters of WGEN (weather generator), a weather generator developed by Richardson and Wright (1984), and results showed that locally weighted regressions outperformed Thiessen polygons and domain-wide (“global”) regressions. The accuracy of interpolation can be improved by adopting auxiliary covariables that are correlated with the regionalized climate variables into the regionalization process (Hengl et al., 2007). For example, elevation is frequently used as an auxiliary covariable and has been found to improve the interpolation of temperature and precipitation (Carrera-Hernández and Gaskin, 2007; Ly et al., 2013; Verworn and Haberlandt, 2011), especially in mountainous regions with complex terrains (Xu et al., 2018).
Several studies have been attempted at regionalization of CLIGEN input
parameters. Regionalization of CLIGEN input parameters for WEPP has
combined the parametric transplantation and spatial interpolation. When
CLIGEN was developed in the United
States to provide climate input to WEPP, parameter
values for 2600 stations were regionalized based on inverse distance
weighting (IDW). In the WEPP application, users identify the targeted
location, for which daily weather sequences using parameters from the
nearest stations will be automatically generated directly or by
interpolation from surrounding stations (up to 20 stations within a distance
of 1
Chen (2008) explored four spatial interpolation methods, inverse distance
weighting (IDW), ordinary kriging (OK), global polynomial interpolation
(GPI), and local polynomial interpolation (LPI), to regionalize the daily temperature- and precipitation-related input parameters of CLIGEN for 12
stations in the Loess Plateau of China. Paired
The overall aim of this study was to enable widespread use of CLIGEN to generate daily temperature, solar radiation, precipitation, and sub-daily precipitation variables anywhere in mainland China and to gain better understanding of the performance of various spatial interpretation techniques. Specific objectives of this study were to (1) assemble CLIGEN input parameter values for 2405 stations in mainland China based on meteorological observations; (2) evaluate spatial interpolation techniques for regionalizing CLIGEN parameters; and (3) produce grid-based CLIGEN temperature, solar radiation, and precipitation parameter values at 10 km resolution for mainland China.
Data lengths for daily temperature, daily and hourly precipitation, and daily solar radiation for stations used in this study.
Map of the study area showing the locations of meteorological stations used in this study.
Four datasets consisting of daily temperature, daily rainfall, and hourly rainfall from 2405 meteorological stations, as well as solar radiation data from 130 stations distributed across mainland China, were collected (Fig. 1) from the National Meteorological Information Center (NMIC) of the China Meteorological Administration (CMA) and have been quality controlled by NMIC. Data lengths were different for these four datasets (Table 2). Daily temperature and daily rainfall data were characterized by longer periods of observation for most stations compared with hourly rainfall data, especially for stations located in the northwest arid area and the Qinghai–Tibet Plateau where gauges for observing hourly rainfall for some stations were installed very late (Zhao, 1983; Wang and Zuo, 2009). Based on these four datasets, a total of 156 parameter values were calculated for each station. It should be noted that the 12th value of TimePk is equal to 1 by definition, and 155 parameters were involved in the calculation and interpolation. The siphon rain gauges used to record hourly rainfall were stopped in winter to avoid freezing failures; therefore, hourly rainfall was only available for the warm rainy season for some northern and western stations. Nine stations distributed in North China (Miyun, Zhengzhou, Harbin), Northwest China (Lanzhou, Ürümqi), the Tibetan Plateau (Lhasa), and South China (Fuzhou, Changsha, Haikou) were selected to further display the regional differences and monthly variability of input parameters (Fig. 1).
CLIGEN requires 13 groups of input parameters and 12 values for each group to stochastically simulate temperature, solar radiation, and precipitation (Table 1). Temperature-related input parameters, TMAX AV, SD TMAX, TMIN AV, and SD TMIN, are used to simulate the daily maximum and minimum temperature for each simulated day and to decide whether the simulated precipitation occurred as snowfall or rainfall (Table 1). These four values can be calculated using daily maximum and minimum temperature data for each month directly. Solar-radiation-related inputs SOL.RAD and SD SOL are used to generate daily solar radiation and can be directly obtained from observed daily solar radiation.
The wet-following-wet and wet-following-dry day transition probabilities,
MX.5P and TimePk are used to simulate inter-storm variables, including storm
duration (
In CLIGEN (Nicks et al., 1995), as in Arnold and Williams (1989), it is
assumed that the magnitude of precipitation intensity decreases
exponentially from the maximum rate when time distribution of precipitation
intensities is discarded. Therefore, the precipitation depth
There are 12 discrete values of TimePk for each station, describing an
empirical cumulative probability distribution of time to peak (Nicks et al.,
1995). The observed interval is
Kriging is a spatial interpolation method that gives the best linear
unbiased prediction of intermediate values, assuming a Gaussian
process governed by prior covariance. For a research region with
Both OK and UK were used to interpolate the CLIGEN input parameters in this study. Stepwise regression was conducted to select appropriate covariables for UK. The longitude, latitude, elevation, and annual rainfall amount were found correlated with the parameters – one for each month for CLIGEN with the exception of the SKEW P (Table 1); therefore, all these four variables were adopted as auxiliary covariables when UK was conducted to interpolate these 12 groups of parameters. SKEW P had low correlations with all four of these covariates but good correlation with parameters MEAN P and S DEV P. Therefore, MEAN P and S DEV P were selected as covariables during the interpolation of SKEW P.
A leave-one-out cross-validation method was used to evaluate the
interpolation accuracy of OK and UK. First, 1 of the 2405 stations was
excluded from data analysis and treated as unknown, and data for the remaining
2404 stations were then used to predict parameter values for the excluded
station using OK or UK. This leave-one-out procedure was repeated for 155
parameters for each of the 2405 stations (13 groups
Input parameters based on observed data and interpolated data using the
better interpolation technique were input into CLIGEN to evaluate the
influence of regionalized parameters on the simulation. For each station,
100 years of continuous climate series were generated using the default
CLIGEN stochastic seed without interpolation between months, and the
simulated data predicted by
Three basic statistics – the average, standard deviation, and skewness coefficient – were calculated for each CLIGEN-generated variable. The
absolute error (AE) and mean absolute error (MAE) were calculated to examine the differences between the two sets of statistics for generated temperatures. Relative error (RE) and mean absolute relative error (MARE) were calculated to examine the differences between the two sets of statistics for
generated daily solar radiation, daily precipitation, and sub-daily storm pattern:
Boxplot of CLIGEN temperature, solar radiation, and precipitation parameters obtained from observations in mainland China.
Map of the study area showing the spatial distribution of CLIGEN temperature-related parameters of mainland China in August. All parameters were regionalized using universal kriging.
Thirteen groups of CLIGEN temperature and precipitation parameters from 2405
stations and solar radiation parameters from 130 stations were plotted to
examine the inter-annual variation and the differences among parameters
(Fig. 2). The average maximum temperature and minimum temperature, TMAX AV and TMIN
AV (in unit of
Map of the study area showing the spatial distribution of CLIGEN precipitation-related parameters of mainland China in August. All parameters were regionalized using universal kriging.
The average and standard deviation of daily precipitation, MEAN P and S DEV P
(in inches, 1 in.
The wet-following-dry transition probability
MX.5P of nine example stations showed the regional differences more clearly in that the parameters of southern stations were relatively higher (Fig. 5c). Differences among southern and northern stations became gradually smaller in the warm season. It should be noted that the narrower range of MX.5P in winter was partially related to the limited availability of hourly data. Due to the restriction of low temperatures on siphon rain gauge observations, MX.5P in cold seasons was available for fewer stations than in warm seasons.
TimePk consists of 12 discrete values representing the cumulative
distribution of time to peak intensity ranging from 0 to 1 for a specific
location. The sixth value for TimePk represents the cumulative ratio of
storms with peak intensity occurring before
Comparison of the accuracy of OK and UK using the leave-one-out cross-validation.
The leave-one-out cross-validation showed that four groups of temperature
parameters (TMAX AV, SD TMAX, TMIN AV, SD TMIN), two groups of solar
radiation (SOL.RAD, SD SOL), and four groups of precipitation parameters at
a daily scale (MEAN P, S DEV P,
Comparison of the interpolation quality in terms of the root mean square error (RMSE) using ordinary kriging (OK) and universal kriging (UK) for temperature, solar radiation, and precipitation parameters.
In comparison with OK, the overall and monthly predicted accuracy using UK
with auxiliary covariables obviously improved TMAX AV, TMIN AV, SOL.RAD,
MEAN P, SKEW P,
Comparison of the interpolation quality using ordinary kriging (OK) and universal kriging (UK) for CLIGEN temperature, solar radiation, and precipitation parameters in August, and the eighth parameter of TimePk.
Cross-validation results showed that the interpolation of the two parameters related to storm patterns, i.e., MX.5P and TimePk, performed well. Three cross-validation statistics for these two parameters using two methods were numerically similar (Table 3). NSE over 12 months for MX.5P interpolated with OK and UK were both equal to 0.95. The seasonal variation in RMSE based on OK and UK follows a similar pattern (Fig. 6l–m). For TimePk, the RMSE values using OK were slightly lower than those using UK for the third, fourth, and fifth parameters but slightly higher for the others.
Interpolation accuracy has been adequately estimated through cross-validation, and these results indicated that the accuracy of interpolation results based on UK was generally higher than those based on OK. Therefore, two sets of CLIGEN-simulated climate series using observed inputs and UK-interpolated inputs were generated and compared to further evaluate the regionalized parameters using UK for the simulation of CLIGEN.
Comparison of CLIGEN generated daily temperature and solar radiation based on observed input parameters and UK-interpolated ones.
Comparison of CLIGEN-generated daily rainfall and annual rainy days based on observed input parameters and UK-interpolated ones.
CLIGEN-simulated daily temperature and solar radiation based on
UK-interpolated input parameters agreed well with those simulated based on
observed parameters. The average, standard deviation, and skewness
coefficient of generated daily maximum temperature, minimum temperature,
solar radiation, and daily precipitation generated using observed and
interpolated input parameters were calculated for each station, and the
simulated accuracies of the average and standard deviation were found to be
better than that of the skewness coefficient. The RMSE of the mean and
standard deviation were all less than 0.79
The absolute error (AE) of the average, standard deviation, and skewness coefficient
between the simulated daily temperature of
Frequency distribution of daily precipitation, duration, and
maximum 30 min intensity (
For generated daily precipitation, 94.1 % and 91.4 % of stations yielded
RE of the average and standard deviation below 10 %, and the MARE values for 2405
stations were 3.72 % and 4.56 %, respectively. The bias between annual rainy
days of
The average and standard deviation of storm duration and the maximum 30 min
intensity (
Station density and simulation quality of CLIGEN for three Chinese physical-geographical regions.
Both AE and RE indexes were adopted to evaluate the simulated results in
this study. The RE index was applied for solar-radiation- and precipitation-related outputs, while the AE index was applied for the assessment of
temperature-related outputs, as RE was not an appropriate indicator to
evaluate the temperature which was in interval scale. For stations located in high-latitude or high-altitude areas, the mean annual temperature may be close to zero, resulting in an extremely high RE. For example, the
mean maximum temperature of Qian'an station (Fig. 1) using observed inputs
was
The frequency distributions of CLIGEN-simulated daily precipitation,
duration, and peak intensity at Tuokexun station using observed inputs were
all not well preserved by those simulated using UK-interpolated inputs (Fig. 8). The simulation quality for Tuokexun was almost the worst among 2405
stations, as RE values for all these three precipitation-related variables were
greater than 99 % of stations. This may be explained partially because
Tuokexun is located in the northwest arid area of China (Fig. 1), with a
station density of 0.97/10
Map of the study area showing the spatial distribution of the standard error for interpolation
results of TMAX AV
The number and density of weather stations for solar radiation were considerably less than for those for temperature and precipitation (Table 6). However, the mean and standard deviation of daily solar radiation using the UK-interpolated parameters was in good agreement with that simulated using observation-based parameter values (Table 4), and MARE of solar radiation was similar to that of daily precipitation. Solar radiation is characterized with much lower spatial variability in comparison to that for the temperature and precipitation. As a result, solar-radiation-related parameters were easier to regionalize, and parameter values could readily be interpolated for regions with limited observations.
Comparison of interpolation quality using universal kriging (UK) and the inverse distance weighted method (IDW) for CLIGEN temperature- and precipitation-related parameters for 2405 stations in August.
CLIGEN-input parameters in the United
States were regionalized from 2600 stations
using the inverse distance weighted method (IDW), which was employed in the initial attempt to regionalize
CLIGEN input parameters. In this study, UK was adopted to interpolate CLIGEN
parameters for mainland China. Interpolated parameter values using IDW and
UK were compared for four selected parameters in August as shown in Fig. 10.
It can be seen that UK performed better than IDW for all four parameters
selected. UK-interpolated parameter values were concentrated mostly along
the
Source code for data extraction, processing, and analysis is available from the authors upon reasonable request.
The gridded CLIGEN input parameter dataset of China at 10 km resolution is
available at the home page of the CLImate Change Impact Assessment (CLICIA) group
at
The widely used stochastic weather generator CLIGEN can simulate long-term
climate data to drive hydrological, soil erosion, and crop-yield models.
Limitations in high spatial–temporal observations, especially at the
sub-daily scale, have partially restricted its application. Daily
temperature, daily precipitation, and hourly precipitation data for 2405
stations and daily solar radiation for 130 stations distributed across
mainland China were collected to establish the CLIGEN input parameter files
and to explore an appropriate method for regionalizing these parameters from
stations to the entire region. The predicted quality using two interpolation
techniques, OK and UK, was compared and fully assessed, yielding the
following results.
UK generally performed better than OK when interpolating CLIGEN parameters.
Compared with OK, the interpolation accuracy was markedly improved for
parameters TMAX AV, TMIN AV, SOL.RAD, MEAN P, SKEW P, UK can accurately predict temperature, solar radiation, and precipitation
input parameters for CLIGEN. RMSE values in UK-interpolated parameter values for
temperature were less than Basic statistics and frequency distributions for CLIGEN-simulated climate
elements using UK-interpolated parameters agreed well with those simulated
using observations. The mean absolute error (MAE) values for the average, standard deviation, and
skewness coefficient for the two simulated series of temperature across 2405
stations were all less than 0.5
The developed gridded input parameter database can be applied using CLIGEN,
with an established and reliable simulation quality, to the stochastic
simulation of temperature, solar radiation, and precipitation at a daily
scale and to precipitation at a sub-daily scale for any single point in
China. CLIGEN can simulate the dew point and wind as well, which is not regionalized
in this study. As a site-based weather generator, simulated climate series
using CLIGEN are independent of each other and lack spatial
correlations among stations. Further research might focus on the rebuilding
of correlations among climate elements and between nearby stations.
WW calculated the input parameters, developed the programming code, and wrote the original draft; SY provided the main conceptualization, supervised the project, and reviewed the draft; BY provided advice about the methodology and reviewed the draft; SW reviewed the draft.
The authors declare that they have no conflict of interest.
Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
We would like to thank the high-performance computing support from the
Center for Geodata and Analysis, Faculty of Geographical Science, Beijing
Normal University (
This research has been supported by the National Natural Science Foundation of China (grant no. 41877068) and the China Postdoctoral Science Foundation (grant no. 2020M680433).
This paper was edited by David Carlson and reviewed by two anonymous referees.