New gridded dataset of rainfall erosivity (1950–2020) on the Tibetan Plateau

. The risk of water erosion on the Tibetan Plateau (TP), a typical fragile ecological area, is increasing with climate change. A rainfall erosivity map is useful for understanding the spatiotemporal pattern of rainfall erosivity and identifying hot spots of soil erosion. This study generates an annual gridded rainfall erosivity dataset on a 0.25 ◦ grid for the TP in 1950–2020. The 1 min precipitation observations at 1787 weather stations for 7 years and 0.25 ◦ hourly European Center for Medium-Range Weather Forecasts Reanalysis 5 (ERA5) precipitation data for 71 years are employed in this study. Our results indicate that the ERA5-based estimates have a marked tendency to underestimate annual rainfall erosivity when compared to the station-based estimates, because of the systematic biases of ERA5 precipitation data including the large underestimation of the maximum contiguous 30 min peak intensity and relatively slight overestimation of event erosive precipitation amounts. The multiplier factor map over the TP, which was generated by the inverse distance-weighted method based on the relative changes between the available station-based annual rainfall erosivity grid values and the corresponding ERA5-based values, was employed to correct the ERA5-based annual rainfall erosivity and then reconstruct the annual rainfall erosivity dataset. The multiyear average correction coefﬁcient over the TP between the station-based annual rainfall erosivity values and the newly released data is 0.67. In addition, the probability density and various quantile values of the new data are generally consistent with the station-based values. The data offer a view of large-scale spatiotemporal variability in the rainfall erosivity and address the growing need for information to predict rainfall-induced hazards over the TP. The dataset is available from the National Tibetan


Introduction
Precipitation is the main driver of water erosion because it directly affects the detachment of soil particles, breakdown of aggregates, and transport of eroded particles via runoff (Wischmeier andSmith, 1965, 1978).The R factor, i.e., the multiyear average rainfall erosivity, which is described by the universal soil loss equation (USLE; Wischmeier andSmith, 1965, 1978) and revised USLE (RUSLE; Renard et al., 1997), is an indicator of the potential ability of rain-fall and runoff to affect soil erosion.Generally, the R factor is calculated by using the classical (Wischmeier and Smith, 1965) or statistical algorithms (e.g., Liu et al., 2002) according to the temporal resolution of the available precipitation data.
The classical algorithm for rainfall erosivity requires a continuous precipitation data series with < 15 min temporal resolution (Angulo-Martínez and Beguería, 2009).As networks of weather stations and observation platforms have Published by Copernicus Publications.matured considerably in the past two decades, rainfall erosivity has been calculated using the classical algorithm at the local level (Agnese et al., 2006;Ma et al., 2014;Wang et al., 2017), and the application of the algorithm has been gradually extended to the national (Panagos et al., 2015;Kim et al., 2020;Yue et al., 2022) and global levels (Panagos et al., 2017;Liu et al., 2020).Despite substantial progress, it is still notable that the relative error of the estimated rainfall erosivity increases rapidly with increasing time interval of the precipitation data.For example, the relative error based on hourly data was more than 80 %, compared with the results based on 1 min data (Lobo and Bonilla, 2015;Yin et al., 2015;Shin et al., 2019).In addition, the accuracy of the rainfall erosivity map is greatly reduced by inadequate weather station coverage, especially in areas with complex climates and terrains (Yue et al., 2021).Therefore, the accuracy of rainfall erosivity estimation strongly depends on both the temporal and spatial resolution of the precipitation data (Panagos et al., 2017;Kim et al., 2020).
Compared with in situ observations, gridded precipitation data (e.g., satellite-based, reanalysis and fused datasets) are not subjected to topographical limitations and could supply continuous precipitation data (Beck et al., 2017).These data have been widely used to estimate the rainfall erosivity in China, especially in the regions with scarce in situ observations (Teng et al., 2018), Germany (Risal et al., 2018), Africa (Vrieling et al., 2010), the USA (Kim et al., 2020), and other regions.They have greatly contributed to our knowledge of the spatiotemporal patterns of rainfall erosivity; however, the uncertainties in rainfall erosivity directly calculated by using gridded precipitation data have not been quantified, although obvious biases between gridded and observed precipitation values have been demonstrated (Freitas et al., 2020).
The Tibetan Plateau (TP) referred to as the Third Pole is one of the highest plateaus worldwide and has an average altitude of more than 4000 m (Yao et al., 2012).Since the mid-1950s, the TP has experienced significant warming exceeding that of other regions in the same latitude zone (Liu and Chen, 2000).Owing to increasing snowmelt and more frequent heavy precipitation events, which may cause more severe soil erosion, knowledge of the rainfall erosivity on the TP is highly important for soil sustainability and thus water and food security.The accuracy of the rainfall erosivity estimation depends mainly on the spatiotemporal accuracy of the precipitation data, especially on the TP, where the seasonal and regional precipitation patterns exhibit significant variability owing to westerly winds, the Indian monsoon, and land-atmosphere interaction.
Many efforts have been made to study the rainfall erosivity on the TP (Table 1).Most studies employed empirical methods; however, our study has demonstrated that these empirical methods always resulted to obvious biases over the TP, when compared with the values based on the 1 min precipitation data by using the standard method (paper submitted).In terms of the type of precipitation data, dozens of station-based precipitation data were commonly used to calculate the rainfall erosivity (e.g., Qin et al., 2016;Gu et al., 2020).Yue et al. (2022) reported that the scarce weather stations can significantly reduce the estimation accuracy of the rainfall erosivity in the regions with complex terrains and climates, especially on the TP.Therefore, the accuracy of the estimated rainfall erosivity on the TP is largely reduced by the current empirical estimation models and the scarcity of historical weather stations.In other words, the precipitation data with high spatiotemporal resolution are essential to calculate the rainfall erosivity on the TP.
To expand the spatial coverage and extend the time series of rainfall erosivity over the TP, the various gridded precipitation datasets, for example, satellite-based Tropical Rainfall Measuring Mission (TRMM) and station-based Climatic Research Unit Time Series 4 gridded precipitation datasets (CRU_TS4), have also been introduced into the soil erosion study of the TP in recent decades (Yan et al., 2010;Teng et al., 2017;Cao et al., 2018).The performances of these gridded precipitation datasets mainly depend on the spatiotemporal accuracy of the gauge observations, and thus these datasets always present obvious biases due to insufficient density of the weather station network over the TP (Yuan et al., 2021).It is notable that the significant biases of various gridded precipitation datasets have also been widely identified (Sun et al., 2018), and their impacts on the rainfall erosivity estimation have not been assessed.
In recent hydrometeorology studies, the model-based gridded precipitation datasets are taken into consideration (Li et al., 2020;Zhou et al., 2021), because they could resolve the complex topography and climate effects over the TP and provide long-term data by setting simulation periods.The European Center for Medium-Range Weather Forecasts Reanalysis 5 (ERA5) as the new generation is one of the most widely used precipitation datasets in the world (Hersbach et al., 2019).Compared with other gridded precipitation datasets, ERA5 succeeded in reproducing the interannual and decadal variabilities of precipitation reflecting the spatiotemporal patterns (Yuan et al., 2021), and performed marginally better in detecting daily precipitation over the whole TP for long-term periods (Jiang et al., 2021), despite the fact that a bias in precipitation amounts was also reported (Jiang et al., 2021;Jiao et al., 2021).Therefore, this study aims to reconstruct the historical annual rainfall erosivity with 0.25 • spatial resolution in 1950-2020 over the TP, by employing the 0.25 • hourly ERA5 precipitation data for 71 years to generate long-term background values and utilizing the 1 min precipitation observations at 1787 weather stations for 7 years to identify and correct the biases of the estimates.In detail, this paper describes (1) the performance of ERA5 precipitation data at the weather stations, (2) the performance of the ERA5-based annual rainfall erosivity calculated by using the standard method recommended by the USLE model and (3) the correction of the ERA5-based annual rainfall erosivity and the validation of the newly generated dataset. 2 Study area and data source

Tibetan Plateau
The study area is the TP (26-40 • N, 73-105 • E), which is located in southwestern China and covers an area of approximately 2.5 million km 2 .The elevation of the TP ranges from 84 to 8246 m, with an average value of 4379 m.Precipitation in the southeastern TP is influenced by warm, humid Indian monsoons, whereas in the western TP, it is influenced more strongly by the mid-latitude westerlies (Yao et al., 2012).The annual precipitation is concentrated in the period from May to October (Gu et al., 2020), and shows a spatial pattern of a wet east and west with a dry middle (Li et al., 2020).Along with the significant climate change and a very fragile ecological environment, the TP has high potential for soil loss, especially in the eastern TP and Hengduan Mountains, which are among the most severely eroded areas in China (Teng et al., 2019).

Precipitation data
Previous studies of the TP have used in situ precipitation observations with < 50 stations and coarse temporal resolution, e.g., hourly (Yue et al., 2021), daily (Wang et al., 2017), or half-monthly (Teng et al., 2018;Gu et al., 2020;Liu et al., 2020).By contrast, this study estimated the rainfall erosivity on the TP using precipitation observations at 1 min intervals in 2013-2020 at 1787 weather stations obtained from the National Meteorology Information Center of the China Meteorological Administration (Fig. 1a).
To ensure the accuracy of the in situ precipitation data, we evaluated their quality.The data integrity of each station was first checked using quality control codes at 1 min intervals by month.Because precipitation on the TP occurs mainly from May to September, observed data with an integrity of > 90 % from May to September in a year can be used to calculate the annual rainfall at the station.The number of stations with data suitable for calculating the annual rainfall erosivity for each year is shown in the lower left corner of Fig. 1a; it ranges from 628 to 1472, with an average of 1114 stations for 2013-2020 (excluding 2017, because a disruption in data reception caused the loss of precipitation observations in August 2017).Moreover, we examined the station density in each 0.25 • grid, which is consistent with the spatial resolution of the ERA5 data (Fig. 1b).The number of stations in each grid varies from 1 to 29, and the mean value is 2.1.A total of 836 grids (20 % of the grids covering the TP) have observed precipitation values.Because the data quality varies, the available grids with observations change annually; on average, there are 589 available grids with observation records for 2013-2020, excluding 2017.
The hourly 0.25 • ERA5 data represent the most recent generation of ECMWF global atmospheric reanalysis and provide higher spatial resolution than ERA-Interim and other improvements since 1979 (Hersbach et al., 2019).The precipitation data are the sum of large-scale precipitation and convective precipitation consisting of rain and snow, as determined by the ECMWF Integrated Forecasting System.

Methodology
Figure 2 shows the overall algorithm for generating the annual rainfall erosivity dataset with the 0.25 • spatial resolution over the TP in 1950-2020.We firstly calculated the annual rainfall erosivity by using the standard method of rainfall erosivity based on the 1 min in situ precipitation observations and 0.25 • hourly ERA5 precipitation data.Secondly, the performances of the ERA5 were systematically assessed in terms of the detection accuracy of the precipitation for erosive events and the estimation accuracy of ERA5-based annual rainfall erosivity.Finally, the historical annual rainfall erosivity data for the TP were produced after correcting the ERA5-based annual rainfall erosivity.

Algorithm of annual rainfall erosivity
A rainfall event is defined following Wischmeier and Smith (1978) as having measurable rainfall with no interruption or at most a 6 h interruption.If a rainfall event is interrupted for more than 6 h, subsequent rainfall is considered to belong to a new rainfall event.Rainfall events of more than 12 mm are selected as erosive events following Xie et al. (2000), and the EI 30 index of the erosive event is calculated.Specifically, the rainfall erosivity of an erosive rainfall event is calculated as follows (Brown and Foster, 1987): (1) where E (MJ ha −1 ) is the total energy of the erosive event, and r event (MJ mm ha −1 h −1 ) is the event rainfall erosivity of the event.For the 1 min precipitation data (ERA5 data), i r (mm h −1 ) is the rainfall intensity for the rth minute (hour), e r (MJ ha −1 mm −1 ) is the unit energy for the rth minute (hour), P r (mm) is the rainfall amount for the rth minute (hour), n is the rainfall duration, and I 30 (mm h −1 ) is the maximum contiguous 30 min (1 h) peak intensity.After the event rainfall erosivity at all stations was calculated, we identified and removed extreme outliers of the event rainfall erosivity at each site, which resulted from temporary abnormalities in the automatic observation equipment and were not identified dur-  ing quality control of the precipitation data.We used boxplots to detect extreme outliers.The lower and upper quartiles were defined as the 25th percentile of event rainfall erosivity (Q1) and the 75th percentile (Q2); the difference (Q2 − Q1) is called the interquartile range (IQR).Event rainfall erosivity data at a station outside the lower and upper bounds (Q1 − 3IQR, Q2 + 3IQR) are considered extreme outliers.
The observed annual rainfall erosivity values (r station_year ) were obtained by summing the rainfall erosivity for all erosive events per year by station.Next, the ERA5-based annual rainfall erosivity (r ERA5_year ) for all the grids in the TP was calculated.Notably, for easy comparison of r station_year and r ERA5_year , the r station_year values were upscaled to the grid values (r obs_year ) with 0.25 • spatial resolution by averaging the station-based values in the same grid.Figure 1b shows the spatial distribution of the available grids with r obs_year .
Steps 2 and 3 in Fig. 2 are based on r obs_year and r ERA5_year data at grid scale.

Assessment of the performance of the ERA5 precipitation data
The performance of the ERA5 precipitation data was assessed at 280 grid cells, which corresponded to 7 % of all the grids over the TP.Given the importance of erosive rainfall events to soil erosion, we focused on the performance of the ERA5 precipitation data in detecting characteristics of erosive rainfall event, including multiyear averaged annual erosive precipitation amount and frequency, and mean erosive event precipitation amount and I 30 .
The mean values of r ERA5_year for 2013-2020 were compared with those of r obs_year by station.The absolute bias (AB) and correction coefficient (r) were used to evaluate the accuracy of annual rainfall erosivity estimation using ERA5 data.The AB is calculated as shown in Eq. ( 4).
https://doi.org/10.5194/essd-14-2681-2022Earth Syst.Sci.Data, 14, 2681-2695, 2022 where i is the ith annual rainfall erosivity value, r ERA5_year i is the ERA5-based annual rainfall erosivity in the ith year, r obs_year i is the observed annual rainfall erosivity in the ith year, and n is the number of years of data.Moreover, the empirical orthogonal function (EOF) was employed to assess the spatiotemporal pattern of annual rainfall erosivity revealed by the ERA5 reanalysis precipitation data by comparing it with the pattern revealed by the observed values.

Reconstruction and validation of annual rainfall erosivity
For the soil erosion process it is known that not all the precipitation events but the erosive events have a close relationship with the water erosion process.Our study indicated the precipitation characteristics derived from ERA5 data for erosive events showed high correlation with those from in situ precipitation observations over the TP (Fig. 4).In addition, there was a high correlation between the station-based and ERA5-based annual rainfall erosivity (Fig. 5), and their spatiotemporal distribution patterns also showed good agreement (Fig. 7).These findings demonstrate that it is reasonable to generate the rainfall erosivity dataset for the TP by us-ing the ERA5 precipitation data, and the correction is also essential because of the obvious biases identified in the ERA5based rainfall erosivity values.
Relative changes between the in situ and modeled precipitation are always used to correct the modeled precipitation for accuracy improvement, such as the global precipitation data from WorldClim (Fick et al., 2017), the gridded precipitation data of the China Meteorological Forcing Dataset (He et al., 2020) and the bias adjusted ERA5 precipitation data (Cucchi et al., 2020).Given the close correlation between the precipitation and rainfall erosivity, the relative changes were also employed to correct the ERA5-based annual rainfall erosivity in this study.Here, we have used a hypothesis that the bias of the ERA5-based annual rainfall erosivity resulting from ERA5 precipitation data at each grid remains steady by year.In detail, the correction process can be divided into three steps.Firstly, the r obs_year values were divided by r ERA5_year for each year, and then the calculated results, i.e., the multiplier factor values, were averaged for years.Secondly, inverse distance weighted (IDW) interpolation was used to generate a multiplier factor map of the TP with 0.25 • spatial resolution.Thirdly, the corrected annual rainfall erosivity dataset (r cor_year ) was obtained as the product of r ERA5_year and the multiplier factor for each grid.
Specifically, there are 373 grids with observed annual rainfall erosivity values from 2014 to 2020.The r obs_year and r ERA5_year values in these grids were used to generate the multiplier factor map.The r obs_year and r ERA5_year values in other grids for 2014-2020, which were not used before, are available for assessing the accuracy.Moreover, the year 2013 was regarded as a complete verification year, in which the assessment of the r cor_year was conducted in all the TP grids with r obs_year values.Table 2 lists the number of validation grids in each year, and Fig. 3 shows the spatial distribution of the validation grids for 2013-2020 (excluding 2017).

Detecting accuracy of ERA5 for erosive rainfall events
Figure 4 compares the multiyear average annual erosive precipitation amount and frequency, and mean erosive event precipitation amount and I 30 derived from ERA5 precipi-tation data with those from in situ observations.In detail, the EA5-based multiyear average annual erosive precipitation amount is three times more than the station-based value across the TP.The overestimation of the multiyear average annual precipitation amount was also reported by Jiao et al. (2021).The ERA5 overestimated the annual erosive precipitation frequency by 1.6 times.For the erosive event rainfall amount, the ERA5 value was almost twice as much as the station-based value, which differed from the findings of Jiao et al. ( 2021) that the daily precipitation amounts with more than 10 mm are underestimated by ERA5.This result demonstrates that the erosive rainfall events in the TP cannot be simply equivalent to the daily precipitation events (Chen et al., 2022).In addition, the mean I 30 of ERA5 for erosive events are only one ninth of the station-based value.Because the relatively slight overestimation of ERA5 precipitation data in the erosive event precipitation amount could not offset the substantial underestimation in I 30 , the ERA5-based estimates showed a marked tendency to underestimate the rainfall erosivity when compared to the station-based estimates.Overall, the comparison between the two data sources indicates that there were significant biases of ERA5 data in detecting prehttps://doi.org/10.5194/essd-14-2681-2022Earth Syst.Sci.Data, 14, 2681-2695, 2022 cipitation characteristics for erosive events in the TP, but also presented high corrections with correlation coefficients being 0.33-0.84.

Evaluation of rainfall erosivity estimations using ERA5 data
The accuracy of annual rainfall erosivity estimations using the ERA5 precipitation data for 2013-2020 was assessed and compared with the r obs_year values in 280 grids covering the TP.The correlation coefficient of the mean annual rainfall erosivity based on the observed and ERA5 precipitation data is 0.71.For most stations, the ERA5-based values were significantly underestimated (Fig. 5).
To further evaluate the quality of the mean annual rainfall erosivity estimation using ERA5 data, the performance of the ERA5 data in each grid was evaluated, as shown in Fig. 6.The spatial pattern of the ERA5-based mean annual rainfall erosivity is consistent with that of the observed values.Specifically, areas with large annual rainfall erosivity are located mainly in the southeastern part of the plateau, especially at the southeast edge, whereas the mean annual values in the northwestern part of the plateau are relatively small; however, the observed mean annual rainfall erosivity on the TP is 344 MJ mm ha −1 h −1 yr −1 , and the ER5-based results underestimate this value by 47 %.Moreover, except for most of the grids in the northwest corner and individual grids in the southeastern part of the plateau, the mean annual rainfall erosivity values in most grids in the TP are lower than the observed values.The accuracy of the spatiotemporal variability of the mean annual rainfall erosivity on the TP obtained using the ERA5 dataset is also crucial for determining whether ERA5 is suitable for rainfall erosivity calculations.We used the first three EOF modes, which are considered to provide the most valuable information, for evaluation.The spatial pattern of the first three EOFs of the observed values accounts for 77 % of the total variance, and that of the first three EOFs of the ERA5-based values accounts for 84 % of the total variance (Fig. 7).Clearly, ERA5 successfully captured the spatial pattern of the EOF modes, especially the first two EOF modes, revealed by the observed values.In addition, the corresponding principal components of the EOF modes of the ERA5based values are also consistent with the temporal variation trend of the observed values.Therefore, it can be concluded that the ERA5-based mean annual rainfall erosivity generally  reproduces the spatiotemporal patterns of the rainfall erosivity on the TP.

Reconstruction and validation of corrected annual rainfall erosivity
Using the observed and ERA5-based annual rainfall erosivity, we calculated the multiplier factors for 373 grids (Fig. 8a).The multiplier factors for the TP range from 0 to 23, with a mean value of 2.4.Multiplier factors of < 1 indicate that the ERA5-based annual rainfall erosivity is overestimated, and conversely, the annual rainfall erosivity in the grid is underestimated.Most of the areas with overestimated ERA5-based mean annual rainfall erosivity are located in the Tarim, Qaidam, and Yarlung Zangpo basins.In other areas, the annual rainfall erosivity is typically underestimated, and areas with greater underestimation appear east of the Qaidam basin and in the source area of the Yellow River.We also produced a multiplier factor map of the TP by IDW interpolation based on the multiplier factors of 373 grids (Fig. 8b).
The corrected annual rainfall erosivity in 2013-2020 (excluding 2017) was then calculated in the validation grids as the product of the ERA5-based annual values and multiplier factors from the map. Figure 9 compares the observed and ERA5-based annual rainfall erosivity in the validation grids by year.In 2014-2020 (excluding 2017), the multiyear averaged correction coefficient between r obs_year and r cor_year is https://doi.org/10.5194/essd-14-2681-2022 Earth Syst.Sci.Data, 14, 2681-2695, 2022 0.67, which is 0.13 larger than the value between r obs_year and r ERA5_year .Moreover, all of the data for 2013, which were not used to produce the multiplier factor map, were used to conduct an independent assessment.The results show that the correction coefficient also increases from 0.53 to 0.67 after the ERA5-based annual rainfall erosivity is corrected, indicating a significant improvement.
Violin plots are an alternative method of synthetically evaluating the accuracy of the corrected annual rainfall erosivity.Figure 10 compares the observed, ERA5-based, and corrected annual rainfall erosivity in the validation grids for 2013-2020 (excluding 2017).The corrected annual rainfall erosivity values for 2014-2020 are better than the ERA5based values in terms of both the probability density and the values corresponding to different quantiles.Even in 2013, a completely independent verification year, the accuracy of the corrected annual rainfall erosivity is greatly improved.Specifically, the observed grid-averaged multiyear mean annual rainfall erosivity is 329 MJ mm ha −1 h −1 yr −1 in 2013-2020 (excluding 2017), where the ERA5-based value is 190 MJ mm ha −1 h −1 yr −1 , and the corrected value is 374 MJ mm ha −1 h −1 yr −1 .The relative error is significantly reduced, from −42 % to 14 %, by multiplier factor correction.

Rainfall erosivity in the TP and related uncertainties
Because of the large variability of the spatiotemporal patterns of precipitation, the R factor, an essential input for soil loss estimation, must be calculated using a minimum of 20 years of precipitation data (Renard et al., 1997).In this study, the annual rainfall erosivity values of the TP for 71 years based on the 0.25 • hourly ERA5 precipitation data were calculated by the algorithm shown in Sect.3.1.Next, after correction by the multiplier factor map, the new annual rainfall erosivity dataset for 1950-2020 and R factor map were produced.
The annual rainfall erosivity fluctuates considerably within a range of 239-408 MJ mm ha −1 h −1 yr −1 (Fig. 11); however, no obvious increasing or decreasing trend appears in the past 71 years across the TP.Regarding the spatial distribution, the R factor generally shows a decreasing trend https://doi.org/10.5194/essd-14-2681-2022 Earth Syst.Sci.Data, 14, 2681-2695, 2022 Compared with previous studies, there are two essential improvements by using the data-driven approach in this study.One the one hand, the 1 min precipitation observations from 1787 weather stations are firstly used to calculate the accuracy of rainfall erosivity values by employing the standard algorithm.With the densely spaced rainfall erosivity values, it is able to yield a realistic spatial distribution and identify the high spatial heterogeneity of the rainfall erosivity over the TP.On the other hand, apart from an R factor map, we also produced a high-precision time series of annual rainfall erosivity for 71 years after correcting the ERA5-based estimations, which may offer great help to reveal the spatiotemporal evolution over the TP under the climate changes.
It is also notable that some uncertainties are also unavoidably involved in the newly reconstructed dataset.As biases of the ERA5 precipitation data in detecting the characteristics of the erosive rainfall events have been revealed, we used multiple factors to correct the ERA5-based rainfall erosivity values by grid, to reduce the biases resulting from the ERA5 data.Limited by the scarcity of the in situ precipitation observations from weather stations before 2013 (less than 100 weather stations), it is hardly possible to yield a realistic spatial distribution of the multiple factor map by year.Here, we made a hypothesis that the biases of the ERA5-based annual rainfall erosivity always remained steady in various years, and thus the multiyear average annual multiple factor map from 2014-2020 is used in the correction process.With the improvement of the weather/climate forecast models in the future, the biases of the estimated rainfall erosivity by using gridded precipitation data will continue to be reduced.

Conclusions
This study presents a new gridded dataset of annual rainfall erosivity over the TP based on the 1 min in situ precipitation data from 1787 weather stations and the long-term ERA5 precipitation data.The annual rainfall erosivity data are available over 71 years (from 1950 to 2020) on a 0.25 • grid.The TP-averaged correction coefficient between the station-based annual rainfall erosivity and the newly released data is 0.67.In addition, the probability density and various quantile values of the new data are generally consistent with the stationbased values across the TP.
This dataset provides a unique view of large-scale to localscale features in rainfall erosivity variability over the TP, where it is hardly possible to obtain the long-term in situ precipitation data with sufficient spatiotemporal resolution.This availability of new data opens up many interesting applications in soil erosion studies and disaster research, including 1. providing input data of the R factor for the TP, which is needed for soil erosion modeling,

Figure 1 .
Figure 1.(a) Spatial distribution of weather stations on TP; the inset shows the number of available weather stations by year.(b) Number of available weather stations in each grid with 0.25 • spatial resolution; the inset shows the number of available weather stations by year.

Figure 2 .
Figure 2. Schematic representation of algorithm for generating annual rainfall erosivity dataset for 1950-2020.r obs_year and r ERA5_year represent the station-based and ERA5-based annual rainfall erosivity values, respectively.r cor_year means the corrected values of the r ERA5_year by using the multiplier factor map.

Figure 5 .
Figure 5.Comparison of mean annual rainfall erosivity based on observed and ERA5-based results for 7 years (2013-2020, excluding 2017).The dashed line is the result of an optimal model (with an intercept of 0 and regression coefficient of 1).The red line is the regression result.Colors of dots represent the grid density.Unit: MJ mm ha −1 h −1 yr −1 .

Figure 7 .
Figure 7. First three empirical orthogonal function (EOF) modes of observed and ERA5-based mean annual rainfall erosivity on the TP in 2013-2020 (excluding 2017).

Figure 8 .
Figure 8.(a) Spatial distribution of multiplier factors of 373 grids, (b) multiplier factor map of TP generated by IDW interpolation.

Figure 9 .
Figure 9.Comparison of ERA5-based annual rainfall erosivity (MJ mm ha −1 h −1 yr −1 ) (left-hand graphs) and the corrected annual rainfall erosivity (right-hand graphs) with observed values in validation grids for 2013-2020 (excluding 2017).The dashed line is the result of an optimal model (with an intercept of 0 and a regression coefficient of 1).The solid black lines are the regression result.Colors of dots represent the grid density.

Figure 10 .
Figure 10.Violin plots of observed, ERA5-based, and corrected annual rainfall erosivity in validation grids for 2013-2020 (excluding 2017).The Y axis shows the annual rainfall erosivity in MJ mm ha −1 h −1 .The boxplot diagram of the median of the violin plots shows the maximum value, 75 % quantile value, 50 % quantile value, 25 % quantile value, and minimum value.The horizontal lines represent average values.

Figure 11 .
Figure 11.R factor map of TP with the 0.25 • spatial resolution for 1950-2020.The inset represents the yearly change in annual rainfall erosivity.

Table 1 .
R factor of TP in previous studies.
CRU_TS4: Climatic Research Unit Time Series 4 gridded precipitation product.TRMM: Tropical Rainfall Measuring Mission gridded precipitation product.HASM: High accuracy surface modeling.Empirical method means the rainfall erosivity values are calculated by using the empirical equations based on daily or monthly precipitation data.The standard method is proposed by USLE or RUSLE.The boundaries of the TP used in these studies have some slight differences.

Table 2 .
Number of grids used in this study.