A global long-term, high-resolution satellite radar backscatter data record (1992–2022 + ): merging C-band ERS/ASCAT and Ku-band QSCAT

. Satellite radar backscatter contains unique information on land surface moisture, vegetation features, and surface roughness and has thus been used in a range of Earth science disciplines. However, there is no single global radar data set that has a relatively long wavelength and a decades-long time span. We here provide the ﬁrst long-term (since 1992), high-resolution ( ∼ 8 . 9 km instead of the commonly used ∼ 25 km resolution) monthly satellite radar backscatter data set over global land areas, called the long-term, high-resolution scat-terometer (LHScat) data set, by fusing signals from the European Remote Sensing satellite (ERS; 1992–2001; C-band; 5.3 GHz), Quick Scatterometer (QSCAT, 1999–2009; Ku-band; 13.4 GHz), and the Advanced SCAT-terometer (ASCAT; since 2007; C-band; 5.255 GHz). The 6-year data gap between C-band ERS and ASCAT


Introduction
Microwave remote sensing uses electromagnetic radiation with a wavelength (λ) between 1 cm and 1 m as a measurement tool (Ulaby et al., 1982). Depending on the source of the energy from which the information is gathered, microwave remote sensing systems can be categorized into two groups, namely passive (radiometer) and active (radar). Passive systems collect the radiation naturally emitted by the observed surface, whereas active systems transmit a (radio) signal in the microwave bandwidth and record the signal backscattered by the target (Ulaby et al., 2014).
Due to the longer wavelength compared to visible and infrared radiation, microwaves exhibit the important property of penetrating objects, with the penetrating ability increasing with increasing wavelength. Microwaves at high frequencies (such as Ku-band; ∼ 13 GHz; λ =∼ 2 cm) are sensitive to atmospheric conditions, but those at lower frequencies, such as C-band radio frequency (∼ 5 GHz; λ =∼ 6 cm), depend less on cloud cover and heavy rain events, making this technique suitable for all weather conditions (Ulaby et al., 2014;Carabajal and Harding, 2006;Le Toan et al., 2011). As a result, long-wavelength microwave remote sensing has been widely used in Earth science studies for atmosphere, land, and ocean monitoring (Wentz, 1992;Wagner et al., 1999Wagner et al., , 2007Spreen et al., 2008;Shi et al., 2016;Steele-Dunne et al., 2017;Murfitt and Duguay, 2021).
However, there is no single multi-decadal microwave data set acquired at the C-band or longer wavelength that spans more than 2 decades (Table 1). This has limited the use of microwave data for trend analysis over extended time intervals. Several passive microwave systems are available, such as the Advanced Microwave Scanning Radiometer for EOS (Earth Observation Satellite; AMSR-E;2002-2011, the Advanced Microwave Scanning Radiometer 2 (AMSR2; 2012-present), WindSat (2003WindSat ( -2012, Soil Moisture and Ocean Salinity (SMOS; 2010-now), and Soil Moisture Active Passive (SMAP; 2015-now), all of which provide data with a wavelength of ∼ 6 cm or longer (Spreen et al., 2008;Yao et al., 2021;Wigneron et al., 2017Wigneron et al., , 2021. However, merging them into a harmonized data set with a time span longer than 2 decades has been shown to be challenging, mainly because AMSR-E has no overlapping observations with AMSR2 (Du et al., 2017;Moesinger et al., 2020;. Active microwave remote sensing, or radar, has the potential to overcome this limitation. A scatterometer is one type of radar known for its large footprint, global coverage, and high revisit rate. These properties make scatterometers interesting for the study of large-scale land surface dynamics (Ulaby et al., 2014). Spaceborne scatterometer sensors have been deployed since 1978 (NASA's Seasat-A; Ku-band; Table 1), but global coverage of scatterometer observation dates back to the European Remote Sensing satellite (ERS-1/-2) in the 1990s (C-band;from 1992Frison and Mougin, 1996;Prigent et al., 2001). Over the past 3 decades, multiple scatterometer missions have been launched with the aim of obtaining full and repeated global coverage (Ulaby et al., 2014), such as the Quick Scatterometer (QSCAT; Ku-band;from 1999, the Oceansat-2 Scatterometer (OSCAT; Ku-band; since 2009), and the Advanced SCATterometer (ASCAT; C-band; since 2007). Among these sensors, both ERS-1/-2 and ASCAT operate at the C-band frequency but have a temporal gap of about 6 years (i.e. be- Figure 1. Temporal coverages and radio frequencies of ERS-1/-2, QSCAT, and ASCAT. ERS-1/-2 and ASCAT have a C-band radio frequency (5.3 GHz), and QSCAT has a Ku-band frequency (13.4 GHz). QSCAT operated between 1999 and 2009 in full mode, overlapping with both ERS and ASCAT. Image courtesy of NASA and the European Space Agency (ESA).
tween 2001 and 2007). Filling this time gap would lead to the first global C-band scatterometer data set with continuous observations for the past 3 decades (since 1992). Moreover, this data set could, in principle, be further extended because ASCAT is still operational, and similar C-band radar missions have been secured for the future (such as the Sentinel radar series and the MetOp Second Generation satellite mission; Malenovský et al., 2012;Lin et al., 2016).
The present study aims at filling the 6-year gap of the Cband scatterometer data at the global scale (Fig. 1). As seen in Table 1, QSCAT is a good candidate for fulfilling this task because it operated between 1999 and 2009, thus overlapping with both ERS-2 (between 1999 and 2001) and AS-CAT (between 2007 and. Recent studies also demonstrated the feasibility of merging ERS-1/-2, QSCAT, and AS-CAT (Bentamy et al., 2012;Tao et al., 2022;Frolking et al., 2022a, b). In theory, the Ku-band signal interacts more with smaller elements (such as raindrops, snow, and canopy leaves) than the C-band signal, due to the difference in wavelength (Saatchi et al., 2013). However, our previous work (Tao et al., 2022) has shown that the Ku-band QSCAT signal in tropical regions can be adjusted to the ERS-2 observations during 1999-2001 and to the ASCAT observations during 2007-2009 to obtain a simulated C-band signal (Tao et al., 2022). Here, we further extend our previous approach to the global scale through a better understanding of the signal mechanism and an improved technique for modelling the signal differences (i.e. decision tree regression). Image resolution has also been enhanced; while the native resolution of scatterometer images is often coarse (25 km or larger), the National Aeronautics and Space Administration (NASA) Scatterometer Climate Record Pathfinder (SCP; https://www. scp.byu.edu/, last access: 20 January 2023) project has improved the resolutions of ERS-1/-2, QSCAT, and ASCAT images using the scatterometer image reconstruction (SIR) with filtering (SIRF) algorithm, which combines multipleorbit satellite passes (Long et al., 1993;Early and Long, 2001). Specifically, ERS-1/-2 images of 8.9 km resolution in the period of 1992and QSCAT (1999) and AS-CAT (2007-now) images of 4.45 km resolution have been made publicly available. To guarantee a long time span from 1992 onwards, we aggregated QSCAT and ASCAT images to 8.9 km to be consistent with the resolution of ERS-1/-2 images. We chose to produce a monthly radar data set in this study because daily scatterometer images do not provide full global coverage and also because the higher spatial resolution was achieved at the cost of reduced temporal resolution, with daily images only being available for polar regions (https://www.scp.byu.edu/docs/ EnhancedFAQ.html, last access: 20 January 2023). Besides, the monthly time resolution has been frequently adopted by previous global-scale studies (Sun et al., 2018). The resulting merged radar data set, named long-term, high-resolution scatterometer (LHScat), is publicly available in netCDF format at https://doi.org/10.6084/m9.figshare.20407857 (Tao et al., 2023). LHScat will be constantly updated to include the latest images acquired by ASCAT and to include even higher spatial and temporal resolutions. Below, we provide a detailed illustration on the source data, methods, quality, and validation of the LHScat data set.

Data and methods
2.1 ERS-1/-2, QSCAT, and ASCAT data Scatterometers were originally designed to measure wind speed and direction, particularly over oceans. However, their data have also been found to be useful for land applications such as soil moisture estimation, rainfall estimation, and forest monitoring. Here we analysed spaceborne scatterometer data from the ERS-1/-2, QSCAT, and ASCAT sensors ( Fig. 1; Table 1). The backscatter of the radar signal, usually expressed in decibels (dB), is a function of the sensor parameters (frequency, polarization, look angle, and spatial resolution) and the dielectric and geometric properties of the scattering objects.

1580
S. Tao et al.: A global long-term, high-resolution satellite radar backscatter data record (1992  ERS-1, launched in 1991 by the European Space Agency (ESA), carries the first spaceborne C-band scatterometer with repeated and global geographical coverage (Carabajal and Harding, 2006; Table 1). The ERS-1 scatterometer data were available globally between 1992 and 1996, and the mission finally ended on March 2000 because of the failure of the attitude control system (Crapolicchio and Lecomte, 2003). ERS-2 was launched by ESA in April 1995 as a follow-up to ERS-1. However, starting from early 2001 until the end of mission in 2011, ERS-2 has been operating without gyroscopes, which largely reduced its spatial coverage (Carabajal and Harding, 2006). Consequently, the distribution of global coverage ERS-2 images to the user community was discontinued for the period 2001-2011. Both sensors operate on a sun-synchronous, near-circular polar orbit, passing the Equator at 10:30 LT in descending mode. The incidence angle of ERS-1 and ERS-2 ranges from 16 to 50 • . ERS-1/-2 images were acquired in vertical (V)-polarization mode and were usually gridded at 25 or 12.5 km resolutions (Frison and Mougin, 1996).
The SeaWinds scatterometer (13.4 GHz; Ku-band) on board QSCAT was launched by NASA in 1999 and collected data in full mode until November 2009. It provides normalized cross-sectional backscatter values at fixed incidence angles of 46 • in the horizontal (H)-polarization mode and 54.1 • in V-polarization mode. Its ascending and descending orbits cross the Equator at 06:00 and 18:00 LST (local standard time), respectively. The QSCAT images are normally delivered at a resolution of 22 km × 22 km (Tsai et al., 2000).
ASCAT, on board the Meteorological Operational (MetOp) series of satellites, was launched in October 2006 as a successor of the ERS-1/-2 scatterometers. The frequency of ASCAT (5.255 GHz; C-band) was designed to be consistent with ERS-1/-2, although the range of its incidence angles was extended to cover 25-65 • . ASCAT passes the Equator at 09:30 LT in descending mode and 21:30 LT in ascending mode. The backscatters of ASCAT are often gridded at a spatial resolution of 25 or 50 km. ASCAT images are available in V-polarization mode, the same as for ERS-1 and ERS-2 (Figa-Saldaña et al., 2002).
The NASA SCP project has enhanced the resolutions of ERS-1/-2, QSCAT, and ASCAT images to a nominal image pixel resolution of 8.9, 4.45, and 4.45 km per pixel, respectively. We downloaded the enhanced-resolution images from the Brigham Young University (BYU) Centre for Remote Sensing (https://www.scp.byu.edu/, last access: 15 March 2023). The images are available for typical global regions under the Lambert equal-area projection, including Europe, the Bering Sea, Siberia, North America, East Asia, Central America, Australia, Alaska, Oceania, North Africa, Southern Africa, South America, and South Asia. Three regions, namely Antarctica, Greenland, and the Arctic region, were not considered in this research because of the lack of QSCAT and ASCAT images in the BYU version. Images were provided in the SIR format and were read and dis-played using the functions provided at https://www.scp.byu. edu/downloads.html (last access: 20 January 2023).

Data preprocessing
We first aggregated QSCAT and ASCAT images of the BYU version at the resolution of ERS-1/-2 images, namely 8.9 km per pixel. Ascending path QSCAT and ASCAT images were used. The ascending path time of QSCAT acquisition (06:00 LT) is before sunrise, and the ascending path time of ASCAT (21:30 LT) is well after sunset, so both reflect nighttime land surface conditions. The ERS-1/-2 images of the BYU version are generated by combining the images of all paths to ensure the highest possible spatial and temporal coverages; we therefore used the all-path ERS-1/-2 images. ERS-2 signals during August 1996-June 1997 were increased by 0.2 dB to account for the sensor calibration bias (Crapolicchio and Lecomte, 2003).
V-polarization QSCAT images were merged with Vpolarization ERS-1/-2 and ASCAT images. H-polarization QSCAT images were also tried, but very similar merged signals were obtained. The BYU data centre provides images synthesized from acquisitions made over periods of 17, 3, and 4 consecutive days for ERS-1/-2, QSCAT, and ASCAT, respectively. For all three sensors, images acquired within a month were averaged. ERS-1/-2 and ASCAT observations of the BYU version were normalized to a common 40 • incidence angle to be free of angle influence on the observations. Monthly signals exceeding 3 standard deviations from the long-term mean were consider to be outliers. Some AS-CAT images were found to have strip patterns. Fortunately, all the strips were characterized by regions with a low number of radar observations and thus can be masked by thresholding for a minimum number of observations, which was set to 20 (Tao et al., 2022). To avoid water contamination, we excluded pixels within which more than 2 % of the pixel area are "water", using the 300 m resolution ESA Climate Change Initiative (CCI) land cover map for the year of 2015 (http://maps.elie.ucl.ac.be/CCI/viewer/, last access: 20 January 2023).

Scaling radar time series
Similar to Tao et al. (2022), a two-step approach was used to merge the C-band (ERS-1/-2 and ASCAT) and Ku-band (QS-CAT) signals into a continuous long-term radar data set. The first step of the method was to unify the backscatter values from different sensors (i.e. data rescaling). The second step was to harmonize the scaled data into a smooth time series by addressing their monthly differences (Fig. 2).
Regarding data rescaling, previous test beds have proposed two methods for rescaling time series, namely a linear regression correction (Brocca et al., 2011) and a cumulative density function (CDF) matching technique (Liu et al., 2009). The linear regression correction involves first scaling a time series within the range of the reference time series and then applying a linear regression equation between the two to minimize error. The CDF method further divides two time series into their quantile segments and then constructs a regression for each segment so that the CDF of a time series matches the CDF of the reference time series (Liu et al., 2009).
We found that the CDF method and the linear regression correction performed well in most regions (Fig. S1 in the Supplement). However, the CDF method failed in regions with a strong QSCAT signal trend, such as the deforested areas in southern Amazonia (Fig. 3a). This is mainly because QSCAT and ASCAT overlapped during 3 years, and the QSCAT signals in these 3 years do not cover the full signal range during 1999-2009. Linear regression correction, as used in Tao et al. (2022), is a preferable option to cope with this issue, but it is sensitive to sudden changes in radar signal (Fig. 3b). To overcome these limitations, we used the rescaling method illustrated in the following equation (Brocca et al., 2010(Brocca et al., , 2013Draper et al., 2009): where Q scaled indicates the scaled QSCAT signals, and Q original means the original QSCAT signals prior to signal rescaling. Q mean_overlap and Q SD_overlap indicate the mean and standardized deviation of the QSCAT signals with AS-CAT in the overlapping period (i.e. 2007-2009). Likewise, A mean_overlap and A SD_overlap indicate the mean and standard-ized deviation of the ASCAT signals in the overlapping period. This method has been used by previous research for rescaling soil moisture observations (Brocca et al., 2010(Brocca et al., , 2013Draper et al., 2009). Here we found it to be robust to both the trends and sudden changes in radar signal (Fig. 3). We therefore used it to unify the scales of ERS-1/-2, QSCAT, and AS-CAT signals. Specifically, monthly QSCAT signals were first scaled against monthly ASCAT signals, pixel by pixel. We chose ASCAT as the baseline for the rescaling because it has the best radiometric quality (lower sensitivity; higher radiometric resolution) and because it is still operational. Thereafter, ERS-2 signals were scaled against QSCAT signals (already scaled against ASCAT) using the same method. The ERS-1 and ERS-2 data sets were already calibrated, so there was no need to rescale them separately.

Addressing the monthly signal differences
Most existing research averaged the scaled signals directly to obtain a long-term merged time series (Du et al., 2017;Moesinger et al., 2020), but we here seek to correct the monthly signal differences before averaging the scaled signals. We previously found that the C-band and the scaled Ku-band signals exhibited large monthly differences in tropical regions (Tao et al., 2022). Importantly, the differences showed a seasonal pattern, with the Ku-band radar signal higher than C-band signal during the dry season and lower in the wet season. This phenomenon could be explained by the fact that the Ku-band signal has a shorter wavelength and lower penetrating ability relative to C-band and is thus more affected by tropical rainfall or intercepted water on leaf surfaces (Weissman et al., 2012;Prigent et al., 2022). To eliminate the signal differences, we first modelled the signal differences using rainfall as a predictor and then added the modelled signal differences to the Ku-band signal (Tao et al., 2022).
To extend our previous approach to the global scale, we explored the monthly signal difference against not only rainfall but also snow depth and skin temperature. Analogous to the effect of rainfall on Ku-band signals in tropical regions, we expect that snowpack prevents the Ku-band signal from reaching the land surface in regions covered by snow (Kelly et al., 2003;Naeimi et al., 2012). Skin temperature is related to a range of hydrological processes such as surface freeze/thaw, ice melting, and forest canopy evaporation (Konings et al., 2017). All of these could impact the radar signals by altering the water content of the measured objects. We therefore also expect skin temperature to be an effective predictor of the signal differences. Importantly, signal differences in many regions are caused by more than one climatic phenomenon. For instance, both precipitation and skin temperature could impact the Ku-band signals in forested regions through, respectively, rainfall contamination and canopy evaporation. In cold regions such as the Ti- (1) for rescaling radar signals. In most regions, the three methods performed almost equally well (Fig. S1), but the CDF method and the linear regression correction failed for signals with a strong trend or sudden changes. (a) Comparison between the CDF method and the method shown in Eq. (1) for rescaling a QSCAT signal time series with a strong decreasing trend. (b) Comparison between the linear regression correction and the method shown in Eq. (1) for rescaling a QSCAT signal time series with sudden increases in signal during the overlapping period.
betan Plateau, precipitation, snow depth, and skin temperature could be jointly responsible for the signal differences, especially considering the hydrological process of rainfallsnow/ice formation-snow/ice melting (Fig. 4a).
Thus, in order to model the signal differences from climatic variables accurately, we used decision tree regression. This technique recursively partitions observations into two sets based on a predictor that minimizes the predictive errors (Sankaran et al., 2005;Pekel, 2020). Compared with other modelling techniques, decision tree regression can be efficiently performed without a heavy computation burden. Besides, a major advantage of the decision tree regression is that it produces a model with easily interpretable rules (Sankaran et al., 2005;Loh, 2011). One example is shown in Fig. 4; while precipitation, skin temperature, and snow depth all contribute to the signal differences in 1 pixel of the Tibetan Plateau (Fig. 4a), the decision tree model clearly dissects the causes of the signal differences by creating binary trees first based on snow depth, then on precipitation, and finally on skin temperature (Fig. 4b). After the decision tree modelling, the Pearson r value between the C-band and Kuband signals increases largely from 0.55 to 0.91 (Fig. 4c).
To summarize, combining monthly climatic variables and decision tree regression modelling, we corrected the monthly signal differences, pixel by pixel, using the following steps: 1. For each pixel, a decision tree regression model was built taking the monthly signal differences during the overlapping periods (i.e. 1999-2001, and 2007-2009) as a dependent variable and monthly ERA5-Land rainfall, snow depth, and skin temperature (0.1 × 0.1 • reso- 2. After tree construction, cross-validation procedures were used to avoid overfitting. We increased the value for the MinLeafSize parameter from 1 to 30, with a step size of 1, and calculated the cross-validated errors. The MinLeafSize corresponding to the minimum crossvalidated error was used, which ensures an optimal tree depth and a high predicative accuracy. Here, 5-fold cross-validation was used because only ∼ 60 overlapping observations (or ∼ 60 months) were available during 1999-2001 and 2007-2009, but we verified that the results were not altered with 10-fold cross-validation. The variable importance of the decision tree regression was quantified using the MATLAB function pre-dictorImportance and the structure of the decision tree. While the former strictly computes the changes in predictive error due to splits for every predictor, the latter simply relies on the sequence of the predictors used to split the decision tree (i.e. the predictor used in the first split is the most important).
3. The decision tree regression model was then applied on climatic data from 1999 to 2009, and the predicted signal differences were added to the full QSCAT time series. In this way, the QSCAT signal was transformed into a substitute C-band signal.
4. After transforming the QSCAT data, we built a time series for each pixel for the 1992-2022 period, combining ERS-1/-2, QSCAT, and ASCAT time series. Radar observations from the overlapping periods (1999-2001 and 2007-2009) were averaged across sensors.

5.
To assess the effectiveness of the data-merging approach, Pearson r (unitless), RMSE (dB), and relative RMSE (rRMSE; unitless) between the C-band and the corrected Ku-band signals in the overlapping periods (1999-2001 and 2007-2009) were finally calculated.
where x denotes the mean of the monthly Ku-band signals x in the overlapping years, y denotes the mean of the monthly C-band signals y in the overlapping years, x i and y i denote the values of x and y at the ith month, respectively, σ y denotes the standard deviation of y, and n denotes the number of months in the overlapping years. rRMSE was used because it is normalized against the standard deviation of the signal and can therefore be compared across regions.

Validation of the data-merging approach
We also conducted a stricter evaluation of the data-merging approach. From January 2001 to 2011, the ERS-2 satellite experienced a series of failures that affected its data continuity and spatial coverage. For each of these 10 regions, we calculated monthly radar backscatter coefficients at 40 • incidence angle from the ESA ERS-2 data set for comparison with our merged radar data set. To normalize the incidence angle, a linear regression was fitted between all incidence angles and the radar backscatter coefficients, and the R squared value and RMSE value of the regression were recorded. The backscatter coefficient at 40 • incidence angle was then predicted by the regression.
To ensure data quality, the predicted backscatter coefficient was not used if the RMSE was higher than 0.5 dB. Since the ESA ERS-2 data have a resolution of 25 km, we aggregated our merged radar signals to that resolution. For each month between 2001 and 2011, pixels with available ESA ERS-2 observations were located, and their ERS-2 signals were averaged across pixels. Because the footprints of the ESA ERS-2 observations are not fixed temporally, different months have a different subset of pixels. Our merged radar time series from the same pixels were then averaged and compared with the ESA ERS-2 signal mean. Months with too few pixels (< 100) having ESA ERS-2 observations were not considered. This increases the strictness of the comparison in the sense that there is an additional spatial variation in pixels embedded within the radar time series.

Merged radar signals and quality assessments
The merged radar signal, averaged across pixels within a region, is presented in Fig. 5. Pearson r, RMSE, and rRMSE between the C-band and the corrected Ku-band signals in the overlapping years (1999-2001 and 2007-2009) were used to assess the quality of the merged radar signal. All 13 regions had a r value larger than 0.92, with a maximum of 0.99. We also obtained low RMSE values (from 0.05 to 0.11 dB), even in regions with a large seasonal amplitude in radar signal, such as Siberia, where the seasonal amplitude is around 3 dB but the RMSE is only 0.11 dB. This result was further confirmed by the low rRMSE values obtained in all regions, which ranged from 0.14 to 0.38. We further assessed the data-merging quality at the pixel level. Before correcting the monthly signal differences, the Pearson r values between the C-band and the scaled Kuband signals showed a long-tailed distribution in all regions (Fig. 6). Regional median r values were relatively low, ranging from −0.22 to 0.91, and negative r values were found in almost all regions. After correcting the monthly signal differences, the regional median r values ranged from 0.64 to 0.94, with no negative r values observed (Fig. 6). The improvement was the most obvious in the northern high latitudes, such as Europe (r improved from 0.54 to 0.87), the Bering Sea (r from −0.13 to 0.94), Alaska (r from −0.16 to 0.94), and Siberia (r from −0.22 to 0.94). In contrast, the improvements for five regions, namely Central America, Australia, North Africa, South America, and South Asia, were relatively limited because their median r values prior to signal correction were already high. All of these five regions contain large portions of barren lands, deserts, shrublands, or grasslands, where the Ku-band signal is not as impacted as in forested and snow-covered regions.
Regarding RMSE (Fig. 7), regional median RMSE values varied between 0.15 and 1.52 dB before the correction for signal differences but decreased sharply after the correction for signal differences (between 0.13 and 0.34 dB; Fig. 7). The most obvious improvement was still observed in the northern high latitudes such as Europe (RMSE decreased from 0.66 to 0.33 dB), the Bering Sea (RMSE from 1.32 to 0.31 dB), Alaska (RMSE from 1.28 to 0.31 dB), and Siberia (RMSE from 1.52 to 0.34 dB). Regional median rRMSE values were lower than 0.88, and in most regions lower than 0.5 (Fig. S2), consistent with the RMSE-based assessments. Besides, although rRMSE values were generally low in the final LHScat data set, tropical regions, mountainous regions, and arid regions had relatively higher rRMSE values than other regions (Fig. S3).

Importance of the predictor variables
We found that the most important predictors calculated by the MATLAB function predictorImportance (Fig. 8) were almost identical to the predictors used in the first splits of the decision trees (Fig. S4). We therefore illustrated the variable importance below, using the results presented in Fig. 8 (i.e. those calculated with predictorImportance). For 33.3 % of all the pixels, signal differences were most accurately predicted by rainfall (hereafter referred to as Type 1 pixels; Fig. 8). This type of pixel was mainly found in the Southern Hemisphere, particularly in tropical regions. In the Northern Hemisphere, such pixels were primarily located in the low and middle latitudes (Fig. 8).
For 57.8 % of all the pixels, the signal differences were most accurately predicted by skin temperature (hereafter referred to as Type 2 pixels; Fig. 8). This type of pixel was widely distributed across the globe. In tropical regions, the spatial pattern of Type 2 pixels is similar to the pattern of Type 1 pixels (Fig. 8), which is expected because skin tem-perature and rainfall are correlated. The main differences between the distributions of Type 1 and Type 2 pixels were found in the northern high latitudes and dry regions, such as the hyper-arid Saharan and Arabian deserts.
Signal differences in the remaining 8.9 % pixels were most accurately predicted by snow depth (Fig. 8; hereafter referred to as Type 3 pixels). As expected, this type of pixel was primarily found in mountainous regions such as the Himalayas and the southern part of the Andes, as well as in the very high-latitude regions in the Northern Hemisphere.

Independent validation of the merged radar signal
The quality of the merged radar signals was also validated directly against the ESA ERS-2 data (see Sect. 2.5). The number of ESA ERS-2 pixels available for a comparison differed across regions. Furthermore, the pixel number decreased greatly in around 2003 in many regions (Fig. S5). Despite the variations in pixel number, we found highly sim- ilar monthly dynamics between the merged radar signals and the ESA ERS-2 signals in all regions. Using the Pearson r value as an index of similarity, all regions had a Pearson r value higher than 0.79, with a maximum of 0.98. Six regions had a r value higher than 0.90 (ranging from 0.90 to 0.98) (Fig. 9). This validation shows that the LHScat data are unlikely to be biased due to the cross-period merging method.

Rescaling the radar time series
The purpose of this project was to create the first global long-term radar backscatter data set with a consistent C-band signal dynamic. C-band ERS-1/-2 (1992-2001) and ASCAT (2007 onwards) signals were bridged by Ku-band QSCAT (1999 signals. Observations overlapped between the three sensors, which allowed us to rescale the signal times series. The CDF matching technique has been a classical signal rescaling method (Liu et al., 2009(Liu et al., , 2011. For instance, Liu et al. (2011)  Time series and quality assessment of the merged LHScat radar time series at the regional level. Each row shows one region. Inside each row, the map in the left panel shows the location of the region. The Lambert equal-area projection is used in the map. The line plot in the right panel shows the merged radar time series, averaged across pixels and coloured according to sensors. The Pearson r (unitless), RMSE (dB), and rRMSE (unitless) labelled in the panel were calculated using the C-band and the corrected Ku-band signals in the overlapping years (1999-2001 and 2007-2009) as indicators of the data-merging quality. 2012). In these previous studies, the overlapping periods among sensors are relatively long, with some even exceeding 10 years. In contrast, neither the ERS-2-QSCAT nor the QSCAT-ASCAT overlapping periods span more than 3 years. The rescaled QSCAT signals by CDF could therefore be biased, due, for instance, to deforestation in southern Amazonia (Fig. 3a). The linear regression correction can tackle this issue (Tao et al., 2022) but is sensitive to sudden changes in radar signal. As shown in Fig. 3b, the QSCAT signal surged in 2009 in one location of Alaska, and the linear regression correction created an obvious bias in the rescaled QSCAT signals. This situation is rare in tropical regions but appears more frequent in northern high latitudes, possibly due to the surface freeze-thaw process. Although we have excluded potential outliers from the radar signals by implementing a stan-  (1999-2001 and 2007-2009) were calculated for all pixels and coloured in orange. As a comparison, the Pearson r values between the C-band and the corrected Ku-band signals in the overlapping years were also calculated and coloured in green. The medians of the Pearson r values are labelled in each panel.  (1999-2001 and 2007-2009) were calculated for all pixels and coloured in orange. As a comparison, the RMSE values between the C-band and the corrected Ku-band signals in the overlapping years were also calculated and coloured in green. The medians of the RMSE values are labelled inside each panel. The rRMSE-based quality assessment is available in the Supplement (Figs. S2 and S3). dard deviation filter (see Sect. 2.2), such sudden changes were not identified as outliers.
We used a simple yet effective method for rescaling the signal time series. This method is rooted in the discipline of statistics and has been used successfully by previous research for rescaling soil moisture data (Brocca et al., 2010(Brocca et al., , 2013Draper et al., 2009). We here further demonstrated its capability of rescaling microwave signals with a short overlapping period (∼ 3 years). Additionally, the results shown in Fig. 3 suggest that this method is robust in response to both the trends and sudden changes in the radar signal. Merging the time series of satellite observations has been an important Figure 8. Spatial distribution of variable importance for predicting the signal differences between the C-band and the scaled Ku-band signals in the overlapping years (1999)(2000)(2001)(2007)(2008)(2009)). The variable importance was calculated from the decision tree regression model using the MATLAB function predictorImportance. For pixel Types 1, 2, and 3, the most important variables are monthly precipitation, skin temperature, and snow depth, respectively. yet challenging task in Earth science studies. Many sensors have temporal overlaps, such as among AMSR-E, ASCAT, Sentinel-1, and SMOS, with the lengths of overlapping period ranging from several months to a couple of years (Table 1). Rescaling these data using Eq. (1) could uncover interesting mechanisms underlying the signal differences, which is an important prerequisite for creating data sets with an even longer time span.

Signal quality and merging mechanism
After rescaling the radar time series from different sensors, monthly signal differences were corrected by modelling them from climatic variables (namely precipitation, skin temperature, and snow depth). The quality of the merged radar signals was assessed against the ESA ERS-2 data set. Highly similar monthly time series were obtained, suggesting high accuracy for the merging procedure.
Why did rainfall, skin temperature, and snow depth successfully predict the signal differences? The main reason is that the Ku-band signal has a lower penetrating ability in comparison to the C-band signal because of its shorter wavelength. In regions with a strong rainfall, such as the tropics, Ku-band signals are more impacted by raindrops and the intercepted water on leaf surfaces, thus showing different seasonal patterns with C-band signals (Fig. 4a). The rainfall attenuation of high-frequency microwave signals (Ku/Ka band or 13/35 GHz) is used for microwave-derived rain retrieval, such as for the case of precipitation radar operating at 13.8 GHz on board TRMM .
Skin temperature is found to be an effective predictor of the signal differences for 57.8 % of all pixels (Fig. 8). This is expected because skin temperature not only correlates with rainfall but also reflects several land surface processes. In tropical regions, skin temperature was found as an almost equally important predictor of signal difference as rainfall. The first explanation for this result is that there is a negative correlation between skin temperature and rainfall in tropical regions. A second explanation could be the increased evapotranspiration of the rainforest canopy in dry periods due to high vapour pressure deficit. Increased evapotranspiration is correlated with skin temperature and could impact the Kuband signals, thus influencing the top canopy moisture. This phenomenon therefore helps to explain why Ku-band signals are higher than C-band signals in dry periods (Guan et al., 2015;Konings et al., 2017).
In boreal regions (Fig. 8), skin temperature is also an effective predictor of the signal differences. This could be related to the fact that the local land surfaces in these regions are seasonally frozen, or covered by ice, thus causing different signal performances between Ku-band and C-band signals. The surface freeze-thaw cycle is captured by skin temperature changes, explaining why there is a skin-temperaturepredicted signal difference in these regions.
In arid regions such as the Saharan and Arabian deserts (Fig. 8), skin temperature also explained the signal differences in most pixels. These regions receive limited amount of rainfall annually. Soil moisture is therefore mainly controlled by land surface processes, such as the seasonal changes in wind intensity/direction in deserts, which modify the roughness of the sand dunes and finally lead to a temporal variation in soil moisture (Frappart et al., 2015). Soil moisture changes with skin temperature, leading to changes in the penetration depths of C-band and Ku-band signals, due to the attenuation of the microwave signal as a function of moisture. This hypothesis could explain why skin temperature is closely related to the signal differences in some arid regions. Snow depth was found to be the most effective predictor of the signal differences in mountainous and high-latitude regions seasonally covered by snow. The Ku-band signal in- teracts with snow because of its short wavelength; thus its dynamics follow the seasonal changes in snow depth. The Ku-band signal is higher when snow depth is deeper, and vice versa (Fig. 4a), but the C-band signal shows the opposite dynamic, possibly because of a deeper penetration. In fact, this phenomenon has long been recognized by classical research which models snow depth or snow water equivalent from microwave signal differences (Kelly et al., 2003). Since the launch of Scanning Multichannel Microwave Radiometer (SMMR) in 1978, microwave data have been used to estimate snow depth and snow water equivalent. One of the classical methods is based on the fact that microwaves at different frequencies respond differently to snow cover. For instance, the Chang et al. (1982) equation utilizes the channel differences between low-(such as 19 GHz) and high-(such as 37 GHz) frequency brightness temperatures observed by passive microwave sensors. Here, we found similar signal differences between low-(C-band) and high-(Ku-band) frequency radar signals. Since several radar sensors at different frequencies are operating, efforts could be made to create products of snow depth or snow water equivalent by combining radar signals of different frequencies such as QSCAT and ASCAT.
It is also worth noting that, although climatic data were used to merge radar signals into a single time series, this does not mean that our final radar signals contain mainly climate information. The three climatic variables were merely used to model signal differences, which were then added to the Ku-band signals. Besides, the 1999-2009 period accounts for only a third of the entire time span. Thus, the main information contained in the merged signals is related to features of the land surface rather than to climate.

Limitations and future works
We used the reanalysis ERA5-Land monthly climatic data to model the signal differences. As a result, whether signal differences can be accurately modelled partly depends on the accuracy of the ERA5-Land climatic data. Future work will test the effectiveness of other climatic data sets for modelling the signal differences. The accurate mapping of some climatic variables, such as snow depth, is challenging (Orsolini et al., 2019;Clifford, 2010;Pulliainen et al., 2020). This is critical in the high-latitude regions such as northern Alaska, where snow depth is the most important variable predicting the signal differences. The estimation of rainfall is also challenging in regions with sparse climate stations such as the tropics. An increasing number of climate data sets has been made publicly available, including the Modern-Era Retrospective analysis for Research and Applications, version 2 (MERRA-2; Gelaro et al., 2017), and the Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS). The snow depth product of MERRA-2 has been demonstrated as being superior to ERA5 in mainland China (Zhang et al., 2021). The CHIRPS precipitation was also validated to have an excellent performance in tropical Africa (Camberlin et al., 2019). Thus, it is possible that these climate products may produce a better merging quality for tropical and mountainous regions where the rRMSE values remained relatively high (Fig. S3).
Except for climatic layers, remote-sensing-based layers such as the normalized difference vegetation index (NDVI) could be useful for modelling the signal differences in vegetated areas. NDVI reflects the vegetation growth condition, which is the result of several environmental factors interact-ing. NDVI therefore contains kinds of multiple environmental information. In addition, aerosol could be a contributing factor to the signal differences, especially in deserts such as the Sahara, where the rRMSE values in the final LHScat data set remained relatively high (Fig. S3). The Sentinel-5P mission provides near-real-time, high-resolution aerosol products starting from the year 2018 (Ingmann et al., 2012). Analysis will soon be conducted to assess whether NDVI and aerosol layers can further improve the data-merging quality.
Another potentially useful data set to be included into our data-merging framework is the Oceansat-2 scatterometer (OSCAT). OSCAT also provides Ku-band backscatters akin to QSCAT but operating in a different period (between 2009 and 2014; Bhowmick et al., 2013). QSCAT operated in full mode between 1999 and 2009 and overlapped with ASCAT during 3 years (2007)(2008)(2009). Adding OSCAT will expand the overlapping period by 5 years (up to 2014), which could help further improve the data-merging method.
In Tao et al. (2022), linear regression was established to predict the signal differences from monthly rainfall amounts because the signal differences exhibited a good linear relationship with rainfall in tropical rainforest regions. Decision tree regression was also adopted in Tao et al. (2022) for a limited number of pixels mainly located in the ever-wet northwestern Amazonian and Asian tropical rainforests. This is because the relationship between signal differences and rainfall in these ever-wet regions is nonlinear. The present study used only the decision tree regression (Fig. 4) and used three climatic variables to increase the modelling accuracy. More advanced machine learning techniques are an option in the future.
The LHScat data set currently has a spatial resolution of 8.9 km, which is much higher than the ∼ 25 km resolution of previous microwave data sets (Liu et al., 2011;Moesinger et al., 2020). This was achieved partly by reducing the temporal resolution, since the SIRF algorithm requires multiple-orbit satellite passes to obtain a fine spatial resolution (Long et al., 1993;Early and Long, 2001). We further composited the images into a monthly temporal resolution for facilitating global-scale studies, such as global vegetation biomass and soil moisture estimations. However, we acknowledge that the monthly temporal resolution might be less useful for local-scale studies requiring frequent observations such as phenological monitoring (Pfeil et al., 2020). New versions of LHScat with even higher spatial and temporal resolutions are being created using the methodology developed in this study. Higher spatial resolution can be achieved by merging only QSCAT and ASCAT images. As stated in the introduction to this paper, the BYU data centre provides QSCAT and ASCAT images at the resolution of 4.45 km. It is therefore possible to generate a global Cband radar data set at 4.45 km resolution but with a shorter time span (since the QSCAT mission started in 1999). It is also possible to have higher temporal resolutions, such as time-averaged (such as weekly) resolutions (Lin et al., 2016). New versions of LHScat will be made publicly available at https://doi.org/10.6084/m9.figshare.20407857.
C-band radar data have been widely used in Earth science studies for monitoring vegetation dynamics, mapping deforestation and soil moisture, and estimating snow water equivalent (Chang et al., 1982;Clifford, 2010;Kelly et al., 2003;Liu et al., 2009;Saatchi et al., 2013;Steele-Dunne et al., 2017;Smith and Bookhagen, 2018). Thus, the merged radar signals are expected to be useful in a range of research disciplines. A possible outcome is to separate the signal into soil moisture and vegetation optical depth (VOD). In this way, the signals can be more directly related to the soil or vegetation dynamics. Technically, extracting VOD and soil moisture from LHScat signal is feasible with the help of the Water Cloud Model , and efforts are being devoted to developing a LHScat VOD data set at the global scale. Considering its long time span (since 1992) and high resolution, LHScat VOD would be suitable for the assessment of long-term global vegetation changes. Using the optical Moderate Resolution Imaging Spectroradiometer (MODIS) leaf area index data, a recent study found that most of the world's vegetated areas are becoming greener, particularly in China and India (Chen et al., 2019). Using the optical vegetation index NDVI, another recent research explored the long-term (2000-2020) resilience change in global forests (Forzieri et al., 2022). It would be interesting to re-evaluate the vegetation trends using LHScat VOD data. While radar signal penetrates the upper forest canopy and interacts directly with the water molecules contained in forest biomass, optical greenness data reflect the canopy features of the topmost leaf layer which could be maintained due to leaf demography or light availability (Guan et al., 2015;Wu et al., 2016). We therefore expect the LHScat VOD to provide new insights into the long-term changes in global forests.
Author contributions. ST, AZ, and YL designed the research. ST and AZ analysed the data, with input from other authors. ST, JPW, SSa, PC, JC, TL, and PLF wrote the paper. All authors interpreted the results and edited the text.
Competing interests. The contact author has declared that none of the authors has any competing interests.
Disclaimer. Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.