A global long-term (1981–2019) daily land surface radiation budget product from AVHRR satellite data using a residual convolutional neural network

. The surface radiation budget, also known as all-wave net radiation (R n ), is a key parameter for various land surface processes including hydrological, ecological, agricultural and biogeochemical processes. Satellite data can be effectively used 10 to estimate R n , but existing satellite products have coarse spatial resolutions and limited temporal coverage. In this study, a point-surface matching estimation (PSME) method is proposed to estimate surface R n using a residual convolutional neural network (RCNN) integrating spatially adjacent information to improve the accuracy of retrievals. A global high-resolution (0.05°), long-term (1981–2019), and daily mean R n product was subsequently generated from Advanced Very High-Resolution Radiometer (AVHRR) data. Specifically, the RCNN was employed to establish a nonlinear relationship between globally distributed ground measurements from 523 sites and AVHRR top of atmosphere (TOA) observations. Extended triplet collocation (ETC) technology was applied to address the spatial scale mismatch issue resulting from the low spatial support of ground measurements within the AVHRR footprint by selecting reliable sites for model training. The overall independent validation results show that the generated AVHRR R n product is highly accurate, with R 2 , root-mean-square error (RMSE), and bias of 0.84, 26.77 Wm -2 (31.54%), and 1.16 Wm -2 (1.37%), respectively. Inter-comparisons with three other R n products, i.e., the 5 km Global Land Surface Satellite (GLASS), the 1° Clouds and the Earth’s Radiant Energy System (CERES), and the 0.5°


Introduction
Net radiation (Rn), which characterizes the surface radiation budget, is the difference between downward and upward radiation 30 across the shortwave (0.3-3.0 μm) and longwave (3.0-100 μm) spectra. The surface radiation budget links the atmospheric climate system to the land surface. Rn is thus a critical variable for studying Earth-atmosphere interactions, including meteorological, hydrological, and biological processes, and is also responsible for the redistribution of surface available energy (Sellers et al., 1997). Accurate characterization and quantification of spatial-temporal variations in surface Rn are essential for both scientific and industrial applications, such as hydrological modeling and water resource management (Hao et al., 2019).

35
However, because the spatial and temporal dynamics of surface Rn are affected by multiple surface features (e.g., albedo, emissivity, and land surface temperature) and atmospheric parameters (e.g., clouds, aerosols, ozone, and water vapor) (Wang et al., 2015b), existing surface Rn data suffer from large uncertainties Jiang et al., 2018;Yang and Cheng, 2020). Therefore, there is an urgent need for long-term, high-resolution surface Rn dataset to more properly understand its spatial pattern and temporal dynamics (i.e., seasonal and inter-annual variability).

40
Traditionally, historical Rn and surface radiative components have been measured at ground meteorological stations. These ground-based measurements are widely used to study spatiotemporal variations in regional surface radiation and to evaluate gridded products Zhang et al., 2020;Zhang et al., 2015). Nevertheless, the high cost of maintaining radiometers means that stations are sparely distributed, severely hindering our ability to study and understand the spatiotemporal variability of surface Rn at global scale. 4 many of these discrepancies are attributed to two aspects: first, the spatial representation mismatch between satellite data and ground-based measurements and, second, the neglect of spatial adjacent effects on surface radiation estimation.

90
The spatial scale mismatch between surface radiation for domain averages and ground point measurements with insufficient spatial representativeness (Jiang et al., 2019b;Barker and Li, 1997) has attracted attention for a long time in the development of ML  and in the evaluation of gridded products (Huang et al., 2016;Yang, 2020;Romá n et al., 2009).
However, many current studies still use matched point-surface sample datasets to train ML model regardless of the difference in spatial representativeness of matched point-surface data. The triple collocation (TC) technique (Stoffelen, 1998) was 95 considered as an appropriate upscaling approach for the impact of random measurement error on ground-based measurements in comparison to other complicated upscaling methods (e.g., the time stability approach and the block kriging algorithm (Crow et al., 2012)) . Furthermore, an extended triple collocation (ETC) method was proposed by Mccoll et al. (2014) and then applied to the validation activities of the Soil Moisture Active Passive (SMAP) level-2 surface soil moisture (SSM) product (Chen et al., 2017) and satellite surface albedo products (Wu et al., 2019). Therefore, the ETC technology is 100 employed to limit the effect of upscaling errors of ground measurements on the final surface Rn estimates at the satellite footprint scale.
Spatial adjacent effects should also be considered in the development of ML methods. With an increase in spatial resolution, horizontal inhomogeneities in the atmosphere have become increasingly important and reduce accuracy of surface radiation retrievals at higher spatial resolutions, especially in conjunction with high solar and viewing angles (Wyser et al., 2002); the 105 correlation between satellite TOA observations and surface radiation measurements weakens, and surface radiation cannot be accessed directly from satellite TOA data for individual pixels. Convolutional neural networks (CNNs) were initially designed to perform image recognition tasks; they can be readily used to extract various high-level, hierarchical, and abstract spatial pattern features from original multispectral or hyperspectral satellite images Ball et al., 2017). Using this approach, multiple environmental parameters and their spatially adjacent effects can be accounted for in the estimation of 110 surface Rn (Jiang et al., 2019b). Therefore, CNN represents a promising method for integrating potential spatially adjacent effects in surface radiation estimation Jiang et al., 2019b).
Several studies have successfully employed CNNs and other deep neural networks to retrieve surface parameters, such as global solar radiation (Jiang et al., 2019a), precipitation , and land surface temperature (Yin et al., 2020), with varying success rates. However, no study has yet attempted to retrieve global surface Rn using a CNN model. In this study, a 115 residual CNN (RCNN) based point-surface matching estimation method (PSME) is proposed for estimating global land surface Rn. Specifically, the RCNN model links ground-based Rn measurements with multiple image blocks of AVHRR TOA observation data, including reflectance in visible channels and brightness temperature in thermal infrared channels, along with other additional auxiliary variables. These auxiliary variables include angular information, i.e., solar zenith angle (SZA), viewing zenith angle (VZA), and relative azimuth angle (RAA), and daily Modern-Era Retrospective analysis for Research 120 and Applications, Version 2 (MERRA2) Rn data Zhang et al., 2016). Before training the RCNN, ETC technology is applied to select reliable sites at a global level to generate representative training samples, making the established statistical relationship more representative of surface Rn variation at the AVHRR footprint scale. After validation and comparison, the best-trained model was subsequently implemented through the proposed PSME scheme to generate a globalscale 39-year daily surface Rn dataset (1981-2019) with 0.05° resolution from the AVHRR data and MERRA2 reanalysis.

125
The remainder of this paper is organized as follows. Section 2 summarizes the characteristics of the data used for the reconstruction of the ETC triplet and PSME method development. Section 3 describes the ETC method for the selection of reliable sites and the PSME process for surface Rn estimation using the RCNN model. The results for selected sites based on the ETC, the performance of the RCNN model and the long-term spatiotemporal variation of the Rn dataset are presented in Sect. 4. The discussion is presented in Sect. 5. The data availability is described in Sect. 6. Finally, the conclusions are 130 presented in Sect. 7. at 523 globally distributed stations from 15 observation networks/programs, as shown in Fig. 1. Detailed information about 6 these observation networks/programs is listed in Table 2, and specific information about these sites is shown in Table S1.

140
These stations are maintained by multiple global and regional observation network organizations, such as Global Fluxnet, the Greenland Climate Network (GCNET) and the Programme for Monitoring of the Greenland Ice Sheet (PROMICE) . These stations vary in elevation from 4 to 5,063 m above sea level and are located in areas with different land covers, including forest, grassland, shrubland, wetland, cropland, ice/snow, and urban areas. The collective in situ measurements, therefore, represent an accurate and comprehensive dataset that is capable of accounting for surface Rn spatial-temporal 145 variation on a global scale.

AVHRR data
AVHRR TOA observations at five spectral channels (a visible band (0.55-0.68 μm), a near-infrared band (0.75-1.1 μm), a 180 middle-infrared band (3.55-3.93 μm), and two thermal bands (10.5-11.3 and 11.5-12.5 μm) were utilized for their comprehensive surface and atmospheric electromagnetic information. The National Aeronautics and Space Administration's (NASA) Land Long-term Data Record (LTDR) project produced a consistent long-term dataset by reprocessing Global Area Coverage (GAC) data, which were obtained from AVHRR sensors onboard the National Oceanic and Atmospheric Administration (NOAA) satellites (Pedelty et al., 2007). The primary reprocessing improvements included radiometric in-185 flight vicarious calibrations for the visible and near infrared channels, along with inverse navigation to relate a specific Earth location to each sensor's instantaneous field of view (Vermote and Kaufman, 1995;Pedelty et al., 2007). Multiple Climate Modeling Grid (CMG) data from AVHRR and MODIS instruments have been created for land climate studies (Xiao et al., 2017;Pedelty et al., 2007). In this study, we utilized a daily AVHRR TOA data product (AVH02C1) with a resolution of 0.05° from 1981 to 2019 to retrieve surface Rn estimates. Additionally, solar/viewing geometry data (i.e., SZA, VZA, and RAA)

190
were also incorporated into the model as the amount of solar radiation incident on the Earth's surface varies greatly under different geometric observation conditions. A summary of these gridded products and their attributes is presented in Table 3.

GLASS product
The GLASS daily surface Rn product from MODIS data, one part of the GLASS product suite , was produced using two sets of algorithms. The main algorithm primarily uses the well-documented conversion relationships between downward shortwave radiation and all-wave Rn combinations (Wang and Liang, 2009;Jiang et al., 2015). It also incorporates a combination of other meteorological variables under various environmental conditions, such as different 200 daytime lengths and land cover characteristics, which are designated based on the albedo and normalized difference vegetation index (NDVI). Multiple MARS learners were employed to establish efficient statistical relationships using GLASS downward shortwave radiation and MERRA2 meteorological variables, allowing land surface Rn to be estimated from these inputs across most spatial domains Jiang et al., 2015). Conversely, when surface solar radiation data were not available, the backup algorithm created a function that separately employed MODIS TOA observations to retrieve surface all-wave Rn 205 using the length ratio of daytime (LRD) classification method, which was accomplished by the genetic algorithm-artificial neural network (GA-ANN) . By using these two algorithms, the GLASS Rn product can provide seamless global land surface Rn estimates with a 0.05° resolution. Several studies have used in situ measurements to conduct evaluation studies, illustrating high accuracy performance as well as good application potential Guo et al., 2020). Thus, we used the GLASS daily Rn product covering 2000 to 2018 as a reference to help select reliable sites and validate the results 210 from this study.

CERES-SYN product
The CERES instruments onboard the Terra, Aqua, and Suomi National Polar-Orbiting Partnership (Suomi NPP) satellites observe the TOA global energy budget by measuring shortwave reflected radiation, longwave Earth-emitted radiation and all wavelengths of radiation at a spatial resolution of 20 km at nadir (Wielicki et al., 1996). The CERES Synoptic (CERES-SYN)

215
product combines CERES and MODIS observations with geostationary (GEO) data to provide hourly broadband TOA radiant flux and cloud properties (Doelling et al., 2013). The CERES-SYN product also contains computed TOA and in-atmosphere and surface fluxes based on the Fu-Liou radiation transfer model. Aerosol and atmospheric data were included as inputs to calculate the radiation flux. CERES-SYN fluxes were provided as a 1° gridded product with an hourly temporal resolution.
CERES-SYN surface Rn data have been evaluated in many studies, which indicate that the product has high accuracy, although 220 systematic overestimation exists in the surface net radiation flux data Jiang et al., 2016;. Thus, the CERES-SYN surface Rn obtained from four surface radiative components was used as a reference for comparison.

MERRA2 reanalysis
MERRA2, produced by the NASA Global Modeling and Assimilation Office (GMAO), is the latest global atmospheric product and employs satellite observation data from 1980 to the present. The MERRA2 reanalysis assimilates space-based observations 225 of aerosols and represents their interactions with other physical processes in the climate system. The goals of MERRA2 are to provide a regularly gridded, homogeneous record of the global atmosphere, and to incorporate additional climatic variables and conditions, including trace gas constituents (stratospheric ozone), improved land surface representation, and cryospheric processes (Gelaro et al., 2017b). The MERRA2 products have a 0.5° × 0.625° spatial resolution and hourly temporal resolution.
Previous studies Delgado-Bonal et al., 2020) have confirmed that the MERRA2-calculated 230 surface Rn and its radiative component provide outstanding accuracy and a reasonable spatial-temporal distribution compared to other reanalysis data. Therefore, MERRA2 Rn data calculated from four surface radiative components were also used in this study to help retrieve accurate high-resolution surface Rn estimates by providing average atmospheric information.

Methods
The entire workflow of the RCNN-based PSME method is shown in Fig. 2. First, the ETC technology was applied to the triplet 235 system constructed from ground-, satellite-, and model-based Rn data to identify reliable sites at which measurements can well represent the dynamic variation of surface Rn at a 5 km scale. Then, AVHRR TOA reflectance, brightness temperature, angular information and MERRA2 Rn were matched with the ground-based Rn measurements, both spatially and temporally.
Specifically, the site-measured Rn data were collocated with the 5 km AVHRR grid product covering the site. If one grid in the AVH02C1 product covered multiple sites, the mean values from these sites' measurements were used to match the grid 240 data. Subsequently, the matched input-output training samples were fed into the RCNN to train the model. Reference Rn measurements taken from reliable sites were used to evaluate the model's performance and, subsequently, identify the best option to produce surface Rn by tuning the hyper-parameters of the RCNN. Finally, surface Rn retrievals were generated using the best-trained model for the global scale, and CERES-SYN and GLASS Rn products were applied to perform intercomparisons to illustrate the accuracy and spatiotemporal variation of the surface Rn retrievals.

Extended triple collocation (ETC)
To address the spatial scale mismatch issue owing to the small spatial support of sparse ground measurements in comparison to gridded satellite data, which introduces large uncertainty into the collaborative inversion process, triplet collocation (TC) technology was employed (Stoffelen, 1998;Yuan et al., 2020a). Specifically, based on the availability of three collocated, 255 independent measurement systems describing the same geophysical variable, TC was designed to estimate the unknown error standard deviations (or RMSEs) of three mutually independent measurement systems, without treating any one system as perfectly observed "truth" (Stoffelen, 1998;Gruber et al., 2016). To perform the TC, the following assumptions were made: 1) each of the triplet is related with the unknown truth of the geophysical variable in the linear form; 2) zero cross-correlation across each of the triplet; 3) zero error cross-correlation between the triplet and the true signal state (T); and 4) the signal and 260 error statistics are stationary (Chen et al., 2017). Following the first assumption, the independent triplet systems (Xi, Xj, and Xk) are related to the unknown true quantity in a linear error model: where and are the additive and multiplicative bias terms, respectively; and is the mean-zero random error. Similar calibration constants ( , , , and ) and random error terms ( and ) are also defined for Xj and Xk.

265
The objective of TC is to find a solution that individually estimates the variance of the random error term ( ) for each of the triplet based on the listed assumptions (Stoffelen, 1998;Yuan et al., 2020a). However, to obtain the calibration constants, one dataset is chosen from the three collocated measurement systems as the reference dataset, and the other two are rescaled into the same reference data space. This results in a dependency of the error variance of the other two datasets on the climatology of the scaling reference (Draper et al., 2013;Yuan et al., 2020a). To deal with this issue, ETC technology was 270 proposed by Mccoll et al. (2014), based on the same assumptions as TC, to estimate an additional evaluation metric independent of the reference dataset, i.e., a correlation coefficient ( ( , ) ) of each measurement system with respect to the unknown target variable as formulated below: where ( , ) is correct up to the sign ambiguity, as the measurement systems will almost always be positively correlated to 275 the unobserved truth.
Following the ideals of Chen et al. (2017) and Yuan et al. (2020a), ETC was applied to determine the reliable site measurements over the AVHRR data footprint scale (i.e., 5 km grid). Specifically, the triplet dataset was first constructed using ground-based measurements (i.e., site measurements), satellite-derived retrievals (GLASS Rn) and downscaled model-based simulations (MERRA2 Rn) depending on the conversion ratio of GLASS Rn between original (0.05°) and aggregated (0.5°) spatial 280 resolutions, as they belong to different measurement systems and are not dependent on each other. Then, ( , ) was calculated using Eq. (2) for individual site, illustrating the fraction of 5 km satellite footprint-scale Rn dynamics captured by point-scale ground-based measurements. An appropriate threshold of ( , ) was determined to select reliable sites with the greatest representativeness within a 5 km footprint to obtain sample datasets for the model training. After testing a series of thresholds between 0.2 and 0.9 at intervals of 0.1, a threshold of 0.9 was selected, above which the sites were assigned as 285 'reliable' (also see Sect. 5). Based on these reliable sites, errors of upscaling reliable site-based measurement to a 5-km scale can be weakened to a certain degree due to its better representativeness within the AVHRR footprint.

RCNN
As the spatial resolution of satellite sensors increases, the spatial adjacent effects induced by spatially inhomogeneous 290 atmospheric constitutes (or clouds) fields become more significant, for example, clouds affect the distribution of surface radiation in a region larger than the resolution of an individual pixel. One spatial adjacent effect is the diffusion of radiation that removes part of radiation from an atmospheric column and transfer it to neighboring columns. Two other effects are related to the solar and viewing geometry, such as a shift of the apparent position of clouds and their shadows. Surface Rn is no longer accurately estimated with retrieval algorithms based on the individual pixel approximation (IPA). Comprehensive information 295 within a certain spatial extent centered at reliable sites needs to be considered to help retrieve surface Rn.
Loosely inspired by the human visual cortex, CNNs were originally applied to analyze common visual imagery using convolution instead of general matrix multiplication (Ball et al., 2017). CNN model can extract features hierarchically from input multi-channel images using multiple filters. Therefore, the most important feature information regarding reliable sitebased Rn measurements can be effectively extracted by CNN within a certain spatial extent rather than on IPA, to help retrieve 300 Rn, which weakens the spatial adjacent effects to a certain extent. A general CNN consists of multiple layers of operations, such as convolution, pooling, normalization, and nonlinear activation functions. In the convolutional layers, a series of convolution (Conv) operations are performed using convolutional kernel weights and biases on the input images within the receptive field. The result of the locally weighted sum (feature map) is then passed through a nonlinear layer, such as a rectified linear unit (ReLU), which increases the nonlinear properties of the decision function and the overall network (Romanuke, 305 2017). The pooling layer, a form of non-linear down-sampling, merges semantically similar features into one, thereby reducing the amount of computation in the network (Gé ron, 2019). Additionally, a batch normalization layer is placed between the convolutional layers and nonlinearities to speed up the training of the CNN and reduce the sensitivity to network initialization.
By stacking two or three stages of convolution, nonlinearity, and pooling, followed by more fully connected layers, a typical CNN architecture is built.

13
To improve network performance in complicated tasks, a deeper CNN architecture is needed; however, a deeper neural network is difficult to train well because a degradation problem occurs with deeper networks converge (He et al., 2016). Specifically, as the network depth increase, the training accuracy becomes saturated and then degrades rapidly. To address the degradation problem, He et al. (2016) proposed a residual block to improve the gradient flow through the network , which enables the training of deeper networks. Residual blocks were employed in our CNN architecture to construct the RCNN. The structure of 315 the RCNN proposed in this study is shown schematically in Fig. 3. Table 4 lists the detailed configurations of the proposed RCNN.
The input signals of the RCNN included AVHRR TOA reflectance and brightness temperature, angular information (SZA, VZA, and RAA), and daily MERRA2 Rn with a spatial size of 15×15 pixels (these specifications are further discussed in Sect.

5).
To avoid introducing new errors, nearest-neighbor interpolation was used to resample MERRA2 Rn to 0.05° to match the 320 spatial resolution of the other predictors. The output signal was ground-based Rn from the reliable sites. Essentially, the RCNN model uses a convolution operation taken at stages of the feature map and residual learning block to recognize spatial patterns centered on a reliable site. Then, the multiple layer perceptron links abstract spatial patterns with ground-based measurements to construct a strong non-linear relationship to reproduce the spatial and temporal variation of surface Rn. This approach had been carried out in previous studies (Jiang et al., 2019b;Jiang et al., 2020a;Jiang et al., 2019a).

RCNN model training and evaluation
Sample data from the reliable sites in the training site group (red circles as shown in Fig. 1) were used to train the RCNN model; datasets from reliable sites in the independent validation site group (blue circles shown in Fig. 1) served as test datasets

335
to independently evaluate the model's performance. Specifically, in the training process, 10-fold cross-validation (CV) was used to test the model's predictive power. All of the sample datasets from the reliable training sites were randomly shuffled and divided into ten groups. One group of these data was then removed as a hold-out or validation dataset and the remaining nine groups of data were treated as the training datasets. The training datasets were used to fit the RCNN model, and the validation datasets were applied to evaluate the trained model's performance to fine-tune the model's parameters. The process 340 was repeated ten times to ensure that each group of data validated the model, and the remaining nine groups of data were trained. Finally, the evaluation results were presented by summarizing and averaging the ten evaluation scores. After determining the hyper-parameter settings using the CV, the model was trained again using datasets from all the reliable training sites, which was then independently evaluated using the test datasets from the reliable validation sites.
The following five evaluation metrics were used to evaluate the performance of the RCNN model and the Rn retrievals: bias, 345 relative bias (rbias), RMSE, relative RMSE (rRMSE) and the coefficient of determination (R 2 ). Detailed information regarding the application of these metrics can be found in Yang et al. (2018).

Identification of reliable sites
The number of reliable and unreliable sites for each observation network, identified by a threshold of 0.9 for the ETC-derived 350 correlation coefficient, is listed in Table 5. A total of 262 sites could be considered reliable, accounting for ~50% of the sites.
Furthermore, no site was considered reliable for some observation networks/programs, namely ChinaFlux, GCNET, GAME.ANN, HiWATER, IMAU-Ktransect, and LBA-ECO. It is not surprising that sites from the ChinaFlux network were assigned as unreliable; several studies have revealed that the reliability of site measurements from China is questionable and should be examined carefully before use Tang et al., 2013;Tang et al., 2011). The GCNET network is 355 located in inner Greenland, where systematic measurements errors are common due to difficulties in instrument maintenance and operation-related failures. Sites from the GAME.ANN network are located in the Tibetan Plateau (TP) region, where abnormal climate and complex terrain make it difficult to accurately measure variations in Rn. Similar issues also exist in in situ measurements from the IMAU-Ktransect, HiWATER and LBA-ECO networks. In contrast, some of the international observational networks, such as BSRN (Ohmura et al., 1998) and FluxNet (Wilson et al., 2002), provide many ground-based 360 measurements with sufficient spatial representativeness for Rn at 5 km resolution. In addition, the ARM (Stokes and Schwartz, 1994) and SURFRAD (Augustine et al., 2000) networks were classified as containing reliable sites. In situ measurements from the SURFRAD (Augustine et al., 2000) network were well known in surface radiation budget studies because of their high data quality, and have been widely utilized as a result (Wang et al., 2015b;Hao et al., 2019;Wang et al., 2015a;Qin et al., 2012). Overall, compared to other networks, the sites from ARM, BSRN, SURFRAD, and FluxNet networks were mostly 365 identified as reliable sites, illustrating the superiority of these observation networks.
The spatial and proportion distributions of the reliable training and validation sites for different surface types are presented in Fig. 4. The most reliable sites are distributed in the United States, Europe, and East Asia. In turn, many sites located in South America, Africa, Eurasia, and the Polar Regions were identified as unreliable. The reasons that these sites were classified as unreliable are closely related to the complex surface and atmospheric environment and poor instrument maintenance in their 370 corresponding regions. Most grassland and cropland sites were classified as reliable (~66% and ~62%, respectively), whereas the fewest reliable sites were classified in ice/snow-covered areas (~14%). In addition, sites neighboring the sea were mostly identified as unreliability due to the presence of large water bodies within the satellite footprint. Thus, the processing of identifying reliable sites highlights the need to pay more attention to such areas for surface radiation estimations.

Assessment of the RCNN model
Ten-CV was used to evaluate the performance of the RCNN model at reliable and all training sites, respectively, and the evaluated performances are summarized in Table 6. Note that the model fitting result represents the model with the best fitting accuracy over the 10 CV rounds, while the cross-validation results are the averages of the 10-round combination. 2 , the RCNN model trained using ground-based Rn measurements obtained at the reliable sites is considered as the best-trained model, which was selected for the subsequent analysis.

Figure 5: Scatterplots of (a) mode training (fitting) accuracy and (b) model test accuracy for the reliable training and independent
validation sites. The color bar illustrates the normalized density of samples. and 20.9% for day and night, respectively) over the sun-glint ocean and polar regions both during daytime and nighttime (Xi et al., 2018;Xu et al., 2020). Additionally, when the aerosol optical depths (AODs) used to calculate CERES-SYN surface 415 solar radiation are compared with the ground-based observations, the calculated shortwave radiation is 1-2% higher (Fillmore et al., 2021). Therefore, large uncertainties in these atmospheric input parameters may lead to serious overestimations of the CERES-SYN Rn. In addition, the MERRA2 Rn shows the lowest accuracy, with an RMSE of 20.87 Wm -2 , reflecting the reanalysis model's inability to accurately describe the evolution of cloud properties (Betts et al., 2006). Comparatively, the AVHRR retrievals show a better accuracy than the other three products. In addition, our estimates have a higher spatial 420 resolution compared to the CERES-SYN and MERRA2 data.

Evaluation of the RCNN-based AVHRR Rn retrievals
Compared to the validation results at the reliable sites, the accuracy evaluation at all sites shows the ability of the RCNN to accurately capture Rn variation at a global scale, even though some measurements from unreliable sites added large uncertainties to the final evaluation. Figure S2 shows a comparison of results for the four datasets at all of the sites. Overall, the AVHRR and GLASS Rn retrievals were still better than those of CERES-SYN and MERRA2; however, the accuracy of 425 AVHRR Rn decreases slightly, with R 2 , RMSE, and bias values of 0.85, 26.51 Wm -2 (35.50%) and 1.32 Wm -2 (1.76%), which is comparable to the GLASS Rn retrievals, with values of 0.85, 26.56 Wm -2 (35.38%) and -0.83 Wm -2 (-1.10%), respectively.
Therefore, even if the RCNN model is trained using measurements from less reliable sites, it still accurately reproduces surface Rn distributions at global scale. In the following analysis, the GLASS Rn retrievals were used as the main comparison because of their high accuracy and reasonable spatiotemporal variation Jiang et al., 2018).

Figure 6: Scatterplots of product validation for (a) AVHRR, (b) CERES-SYN, (c) MERRA2, and (d) GLASS at the reliable sites.
Inter-comparison results for the AVHRR and GLASS Rn retrievals against the ground-based measurements over each network are displayed in Fig. 7  To further improve understanding of the temporal variations of the AVHRR Rn retrievals, coincident time series from all the Rn datasets were inter-compared over seven sites representing different surface types, as shown in Fig. 8

460
Rn values only capture the general Rn trend in the snow/ice areas. The GLASS Rn retrievals are also most consistent with the in situ measurements among the four datasets, although small biases still exist.
The overall evaluation results of the AVHRR and GLASS Rn retrievals for different surface types are displayed in Table 7.
Generally, both datasets achieved high accuracy, with RMSEs ranging from 20 to 25 Wm -2 . The AVHRR Rn retrievals show the better performance for most of the surface types, except for snow/ice, as previously discussed; however, the difference in 465 the RMSEs between the AVHRR and GLASS Rn retrievals for the snow/ice cover type is small (0.5 W/m -2 ). Together, these results further indicate that the RCNN model can generate accurate Rn estimates for different land cover types.

Analysis of influencing factors
Variation in surface Rn is mainly affected by atmospheric conditions, but also, to a lesser degree, by surface characteristics (He et al., 2015). Under clear-sky conditions, AOD and column water vapor (CWV) are the main atmospheric constituents that modulate surface shortwave and longwave radiation, and further affect spatiotemporal variations of surface Rn. In contrast, 480 clouds and CWV control surface Rn dynamics under cloud-sky conditions, especially clouds that have significant impacts on shortwave and longwave radiation. Therefore, AOD, CWV, and cloud optical thickness (COT, as a surrogate for cloud optical properties) derived from MERRA2 were employed to analyze the sensitivity of the accuracy of the AVHRR and GLASS Rn retrievals to variations in these influencing factors. In addition, Rn retrieval performance at different elevations was also evaluated.

485
All the evaluation results are displayed in Figure 9. The AVHRR Rn retrievals were always better than the GLASS Rn retrievals under all conditions of the four influencing factors, except for elevations ranging 800-1000 and 1200-1500 m, which demonstrates the superiority of our algorithm. Specifically, as the COT increases (i.e., increasing cloud thickness), the AVHRR and GLASS Rn RMSE values increase accordingly but still remain relatively low for both datasets (< 27 Wm -2 ). Note that the differences in RMSE between the two datasets also increased with increasing COT (Fig, 9(a)). A small COT indicates relative 490 clear-sky conditions, which results in surface total solar radiation dominated by direct solar radiation. Therefore, the performance of the RCNN model and the MARS models used for the GLASS Rn product  is comparable with regard to the accuracy of their Rn retrievals. However, when the absorption and scattering effects are enhanced for direct solar radiation from TOA, depending on the IPA, it is difficult to retrieve the total surface Rn accurately using the MARS model because the spatially adjacent effects (i.e., 3-D effects from clouds) are not considered. Although the RMSEs of the 495 AVHRR retrievals also increase, the rate of increase is lower than that of the GLASS Rn retrievals. This is because the RCNN model recognizes spatial textural and contextual information and comprehensively considers atmospheric conditions within a certain area rather than on IPA, which to some extent addresses the spatially adjacent effect on the accuracy of AVHRR Rn retrievals.
Aerosols also have absorption and scattering influences on solar radiation, and therefore, a similar conclusion can be drawn.

500
For example, when AOD increases from 0.3 (a non-clean atmosphere), the difference in the performance of the two retrieved model becomes more pronounced. The AVHRR Rn retrievals maintain a stable level of uncertainty (RMSEs = 22-24 Wm -2 ), while the errors GLASS Rn retrieval errors increase dramatically (up to 29 Wm -2 ). This illustrates the importance of integrating the spatial adjacent effect into the inversion model under hazy atmospheric conditions.
In the case of CWV, which has a strong influence on longwave radiation, when the condition is < 50 kgm -2 , the accuracy 505 differences between the AVHRR and GLASS Rn values are small (< 1 Wm -2 ). However, as CWV increases, the RMSEs of the GLASS Rn retrievals increase dramatically, while the AVHRR Rn estimates maintain a high level of accuracy (RMSEs = ~23 Wm -2 ).
With respect to elevation, there was no notable difference between the two datasets; our estimates were better than GLASS Rn under different elevation ranges, except for the 800-1,000 and 1,200-1,500 m bins, although these differences were less than 510 1 Wm -2 . The lower accuracy of AVHRR Rn values for these two elevation ranges is attributable to the less reliable sites used for the RCNN training. In addition, the AVHRR Rn retrievals show steady and very low (close to zero) biases under different conditions, while the biases of the GLASS Rn retrievals show a high degree of variation. This illustrates that the RCNN model has a greater capability for unbiased surface Rn estimation.
Overall, the RCNN-derived Rn retrievals show a high accuracy under different atmospheric and surface conditions relative to 515 the GLASS Rn retrievals and especially for thick-cloud and hazy atmospheric conditions. In such cases, spatially adjacent information is important for accurately estimating the surface Rn retrievals. Although previous studies have proposed several methods for integrating spatial information to retrieve surface and atmospheric variables, such as PM2.5 Wang et al., 2020a), ozone (Li and Cheng, 2021), and nitrogen dioxide , these methods only considered discrete surrounding points within a certain area to train the model using IPA. This artificially destroys the natural correlation between 520 the target and the surroundings. Our RCNN model can automatically recognize complete spatial information centered at an interesting location and, thus, is a more reasonable and effective method.

Figure 9: Accuracy changes in AVHRR and GLASS Rn retrievals under different conditions for (a) cloud optical thickness (COT), (b) aerosol optical depth (AOD), (c) column water vapor (CWV), and (d) elevation. The values in parentheses on the left-axis
525 correspond to the RMSE differences denoted by bar charts. The shaded area show the variance ranges of the biases.

Spatiotemporal analysis
The global-scale spatial distributions of mean AVHRR and GLASS surface Rn for January and July 2008 are displayed in Fig.   10. The missing values in the Polar Regions reflects the unavailability of valid data at the five bands in the case of the AVH02C1 product. The overall distribution of surface Rn for the two datasets is very similar, although slight differences exist 530 in some regions, such as the TP region, the Sahara Desert, and Greenland. AVHRR Rn retrievals are notably larger than the GLASS Rn retrievals in the TP region. Based on the results shown in Fig. 9(d), greater confidence can be placed in the AVHRR Rn retrievals for high-elevation regions relative to GLASS Rn retrievals; however, the AVHRR Rn retrievals are relatively lower in Greenland. Because few sites from the GCNET and PROMICES networks were classified as reliable for the model training, the RCNN model has less knowledge about the spatiotemporal variations of Rn in Greenland compared to other 535 regions. The validation results in Fig. 8 and Table 7 for the ice/snow surface cover type further confirm that GLASS Rn product may offer a better performance in Greenland region. Therefore, new algorithms and data are required for the Polar Regions to address this problem.

540
The spatiotemporal consistency of the AVHRR and GLASS daily Rn retrievals against COT was examined at a global scale in January and July 2008, respectively, as shown in Fig. 11. Overall, the spatial consistency is high for the two datasets.
Specifically, in January, as the COT increases, the daily mean Rn values and the absolute differences between the two datasets also increase. As shown in Fig. 9(a), when COT increases, the AVHRR Rn retrievals are more accurate. Thus we believe that 545 the large discrepancies under high COT conditions are mainly attributed to the uncertainty of GLASS Rn retrievals. In July, surface daily mean Rn remained relatively stable under different COT conditions, and the absolute differences between the two datasets also remain steady, with a mean absolute difference of about 20 Wm -2 .
Based on the previous analysis, spatially adjacent information is important for surface Rn estimation when COTs values are large; however, if the cover of the entire cloud layer is small compared to the scale of the AVHRR footprint, the spatial adjacent 550 effects will be significantly weakened in the inversion process, even if the corresponding COT is large. IPA-provided information includes the properties of the entire cloud layer. Figure S3 shows the spatial distribution of the monthly mean cloud cover fraction (CF) at the global scale in January and July, and the corresponding differences in CFs (Jan.-July). In January, the CFs are higher than in July over most land regions except in Central Africa, Southern Asia, Southern Australia and Antarctica. However, most regions had smaller CFs in July. The differences in CFs for the two months are also marked; 555 the positive differences demonstrate that more than 72% of the land pixels had a higher CFs in January than in July. The spatial adjacent effects induced by clouds are more significant on surface Rn in January than in July. Therefore, when large and thick cloud layers exist, such as in the Polar Regions, CNN is a better choice for surface Rn estimation, especially for downward longwave radiation (DLR) because the temperature of cloud-base, which is an essential variable in the parameterized calculation of DLR, is difficult to retrieve from multispectral remote sensing (Yang and Cheng, 2020).

565
To examine the temporal reliability of the generated AVHRR Rn dataset, a long-term analysis of surface Rn for the four datasets was carried out, the results of which are shown in Fig. 12

570
Rn values are overestimated against ground-based measurements. GLASS Rn temporal profile is more consistently correlated with the AVHRR Rn retrievals.
Note that the LTDR project only uses afternoon satellite to generate the AVHRR product to do with the high uncertainty of the atmospheric correction algorithm when applied to low sun elevation pixels present in morning (am) satellites. Afternoon satellites include NOAA-7, NOAA-9, . The use of 575 these satellites alone inevitably leads to small gaps in the data in exchange for a higher accuracy in the atmospheric correction.
The time series is not fully complete and presents some observational gaps. Specifically, some large discrepancies occur, during some periods including 1994-1995, 1999-2000, 2007-2008, and 2018-2019. These periods correspond to the alternative update times of the NOAA-series satellites. For example, NOAA-11 was successfully succeeded by NOAA-14 from 1994 to 1995. Important gaps and noise were found in the images from March to September and empty data from 580 September to December, due to NOAA-11 orbital degradation. NOAA-16 replaced NOAA-14 in 2000 for monitoring of the Earth's surface and atmosphere. During these periods of satellite replacement, the corresponding AVHRR data also contain large gaps. Similarly, NOAA-20 was launched on November 18, 2017, yet the quality of the AVHRR TOA observations from this platform was poor due to important gaps in the images and the presence of artefacts. This explains the abnormal temporal variations in the AVHRR Rn profile in these years shown in Fig. 12(a). Therefore, effective multi-source data fusion algorithms 585 and spatial gap-filling technology are urgently needed to improve the quality and coverage of the AVHRR Rn dataset.
The temporal variations in monthly Rn anomalies for the four datasets are shown in Fig. 12 (Vermote and Kaufman, 1995), which enables a temporally reliable Rn dataset to be produced. Following the approach, overall, our AVHRR Rn dataset is more accurate and shows reasonable spatiotemporal variations compared to the other three datasets. This dataset will play an important role in climate change study.   Fig. 13. Overall, the average R increases from 0.61 to 0.708, and RMSE decreases from 50.12 to 46.17, respectively, for the MLR model. As the valid spatial extent increases, essential and complete spatial features are exposed and incorporated into the MLR model, which helps to continuously improve the model's retrieval accuracy. The spatial extent of B13 (approximately 65 × 65 km 2 ) is the smallest size that exhibits convergent R and RMSE values, and the spatial extent at the B15 reaches a more stable state 610 for surface Rn estimations. This finding is in line with the results of Jiang et al. (2020b), showing that scale effects have a considerable impact on solar radiation retrieval accuracy, and distances of approximately 20 to 40 km from the central point (corresponding to areas of 40 × 40 km 2 to 80 × 80 km 2 ), are optimal spatial scale. In addition, previous studies (Hakuba et al., 2013;Huang et al., 2016) recommended a threshold distance of approximately 30 km, equal to a 13 × 13 grid region with a spatial resolution of 0.05°, for shortwave radiation estimation. Therefore, a 15 × 15 grid area was selected for the input sub-615 images to generate AVHRR Rn retrievals.

620
The RCNN model uses instantaneous satellite-sensed signals to directly estimate daily mean Rn retrievals. Though some previous studies Wang et al., 2015a;Wang and Liang, 2017;Xu et al., 2020) directly estimated dailyaveraged surface radiation from instantaneous satellite observations, like MODIS, the idea is theoretically flawed because the AVHRR sensor only offers instantaneous "snapshots", which cannot capture daily mean information about the diurnal cycles of the atmosphere and clouds. King et al. (2013) acknowledged that the frequency of cloud variations is high at different times 625 and locations based on twin MODIS cloud products. In view of the wide satellite overpass times over a particular location, e.g., equatorial crossing time generally ranges from 1300 to 1730 in local time, representing different instantaneous atmospheric conditions for different AVHRR sensors, daily mean MERRA2 Rn is incorporated into the input collection to provide daily mean information about the surface, atmosphere and clouds for the RCNN model. Figure 14 shows the effect of the daily mean MERRA2 Rn on the final AVHRR Rn retrievals at different AVHRR overpass 630 times in local time. The improved effect is slightly more significant during the afternoon than in the morning when more overland clouds are present (King et al., 2013). This improvement is also more pronounced during the night. The AVHRR Rn retrievals can only be obtained when solar radiation is available (Fig. 10) because of the missing values in the AVH01C1 product; therefore, the results during the night are based on the validation results for high latitudes, which demonstrate that daily mean information about the diurnal cycles of the atmosphere and clouds is more important for daily surface radiation 635 estimation at high latitudes than that at middle and low latitudes. Shupe et al. (2011) found annual cloud occurrence fractions are 58%-83% at the Arctic observatories, with a clear annual cycle wherein clouds are least frequent in the winter and most frequent in the late summer and autumn.
Additionally, MERRA2 downward shortwave radiation (DSR) was used as a replacement for MERRA2 Rn to test its contribution to daily mean surface Rn estimations when using instantaneous satellite data. The results presented in Fig. S4

640
show that the improved effect of daily MERRA2 DSR is not comparable with that achieved using daily MERRA2 Rn, and the former is closer to the results obtained when only instantaneous AVHRR observations are used. Therefore, MERRA2 Rn is a meaningful input for the RCNN model. Moreover, the AVHRR Rn retrievals could also be further improved by using more accurate daily mean Rn data, such as GLASS Rn, or other parameters that accurately represent daily mean atmospheric and cloud variations. The bars indicate RMSE and lines indicate absolute biases. The shading shows the variation range of absolute bias.

Determination of a threshold for the ETC-derived correlation coefficient
The threshold for the ETC-derived correlation coefficient between in situ measurements and the unknown truth within the 5 650 km AVHRR grid in Eq.
(2) affects the selection of reliable sites and the subsequent PSME process. A series of thresholds for the ETC-derived coefficients were considered, ranging from 0.2 to 0.9 with an interval of 0.1. In each case, the corresponding measurements from the selected reliable sites were fed into the RCNN model to train and subsequently generate AVHRR Rn retrievals. Then, the training and test accuracies of the RCNN were calculated over all of the reliable sites for comparison.
Another important consideration is the representativeness of the RCNN for global Rn estimation, given that the number of 655 reliable training sites decreases with higher thresholds. Thus, the trained RCNN model was again evaluated at all sites, including reliable and unreliable sites, to examine the global representativeness. The number of reliable sites (training and test accuracies) and the associated global accuracy are presented in Fig. 15. As the threshold increases, the number of reliable sites decreased. The training and test relative RMSEs of the RCNN model showed a general decreasing trend, especially above a threshold of 0.5, which illustrates that the selection of reliable sites and the measurements from these sites have better 660 representativeness for the AVHRR footprint scale using ETC. This helps address the spatial scale mismatch issue and improve the accuracy of AVHRR retrievals at a 5 km resolution. In addition, a trade-off between the RCNN's fitting accuracy at the reliable sites and the global accuracy at all sites needs to be considered. Even when a threshold of 0.9 was used, the global accuracy of the RCNN was only slightly lower, which explains why this threshold was applied in Sect. 3.1.

Orbital drift of the NOAA-series satellites
The orbital drift problem of the NOAA-series satellites has attracted the attention of users applying AVHRR-derived highlevel remote sensing products, such as land surface temperature (LST) (Ma et al., 2020;Liu et al., 2019) and TOA albedo 670 (Song et al., 2018). As shown in Fig. 16

Conclusions and outlook
A long-term (1981-2019) global daily surface Rn product with spatial resolution of 0.05° was generated from historical NOAA-695 series AVHRR data using a RCNN-based PSME method. The specific steps employed were as follows: (1) selecting reliable sites from all sites based on ETC to generate the sample dataset; (2) training and independent testing of the proposed RCNN model; (3) evaluating the AVHRR Rn retrievals against in situ measurements and performing inter-comparisons with three other Rn products (GLASS, CERES-SYN, and MERRA2); and (4) generating and evaluating the long-term AVHRR Rn product.

700
ETC was first applied to select reliable sites to prepare a sample dataset with better spatial representativeness at the AVHRR footprint scale (i.e., 5 km). In total, 262 sites were classified as reliable sites from a total of 523 sites and used as a sample dataset for the RCNN model. The proportions of the selected reliable sites representing cropland and grassland surfaces were highest (~66% and ~62%, respectively), while those representing ice/snow surface were lowest (~14%). The sample dataset from the 262 sites ensured that the trained RCNN model had both a good fitting accuracy for the reliable sites and global 705 accuracy across all sites.
A simple MLR model was used to examine the spatial adjacent effects on surface Rn estimation, and a spatial extent of 15 × 15 pixels (75 × 75 km 2 ) was then determined as the input size of the RCNN to provide sufficient spatial information. Based on 10-fold CV, the trained RCNN model achieved an R 2 of 0.90, with an RMSE of 20.76 Wm -2 (25.93%), and a bias of -0.11 Wm -2 (-0.14%); the corresponding independent validation values were 0.84, 26.77 Wm -2 (31.54%), and 1.16 Wm -2 (1.37%) at 710 the reliable sites, respectively. These results demonstrate the overall ability of the RCNN model to accurately predict surface Rn.
The results of an inter-comparison between the AVHRR Rn retrievals and three other products illustrated that our retrievals show a better accuracy against in situ measurements, with an R 2 of 0.90, RMSE of 20.99 Wm -2 (26.18%), and bias of -0.04 W/m 2 (-0.05%) at the reliable sites, and an R 2 of 0.85, RMSE of 26.51 Wm -2 (35.30%), and bias of 1.32 Wm -2 (1.76%) across 715 all sites. At the same time, the AVHRR Rn retrievals show better performance for different observational networks and surface cover types, except for the snow/ice surface cover. Under different elevations and atmospheric conditions, the AVHRR Rn retrievals performed better than the GLASS Rn equivalent, especially in the presence of thick clouds and hazy atmospheric conditions because of the integration of spatially adjacent information into the inversion process in the RCNN model. In addition, the spatiotemporal variation of the AVHRR Rn retrievals is similar to that of the GLASS Rn values, demonstrating 720 the ability of the RCNN model to generate a long-term global Rn product.
The long-term global Rn dataset generated by the RCNN model displays high accuracy and reasonable spatiotemporal variation at the global scale, which is suited to many applications including, for example, studies to understand the radiation budget and global climate change. Besides, compared to current satellite-derived Rn products, e.g., CERES-SYN and GLASS (2000present), a more long record  of the AVHRR Rn dataset shows its value in climate change studies. However, 725 further research is needed to solve some problems to further improve the data quality of the AVHRR Rn dataset. First, new algorithms and satellite data are needed to estimate surface Rn in the Polar Regions, such as MODIS data .
Second, an effective data gap-filling method or multi-source data-fusion algorithm is required to fill the data gaps over land, especially during periods of satellite replacement work. Third, coupled with spatially adjacent information, real-time temporal information, or historical information should be incorporated to further improve the accuracy of the Rn retrievals.

730
As a type of machine learning, deep learning involves using data-driven models to find potential relationships and patterns, and offers high adaptability to training data sample inputs. The predictive ability of a data-driven model completely depends on the limitations of the training dataset and in the case of Rn, the ability of the model to accurately portray spatiotemporal dynamics in areas where the availability of training data is relatively poor, such as for AVHRR Rn retrievals for ice/snowcovered surfaces. To address this problem, more physical knowledge is needed to fully utilize data-driven modeling to estimate 735 surface Rn under different atmospheric and surface conditions. In particular, more attention should be paid to understanding inherent physical processes in addition to obtaining optimal estimation by coupling physical process models with the versatility of data-driven machine learning (Reichstein et al., 2019).