A first global height-resolved cloud condensation nuclei data set derived from spaceborne lidar measurements

. We present a global multiyear height-resolved data set of aerosol-type-specific cloud condensation nuclei concentrations ( n CCN ) estimated from the spaceborne lidar aboard the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO) satellite. For estimating n CCN , we apply the recently introduced Optical Modelling of the CALIPSO Aerosol Mi-crophysics (OMCAM) algorithm to the CALIPSO level 2 aerosol profile product. The estimated n CCN are then gridded into a uniform latitude-longitude grid of 2 ◦ × 5 ◦ , a vertical grid of resolution 60 (cid:58) m from the surface to an altitude of 8 (cid:58) km , and a 5 temporal resolution of one month. The data spans a total of 186 months, from June 2006 to December 2021. In addition, we provide a 3D aerosol-type-specific climatology of n CCN produced using the complete time series. We further highlight some potential applications of the data set in the context of aerosol-cloud interactions. The complete data set can be accessed at https://doi.pangaea.de/10.1594/PANGAEA.956215 (Choudhury and Tesche, 2023).

aerosols are taken from CALIPSO's aerosol model . For marine aerosols, the microphysical properties are 90 derived from Sayer et al. (2012) because it yields aerosol number concentrations that agree better with airborne in-situ measurements (Choudhury et al., 2022). OMCAM first selects a normalized volume size distribution and refractive index based on the aerosol type identified by CALIPSO. The algorithm then scales the normalized size distribution linearly to reproduce the CALIPSO-estimated extinction coefficient as: where S V is called the volume scaling factor estimated from the ratio of α, the CALIOP-derived extinction coefficient, and α n , the extinction coefficient calculated from the normalized size distribution and refractive index using MOPSMAP.
For modelling α n , we treat continental (clean continental, polluted continental, and smoke) and marine aerosols as spheres and use Mie scattering theory. We consider desert dust aerosols as spheroids and use a combination of the T matrix and the 100 improved geometric optics method, depending on the value of aerosol size parameter, to model α n . The values of normalized size distribution parameters such as standard deviation (σ i ), volume fraction (ν i ), and mean radius (µ i ) of i th (fine and coarse) mode are listed in the Table : A1. After the scaling step, the volume size distribution is converted to a number size distribution, which is then integrated starting at 50 : nm to compute n 50,dry for continental and marine aerosols and at 100 nm to compute n 100,dry for desert dust aerosols. This is because aerosols within these size ranges have been identified to act as CCN for a 105 supersaturation of 0.15-0.20 % in several studies through in-situ measurements (Koehler et al., 2009;Rose et al., 2010;Deng et al., 2011;Kumar et al., 2011;Mamouri and Ansmann, 2016). : In :::::::::: accordance :::: with ::: the ::::::::: AERONET :::: size ::::::::::: distributions, ::: the ::::: upper ::::: radius :::: limit ::: for ::::::::: integrating ::: the :::: size :::::::::: distributions :: is ::: set :: at :: 15 : µm : . Further, the CCN concentrations at higher supersaturation can be estimated following the simple CCN parameterization given by Mamouri and Ansmann (2016) as follows: n CCN = f ss · n j,dry , 110 where j represents the lower radius limit of the size-distribution integration and f ss is the enhancement factor with values equal to 1.0, 1.35, and 1.7 for supersaturations of 0.15-0.20%, 0.25%, and 0.40%, respectively. While the simple parameterization based on aerosol size and type do not consider the aerosol chemistry stringently, they have been shown to yield reasonable n CCN estimates, particularly when applied to spaceborne lidar measurements (Choudhury and Tesche, 2022a, b). Further information on the limitations and known issues of the algorithm are discussed in Section3.3 ::: 3.3.

Methodology
In this section, we discuss the complete methodology used to apply the CCN retrieval algorithm to the CALIPSO profile data to generate a global gridded monthly CCN data set. It includes a number of pre-processing stages that are implemented prior to applying the CCN-retrieval algorithm, followed by a set of post-processing steps to convert the n CCN profiles to global gridded data. The CALIPSO level 2 aerosol profile product includes a number of quality control flags that are intended to filter out unreliable retrievals that may result from clouds misclassified as aerosols and errors pertaining to the extinction coefficient retrieval (Tackett et al., 2018). The Cloud Aerosol Discrimination (CAD) score specifies the confidence in the classification of aerosol 125 and cloud for each data bin . We only select data which have a CAD score in the range [−100, −20] as they correspond to a high confidence in the aerosol classification . To screen retrievals with extinctioncoefficient-related retrieval issues, we use the "extinction uncertainty" and the "extinction QC" (extQC) metrics. Data bins with an extinction uncertainty of −99.99 : km are not considered including the bins present below them as the extinctionretrieval uncertainty may propagate to solutions at lower altitudes. The extQC flag describes how the extinction coefficient 130 is generated for each level 2 sample. Following Tackett et al. (2018), we only consider bins with extQC values of 0, 1, 16, and 18, because they represent high confidence in the retrieved extinction coefficient. We also use the minimum laser energy (MLE) parameter at 532 : nm, which was recently introduced in version 4.20 update, to identify and screen columns affected by low laser energy shots (MLE < 0.08) that results in higher noise and degraded retrieval quality. Additionally, we filter out the profiles or columns with cloudy pixels because clouds can impede aerosol-retrievals underneath them due to signal attenuation.

135
All the quality screening criteria used in this work are listed in Table 1. Readers are advised to refer to Tackett et al. (2018) for further information on the production and functions of the quality screening flags.

Post-processing
We use the quality screened CALIPSO level 2 data obtained by following the steps outlined in Section3.1.1 :::: 3.1.1 : in the OMCAM algorithm to estimate n CCN profiles at a supersaturation of 0.20 %. We further limit the n CCN in the output data to 170 a supersaturation of 0.20 % as it can be easily converted to higher supersaturations by using the enhancement factors as shown in Equation ::: Eq.
(2). In this section, we discuss the spatial and temporal grid configuration used to covert the n CCN profiles to 3D gridded data. We further describe all the parameters stored in the output data files.

Gridding, sampling, and averaging
The CALIPSO level 2 aerosol profile data have a very high vertical resolution of 60 : m and an along-swath resolution of 175 5 : km. However, because of the negligible divergence of lasers, CALIPSO has a narrow cross-track coverage. Further, the distance between two consecutive overpasses is inversely proportional to the latitude because of CALIPSO's polar orbit with an inclination of 98.36 • . Therefore, a grid box of a certain dimension close to the equator will include fewer overpasses compared to one close to the poles. Due to this factor and to produce a regionally representative global gridded data with enough aerosol sampling frequency in each grid cell, we choose a monthly temporal resolution and a horizontal grid resolution 180 of 2 • × 5 • to process the n CCN profiles. The vertical resolution is, however, kept unchanged at 60 m. We further consider both the daytime and nighttime CALIPSO overpasses to compute the average n CCN to increase the sampling frequency. This grid configuration is also opted in monthly CALIPSO level 3 products (Tackett et al., 2018) and is suggested by Choudhury and Tesche (2022b) to compile a global n CCN data set. Using this grid, we estimated the number of days observed with valid n CCN (value ≥ 0; hereafter abbreviated as NDO) samples in each grid box for each month and found them to be as high as 17 days 185 in the tropics and 31 days at the poles. Note that the NDO estimated here is different from the "Days_Of_Month_Observed" parameter in CALIPSO level 3 product as the latter also considers valid cloud retrievals.
We further produced a climatological n CCN data set using the same horizontal and vertical grid configuration. An alternative approach would be to consider a higher horizontal resolution of 1 • × 1 • as used in Amiridis et al. (2015) to construct a climatology of aerosol and cloud properties using the CALIPSO level 2 data. However, using such a grid results in low aerosol 190 sampling, especially in the tropics. This is shown in Figure :::: Fig. 1 based on more than 15 years of CALIPSO measurements.
While the former gives a maximum of 2340 days observed in the tropics, the latter results in a significantly lower maximum of 591 days. Therefore, we use the coarser grid to produce the n CCN climatology that ensures ample aerosol sampling required for a realistic and regionally representative data set.
For averaging the n CCN for each latitude, longitude, and altitude grid cell, we follow the methodology used in Tackett et al. 195 (2018) for producing the CALIPSO level 3 aerosol products. We assign the clear-air level 2 bins with an extinction coefficient of 0 : km −1 , which leads to an n CCN ) of 0 : cm −3 , and compute the average n CCN (n CCN ) for each grid cell as: Here, N a is the number of level 2 samples with aerosol extinction coefficient > 0 km −1 and N ca is the number of clear-air samples. This averaging scheme is used in generating the global monthly gridded data set as well as the n CCN climatology.

Output data records
Following the gridding and averaging scheme discussed in the previous section, we produce the global n CCN data sets at a monthly temporal resolution. Table 2 provides a list of all the parameters included in the output data and their description.
Along with the averaged n CCN , we further provide the total number of level 2 samples with aerosol extinction coefficient ≥ 0 km −1 (N = N a +N ca ) and with extinction coefficient > 0 (N a ). The n CCN , N , and N a are also provided separately for each 205 aerosol type. Note that both N and N a may not be equal to the sum of the contributions from all the aerosol types, specifically when aerosol mixtures are present. They are useful in computing annual and seasonal averages of n CCN using Eq.(3 :: (4). We further provide the NDO for each month and suggest to use the data only when NDO > 10 (coverage of a minimum 30 : % days of a month). Average pressure and temperature are also provided for each latitude, longitude, and altitude grid cell. The pressure values can be used to convert the data from height coordinates to pressure coordinates. To reduce the :: As ::: the :::::::: retrievals :::: over ::: an 210 :::::: altitude :: of :: 8 km :::::::: constitute ::::: about ::: 0.7 :: % :: of ::: the :::: total :::::::::: tropospheric :::::: CCN, :: to ::::: reduce ::: the :::::: overall : data size, all parameters are limited to a maximum altitude of 8 km. For ease of accessibility across different platforms, we provide the data in Network Common Data Form (NetCDF) format with a medium level of data compression (deflate level of 5). These NetCDF files are accessible at https://doi.pangaea.de/10.1594/PANGAEA.956215 and are tested to work with tools and software like Climate Data Operators (CDO), netCDF Operators (NCO), and ncdump. The netCDF files with monthly gridded data are provided separately for each 215 year from 2006 to 2021. The n CCN climatology is provided in a separate netCDF file, with data structure and nomenclature similar to the monthly data. The file also includes n CCN climatologies for boreal winter (December, January, and February), spring (March, April, and May), summer (June, July, and August), and autumn (September, October, and November) seasons.

Uncertainty, known issues, and validation
The uncertainty in the estimated n CCN can arise from the uncertainties in the input parameters such as the aerosol extinction 220 coefficient, type-specific normalized size distribution, and the ambient RH as well as from the dust-separation technique and the CCN parameterizations used in the algorithm. Choudhury and Tesche (2022a) studied the sensitivity of the cloud-relevant aerosol number concentrations (n j,dry ) to the variations in the size distributions for each aerosol type at different RH. They reported a variation of a factor of 1-1.5 depending on the aerosol type. Combining this with the uncertainties associated with other previously mentioned sources, the overall uncertainty associated with the output n CCN is found to be between a factor of 225 2 and 3. Such a range is reasonable for spaceborne retrieval of n CCN (Shinozuka et al., 2015;Mamouri and Ansmann, 2016) as atmospheric CCN concentrations can potentially vary by orders of magnitude in space and time (Schmale et al., 2018).
There are some known issues associated with CALIOP's retrieval algorithm and the CCN parameterizations used to produce the global data set. First, faint aerosol layers with extinction coefficient < 0.001 :::::: < 0.001 : km −1 (optical depth < : 0.01) may not exceed the signal-to-noise ratio required to be detected by CALIOP (Tackett et al., 2018) ::::::::::::::::::::::::::::::: (Tackett et al., 2018;Mao et al., 2022) 230 . The background noise due to solar radiation further impacts the feature detection, especially for the daytime retrievals (Winker et al., , 2013. Such layers may therefore be classified as clear air by CALIOP's feature classification algorithm and assigned with a zero extinction coefficient. This may result in an underestimation of the average extinction and thus the n CCN , particularly in grid cells comprising of clean environment (rural continental sites and higher altitudes). Second, the simple aerosol-type-specific parameterizations implemented in this study assumes all particles over a certain minimum radius 235 to be CCN active. This may lead to an overestimation of n CCN that may also compensate for the underestimation due to the undetected aerosol layers. Third, CALIOP cannot distinguish between polluted continental and smoke aerosol layers that occur below a layer top height of 2.5 : km (Kim et al., 2018). Therefore, isolated smoke layers that do not extend above a height of 2.5 km may get classified as polluted continental aerosols, leading to an overestimation of n CCN by about 13.6 %. As a result, we anticipate that the CCN-retrieval algorithm will overestimate the CCN in areas often influenced by smoke aerosols below

Comparison with global model outputs
A comparison of the CALIOP-estimated average global n CCN with the model outputs for the year 2011 is shown in Figure   250 ::: have the maximum contribution to the total CCN below 2 : km, followed by marine (110.8 ::::::::::::: 110.8 ± 175.12 cm −3 ), dust (108.9 :::::::::::: 108.9 ± 256.39 : cm −3 ), and elevated smoke (21 ::::::::: 21 ± 74.61 cm −3 ) aerosols. Clean continental aerosols with a global average  Though pollution aerosols contribute the most to the total CCN, it is evident that dust has the most widespread coverage, encompassing nearly the entire globe, indicating its significance in ACI even in pristine aerosol environments far away from the continents.
The seasonal climatologies of the total n CCN for altitudes below 2 : km is shown in Figure :::: Fig. 4. The global average n CCN is found to be maximum during winter with a value of 376 :::::::::: 376 ± 800.91 : cm −3 . Similar to the global climatology in Figure  ::: Fig. : 4, the average ::::: winter : n CCN is significantly higher over land (725.53 ::::::::::::::: 725.53 ± 1499.36 cm −3 ) compared to ocean (321.61 ::::::::::::: 321.61 ± 427.20 : cm −3 ). Strong seasonality in n CCN is seen over the regions influenced by Northern Hemisphere summer monsoon such as Asia and West Africa, with minimum values during summer. This is perhaps due to the wet scavenging of aerosol particles related to the formation of cloud droplets and precipitation during the monsoon. The n CCN seasonality in regions like Eastern and Central Africa is also evident and is perhaps a result of the changes in the local biomass burning 285 patterns (Myhre et al., 2003;van der Werf et al., 2017), which peaks during dry summer (Southern Hemisphere winter) season.
It is worthwhile to note that in Figures 3 and ::: Fig. :: 3 ::: and :::: Fig. 4, we have limited our calculations to low-level aerosols (altitude < 2 : km). The climatology data is provided up to an altitude of 8 km also for different aerosol types, which provides a unique opportunity to investigate the altitudinal variations of type-specific n CCN over different seasons and regions across the globe.
Such studies, however, are outside the scope of this paper, which is aimed at introducing a new global CCN data set, and will 290 be the focus of a future publication.

Conclusion and usage notes
We present a first aerosol type-specific global CCN data set derived from spaceborne lidar measurements at a horizontal latitude-longitude resolution of 2 • × 5 • , a vertical resolution of 60 : m, and a temporal resolution of one month. The data set spans more than 15 years, from June 2006 to December 2021, or a total of 186 months. We further use the complete time series 295 to construct a global n CCN climatology for different aerosol types. The climatologies are also reported at a seasonal time scale.
These data sets are aimed at replacing the currently used satellite-derived optical proxies for CCN. Readers are encouraged to utilize the full potential of the global data set for the following purposes: -The data can be used to investigate the horizontal and vertical distributions of CCN for various aerosol types, as well as to study their trends and variations across monthly, seasonal, and annual timescales. In the Section4.2 ::: 4.2 of this paper, 300 we briefly discuss the spatial distribution of n CCN for different aerosol types for altitudes limited below 2 km, where we find the global average n CCN to be 365.12 cm −3 and identify various CCN hotspots. This approach can be expanded to include the variations along the height and time dimensions. In addition, the type-specific CCN concentrations can be used to identify the sources and sinks of CCN as well as to investigate the contributions of long-range transport of aerosols with high residence time (for example dust and smoke) to CCN in the atmosphere.

305
-Recent studies have demonstrated that spaceborne aerosol, cloud, and radiation measurements at monthly temporal resolution can be used to estimate the radiative forcing associated with ACIs (Wall et al., 2022;Chen et al., 2022).
6 Code and data availability         Table A1. Log-normal bimodal volume size distribution parameters of different aerosol types. ν, µ, and σ represent the volume fraction, mode radius, and standard deviation, respectively. Subscript "f" and "c" represent fine anc coarse modes of the size distribution, respectively.