Radiosounding HARMonization ( RHARM ) : a new homogenized dataset of 1 radiosounding temperature , humidity and wind profiles with uncertainty . 2 3

Radiosounding HARMonization (RHARM): a new homogenized dataset of 1 radiosounding temperature, humidity and wind profiles with uncertainty. 2 3 Fabio Madonna, Emanuele Tramutola, Souleymane SY, Federico Serva, Monica Proto, Marco Rosoldi, 4 Simone Gagliardi, Francesco Amato, Fabrizio Marra, Alessandro Fassò, Tom Gardiner, and Peter William 5 Thorne 6 7 Consiglio Nazionale delle Ricerche Istituto di Metodologie per l'Analisi Ambientale (CNR-IMAA), Tito Scalo (Potenza), Italy 8 Consiglio Nazionale delle Ricerche – Istituto di Scienze Marine (CNR-ISMAR), Rome, Italy. 9 University of Bergamo, Bergamo, Italy 10 National Physical Laboratory, Teddington, UK 11 Irish Climate Analysis and Research Units, Department of Geography, Maynooth University, Maynooth, Ireland 12 13 Abstract 14 Observational records are essential for assessing long-term changes in our climate. However, these 15 records are more often than not influenced by residual non-climatic factors which must be detected 16 and adjusted prior to their usage. Ideally, measurement uncertainties should be properly quantified 17 and validated. In the context of the Copernicus Climate Change Service (C3S), a novel approach, 18 named RHARM (Radiosounding HARMonization), has been developed to provide a harmonized 19 dataset of temperature, humidity and wind profiles along with an estimation of the measurement 20 uncertainties for about 650 radiosounding stations globally. The RHARM method has been applied 21 to IGRA daily (0000 and 1200 UTC) radiosonde data holdings on 16 standard pressure levels (from 22 1000 to 10 hPa) from 1978 to present. Relative humidity adjustment and data provision has been 23 limited to 250 hPa owing to pervasive issues on sensors' performance in the upper troposphere and 24 lower stratosphere. The applied adjustments are interpolated to all reported significant levels to 25 retain information content contained within each individual ascent profile. Each historical station 26 time series is harmonized using two distinct methods. Firstly, the most recent period of the records 27 when modern radiosonde models have been in operation at each station (typically starting between 28 2004 and 2010 but varying on a station-by-station basis) are post-processed and adjusted using 29 reference datasets from the GCOS Reference Upper Air Network (GRUAN) and from the 2010 30 WMO/CIMO (World Meteorological Organization/Commission for Instruments and Methods of 31 Observation) radiosonde intercomparison. Subsequently, at each mandatory pressure level, the 32 remaining historical data are scanned backward in time to detect structural breaks due to prolonged 33 systematic effects in the measurements and then adjusted to homogenize the time series. 34 This paper describes the dataset portion related to the adjustment of post-2004 measurements 35 only. A step-by-step description of the algorithm is reported and comparisons with GRUAN and 36 atmospheric reanalysis data for temperature and relative humidity data are discussed. The 37 evaluation shows that the strongest benefit of RHARM compared to existing products is related to 38 the substantive adjustments applied to relative humidity time series for values below 15% and 39 above 55% as well as to the provision of the uncertainties for all variables. Uncertainties have been 40 validated using the ECMWF reanalysis short-range forecast outputs. 41 The RHARM algorithm is the first to provide homogenized time series of temperature, relative 42 humidity and wind profiles alongside an estimation of the observational uncertainty for each single 43 observation at each pressure level. A subset of RHARM data is available at 44 http://doi.org/10.5281/zenodo.3973353 (Madonna et al., 2020a). 45 https://doi.org/10.5194/essd-2020-183

2. The quantization of the variance of each term in eq. 5 as well as other equations is not clearly stated. How to estimate the final, total values of the uncertainties of temperature, relative humidity and wind? It is not clearly explained.
We have provided the detail of each of the uncertainty sources in the equation describing the calculation of the combined standard uncertainty for temperature, RH and wind. In details for temperature, please see the previous comment.
For RH, at lines 430-437, the text described the calculation of the combined standard uncertainty for RH: "Likewise Eq.5, "#$"%,"'() , the combined standard uncertainty on "#$"%,"'() , "#$"%,"'() , is calculated as follows: Where (∆ ) is the uncertainty of dry bias correction; (∆ ) is the uncertainty of the radiation sensitivity factor f in Eq. 5; is the uncertainty due to calibration factor cf; is an additional random uncertainty of 2% RH. In analogy with temperature, when the radiation correction of the manufacturer is left unchanged, , is assumed to be the same as the closest RH profile in time measured under the same meteorological conditions." At lines 490-497 report the formula for the estimation of uncertainties for the wind related quantities: "To adjust the IGRA wind profiles, the day and night time differences for u and v between the GRUAN processed and the IGRA radiosounding wind profiles have been calculated using the stations in Table 1 3. In line 175, it stated that "IGRA contains observations from several networks and initiatives, including the GCOS Upper-air Network (GUAN), and the universal RAwinsonde OBservation program (RAOB)." So, GUAN data (actually the correct name should be GRUAN data) is a part of the IGRA data. You processed data from 650 stations and all the data were also from the IGRA. In the Section 4.1 "RHARM consistency with GRUAN", the RHARM results are compared with GRUAN result. So authors are regarding the GRUAN data as reference data. This raises one new question. Your RHARM results are derived from GRUAN data and now your results are compared with the GRUAN data. It is a repeated use of the same dataset. I am not sure how you can get an objective assessment of your RHARM result. In the Section 4.1 "RHARM consistency with GRUAN", though the title of this section is the "consistency with GRUAN". However, in your Figures 9-13, you also compared your results with the IGRA data. As said earlier, IGRA data set include the GRUAN data set and RAOB data set. So, it is confusing to readers: Your results are compared to GRUAN data, and are also compared to IGRA data.
The difference between GRUAN and GUAN has been clarified above, and it is fully documented in the manuscript. GUAN data are part of IGRA but GRUAN data not: GRUAN data are fully traceable radiosounding profiles with quantified uncertainties which is completely different from any other existing radiosounding dataset. Other existing radiosonde data archives are essentially based on the manufacturer data processing and do not provide any estimation of the uncertainty. Therefore, GRUAN data are not contained in the 691 RHARM processed stations although data from the sonde processed using the manufacturer processing software do indeed form part of the iGRA database.
The RHARM approach is inspired by the GRUAN data processing, although uncertainties in RHARM cannot be quantified exactly as done by GRUAN which requires the raw digital counts as the basis for the processing. The results reported in section 4.1 ("RHARM consistency with GRUAN") are not meant to be an objective assessment of the RHARM dataset, but they quantify the residual gaps between RHARM and GRUAN resulting from the lack of raw data availability and the necessary compromises this entails at the vast majority of global radiosonde sites.
Regarding Figures 9-13, the reviewer is right, a comparison with IGRA data is included in the section entitled "RHARM consistency with GRUAN". In the new version of the manuscript, a new subsection has been created to separate the comparison between IGRA and RHARM data.
4. In the Section 4.2 "Comparisons with ERA5", you compared your RHARM results with ECMWF ERA5 results. The derivation of ECMWF ERA5 results have used the global radiosonde datasets already. Therefore, your comparison with ERA5 is not an independent assessment either. You repeatedly use the same set of radiosonde data.
Theoretically, we agree with the referee. This is the reasons why the authors introduced the comparison discussed in section 4.2 as "An important step in the performance assessment of the RHARM data" and not as an "independent assessment". Despite a degree of non-independence due to use of global radiosoundings data for anchoring its bias correction, it is also true that ERA5 reanalysis assimilation scheme selects the radiosonde stations using quality criteria which are completely different from RHARM and permits the ingestion of a different, typically smaller, set of radiosounding observations.  2020, 11, 401). This is due to the nature of the reanalysis data assimilation system and to the huge amount of ingested data sources.

It is also important to point out that the comparison of homogenized datasets with reanalysis is quite often adopted in literature and this is due to the fact that reanalysis is heavily used in climate studies.
The optimal basis for a comparison would be to use ERA5 reanalysis data or some other data source obtained without the ingestion of radiosounding measurements and that thus was completely independent. Given the need for a vertically resolved profile only GNSS-RO may come close. A perfectly independent assessment of the RHARM data can be carried out using Cryogenic Frostpoint Hygrometer (CFH) data, but these are available with long records only at a very few sites.
However, in order to more clearly show the improvements of the RHARM dataset, also following on from comments by referee #1, two further comparisons have been added to section 5. Firstly, to show the improvement brought by the RHARM dataset to radiosounding temperature data, the episodes of 2017 and 2018 Sudden Stratospheric Warming (SSW) events are shown using IGRA, RHARM and ERA5 data, while for relative humidity a comparison with MLS-AURA data at 300 hPa is discussed. These additional comparisons are reported in an appendix of the new version of the manuscript and summarized in the reply to the referee #1 (to avoid repetition the same content is not reported here as well).
5. In the section 5 "Uncertainties: consistency with GRUAN and independent validation", the GRUAN datasets are compared again here. It seems to be a repetition of the work in section 4.1. In Fig. 16, only 6 stations are used in the comparison. I am not sure how useful/meaningful it is to compare with only 6 stations' result, considering you are processing global 650 stations.
Please see the comments to the point 3, also considering that an independent validation of the uncertainties is provided in section 6 at a much broader scale.
6. The conclusion is too long and you should summarize the main findings of the paper. In addition, there is no single numerical value to show your findings, which is surprise to me.

Conclusions have been modified using a more schematic style with a clearer description of the main quantitative results discussed in the manuscript.
Other minor comments are: In the paper, in some places it claims it has processed the historical data since 1978. However, in some places it states it processed the data since 2004 to present and the 2010 WMO/CIMO radiosonde data. It is confusing to readers which dataset you have processed exactly.
In the new version of the manuscript, a clear difference was made between what is discussed in this manuscript (adjustment of radiosondes from 2004) and what will be available to the users in the final RHARM dataset (adjustment of radiosondes from 2004 plus statistical homogenization of radiosonde from 1978 to 2004). A second manuscript currently under preparation will complete the description of the RHARM approach.

The present paper provides an analytic description of the section of the RHARM approach described in the point a above and able to adjust a subset of 691 radiosounding stations available from the Integrated Global Radiosonde Archive (IGRA -Durré et al., 2006; Durré et al., 2018) from 2004. "
Therefore, we are discussing the section of the RHARM algorithm obtained by adjusting the RS92 Vaisala radiosondes and all the other radiosonde types involved in WMO-CIMO 2010 radiosonde intercomparison. The identification of these sonde type within IGRA was based on the metadata collected by IGRA itself (i.e. the radiosonde code related to the WMO 3685 table) and on the use of the TAC code made available in the more recent high-resolution BUFR files provided by an increasing number of stations (about 1-2 hundreds at present) and supplied for RHARM by the ECWMF; the files includes extensive metadata and a larger number of pressure levels for each reported radiosounding launch. The selected radiosondes types are generally available from 2004, with differences in the covered period depending on each single station. The homogenization of the historical data before 2004 is alluded to in this paper to create the link to the second paper describing the RHARM algorithm, in preparation.