A global radiosonde and tracked balloon archive on 16 pressure levels (GRASP) back to 1905 – Part 2: Homogeneity adjustments for PILOT and radiosonde wind data

General remarks ‐ PILOT: The format for messages from fixed land stations which contain only wind data. They are called PILOT messages (see http://www.ofcm.gov/fmh3/pdf/13‐app‐e.pdf); pilot ballons or pibal is used more often. ‐ Names and symbols should be unified through the hole paper (as well in text, formulas and figures), e.g. NOAA‐20CR, NOAA 20CR, NOAA20CR, 20CR, bg or ff, WS or Φ, WD or Δ, τ or ‘an’, ‘20CR’ ‐ Meteorological conventions should be considered, e.g. that Φ is used as symbol for geopotential (like in part1) ‐ Formula should be written like in part1: the dependence of a variable on time is easier to read in Eq 2, part 1 than in Eq. 1 of part 2 ‐ The paper does not explain the content of the data files.


Introduction
Since the 1900s tracked balloons and since the 1940s also radiosondes were practically the only upper air wind observing system with global or regional coverage up to the beginning of the satellite era in the late 1970s and they still are an essential component of the observing network (Dee et al., 2011).Even then they remain an essential component of the observing network.While the vertical extent of the records was limited to mostly below 400 hPa before 1940, it reached the stratosphere from the 1950s onward (Scherhag, 1962).Since the 1970s 10 hPa are regularly reached by most balloons.
Early upper air wind data have been used to reconstruct climate anomalies in the early 20th century, such as the Dust Bowl drought in the 1930s (Ewen et al., 2008b) or were instrumental in discovering the Quasi-Biennial Oscillation (Graystone, 1959).More modern in situ upper air wind data have helped attributing regional wind stilling in the Northern Hemisphere to increased surface roughness (Vautard et al., 2010).There is also increased interest in wind speed trends due to increasing installations of wind turbines.Allen and Sherwood (2008) have used wind data as a temperature proxy applying the thermal wind equation to calculate temperature (gradient) trends.Interestingly they found much larger warming trends than those estimated from radiosonde temperature measurements which are since long known to biased (Santer et al., 1999(Santer et al., , 2005;;Thorne et al., 2011).
Long and homogeneous observed time series are an essential source to diagnose the three dimensional pattern of climate change.They are also precious input data for reanalysis efforts, particularly if they go back to beyond the satellite era (Kistler et al., 2001;Uppala et al., 2005;Ebita et al., 2011).Reanalyses have proved extremely fruitful for climate research and are essential input for studies in many disciplines (Hartmann et al., 2013).Reanalysis performance and homogeneity are highly dependent on the available input data and their quality.Any improvement there will increase the accuracy of the estimated climate state and will be directly beneficial for future studies related to Figures it.This fact triggered several efforts to digitize surface data and early upper air data in many countries (Allan and Ansell, 2006;Allan et al., 2011;Brönnimann, 2003;Ewen et al., 2008a;Stickler et al., 2014).The present study builds upon these efforts.In Part 1 of this study (Ramella-Pralungo et al., 2013) a new radiosonde and PILOT wind data set on 16 standard pressure levels, called GRASP, has been developed (GRASP).
Figure 1 shows maps of global coverage with wind data for different decades in this data set.
Upper air wind time series from balloons have traditionally been assumed to be temporally relatively homogeneous compared to temperature or humidity.While biases are not as pervasive as for temperature, wind biases have been occasionally detected by monitoring the output of data assimilation systems (Hollingsworth et al., 1986).As an example Fig. 2 shows the observed wind direction time series for the station Bismarck (North Dakota USA WMO ID 072764) at 00:00 UTC at different pressure levels.Already in these time series there are indications that around 1948 they are affected by artificial shifts in wind direction, caused by wrong north alignment of the station.Imperfect tracking of the horizontal motion, wrong height assignment or the inertness of the ascending balloon can also cause biases.These biases are hard to detect without a reliable reference if they are smaller than in this example.That reference may be neighboring stations or, since recently, reanalysis data.Haimberger (2007) showed that background departure time series from reanalyses can be used effectively to detect and adjust breaks in radiosonde temperature time series.An automatic homogenization method (RAdiosonde OBservation COrrection using REanalyses, RAOBCORE) showed some skill in adjusting the global radiosonde temperature data set.Subsequent refinements of the method also employed composites of neighboring radiosondes that allowed more accurate adjustment of breaks without sacrificing the independence of the adjusted radiosonde data from satellite data (Haimberger et al., 2008(Haimberger et al., , 2012)).This method has been successfully applied to the global radiosonde wind data set back to 1958 as well (Gruber and Haimberger, 2008) be safely identified in their paper, there were concerns that wind speed background departures may not be independent enough from the assimilating model.This study also did not consider PILOT winds and was restricted to the period 1958-2002.
Since then the availability of newly digitized data as well as the advent of the NOAA Twentieth Century Reanalysis (NOAA 20CR Compo et al., 2011) improved the prospects for a comprehensive homogenization of upper air wind data back to the beginning of balloon observations.The NOAA 20CR is well suited as reference since it is independent of upper air observations and since it has reasonably realistic temperature fields up to stratospheric levels.While some wind biases are evident in this reanalysis (see Fig. 3 of Part 1), it is temporally quite stable, at least over the mid-latitudes from 1950 onward (Brönnimann et al., 2012).It has been used for interpolating the global PILOT wind data set from geometric height to standard pressure levels in Part 1.The present paper describes how this combined radiosonde plus PILOT data set can be homogenized using analysis departure information from the NOAA-20CR.
The next section describes the input data, Sect. 3 outlines the homogenization method.Section 4 explains how sampling biases can translate into monthly mean biases and what is done to avoid that.Section 5 presents results, conclusions are drawn in Sect.6.

Input data
The GRASP dataset as described in Part 1 is the main input to be homogenized.Its main sources are the ERA-Interim observation input and observation feedback data set (Dee et al., 2011), the ERA-40 observation input and feedback data set (Uppala et al., 2005), the Integrated Global Radiosonde Archive (IGRA) (Durre et al., 2006), updated until 2012 and supplemented with analysis departures from the NOAA-20CR and the Comprehensive Historical Upper-Air Network (CHUAN) (Stickler et al., 2010) including the ERA-CLIM Historical Upper-Air Data (Stickler et al., 2014).The archive contains daily records at 00:00 and 12:00 GMT from 2924 wind stations with WMO identifier, distributed all over the world, with data starting in 1905 at Lindenberg. Figure 1 shows the wind network development and distribution composed by those stations that have at least 5 years of observations.Already in the 1930s a network of stations was operating mainly in the US and in India.In the 1940s the distribution was already global, albeit sparse.Truly global coverage over land masses is given already in the 1950s although South America and Southern Africa are not well covered.Since then the network has not changed much.The Network of radiosonde and PILOT observations in the 1950s is much denser than the radiosonde network only (compare with Fig. 10 in Part 1).
In addition to the observations on standard pressure levels, the archives contains the so called innovations or analysis departures, i.e. observation minus analysis from the NOAA-20CR, collocated at each station location and each available observation time (00:00 and 12:00 UTC) and standard pressure levels.For homogenization the analysed fields from the NOAA 20th Century Reanalysis project (20CR, Compo et al., 2011) have been chosen since it is independent of upper air data and since it goes back to 1872 so that analysis departures exist even for the earliest upper air data.While background departures from ERA-40 and ERA-Interim have smaller variance, they are not completely independent of upper air data and they also do not reach back far enough.Evaluations of NOAA-20CR have found some biases in its wind field Stickler et al. (2010) and several inhomogeneities have been detected by Ferguson and Villarini (2013) particularly in ensemble spread.Despite these caveats one has to acknowledge that the NOAA 20CR reproduces the comprehensive global atmospheric circulation and also the surface temperatures in the twentieth century quite well given the reduced set of input data it assimilates.In many ways it is unique and similar upcoming products such as ERA-20C (Poli et al., 2013)

Methods
The radiosonde wind homogenization system presented here is based on the methodology referred to as Radiosonde Observation Correction using Reanalysis (RAOB-CORE).It has been introduced by (Haimberger, 2007) and was originally applied only to temperature time series.Already Gruber and Haimberger (2008) demonstrated that the RAOBCORE's technique can be, when properly modified, applied also to wind data with promising results.In that pioneering study only radiosondes data collected during ERA-40 (Uppala et al., 2005), complemented with radiosonde data from the IGRA (Durre et al., 2006) were used, spanning the time interval 1958-2002.
The system used here, referred to as RAOBCORE 2.0, analyzes temperature, wind speed and direction simultaneously.In contrast to the original RAOBCORE it uses NOAA 20th Century Reanalysis data as reference.While the simultaneous treatment of temperature and wind would allow for combined break detection, we found that temperature and wind breaks rarely occur at the same time, the exception being perhaps station relocations.Therefore each variable is treated separately for break detection and adjustment.Metadata are an important source of information for homogenization purposes and they add information at the statistical test applied in the homogenization process.They have been extensively used for temperature adjustments (Luers and Eskridge, 1995;Haimberger, 2007) and a new version of upper air metadata became recently available (Tschudin and Schroeder, 2013).The metadata have been collected mainly for temperature and humidity adjustment, however.For wind the metadata information is rather sparse and most likely incomplete.At this stage metadata have therefore not been used for wind homogenization.

Construction of reference time series
The 20CR U and V wind values used as reference are already stored in GRASP as departure time series, i.e. the 20CR values can be calculated simply by adding the departures to the observations.For homogenization purposes is is often advantageous to represent wind information in polar coordinates (speed ff and direction Φ).North alignment shifts in wind direction, which are a major source of biases affecting the whole vertical profile, are much better visible in wind direction time series than in U or V time series.Following the notation of Haimberger et al. (2012), we define the wind direction difference (τ) between an observation (obs) at time i at a given station and the NOAA-20CR analysis (an) interpolated to this as: In cases where this difference is not in the range [−180 • , 180 • ], we set Wind direction departures are not necessarily Gaussian and care must be taken particularly at low wind speeds.Therefore they are considered valid only if the wind speeds in both observations and reanalysis are bigger than 1m s −1 and if they are smaller than 90 • .Now one can take the average over some time interval a (denoted by an overbar, typically between 1 and 8 years) which yields The wind speed departures ∆ff are defined as: Similarly, τ u (a) and τ v (a) can be defined.All these averages exist at 00:00 and 12:00 GMT on the 16 standard pressure levels.

Break detection
As discussed by Haimberger (2007), a variant of the Standard Normal Homogeneity Test (SNHT, Alexandersson, 1986) can be applied to the innovation time series i τ x (where x can be u, v, Φ, ff) in order to find possible break candidates.
The SNHT variant considered here calculates a test statistic Q k for each potential breakpoint k.The intervals are then chosen as a=[k −N/2, k] and b = [k, k +N/2], with N = 1460 as default choice.We now define Q k as: where σ(a, b) is the standard deviation of the i τ x over the interval [a, b] and is the mean i τ x over the whole interval [a, b].The maximum number of missing data admitted in one subinterval of length N/2 is an important parameter.It is set to 650.
With so many data allowed to be missing, special care is needed to ensure the equal sampling of the annual cycle in the interval before and after a potential breakpoint k, particularly in case of data gaps.This is done by deleting data for a month that is missing in one interval also in the other interval.This simple measure minimizes false break detections at stations with strong annual cycles (Haimberger, 2007).This SNHT variant yields time series of Q k for all k where there are enough data.The Q k time series exist on each pressure level and significant maxima are not reached on each level and not at the same k values.Thus one has to combine the breakpoint probability information from all levels to get unique breakpoints.The composite series of all the Q k is obtained as mean along all the pressure levels and times: From these a critical value Q crit can be derived.It is a reasonable idea to set Q crit not too high, since weaker breaks can be eliminated later by applying some robustness criteria (explained in the next section).Depending on the value set for Q crit , the time series Q k is converted into break probabilities, with range [0, 1].If in a given day, the maximum break probability exceeds 0.5, the date is recorded as a possible break.If it passes the robustness tests, it adjusted as explained below.If there is a break, the Q k are typically larger than Q crit over some time interval (see e.g.Fig. 6).The break location is set at the maximum Q k value.Local Q k maxima have to be separated by at least a year to be recognized as separate breakpoints.

Break adjustment
As mentioned, a common reason for breaks in wind direction is the wrong north alignment: in this case the wind direction error is expected to be constant in time and also in the vertical.The break size estimate at a given pressure level at 00:00 or 12:00 GMT at date k is defined as: The intervals a, b are generally chosen longer (1-8 years) than the intervals used for break detection above.We consider a break as significant if its magnitude is larger than 1.96 times the standard deviation of the τ Φ .We required that the mean vertical break profile ∆τ k Φ (a, b) must be greater than 3 • and that the break size estimates at individual pressure levels must not change sign in the vertical.At least 4 time series (2 pressure levels at both 00:00 and 12:00 GMT times or 4 pressure levels at the same time) must be available in order to be accepted.In this case the break will be adjusted at both, 00:00 and 12:00 GMT.Since there are cases in which the device changes between the observation at 00:00 and 12:00 GMT (for example, PILOT at 00:00 GMT and radiosonde at 12:00 GMT), one may get different and independent break profiles at different times.For such cases, the significance test has been performed at both  Gruber and Haimberger (2008) have been revisited for comparison purposes.The obtained mean break profiles agree well with those presented in Gruber and Haimberger (2008) although the variance of the estimates is larger because of the larger scatter of the departures from surface data only reanalysis used as reference compared to background departures from full reanalyses such as ERA-Interim, particularly in remote regions.
The approach for wind speed is slightly different, since there are no such strong physical constraints available as for wind direction.Larger biases may occur only at some pressure levels, thus each single level is analyzed independently.In particular, it is well known that the largest inhomogeneities occur at high wind speeds, i.e.where the jet streams are located (7-12 km above the sea level for the polar jet and 10-16 km for the subtropical jet).A break at a given level is considered significant if its size is larger than 1.96 times the standard deviation of the τ ff .If this criterion is fulfilled at least at two levels at a potential break point, the wind speed profile is adjusted.
While the individual wind direction values are adjusted by a vertically constant value, the wind speeds at a given pressure level are adjusted by a factor λ such that high wind speed values are adjusted more than low wind speeds.Otherwise low wind speeds might become negative.
Before adjusting the observations, the NOAA-20CR wind speeds, which are known to be biased low, have been adjusted by another constant factor λ 20CR , which has been derived from the mean wind speed difference between the most recent part of the observation time series and the corresponding NOAA 20CR analysis time series: The λ 20CR factor is calculated for each pressure level using linear regression to reflect different biases in different climate zones.Only wind speeds larger than 1 m s −1 in the 20CR and after 1960 have been taken into account for the scatter plot.In general the Figures

Back Close
Full method delivers reasonable factors in the range 1.0-1.2provided there are sufficient data available (at least 1000 values).The mean values of the observations in Fig. 3a are higher by this factor than the NOAA-20CR winds in Fig. 3b, at least after 1960.
As has been described in Haimberger (2007); Haimberger et al. (2012) the breakpoint adjustment procedure works backward in time, from the most recent to the earliest one, in this way a progressively shorter section of the time series is adjusted.The default averaging interval for the break size estimation has been set to 8 years.If only a shorter time series is available (because of the next earlier break or because of a data gap) this parameter is reduced accordingly.The adjustment factors remain constant in amplitude between the current break and the beginning of the time series.After these procedures, the breaks due to changes in the measurement biases of the observed wind time series should be largely removed.

Calculation of unbiased monthly means
The early wind measurement systems used theodolites to track of the ascending balloons, which worked only if there was good visibility and not too strong upper level winds.While under fair weather conditions the balloons could be tracked up to 200 hPa or higher, the balloons were lost in stormy conditions sometimes below 2 km.As a result, winds during disturbed weather conditions are underrepresented.While the single measurements are unbiased, their monthly mean is biased since only the low wind part of the distribution is well sampled.Figure 3a  since the NOAA-20CR has a well known low bias in wind speed (Compo et al., 2011;Stickler et al., 2010), see also Fig. 3 of Part 1.It is slightly increasing over time since the departures tend to be larger during high wind conditions, which are undersampled in the early years.The increase is therefore not necessarily a sign of inhomogeneities in the measurements.The constant mean departures from the 1960s onward are a sign that the low bias of 20CR is practically constant throughout these decades.Thus it can be expected to be constant also earlier on, the exception being perhaps the period 1900-1940, when different background error variances have been used in the NOAA-20CR assimilation system.
There are several options to avoid a sampling bias in e.g.monthly means.The simplest one is to calculate the means only if a very large fraction of observations are available (e.g. less than 2 observations missing per month).Such a strict requirement leads to very few monthly means being calculated in the early years.As second option, one can substitute missing observations with analysed winds from NOAA-20CR.We did this for months where at least 15 observations were available at a given observation time (00:00 or 12:00 GMT).Of course the scaled NOAA-20CR winds (as explained before) are used for the substitution.Figure 4a and b shows the effect of the filling at a station where only few data were missing.The mean is slightly increased through the filling.To put more stress on the filling algorithm we performed an experiment where all high wind speeds were withheld and then reconstructed using scaled NOAA-20CR wind speeds.The reduction of the mean in Fig. 4c by ca.1.4 m s −1 is at least partially compensated by the filling.
Figure 5 shows how the monthly time series of stations with WMO numbers between 70000 and 75000 can be affected by the sampling bias and that the filling with scaled 20CR values is capable of removing or at least reducing the spurious trends due to undersampling.There may be room for further improvement in the treatment of the sampling bias, but the example demonstrates that the sampling bias cannot be ignored at all if one wants to consider time series going back further than to the 1960s.Despite this encouraging performance of the filling method, we decided not to use it for homogenization purposes.The most important reason is that the filling compromises the independence between observation and the reference.Thus it is harder to detect breaks when analysing time series containing an appreciable amount of values from the reference series.Even if we did an adjustment, it would be underestimated for the observed values but overestimated for the filled values.This would remain a problem even if we had a more accurate reanalysis than the 20CR at our disposal.As such one should use the filling approach only after homogenization.
In the following we display only results with data where the sampling bias is low so that the filling is essentially unnecessary.This is the case for the early period at lower pressure levels (< 500 hPa) and for the period after 1960.
Also on PANGAEA we only provide homogeneity adjustments as well, but no filled values.It is important for the reader to understand that we removed the nonclimatic shifts but that we only highlighted the presence of the sampling bias without actually removing it.

Single series analysis
As mentioned above, the main purpose of this paper is to find and possibly fix breaks especially in the period before 1958, where other more sophisticated techniques (RICH and its versions (Haimberger et al., 2008), for example) are not applicable mainly due to the low station density that makes the construction of reliable reference time series from neighbour stations rather difficult.
While the US wind network was established already very early with up to 60 stations in the 1930s, it had some problems with measuring the wind direction.winds as reference at 850, 700 and 500 hPa (the only three pressure levels that have an almost complete time series back to 1935) for wind direction (left panel) and wind speed (right panel).In these early years the ascents did not reach very high levels and already at 500 hPa the SNHT (blue line, right axis) can be calculated only after 1945.At all analyzed pressure levels breaks are indicated at the same day with almost identical SNHT maximum values.After those breaks in early years, the wind direction departures seem quite homogeneous with no obvious shift.The right panel of Fig. 6 shows the wind speed innovations i τ ff and the SNHT time series.In the years 1930-1960, the time series are homogeneous.Weak shifts are visible in 1960, 1963 (for unknown reasons) and in 1973, which can be attributed to a station relocation according to S. Schroeder's metadata database.Figure 7 shows the different adjustments calculated with RAOBCORE 2.0, which should reduce the detected jumps.As is typical for north alignment errors, the break profiles calculated from the background departures are vertically nearly constant.Only few pressure layers are detectable for the earliest break (1938) because no higher levels were reachable at that time, as shown in Fig. 9.The adjusted innovation time series, in Fig. 8, is definitely more homogeneous than the raw series presented in 6.No visible inhomogeneities remain.The inhomogeneities found may have contributed to the apparent strong wind anomalies found by Brönnimann et al. (2009) during the dust bowl drought.They would likely be weaker if homogenized data had been used.This just shows how important it is to eliminate inhomogeneities before doing climate studies.
Wind speed adjustments are not constant in time, since they depend on wind speed itself.The breaks in the 1970s and 1980s are no longer visible in the adjusted innovation time series.The adjustments for wind speed and direction can be splitted to U and V wind components adjustments, with a simple vectorial decomposition, and these can be applied in order to adjust U and V wind components.Due to the predominantly zonal wind direction, the WD adjustments are projected more on V wind component whereas wind speed adjustments affect mainly the U component.Figure 6 shows U and V unadjusted innovations, adjustments and adjusted innovations for the 700 hPa level.Note that the V adjustments (right panel) have a negative mean because of the negative wind direction adjustment and the mostly westerly winds.The adjusted time series have no more shifts but also have reduced variance.Generally it has been found that wind speed breaks over the US are relatively rare and relatively weak.Already the unadjusted innovations often look homogeneous, which indicates good quality of both observations and NOAA 20CR.
Wind speed breaks are weak at Bismarck but at other places they can be quite strong, e.g. in Athens (016716, Greece).This station has already been studied by Gruber and Haimberger (2008).Figure 11 shows the innovation time series at 700, 500 and 300 hPa at 00:00 and 12:00 GMT. In 1997, 1993, 1987, 1983and 1979, five massive breaks are detected by the SNHT.The resulting adjustments calculated by RAOBCORE 2.0 and the adjusted innovation time series are shown in the middle and lower panels of Fig. 11.All breaks already detected by Gruber and Haimberger (2008) could be detected again although a less accurate reference (NOAA-20CR instead of ERA-40) was used.Also the wind direction at the Athens station seem to suffer by two inhomogeneities, but they are not removed since are not vertically constant (close to linear growth) and they change sign.As such they are not attributable to wrong North alignment.

Regional and global trends
The overall beneficial impact of the applied homogeneity adjustments is best visible in maps of trends of both wind direction and speed before and after the data have been treated with RAOBCORE 2.0.The raw, the adjusted and adjustment time series are available at daily resolution, but for trend calculations monthly averages have been used, taking particular care of the sampling problem as noted above.The trend calculation is easy for wind speed (scalar variable), but more challenging for wind direction, which is derived from U and V trends, as explained in Appendix A. metadata database indicates a change from theodolite PILOTs to first generation radiosondes in 1939, as was the case at Bismarck.In many cases there are metadata events indicating measurement system changes in this database in the above cited years.However the information does not tell what exactly caused these changes in the North alignment.According to our estimates, this problem was corrected more than a decade later.The break distributions shown in Fig. 15 can be splitted when investigated only the years 1938-1941 and 1947-1951.One obtains two tight and approximately Gaussian distributions, with means −17 and +15 • .
In order to get an idea how the observed mean wind speed and direction looked like before and after the break analysis, a subset of central US stations was averaged year by year for both wind speed and direction.As it is well visible in Fig. 16, wind directions have been strongly adjusted in the period 1935-1948, whereas wind speed shows only slight changes due to the homogenization.The sampling problem does not play a major role for this figure, since at 700 hPa most series involved were almost complete.Thus and because it is visible also in the NOAA-20CR the increase in wind speed during this period is trusted.Some instationarity in wind direction is visible as well.However, this is also found in the NOAA-20CR.The offset by several degrees is less than the measurement increment of wind direction (5-10 • ).
Figure 17 shows that 65 out of 110 US stations reporting wind observations in the period 1940-1960 with at least 10 years of observations had breaks in wind direction (107 breaks in total).Only 18 station were affected by 20 wind speed breaks.

Global trends in the more recent times
The upper air wind observation coverage is fairly global from 1950 onward, with a significant improvement in 1958, the International Geophysical Year.The inclusion of PILOT data, is essential to get acceptable coverage in the Tropics and Southern Hemisphere in the early years.
Figure 18 shows global wind direction trend maps at 200 hPa for the period 1970-1990.Only few stations show suspiciously strong trends compared to their neighbours.21).One could have achieved a lower cost function value by setting wind speed break thresholds lower but we decided to be more conservative.Also, some strong trend features are present in the NOAA-20CR (which has a cost value of 3900) as well, such as the strong deceleration of winds over the US.A total of 1102 wind speed breaks at 566 stations (out of 2924 stations with more than 1 year of data) were detected.In addition 1330 wind direction breaks at 604 stations have been found.
Figure 22a shows the temporal distribution of wind direction breaks, whereas b shows the size of the breaks.The two peaks around the 1937/8 and 1947/8 stem from the problems of the early US stations.Afterwards the temporal distribution is smooth.The size distribution shows that really only larger wind direction breaks have been adjusted.Wind speed break sizes in Fig. 23 show again a bimodal break size distribution.Regarding the time distribution, we observe breaks only after 1945, most probably because before the sparse (only in USA) sounding operative weren't able to reach high levels where the wind speed is higher and problems in speed measurements are more likely.The prominent peak in 1998 can be attributed mainly to station with breaks in the Middle East, North Africa and South Asia (together ≈ 25 stations).We found no indication for a break in the 20CR when we compared difference series of ERA-Interim and NOAA-20CR wind speeds (not shown).

Conclusions
This paper documents our efforts to homogenize the global wind radiosondes and PI-LOT balloons archive described in Part 1 (Ramella-Pralungo et al., 2013) using the NOAA-20CR (Compo et al., 2011) as background reference.Since this reanalysis was produced using only surface data it is independent of upper air data.The already well known RAOBCORE method, that was successfully applied to the radiosonde temperatures and in a preliminary fashion also to wind (Haimberger, 2007;Gruber and Haimberger, 2008), has been extended and reinforced such that it is able to treat temperature (results are not shown in this work) and wind together.
In contrast to Gruber and Haimberger (2008) we analyzed innovation time series of wind speed and direction only (not U and V ) for break detection, since the measurement instrument reports speed and direction and the breaks are related to biases linked to the instrument itself.For wind direction we looked specifically for north alignment errors.Regarding wind speed, one expects breaks at those levels where the mean wind speed is particularly large i.e. is where the jet streams are located.When a break was detected, the adjustments were made by applying a constant factor estimated from comparing the wind speeds and wind speed innovations before and after the break.
Combining information coming form wind speed and direction, also the U and V wind components could be adjusted.Since the NOAA-20CR was used as reference and since a much more comprehensive input data set was used, several inhomogeneities in the period before 1958 could be detected and adjusted, the most prominent being a pervasive wind direction bias of up to 15 • over the central US between 1938 and 1948.After 1960 inhomogeneities are relatively rare compared to temperature inhomogeneities.Fewer breaks have been detected and adjusted compared to Gruber and Haimberger (2008), which may be related to inhomogeneities in the reference used then (ERA-40) but also because the variance of the ERA-40 innovations was much smaller than the variance of the NOAA-20CR innovations.The adjustments could be used as input in a reanalysis bias correction scheme or they could be used as reference to test a variational wind bias adjustment scheme Dee and Uppala (2009).Particularly wind direction biases seem good candidates for variational adjustment since wind direction is well constrained in a state of the art multivariate data assimilation system and expected model biases in wind direction are much smaller than the biases found in some of the observation records.
The wind data set developed is the most comprehensive homogenized data set of its kind.It is an ideal basis for estimating the tropical zonal mean warming maximum through exploiting the thermal wind relationship, as pioneered by Allen and Sherwood (2008).The large number of PILOT records should allow to extend these estimates back to 1950 and because of the more complete data higher accuracy can be expected compared to their study.It may be very interesting as well to check if the global mean kinetic energy has increased in the past decades (Bengtsson et al., 2004;Kung, 1966).
Although the data are now homogenized, one should be aware that observed wind speeds have a sampling bias.In the early period high windspeeds are underrepresented because of measurement limitations.This effect still has to be taken into account when calculating monthly means and trends.be expressed as ffe i φ , which is a nonlinear product.Calculating the wind direction trends just from regressing φ values over time will not always work, also because of its periodicity.To avoid such problems we split the wind vector in its orthogonal U and V components and regress those independently over time.
1.For the selected time window (normally chosen to 20 years), U raw and V raw and U adj and V adj are calculated using unadjusted (raw) and adjusted (adj) data (where the mean is taken over the selected time window); 2. from U raw , V raw and U adj , V adj , time series, the linear trends are calculated.These are multiplied with the length of the time interval to yield differences ∆U raw , ∆V raw , ∆U adj and ∆V adj .
3. Using the standard inner product definition the wind direction difference ∆θ can be calculated as for raw and adj observations.
Dividing ∆φ by the length of the time interval used for calculating ∆U, ∆V yields the wind direction trend as it is shown color-coded e.g. in Fig. 18.
Wind speed trends are calculated similarly, by calculating first the wind speed difference and then dividing by the time interval.
Discussion Paper | Discussion Paper | Discussion Paper | . While wind direction errors could Figures Back Close Full Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | have yet to demonstrate significantly better temporal homogeneity than the NOAA-20CR.Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | 100, 150, . . ., 850 hPa t m = 00:00, 12:00 GMT Under the null hypothesis (homogeneous time series) the distribution of Q k can be obtained from Monte Carlo simulations, taking possible autocorrelation into account.Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | times separately.The breaks at station Bismarck as shown in Figs.6-8 are typical examples.Others like the Russian station Aktyubinsk (WMO ID 35229) or Marion Island (WMO ID 68994) highlighted by Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | shows histograms of observed wind speed for long records over the US at the 300 hPa level.Whereas there are very few measured wind speeds over 40 m s −1 in the 1940s, they are quite common from 1960 onward.When taking the mean, the 5 year averaged wind speeds increase from 14.6 m s −1 to more than 20 m s −1 in the above time frame.This increase is almost solely due to decreasing sampling bias.After the introduction of RADAR tracking in the 1960s only very few observations were missing and the sampling bias vanished.In contrast to the observations, the mean wind speed departures from the NOAA-20CR changed only slightly.Observed winds tend to be higher than NOAA-20CR winds ESSDD Discussion Paper | Discussion Paper | Discussion Paper | by 0.5 m s −1 in the 1940s and 1.8 m s −1 from the 1960s onwards.The mean is positive Discussion Paper | Discussion Paper | Discussion Paper | Already the raw observations show visible shifts, e.g. in the wind direction time series of station 072764 (Bismarck, North Dakota USA) in Fig. 2. Shifts are visible at all the available ESSDD Discussion Paper | Discussion Paper | Discussion Paper | pressure levels in the years 1938 and 1948.The shifts are much more clearly visible in Fig. 6.It shows the wind direction innovations i τ Φ time series with the NOAA-20CR Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |The trends are rather sensitive to inhomogeneities, the comparison with neighbour station's trends help to understand where the raw data are affected by them.To estimate the spatial consistency of the trends, the Cost function introduced by Haimberger (2007) has been used.In this way, also smaller improvements can be objectively measured 4.2.1 Wind trends over the US in the early period 1940-1960 In 1940 the US already had a well established upper air network whereas observations were very rare in other regions.PILOT balloons and radiosondes were launched daily, typically reaching the 500 hPa level.As a first example, we show wind direction trends at 700 hPa in the period 1940-1960 over the US in Fig. 12.The arrows indicate the mean wind speed over the considered period and point to the mean wind direction.The color scale tells about the wind direction trend for the investigated period.The figure shows suspiciously strong trends (more than 10 • over 10 years, mainly located over the central US Station Bismarck -North Dakota, 072764 belongs to this group, see 6).Applying RAOBCORE2.0,and using as reference the NOAA-20CR, around 150 wind direction breaks were detected in over 45 station records in the period 1935-1958.The applied adjustments are visible in Fig. 13, where the arrows represents the adjustment vectors.For the period 1940-1960, the adjustments are generally directed southward, stressing that all the stations are affected by a similar North alignment related bias.The colors show how the adjustments affect the wind direction trends.The adjusted trends are visible in Fig. 14.They look much more homogeneous and also the cost function has been reduced by a factor 3 due to the improved spatial and temporal homogeneity.The nature of the breaks affecting the USA in the period 1935-1960 is highlighted in Fig. 15, which shows that the break distribution is clearly bimodal with absolute values of break sizes definitely greater than 0. This indicates that the adjustments are applied only where the break significance has been carefully verified.As well, the break time distribution shows that most breaks occurred in 1939 and 1948.S. Schroeder's Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | 477 breaks have been detected and adjusted by RAOBCORE and the cost function is reduced to about 40 %, which is substantial.The adjusted trends in Fig. 19 show only very few remaining outliers.Wind speed trend patterns are already relatively smooth without homogenization as shown in Fig. 20 at 200 hPa.Isolated outliers such as Athens (see Fig. 11) or stations in Turkey, Algeria, Republic of South Africa, California, Mexico and South Pole are detected and in most cases adjusted.The cost function is reduced only slightly by 15 % (Fig.
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

Figure 1 :Fig. 1 .Figure 2 :Fig. 2 .Fig. 3 .Fig. 4 .
Figure 1: Maps of upper air stations of the GRASP data set with at least 5 years of data per decade in the range 850-500hPa, for different decades.Between the 1960s and the 2000s the difference is relatively small Fig. 1.Maps of upper air stations of the GRASP data set with at least 5 years of data per decade in the range 850-500 hPa, for different decades.Between the 1960s and the 2000s the difference is relatively small.

Figure 5 :
Figure 5: Monthly mean wind speed at 200 hPa, averaged over WMO stations with IDs between 70000 and 75000 (most USA and Canada), a) without filling, b) with replacing missing data with NOAA-20CR scaled wind values.

Fig. 5 .Figure 6 :
Fig. 5. Monthly mean wind speed at 200 hPa, averaged over WMO stations with IDs between 70000 and 75000 (most USA and Canada), (a) without filling, (b) with replacing missing data with NOAA-20CR scaled wind values.

Fig. 6 .
Fig. 6.Wind direction (left panel) and wind speed (right panel) analysis departure (Obs-NOAA20CR) time series at time 00:00 GMT at 850, 700 and 500 hPa levels at station Bismarck (red curves).The blue curves (right axes) show the Q k time series as defined in Eq. (4).Q k values above Q crit WD = 10 are statistically significant and are shaded.The small colored triangles on the x axes indicate changes of the radiosonde type.For WS, Q crit WS has been set to 20.The mean of the wind speed differences is positive since NOAA-20CR wind speeds over the US are generally biased low.

Figure 7 :Fig. 7 .Figure 7 :Figure 8 :
Figure 7: Wind direction (left panel) and Wind speed (right panel) adjustments calculated by RAOBCORE from analysis of the departure time series shown in Fig. 6.Wind direction biases are constant in pressure and time between breakpoints.Wind speed adjustments occur only after the 1960 at this station.They depend on observed wind speeds and are therefore not constant. 33

Figure 9 :Figure 10
Figure 9: Vertical profiles of wind direction break size estimates at Bismarck for breaks in 1939 and 1948.The bullets are the estimated differences detected at each pressure level (black and blue at 00GMT, yellow and red at 12GMT).The solid and dotted lines are the respective vertically constant adjustment valuesmean values for the whole vertical profiles Fig. 9. Vertical profiles of wind direction break size estimates at Bismarck for breaks in 1939 and 1948.The bullets are the estimated differences detected at each pressure level (black and blue at 00:00 GMT, yellow and red at 12:00 GMT).The solid and dotted lines are the respective vertically constant adjustment valuesmean values for the whole vertical profiles.

Fig. 10 .Figure 11 :
Fig. 10.U (left panel) and V (right panel) wind innovations (upper row), adjustments (middle row) and adjusted innovations (lower row) at the 700 hPa level.Note that a positive adjustment of wind direction as shown in Fig. 7 translates to negative V adjustments depending on the strength of the predominantly westerly winds.

Fig. 11 .Fig. 13 .
Fig. 11.Wind speed innovations (upper row, red curves) at Athens, Greece, WMO ID 016716 at 300 hPa, at 00:00 GMT (left panel) and 12:00 GMT(right panel).Blue curves are SNHT time series.The SNHT points out relevant inhomogeneities at the same positions at neighboring pressure levels as well (not shown).Middle and lower rows show adjustments and adjusted innovations, respectively.

Figure 15 :
Figure 15: Size (left panel) and time (right panel) of wind direction breaks at 99 mainland US stations (WMO numbers between 72000 and 73000) in the period 1935-1960, all available stations are analyzed.The means of positive and negative breaks are indicated.

Fig. 15 .
Fig. 15.Size (left panel) and time (right panel) of wind direction breaks at 99 mainland US stations (WMO numbers between 72000 and 73000) in the period 1935-1960, all available stations are analyzed.The means of positive and negative breaks are indicated.