Zonal-mean data set of global atmospheric reanalyses on pressure levels

This data set, which is prepared for the SPARC-Reanalysis Intercomparison Project (S-RIP), provides several zonal-mean diagnostics computed from reanalysis data on pressure levels. Diagnostics are currently provided for a variety of reanalyses, including ERA-40, ERA-Interim, ERA-20C, NCEP-NCAR, NCEP-DOE, CFSR, 20CR v2 and v2c, JRA-25, JRA-55, JRA-55C, JRA-55AMIP, MERRA, and MERRA-2. The data set will be expanded to include additional reanalyses as they become available. Basic dynamical variables (such as temperature, geopotential height and three-dimensional 5 winds) are provided in addition to a complete set of terms from the Eulerian-mean and transformed Eulerian-mean momentum equations. Total diabatic heating and its long-wave and short-wave components are included as availability permits, along with heating rates diagnosed from the basic dynamical variables using the zonal-mean thermodynamic equation. Two versions of the data set are provided, one that uses horizontal and vertical grids provided by the various reanalysis centers, and another that uses a common grid to facilitate comparison among data sets. For the common grid, all diagnostics are inter10 polated horizontally onto a regular 2.5◦×2.5◦ grid for a subset of pressure levels that are common amongst all included reanalyses. The dynamical (Martineau, 2017, http://dx.doi.org/10.5285/b241a7f536a244749662360bd7839312) and diabatic (Wright, 2017, http://dx.doi.org/10.5285/70146c789eda4296a3c3ab6706931d56) variables are archived and maintained by the Centre for Environmental Data Analysis (CEDA).


Introduction
Reanalysis products are commonly used to study weather and climate variability and to validate climate models.By combining numerical forecast models and various observations through data assimilation procedures, reanalyses aim to produce a best estimate of the state of the atmosphere.However, differences among the model and assimilation components of reanalysis systems, as well as differences in the assimilated observations, result in different representations of the historical state and behavior of the atmosphere.These dis-crepancies contribute to uncertainties in our understanding of the atmosphere and its variability.
The Stratosphere-troposphere Processes And their Role in Climate (SPARC) Reanalysis Intercomparison Project (Fujiwara et al., 2017, S-RIP;) undertakes to compare reanalysis data sets, understand the causes of the differences, and provide guidance on the appropriate usage of reanalyses (all abbreviations are collected in Appendix A).To facilitate this comparison, we have prepared a data set containing zonalmean variables on pressure levels using a consistent set of numerical methods and a unified file format.a Original grid resolution when downloaded from the source; some reanalyses provide data on multiple grids.b Transition from version 1 (CFSR) to version 2 (CFSv2) on 1 January 2011.c For MERRA and MERRA-2, only assimilated (ASM) state products are used (see also discussion by Fujiwara et al., 2017).
The data set comprises two major components.The first component provides variables on an original latitudepressure grid defined by the corresponding reanalysis center (original grid, OG).Note that this grid is typically not defined by the model resolution, nor is it necessarily unique, as some reanalysis products are distributed on a range of grids (Table 1).The second component is a data set for which basic variables have been interpolated onto a common 2.5 • × 2.5 • latitude-longitude grid (common grid, CG).The pressure coordinate for the CG data files is reduced to contain only those levels common to all of the reanalysis data sets, with extension up to 1 hPa when possible.Both data sets are provided on 6 h time intervals.While the OG zonal-mean diagnostics are affected by the horizontal grid on which variables are provided, the CG diagnostics have no such dependence and can be directly compared without further interpolation.
The characteristics of this zonal-mean data set on pressure levels are described in this paper.The reanalysis data sets included in the comparison are listed and briefly described in Sect. 2. The diagnostics provided in the data set are introduced in Section 3, with grid dependence discussed in Sect. 4. The availability of the S-RIP zonal-mean data set and its appropriate usage are outlined in Sect. 5.

Data
The zonal-mean data set on pressure levels includes most major reanalysis products (Table 1), with a total of 14 reanalyses represented.Three of these reanalyses have been produced by the European Centre for Medium-Range Weather Forecasts (ECMWF): the ECMWF 40-year Reanalysis (ERA-40), the ECMWF Interim Reanalysis (ERA-Interim), and the ECMWF 20th Century Reanalysis (ERA-20C).Five of the reanalyses have been produced by the National Centers for Environmental Prediction (NCEP) and cooperating agencies: the NCEP-NCAR (National Center for Atmospheric Research) Reanalysis 1, the NCEP-DOE (Department of Energy) Reanalysis 2, the Climate Forecast System Reanalysis (CFSR), and the National Oceanic and Atmospheric Administration (NOAA) and Cooperative Institute for Research in Environmental Sciences (CIRES) 20th Century Reanalysis (20CR) version 2 (v2) and version 2c (v2c).Note that CFSR products after January 2011 have been produced with the slightly different data assimilation system CFSv2 (Fujiwara et al., 2017, their Sect. 2.4).The 20CR v2c reanalysis uses the same model as 20CR v2, but with new sea ice boundary conditions, among other changes (Gil Compo, personal communication, 2017).Four of the reanalyses have been produced by the Japan Meteorological Agency (JMA): the Japanese 25-year Reanalysis (JRA-25), the Japanese 55-year Reanalysis (JRA-55), and two variants of JRA-55 (JRA-55C and JRA-55AMIP).JRA-55 and its variants all use the same model and boundary conditions; however, while JRA-55 assimilates both conventional and satellite observations, JRA-55C assimilates only conventional observations and JRA-55AMIP does not assimilate any observations.The final two reanalyses included in the data set have been produced by the National Aeronautics and Space Administration (NASA): the Modern-Era Retrospective Analysis for Research and Applications (MERRA) and its successor MERRA-2.Diagnostics for MERRA and MERRA-2 are based on the assimilated (ASM) state, rather than the analyzed (ANA) state (see e.g., Rienecker et al., 2011;Fujiwara et al., 2017).
Horizontal grid sizes in degrees for the reanalysis products used to produce the OG zonal-mean data set are listed in Table 1.All data are on regular latitude-longitude grids.Model-generated diabatic heating products from CFSR and MERRA-2 are remapped directly from the default grids (1 and 0.5 • × 0.625 • , respectively) onto the OG and CG grids listed for these reanalyses in Table 1 using bilinear interpolation.
The highest pressure level provided is also listed in Table 1.The pressure levels included in each data set are shown in Table 2.As discussed below, the grid spacing does not have a large impact on the diagnostics provided in this data set.Inter-reanalysis differences in zonal-mean diagnostics are dominated by reanalysis-specific factors rather than numerical resolution.Differences among reanalysis products emerge from differences in the underlying models, data assimilation techniques, and assimilated observations.A detailed accounting of these differences is beyond the scope of this article.Fujiwara et al. (2017) have recently reviewed many of the technical differences among these reanalyses.A focused intercomparison of monthly mean temperatures and winds conducted by Long et al. (2017) further revealed a good level of agreement among reanalyses, particularly the most recent products, but with a time dependence that emerges from changes in the assimilated observational data.

Diagnostics
With the exception of the model-generated diabatic heating rates (see Sect. 3.6.1),all diagnostics provided in this data set are based on three-dimensional wind (u, v, ω), temperature (T ), and geopotential height (Z) fields provided on levels of constant pressure (p).All diagnostics are evaluated as a function of time t at 6 h intervals and provided along a regularly spaced latitude coordinate (φ).Data access information for core variables is provided in Appendix A, Table A1.
Potential temperature is calculated on pressure levels as follows: where p 0 is a reference pressure (1000 hPa), R d is the gas constant for dry air (287 J K −1 kg −1 ), and c p is the specific heat at constant pressure (1004 J K −1 kg −1 ).The ratio R d /c p is rounded to 0.286.Throughout this paper, the zonal mean of a quantity x is denoted as x, with anomalies from the zonal mean defined as x = x − x.Differences in the preparation of the OG and CG data sets are illustrated in Fig. 1.All calculations for the OG data set are performed on the original grid associated with that reanalysis (Table 1).Note that the OG grids differ from the grids on which the model components of the reanalyses were run and therefore already include errors from interpolation onto coarser grids for data dissemination.For the CG data set, all variables are first interpolated to the common 2.5 • × 2.5 • grid using bilinear interpolation in latitude and longitude.Only common pressure levels listed in Table 2 are kept.Diagnostics are then computed on the common grid before the zonal mean is taken.

Numerical methods
A three-point stencil is used to evaluate all derivatives.In the case of meridional derivatives the three-point stencil is expressed as where φ is latitude in radians.This scheme, which has an accuracy of the order of ( φ) 2 , is chosen for its ability to evaluate derivatives close to the boundaries.In the case of vertical derivatives the three-point stencil is expressed as Since pressure levels are not evenly spaced, the centered difference in the vertical is first computed for half-levels (in the pressure domain) and then linearly interpolated (still in the pressure domain) back to the original pressure levels.No extrapolation is performed; vertical derivatives at the lowermost and uppermost pressure levels are not provided.

Core zonal-mean variables
The core of the data set consists of simple zonal-mean diagnostics.Zonal-mean variables such as zonal wind, meridional wind, temperature, and geopotential height are provided (Table 3).These basic quantities are then used to produce the advanced diagnostics.

Covariance terms
Several covariance terms are provided (Table 4).The covariance between zonal wind and meridional wind (momentum flux) and the covariance of meridional wind and temperature (heat flux) are often used to assess the propagation of eddies in the zonal-mean framework.These covariance terms also enter the computation of the transformed-Eulerian-mean (TEM) and Eliassen-Palm (EP) flux diagnostics (described For the covariance terms of individual zonal wavenumbers, _k# is appended to the variable name where # is the wavenumber. below).The variances of zonal wind (u 2 ) and meridional wind (v 2 ) can be used to compute eddy kinetic energy as EKE = 1/2 u 2 + v 2 .Covariance terms and the advanced diagnostics presented in the following sections are provided for the sum of all eddies in addition to zonal wavenumbers 1, 2, and 3.

Eulerian-mean momentum diagnostics
The zonal-mean tendency of zonal wind ∂u ∂t is expressed using the primitive momentum equation on pressure levels: where f is the Coriolis frequency (f = 2 sin φ), is the rotation rate of the Earth (7.2921 × 10 −5 rad s −1 ), and a is the mean radius of the Earth (6 371 000 m) (Andrews et al., 1987;see Salby, 1996, for the transformation from log-pressure coordinates to isobaric coordinates).The first five terms on the right-hand side of Eq. ( 4) are provided as Eulerian-mean momentum diagnostics (Table 5).The residual term includes the effects of parameterized processes, diffusion, and errors in the numerical methods.This term may be evaluated by subtracting the sum of the five terms listed in Table 5 from the zonal-mean tendency of zonal wind.

Primitive-equation version
Transformed-Eulerian-mean (TEM) momentum diagnostics (Table 6) based on the primitive momentum equation are provided on pressure levels (e.g., Andrews et al., 1987;Salby, 1996).The residual circulation is first defined as follows: Substituting Eq. ( 5) into Eq.( 4), we obtain the TEM equation: Here, the Eliassen-Palm (EP) flux is a two-dimensional vector defined as with the divergence operator in spherical coordinates defined as The residual ( ) is mathematically equivalent to as defined in the Eulerian-mean framework (Eq.4).

Quasi-geostrophic (QG) approximation
The quasi-geostrophic (QG) version of the TEM equation (Edmon et al., 1980) is expressed as where the QG EP flux takes the following form: For the contribution of individual zonal wavenumbers, _k# is appended to the variable name where # is the wavenumber.
For the contribution of individual zonal wavenumbers, _k# is appended to the variable name where # is the wavenumber.
Although we use the QG form of the momentum equation, the resulting diagnostics are not strictly QG since the total wind is used rather than the geostrophic wind.The vertical and meridional components of the QG EP flux and EP flux divergence (EPFD) are provided in addition to the primitiveequation terms (Table 6).
3.6 Diabatic heating rates 3.6.1 Model-generated heating rates Zonal-mean diabatic heating rates generated by a subset of the reanalysis forecast models are provided at 6 h resolution in units of K day −1 .Unlike the other diagnostics, these terms are not derived using the basic variables listed in Table 3.Instead, these zonal-mean diabatic heating rates are computed from the physical temperature tendency diagnostics produced during the reanalysis model forecast step.This approach allows for separate analyses of total, radiative, and non-radiative heating.Not all reanalyses store these forecast products or make them publicly available.Heating rates in the S-RIP zonal-mean data set are provided for only eight of the 14 reanalyses: ERA-40, ERA-Interim, NCEP-NCAR, CFSR, JRA-25, JRA-55, MERRA, and MERRA-2.Heating rate forecasts were not archived for CFSv2 and are therefore only available for CFSR through December 2010.Data access information for all heating rate products is provided in Appendix A, Table A2.Diabatic heating is a fundamental component of the temperature budget, as expressed by the thermodynamic equation in pressure coordinates: The terms on the left-hand side of Eq. ( 11) constitute the material derivative of potential temperature Dθ Dt .The term on the right-hand side represents diabatic heating due to physical processes, such as latent heating, radiative transfer, and vertical diffusion.Three diabatic terms are included with the OG and CG zonal-mean data sets (Table 7): total diabatic heating due to all parameterized physics, diabatic heating due to long-wave radiative transfer, and diabatic heating due to shortwave radiative transfer.These terms, which are pro-Earth Syst.Sci.Data, 10, 1925Data, 10, -1941Data, 10, , 2018 www.earth-syst-sci-data.net/10/1925/2018/  vided by the reanalyses as temperature tendencies, are converted here to potential temperature tendencies.As the variables are provided on pressure levels, they can be easily converted back to zonal-mean temperature tendencies if desired.Note that the diabatic heating terms are based on average temperature tendencies over each 6 h window.Accordingly, these data are centered at 03:00, 09:00, 15:00, and 21:00 Z rather than the standard synoptic times 00:00, 06:00, 12:00, and 18:00 Z.Other diagnostics provided in the OG and CG zonal-mean data sets are based on instantaneous fields at the standard synoptic times.The diabatic heating rates thus lag these other diagnostics by 3 h.

Diagnosed heating rates
A complementary set of heating rates is diagnosed using a subset of the zonal-mean dynamical quantities introduced above.These heating rates have been calculated based on analysis of the zonal-mean thermodynamic equation: which are provided together with dynamical heat transport terms.Terms on the left-hand side that are functions of potential temperature are obtained from the corresponding terms expressed as functions of temperature (see Tables 3 and 4).For example, vertical fluxes of potential temperatures are obtained from ω T using the identity ω θ = ω T (p/p 0 ) −κ , with κ approximated as 0.286 as in Eq. ( 1).The term X , which can be computed as a residual by substituting the model-generated diabatic heating into Eq.( 12), represents the effects of assimilation increments, but also includes numerical errors and methodological differences.For example, whereas model-generated heating rates are computed as averages over the forecast step, diagnosed heating rates are computed using instantaneous analysis winds and temperatures at the beginning and end of each forecast step.The extent to which the residual term may be considered to approximate the assimilation increment is discussed further below.
Table 8 lists the diabatic and dynamical heating terms calculated based on Eq. ( 12).As with the model-generated di-abatic heating rates, all terms are provided as functions of potential temperature in units of K day −1 .Although these diagnostics are based on the core variables and covariance terms, they are constructed to be valid at 03:00, 09:00, 15:00, and 21:00 Z to match the model-generated diabatic heating rates.The time derivative ∂θ /∂t is calculated as a central difference, while the meridional and pressure derivatives are calculated by applying the numerical methods described in Sect.3.1 to quantities averaged over the two time steps bracketing each window.

Comparison of the OG and CG data sets
All diagnostics are provided on two distinct grids (see Fig. 1 for a flowchart of the processes involved).For the original grid data set, the diagnostics described in Sect. 3 are calculated on the grid specific to the corresponding reanalysis (see Table 1).We emphasize again that these are not the model grids used to produce the reanalysis forecast background states but coarser grids onto which data have been interpolated for public distribution.Differences in the diagnostics among reanalyses are therefore influenced by numerical resolution.Small-scale processes could affect the computation of co-variance terms, with reanalysis products on a finer grid potentially providing more accurate representations.These data, in addition, cannot be compared directly across reanalyses due to the variety of grids unless the user performs area integrals or interpolations.By contrast, the common grid data set provides all diagnostics on a common 2.5 • ×2.5 • grid and uses only those pressure levels common to all 14 reanalyses (Table 2).Reanalyses that provide data at finer resolutions are regridded in the horizontal dimension using bilinear interpolation.All diagnostics are calculated after this interpolation is performed.This approach enables direct comparison of all reanalyses, but the interpolation may introduce additional numerical errors.Users who wish to compare reanalyses while still taking advantage of the finer resolution in the OG data set may take the computationally inexpensive approach of interpolating OG zonal-mean diagnostics onto a common grid.
The impact of the grid transformation on variables provided in these data sets is tested for some selected diagnostics.Figure 2 shows geopotential height contours for all reanalyses using data from the OG and CG data sets at www.earth-syst-sci-data.net/10/1925/2018/ Earth Syst.Sci.Data, 10, 1925-1941, 2018  00:00 UTC, 1 January 1980.Contours based on OG and CG data for each reanalysis are virtually indistinguishable from each other.Inter-reanalysis differences are far larger than discrepancies between grid types.Data sets that assimilate surface observations only (E20, 20CR2, 20CR2c) or conventional observations only (JRA55C) tend to differ more from the main group of reanalyses that assimilate upper atmosphere and satellite data, especially at high altitude.Figure 3 shows the vertical profile of zonal wind averaged over the Northern Hemisphere high latitudes at 00:00 UTC, 1 January 1980.Despite some small differences, results based on the OG and CG data sets are generally similar.Discrepan-cies between the two grid types can be partly attributed to uncertainties in the interpolation procedure; however, these discrepancies are again smaller than differences across reanalyses.
The differences between the CG and OG profiles shown in Fig. 3 can be attributed to latitudinal resolution.The number of grid points included in the latitude band used to compute the average differs between CG and OG data based on the same reanalysis data set.These differences can influence budget averages.The OG and CG data sets are virtually indistinguishable when comparing latitudinal profiles Earth Syst.Sci.Data, 10, 1925Data, 10, -1941Data, 10, , 2018 www (not shown), indicating that the interpolation used to create the CG data set has little influence on zonal-mean quantities.
Although zonal-mean quantities are largely insensitive to grid spacing and interpolation, flux terms may be more sensitive.Figure 4 shows the vertical profile of EP flux divergence, a quantity that depends on both the horizontal resolution (for computing heat and momentum fluxes) and the vertical resolution (for computing vertical derivatives).Again, differences between EP flux divergence computed using CG data and EP flux divergence computed using OG data are typically small relative to differences across reanalyses.As above, some of the differences between the CG and OG diagnostics are due to the different numbers of points that go into the meridional average, but differences in vertical resolution also play an important role.The latter is particularly apparent in the differences between the CG and OG pro- files for MERRA and MERRA-2 between 30 and 100 hPa.The OG and CG profiles for NCEP-NCAR and NCEP-DOE, for which the OG and CG grids are identical in this latitude range, are virtually the same at all levels.
Figure 5 shows EP flux divergence as a function of latitude.Although the values of this diagnostic are typically similar between the CG and OG data sets, they differ substantially in some locations.This is especially evident for ERA-20C at 300 hPa around 37 • N.Such differences likely result from differences in the relative contributions of smallscale eddies, which are enhanced when computations are performed using OG data and reduced when computations are performed using CG data (note also that ERA-20C has the finest OG grid spacing among the reanalyses; Table 1).EP flux divergence also varies substantially among reanalyses near the pole.These inter-reanalysis differences are likely related to differences in representations of eddy fluxes at the poles among reanalyses.We therefore recommend that users of this data set be cautious in interpreting behavior near the boundaries and avoid using certain diagnostics in these regions.
The sensitivity of momentum diagnostics to numerical resolution has been evaluated separately in both horizontal and vertical dimensions by Martineau et al. (2016).Although enhanced vertical resolution improves dynamical consistency (i.e., the ability to explain the wind tendency as a function of the forcing terms) in the upper troposphere and lower stratosphere, gains in dynamical consistency in the middle stratosphere are mainly achieved by reducing the horizontal grid spacing.However, these improvements were merely incremental in both cases.The small differences in grid spacing between the OG and CG data sets are thus not expected to substantially affect the conclusions of studies that use these data sets.
Overall, differences between the OG and CG diabatic heating diagnostics are similar to those for other variables: very small in zonal-mean fields, slightly larger for area averages, and typically much smaller than inter-reanalysis differences.The latter two features are illustrated in Fig. 6a, which shows time series of area-mean model-generated diabatic heating at 50 hPa averaged over 60 to 90 • N from January through March 2009 (Manney et al., 2009;Harada et al., 2010).These time series include the sharp intensification and subsequent decay in diabatic cooling associated with a stratospheric sudden warming that occurred around 24 January 2009.Differences between the OG and CG data sets are negligible Earth Syst.Sci.Data, 10, 1925Data, 10, -1941Data, 10, , 2018 www • N, along with (c) residual terms reflecting the differences between the diagnosed and model-generated heating rates.Time series are shown for the period between 00:00 UTC, 1 January, and 00:00 UTC, 1 April 2009.Note that a 2-day equally weighted rolling mean is applied to the diagnosed heating rates and residual terms.
Values are shown at 6 h intervals for the original grid (lines) and 24 h intervals for the common grid (x; 03:00 UTC every day) based on reanalyses with available data for the selected time period and isobaric level (colors).The vertical shaded region indicates 24 January.
through most of the period shown, except for the weeks immediately following the sudden warming.Although qualitative variations are consistent between the two data sets, CG cooling rates are enhanced relative to OG cooling rates during portions of this period by magnitudes approaching 0.2 K day −1 .Users should be aware of the potential for these types of quantitative biases when using the OG and/or CG data sets to study temporal variations in area-mean quantities.Figure 6b shows a similar time series of heating rates diagnosed from the core dynamical variables using the zonalmean thermodynamic equation.Although the qualitative behavior of the diagnosed heating rates is similar to that of the model-generated forecasts through most of the comparison period, the two estimates differ substantially in the lead-up to the sudden warming and the days immediately afterward.Three aspects of these diagnosed heating rates are worth noting here.First, the diagnosed heating rates are considerably noisier than the model-generated heating rates, particularly at 6-hourly time resolution.This noise arises largely from variance in the vertical velocity ω (accruing at least in part from using the average of two instantaneous values bracketing the 6 h time step rather than average values across the time step), as well as numerical errors during the diagnostic step.For practical applications, the noise can be reduced by applying a rolling mean.The rolling mean is applied using a Hamming window spanning 2 days (nine time steps) for the time series shown in Fig. 6b.Second, differences between the OG and CG data sets are much larger for the diagnosed heating rates than for the model-generated heating rates.For the time series shown in Fig. 6b, these differences are especially pronounced during the period leading up to the sudden warming (as large as 0.5-0.7 K day −1 in the diagnosed total heating rate).Several of the terms on the left-hand side of Eq. ( 12) increase sharply in absolute magnitude during this period, with the differences between the OG and CG representations of these terms increasing at the same time.Third, the diagnosed heating rates do not extend over the entire polar cap.(a-d) and the OG diabatic data set described in this paper (e-h).Terms include the time rate of change (a, e), the effects of dynamics (b, f), the analysis increment and residual term (c, g), and the model-generated diabatic heating due to parameterized physics (d, h).Here, the two left panels in each row should be considered as one side of the heat budget equation, with the two right panels on the opposite side.
Edge effects eliminate the data at 90 • and adversely impact the quality of the data at 87.5 • ; note, however, that calculating average model-generated heating rates over the 60-85 • N domain has little influence on the time series shown in Fig. 6a.
Figure 6c shows the time evolution of the residual term, calculated as the difference between the diagnosed and model-generated heating rates at every time step.As with the diagnosed heating rates, a 2-day rolling mean has been applied to reduce noise.The residual term in this part of the atmosphere is generally close to zero, with the exception of large positive values (in some cases larger than 3 K day −1 ) around the time of the sudden warming event.As mentioned above, the residual term in this data set may be taken as one way of approximating the zonal-mean assimilation increment or "analysis tendency" in potential temperature (e.g., Wright and Fueglistaler, 2013), with the large magnitudes of the residual terms thus reflecting the importance of observational data assimilation in keeping the models on track as the sudden warming develops.However, there are several sources of uncertainty that should be taken into account when treating the residual term in this way.First, modelgenerated heating rates are based on accumulations over each forecast step, whereas the diagnosed heating rates are calculated using instantaneous analysis temperatures and winds.These discrepancies can be mitigated to some degree by averaging over longer intervals but should nonetheless be taken into account.Second, the diagnosed heating rates are based on finite differences applied on a reduced set of lat-itudes and pressure levels relative to the native model grid.Our diagnosed estimates compare well with direct estimates of potential temperature tendencies due to dynamics from MERRA-2 at monthly timescales (Fig. 7); however, some features are displaced slightly in pressure or latitude.These displacements then display as differences between the residual term and the analysis increment.Third, the diagnosed heating rates account for assimilation increments in winds in addition to those in temperatures.The logical relationship between the residual term and the assimilation increment as traditionally defined (analysis minus forecast temperatures) thus depends on the type of data assimilation used by the reanalysis system (see also Fujiwara et al., 2017).For systems based on three-dimensional variational (3D-Var) assimilation techniques (such as ERA-40, JRA-25, NCEP-NCAR R1, or CFSR), the assimilation increment effectively reflects only adjustments to temperatures within the assimilation window.By contrast, temperatures including in the zonal-mean dynamical data set for systems using incremental analysis update (IAU; such as MERRA and MERRA-2) or incremental 4D-Var techniques (such as ERA-Interim or JRA-55) ultimately reflect the assimilation of multiple types of observations, including winds.The residual term therefore has a closer conceptual relationship with temperature increments under IAU or 4D-Var (for which analyzed winds are influential) than those under traditional 3D-Var (for which they are not).Finally, it is well known that gravity waves redistribute heat vertically (Medvedev and Klaassen, 2003).As gravity waves are parameterized in reanalysis models, their effects Earth Syst.Sci.Data, 10, 1925Data, 10, -1941Data, 10, , 2018 www.earth-syst-sci-data.net/10/1925/2018/ on the heat budget appear in the residual term when the thermodynamic equation is evaluated based only on resolved variables.Enhanced gravity wave activity observed around the onset of stratospheric sudden warming events (e.g., Albers and Birner, 2014) may therefore play some part in the large residual terms seen in Fig. 6c, although it is necessary to note that some systems (e.g., MERRA and MERRA-2) explicitly include the effects of gravity waves in their estimates of heating due to parameterized physics.

Data usage and availability
The S-RIP zonal-mean data set of reanalyses on pressure levels provides preprocessed zonal-mean diagnostics using unified NetCDF-4 classic file format.The main purpose of making this data set publicly available is to reduce the workload of researchers contributing to the S-RIP project by providing diagnostics that are commonly needed for reanalysis intercomparison, particularly in the middle atmosphere.The provision of preprocessed data will also save users the need to download and store dozens of terabytes of data.Producing the data set locally using a standardized set of computer codes ensures that the diagnostics are consistent among the reanalyses.
The dynamical (Martineau, 2017) and diabatic (Wright, 2017) components of the data set are archived and maintained by the Centre for Environmental Data Analysis (CEDA) and have been made CF (Climate and Forecast) compliant when possible.All NetCDF files are fully annotated with descriptions of variables and units.A user manual describing the files in detail is provided.The data set is bound to evolve in the future as new reanalysis products are introduced and is also being updated to include additional data as reanalyses are extended in time.For instance, ERA5, which is only partially available at the time of this publication, will be added in the future.Since this data set is intended to serve the needs of the S-RIP community, it may be extended to include additional diagnostics as dictated by user requirements.

Summary
The S-RIP zonal-mean data set of reanalyses on pressure levels aims to facilitate the comparison of reanalysis data sets for the S-RIP community and the general atmospheric science community at large.In its current iteration, the data set includes 14 reanalyses and ancillary products from multiple research institutes.It covers the satellite era (1979-present) and extends backward in time to 1958 when data are available.Diagnostics provided include zonal-mean variables, diabatic heating, covariance and variance terms, and complete diagnostics from the Eulerian-mean and transformed-Eulerian-mean momentum equations.The diagnostics are provided on two grids, the original grid (OG) where diagnostics are performed on the original files acquired from each reanalysis center and the common grid (CG) where data are interpolated to a unified grid before advanced diagnostics are performed.The data set will grow in time to include more reanalyses and variables, as dictated by the evolving needs of the S-RIP community.

Figure 2 .
Figure2.Selected geopotential height contours on 1 January 1980 at 00:00 UTC for three isobaric surfaces: 1 hPa (top; Z = 46 km), 10 hPa (center; Z = 30.5 km), and 500 hPa (bottom; Z = 5.5 km).Contours are displayed for each reanalysis according to color legend.Contours based on the OG data set are shown using solid lines.Black dots are added to contours based on the CG data set.The two sets of contours are indistinguishable for this case.Due to space constraints, these longitude-dependent fields are not included in the core data set.

Figure 3 .Figure 4 .
Figure 3. Vertical profiles of zonal wind averaged from 40 to 90 • N at 00:00 UTC, 1 January 1980.Profiles are shown for both the common grid (dashed, x) and original grid (solid, o) data based on different reanalyses (colors).Tick marks (x, o) indicate pressure levels included in the corresponding grid.

Figure 5 .
Figure 5. EP flux divergence as a function of latitude for the 1 hPa (a), 10 hPa (b), 100 hPa (c), and 300 hPa (d) pressure levels at 00:00 UTC, 1 January 1980.Values are shown for both the common grid (dashed, x) and original grid (solid, o) data based on different reanalyses (colors).Tick marks (x, o) indicate latitude points included in the corresponding grid.

Figure 6 .
Figure 6.Time series of total diabatic heating on the 50 hPa isobaric surface based on (a) model-generated forecasts averaged over 60 to 90 • N and (b) analysis of the zonal-mean thermodynamic equation averaged over 60 to 85 • N, along with (c) residual terms reflecting the differences between the diagnosed and model-generated heating rates.Time series are shown for the period between 00:00 UTC, 1 January, and 00:00 UTC, 1 April 2009.Note that a 2-day equally weighted rolling mean is applied to the diagnosed heating rates and residual terms.Values are shown at 6 h intervals for the original grid (lines) and 24 h intervals for the common grid (x; 03:00 UTC every day) based on reanalyses with available data for the selected time period and isobaric level (colors).The vertical shaded region indicates 24 January.

PFigure 7 .
Figure 7.Zonal-mean distributions of potential temperature tendencies (K day −1 ) within 60 • S and 60 • N and between 300 and 10 hPa.Distributions are averaged over January 2009 and are based on direct MERRA-2 outputs (a-d) and the OG diabatic data set described in this paper (e-h).Terms include the time rate of change (a, e), the effects of dynamics (b, f), the analysis increment and residual term (c, g), and the model-generated diabatic heating due to parameterized physics (d, h).Here, the two left panels in each row should be considered as one side of the heat budget equation, with the two right panels on the opposite side.

Table 1 .
List of reanalyses represented in the S-RIP zonal-mean data set.

Table 2 .
Vertical levels of the CG and OG data sets.Pressure levels provided in the OG data set are indicated with x and pressure levels provided in the CG data set are highlighted in bold-italic type.
Flowchart illustrating differences in the calculation of diagnostics in the original grid (OG) and common grid (CG) data sets.