the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
LARA: a Lagrangian Reanalysis based on ERA5 spanning from 1940 to 2023
Abstract. Meteorological reanalyses are crucial datasets in atmospheric research, providing the foundation for many scientific applications. However, most reanalyses follow a Eulerian framework, providing data at specific, fixed points in space and time. This fixed-location approach is suitable for many scientific analyses, but studies focused on transport in the atmosphere would benefit from a Lagrangian framework, which provides data along dynamic, continuous trajectories following the movement of air.
To achieve this, the Lagrangian particle dispersion model FLEXPART was driven off-line with data from ECMWF’s (European Centre for Medium-Range Weather Forecasts) latest reanalysis, ERA5, to convert the Eulerian ERA5 data into a Lagrangian format. FLEXPART utilises the grid-scale winds from ERA5 and stochastic parameterisations of turbulence and convection to advect particles in a domain-filling mode, where the global atmosphere is represented by 6 million particles that move freely in the atmosphere, with their number density following closely the density of air. The resulting new Lagrangian Reanalysis (LARA) dataset has been stored in an easily searchable database and made accessible to researchers all over the world. It will enable a wide range of studies, including global and regional analyses of extreme events, water and energy transport in the atmosphere, and atmospheric energy budgets.
Here, we describe the data format, and how the data can be accessed and analysed. Using four examples, we give a non-exhaustive list of possible applications for which LARA could be used for. We show methods for how the evolution of air masses and their properties can be studied, and how climatologies can be established. Our examples include a study of the evolution of the Hadley cell circulation, a climatology of warm conveyor belt events, a measure of continentality by time it takes air to reach land from the ocean, and an evaluation of the dynamical consistency between subsequent ERA5 meteorological fields.
- Preprint
(2739 KB) - Metadata XML
-
Supplement
(9239 KB) - BibTeX
- EndNote
Status: open (until 14 May 2025)
-
RC1: 'Comment on essd-2025-26', Anonymous Referee #1, 22 Apr 2025
reply
Review of ‘LARA: a Lagrangian Reanalysis based on ERA5 spanning from 1940 to 2023’ by L. Bakels et al.
General comments
This paper introduces LARA (Lagrangian Reanalysis), a novel global dataset created by converting ERA5 reanalysis data into a Lagrangian format using the FLEXPART model. Unlike traditional Eulerian reanalyses, LARA tracks six million air particles from 1940 to 2023, providing high-resolution, hourly data on their positions and atmospheric properties. This dataset supports studies on atmospheric transport, energy and moisture fluxes, and extreme events. It is validated for internal consistency. LARA is available in Zarr data format, accompanied by analysis scripts and four illustrative applications, such as Hadley cell evolution and diagnostics of data assimilation discontinuities in ERA5. Overall, LARA appears to be a powerful tool for Lagrangian-based atmospheric research.
The methodology of the study is sound. FLEXPART is a reliable model, and the transition to a Lagrangian perspective is well-justified. The validation is appropriate, and the analysis is carefully executed, offering a valuable dataset with wide applications.
One general comment that remains open is how the number of six million air parcels used in LARA was determined. This number is briefly discussed in the conclusion but seems based on computational and storage constraints rather than a sensitivity analysis? Clarifying whether tests were conducted to assess the adequacy of this particle count would strengthen the manuscript. It would also be useful to discuss whether scientific results in Section 4 would remain robust with different particle counts.
Overall, the manuscript is well written, scientifically rigorous, and uses proper terminology and referencing. It follows a clear structure and presents concepts and results in an accessible and precise manner. While some sections are detailed, this aligns with the paper’s dual role as a dataset description and usage guide. Overall, the manuscript meets high standards of clarity, conciseness, and scientific communication. I recommend publication subject to addressing the specific comments and technical corrections listed below.
Specific comments
Lines 34-39: Several major reanalyses, including MERRA and JRA-55, are mentioned, but it would be useful to note that newer versions are available, MERRA-2 and JRA-3Q. Citing a recent reanalysis intercomparison, such as from the S-RIP/A-RIP initiatives, would provide broader context on how these datasets compare in performance and scope.
Lines 70-73: The Lagrangian datasets by Sodemann et al. and Vázquez et al. are described as ‘available on request’, but it’s unclear how they can be accessed. If they are not openly available in a FAIR-compliant manner, it may be better to omit or clarify these statements to avoid misleading readers, especially compared to the open-access nature of LARA.
Line 74: The phrase ‘showing large decrease in particle distribution degradation in the troposphere over time’ is somewhat unclear. It would help to briefly explain what is meant by ‘particle distribution degradation’, such as whether it refers to deviations in air density, loss of particle ensemble representativeness, or numerical diffusion effects.
Line 78: The manuscript refers to the ‘full period of 1940–2024 ERA5 reanalysis’, but the title states 1940–2023. This discrepancy should be addressed for consistency, clarifying if the dataset includes data through March 2024, as mentioned later in the paper.
Line 82: Instead of providing just a web link, include a proper citation with a DOI for the LARA dataset to ensure long-term accessibility and facilitate referencing, in line with best practices for data publication. At the time of this review, the web link was not accessible, unfortunately. The web site showed 'Internal Server Error'.
Table 1: The additional notes in the ‘unit’ column are unclear. For example, it's not obvious what is meant by etadot having units of s⁻¹ but ‘internally: Pa s⁻¹’. If these notes are crucial, move them to footnotes beneath the table for better clarity without cluttering the unit column.
Line 93: The manuscript mentions using ERA5 data at 0.5° × 0.5° resolution, although ERA5 is (natively) available at 0.25° × 0.25°. Clarifying why the reduced resolution was chosen (e.g., due to computational or storage constraints) would help improve transparency regarding the dataset's accuracy.
Line 121: A brief explanation of the ‘domain-filling option’ would help readers unfamiliar with the concept. Clarifying that it initializes particles to uniformly represent the entire atmospheric mass and discussing its relevance to mixing and atmospheric transport would enhance clarity.
Lines 126–128: Since the data processing was split into overlapping streams, clarify if consistency checks were done across stream boundaries. It would be useful to mention whether users need to account for potential artifacts or discontinuities when working across different stream periods.
Lines 126–130: The explanation of overlapping periods between computational streams is unclear. The text first states ‘one full year for each period’, then mentions a ‘three-month overlap’, which is confusing. Please clarify whether the overlap is one year or three months, and explain how it ensures continuity in trajectory tracking.
Lines 132–136: While the impact of time step length on trajectory accuracy is discussed, the numerical integration scheme used in FLEXPART is not mentioned. Specifying the integration method (e.g., Runge-Kutta, Euler) would provide a more complete understanding of the model setup and accuracy.
Lines 139–141: The inclusion of compiler flags and OpenMP settings may be overly detailed for an ESSD paper. Unless these settings impact the results or reproducibility, consider omitting it for clarity and focus.
Lines 144–145: The dataset provides hourly means for several variables, but adding additional statistics like minimum, maximum, or standard deviation could be valuable, especially for applications involving extremes or uncertainty. If not included, note whether these could be added in future versions.
Lines 157–161: Include an estimate of the total size (e.g., in terabytes) of the full LARA dataset, especially since storage format and compression are discussed. This would help users understand data management requirements for downloading or processing large subsets.
Line 170: Perhaps briefly elaborate on improvements to the FLEXPART interpolation scheme in the latest version. Summarize enhancements to help readers understand their role in reducing potential errors.
Figure 1: The x-axis label refers to absolute differences, |ρ_part − ρ_ERA5|, but negative values suggest signed differences (ρ_part − ρ_ERA5) are shown. Update the label to reflect this. Also, suggest to change the y-axis label to ‘Geopotential height (m)’ for clarity.
Lines 193-194: The sentence about maximum deviation of 0.6% from latitudinal-averaged volume is unclear. Clarify what is meant by ‘latitudinal-averaged volume’ and how the 0.6% deviation is calculated.
Line 199: Suggest to rephrase ‘it is always better to take the air density from the _ERA5_ values interpolated to the particle positions’ for clarity.
Lines 201-202: The sentence about calculating mass using average air density within the source or receptor volume is unclear. Does this affect mass conservation? Should users always rely on air density or mass derived from ERA5 data at the particle locations, rather than using the mass of Lagrangian particles?
Figure 2: The figure seems somewhat overloaded with information. Consider splitting it into two figures for panels a and b or removing unnecessary curves/data to improve clarity.
Lines 263-266 and 273-276: The phrasing could be adjusted to avoid casting doubt on previous results. If only local/regional analysis is meaningful, why focus on global results in Fig. 2? A more confident presentation of the findings could strengthen this section.
Lines 299-300: Could convection parametrizations influence the results?
Lines 318-320: Why use a 1000 m proxy for boundary layer height, given that the actual boundary layer height is available in the LARA dataset?
Figure 5: The tracer conservation errors are averaged up to 20 km altitude, but tracers considered here vary exponentially with height, whereas particle numbers decrease exponentially with height. Is there any significant altitude dependence in tracer conservation errors? It could be insightful to compare conservation errors across different layers (e.g., lower, middle, upper troposphere, and lower stratosphere).
Line 367: Add '(not shown)' since this is not demonstrated in Fig. 5.
Lines 377-378: The improvement in consistency may be due to the broader onset of the modern satellite era, not just the TOVS instrument.
Technical corrections
Please ensure consistent capitalization and abbreviations for terms like 'Section' and 'Figure' throughout the text, following Copernicus manuscript guidelines.
Line 69: rephrase ‘ERAInterim’ as ‘ERA-Interim’
Line 83: please check and standardize the spelling of 'Zarr' data format throughout the manuscript
Line 133: 'deviate by approximately 20% less from ERA5 air densities' – Do you mean '20% or less'?
Line 141: Do you mean '12 eight-year periods' (considering the overlap)?
Table 2: Column header 'unit' -> 'Unit'. Fix unit 'K' for tropopause height.
Line 254: The acronym 'ERSSTV5' is not introduced.
Line 285: model used in _that_ study
Figure 5: suggest to replace '-23°N' by '23°S' in the caption
Lines 382-383: Please check the capitalization of 'Northern Hemisphere' and 'Southern Hemisphere' throughout the manuscript for consistency.
Citation: https://doi.org/10.5194/essd-2025-26-RC1
Model code and software
FLEXPART 11 L. Bakels et al. https://doi.org/10.5281/zenodo.12706632
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
194 | 27 | 5 | 226 | 21 | 6 | 7 |
- HTML: 194
- PDF: 27
- XML: 5
- Total: 226
- Supplement: 21
- BibTeX: 6
- EndNote: 7
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1