the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Link-based European road transport emissions for CAMS-REG v8.1 and a comparison to city inventories
Abstract. Spatially resolved estimates of road transport emissions are fundamental for tackling challenges of air pollution and greenhouse gas emissions. Emission estimates at 0.05° x 0.1° resolution are provided in the widely used CAMS-REG regional European emissions inventory. For the road transport sector, several improvement opportunities were identified: Firstly (1) an underestimation of ca. 35 % of NOx emissions in comparison to 8 independent urban inventories; secondly (2), artefacts in the spatial distribution in Eastern European non-EU countries; thirdly (3), the necessity of labour-intense downscaling methodologies to create high-resolution urban inventories from the fixed resolution dataset. To overcome these, emissions for all road links in the domain (n=59,710,490) were estimated using gap-filled activity data (AADT) from OpenStreetMap and OpenTransportMap. Gap filling was performed with random forest models trained on land-use and road information data. Model performance was R2: 0.63–0.74 and MAE(AADT): 1570–2028, with better performance for larger roads. Up-to-date emission factors were applied on road links using a novel maximum-speed–based classification. To generate the CAMS-REG v8.1 inventory, the resulting spatial distribution was used as a proxy map, together with national totals. The new dataset lowered the difference-to-city inventories to 19 % for absolute NOx emissions, and can be flexibly gridded to high resolutions. Median increase in urban emission share is 24 % compared to national totals, and non-EU cities see large increases (e.g. Istanbul, +197 %; St. Petersburg, +288 %) in attributed emissions due to the updated spatial distribution. Two case studies (London and Milan) show an increased spatial correlation, from R2 ≈ 0.3 using CAMS-REG v4.2 to R2 ≈ 0.6, with CAMS-REG v8.1 against the local inventory. Vector and gridded versions of the emission dataset and spatial distribution are available at https://doi.org/10.5281/zenodo.15688723 (Hohenberger et al. (2025)).
- Preprint
(3237 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 14 Jan 2026)
- RC1: 'Comment on essd-2025-428', Anonymous Referee #1, 15 Dec 2025 reply
-
RC2: 'Comment on essd-2025-428', Anonymous Referee #2, 18 Dec 2025
reply
General Feedback
This manuscript presents an original and valuable contribution by improving the spatialisation of NOx emissions from road transport for the CAMS-REG dataset. The article is well written, logically structured, and the newly generated dataset is both innovative and of high relevance. I would like to congratulate the authors on this achievement.
The authors clearly demonstrated the superiority of the v8.1 dataset relatively to v4.2 across several aspects. These include the higher-resolution spatialisation now enabled by the vector dataset, the notable improvements for non-EU countries, the reduced differences with local city inventories, and the enhanced representation of urban versus rural traffic patterns resulting from speed-based emission factors.
My main concerns relate to the reproducibility and long-term evolution of the methodology. The key input dataset for traffic volumes (OpenTransportMap) was released a decade ago and appears to be no longer maintained. The manuscript does not sufficiently address the temporal representativeness of this dataset, nor does it specify the reference year(s) of OTM. Given that traffic volumes can change significantly over time (particularly in urban areas), and that emission factors evolve rapidly due to the implementation of Euro Standards and fleet renewal, the lack of discussion on temporal aspects is a notable gap. I believe some validation with recent traffic measurements (for instance on the 8 case study cities) could strongly improve the confidence in the resulting dataset.
I have also raised questions regarding the method used to estimate and apply emission factors. The manuscript provides no sufficient information on the distribution of vehicle types across countries or road segments typically. As the data are presented per vehicle category, this omission is important and warrants deeper discussion.
Additionally, I suggested improvements to the readability of several figures, particularly regarding the harmonisation of colour scales and axis ranges. I also recommended some changes and pointed out inconsistencies to improve the readability of the manuscript and the dataset.
Overall, I believe the paper addresses an important topic and has the potential to make a valuable contribution. However, the manuscript currently contains potential errors and unclear explanations that substantially affect readability and interpretation. I therefore recommend major revision. Please accept my apologies for any potential misunderstandings or missed points on my part.
Specific comments
Is this methodology applied to other pollutants in the CAMS-REG-ANT Datasets ? Why was this study focused on NOx emissions ?
The manuscript needs clarification regarding the year to which the presented results refer. Specifically, this information is missing or unclear in:
- The captions for Figures 4, 7, 8, and 9.
- Lines 210-211
- Multiple mentions of the AADT data sourced from OTM.
- The dataset
Line 7: The "gap-filled activity data" originates from OpenTransportMap, not OpenStreetMap. Consider adding OpenStreetMap on Line 8, following the reference to "... road information data."
Line 7: “n=59,710,490” doesn’t match with Table 1 information on road links from OTM data (sum of first column). Could you explain the differences ? I would expect the final dataset to have more elements than OTM values since it was not available everywhere ?
Lines 23-24: I usually use EEA numbers (https://www.eea.europa.eu/en/topics/in-depth/air-pollution/air-pollutant-emissions-data-viewer-1990-2023) that seem quite different for 2022, do you have an idea to explain these differences ? (EEA32-scope): 20.6% for CO, 35.1% for NOx, 15.3% for PM10 and 13.7% for PM2.5
Line 96: How is the information on sidewalks used in the manuscript?
Line 189: Please elaborate on the differences, which are also referenced in Figure 1, between the pure bottom-up approach and the national inventories. Theoretically, these two should align closely, and clarifying the discrepancies would be highly beneficial for bottom-up emission modelers. Specifically, could you quantify the differences (e.g., as percentages) and indicate whether the bottom-up approach results is an underestimation or an overestimation?
Lines 189-190: What is the source of these national totals?
Line 244: Which year are these national total emissions referring to ? 2018 (from Figure 3)?
Line 333: “improved NOx emission allocation in 6 out of 7 cities”: I think you used 8 different cities' inventories (Figure 5, line 213) ?
Line 335: typo: “an approximately”
Lines 339-340: I would appreciate further clarification regarding the reported unavailability of OSM data. In my experience, the 'highway' and 'maxspeed' fields are typically well-provided. If data was indeed missing for certain road segments, please describe the methodology used to handle these gaps (only described for missing speed information line 163-164). Finally, I do not understand how the conversion from miles per hour to kilometers per hour could introduce significant errors.
Line 349: For future perspectives on integrating congestion and slower speed effects, maybe you could mention such projects: https://github.com/BlaiseKelly/google_speeds
Lines 358-359: Maybe you could mention this research article, which I believe is closely related to yours: https://doi.org/10.1016/j.atmosenv.2024.120719
Comments regarding the Activity Data and modeling
Line 9: It is difficult to interpret the two different scores (could be misinterpreted as a range) without looking at the methodology as it seems to be for just one “Model”. You could either group the data from Model 1 and 2 all together to present just one number, or you can explain that one score (Model 1) is for EU countries and the other (Model 2) is for non-EU countries with less predictors available.
Line 143: Consider clarifying the term "spatial gap filling" early on, specifying its use only for small roads, as its current appearance is imprecise until later description. Additionally, to avoid confusion, it is recommended to use a more specific name for this approach as your Machine Learning approach is also sometimes referred to as "gap-filling" (e.g., line 110, line 124, line 207).
Lines 154-156: The statement that "information on road class and speed limit was not available from OTM for non-EU Eastern European Countries" appears contradictory, given that you mention having this information, and more detailed, available from the OSM dataset (as stated on line 161 and in Table 3). Furthermore, based on Table 3, it seems OTM road classes were not used (or not useful) for the highway models, as all are categorized as "mainRoad". More generally, the features used in the Random Forest approach (including sidewalks, road surface, two different sources of road classes, land use, and density) are difficult to fully understand. I suggest including a table that summarizes the features employed, their source, and definition (specifying which were excluded for Model 2) to improve clarity.
Lines 173-177: The methodology for estimating low-class road segment traffic volume, despite being challenging, is acceptable as presented and validated against existing literature. However, the use of numerous parameters and associated hypotheses lacks justification. For example, the decision to remove 25% of the data to treat outliers may be excessive. Could you explain the reasons behind selecting these specific parameters? Was the goal, perhaps, to adjust them to achieve an approximate AADT of 50 for these road types?
Line 229-230: (related to lines 154-156) Model 2's lower performance (compared to Model 1) may require further discussion: is the absence of certain features the root cause, or are there other factors? This seems counter-intuitive, given the reliance on OSM speed limits and the fact that OTM road class information is captured by the highly influential OSM 'highway' type (line 238).
Line 238: From my understanding, Pulugurtha et al. (2021) applies methods on their coefficients resulting from ordinary least square (OLS) regression and geographically weighted regression (GWR) models. I’m not sure this directly applies to a Random Forest regression, and maybe further explanation is needed to explain how the feature importances are computed in this case.
Lines 354-357: This is perhaps my main concern. I think this point is very important as the OTM dataset was published a long time ago and you don’t provide information regarding the temporal scope for which the dataset was published. Traffic volumes might have experienced substantial changes since this period and especially in urban areas. Moreover, while CAMS has to maintain detailed emission inventories, relying on OTM seems to be a growing challenge in the long term as it appears to have expired (https://opentransportmap.info) with their data being made inaccessible (I couldn’t access it elsewhere).
I strongly agree with the need to obtain up-to-date measurement datasets. You could for instance use this recently published open traffic volume dataset from local measurements on European cities: https://www.nature.com/articles/s41597-025-05698-y. I believe this is also a valuable resource as it repertory additional different sources to collect traffic volumes. Could you consider comparing your modeled or OTM-based AADT values with recent measurements from this dataset ? The traffic measurements were map-matched to the OSM network so you might just use the OSMID key directly and I don’t think this would be too difficult. I believe this would really strengthen the results from your study if the activity data (which is at the core of the bottom-up inventory and spatial disaggregation) showed an agreement with recent data published by local authorities.
In the long term, you can also consider projects such as AVATAR (https://avatar.cerema.fr/cartographie) by the CEREMA in France which gathers near real time and historical traffic volumes on many locations (and not only cities).
Comments regarding the Emission Factors
Lines 132-133: Figure 1 mentioned that the emission factors are based on the COPERT model, maybe you could mention both sources (both in the figure and in the text) for consistency.
Lines 132-141: I feel like this is a really important subsection for the estimation of NOx emissions but it lacks some details to make it fully understandable and reusable. Please could you provide more details and maybe additional data to clearly present these methodological points ? See some examples below.
Do you have a reference for VERSIT+ data ? If I understood correctly the emission factors (in grams/vkm) for each speed (maybe add this information, as it is mentioned line 147) and vehicle category are from VERSIT+. It would be very useful to provide a dataset for the emission factors used as well (and maybe additional figures, even in supplementary).
Then, I’m wondering how the different emission factors (related to vehicle types) are applied to the total traffic volume (which doesn’t differentiate between vehicle types) ? Is the distribution of the vehicle types on the roads from COPERT data ?
If this is the case, what is the resolution (in addition to road types) of this data ? Do you have a typical vehicle fleet per country ? Per year ? Do you have different temperature effects on the emission factors for different countries ?
Why didn’t you use the COPERT emission factor values directly ? It seems highly related since your data is presented with different vehicle types which are similar to the one in COPERT (Buses / HDT / L-cat / LCV / passenger Cars). This could also be presented in the manuscript since the dataset is presented through these categories.
Line 160-161: “The maximum speed of each link was used to decide whether to use urban, rural or highway…” contradicts lines 188 and 308.
Line 180: The current description of using "an" emission factor (a single value) is confusing, as multiple values should be necessary to account for the varying "vehicle classes" present on a road segment. Again, I am unclear on the implementation: how are the different shares of vehicles, such as heavy-duty trucks versus light passenger cars, represented for a given road segment?
Lines 181: I suggest grouping subsections 2.2 and 2.3.3 as they refer to the Emission Factors calculation and use here.
I am confused by the distinction you draw here between "road class" and "environment (urban, rural or highway)". In section 2.2, you state that "Emission factors… are road-type specific, distinguishing between urban, rural, and highway conditions." Is "road class" distinct from the "environment" types mentioned in Section 2.2, or am I missing something?
Line 197: “emission factors are applied”: Please elaborate on the application of the emission factors from the previous version. Specifically, were the methodology and the underlying data the same as in previous versions? Providing this detail would help clarify whether the changes observed with v4.2 are solely due to differences in activity data methodology or if an adjustment to the emission factors also contributed to the overall effect.
Comments regarding the Figures
Figure 2: The caption says Model 2 was used in EU countries, but the caption of Table 3 and line 230 say otherwise. Please make it consistent.
Figure 3: Because country emissions have different absolute ranges, maybe it would be easier to show the evolution (in %) from v4.2 to v8.1 (with +/-%) If you have enough space you could add the new total (in kg/y) of the country on top of it too. I feel like this would help the reader see more clearly what changes occurred on national totals.
For countries which are truncated (like Russia), is it only a subtotal ? How did you scale the initial inventory for these cases ?
For Figure 3 the unit is kg, but for Figure 5 it’s kg/year. I believe kg/year is better.
Figure 4: While Panel a) and c) share a common colorscale, Panel b) has a different scale for both versions which prevents the reader from having an unbiased look to these figures. You could consider making one unique colorscale for all the maps here (ranging 0-0.2 for instance). You could consider using the same colormap as in Figure 7 and 8 too since it’s the same kind of unit (% of the map total). On the contrary, Figure 9 has a different unit but the same colormap.
Figure 5: The two panels have the same y-axis label but different ranges, which may confuse the reader: please standardize the range. The different ranges appear to be necessary due to large emissions, particularly from London and Paris. In the case of Paris, the high values are mainly attributable to the decision (line 216) to include the entire Île-de-France region, which encompasses numerous rural and low-density areas. Note that the AirParif inventory is also available specifically for the Paris municipality. Additionally, the current link to the inventory is expired, please consider using this one:
https://www.airparif.fr/surveiller-la-pollution/les-emissions# (3187 tNOx for the year 2018).
London is known to have an extensive area (contrary to Paris) so this might be less surprising (although Birmingham seems comparable).
Moreover, from the “cities_polygons.gpkg” file, we can observe that the Barcelona geometry seems to be overly simplistic: is it comparable to the local inventory ? Other geometries used also show simple square-like shapes, and I’m wondering how this is comparable to the others (Figure 5). For instance, part of the Munich of Krakow ring-road is not included in the geometry. The utilization of a standardized urban area boundary dataset is suggested as a more straightforward and academically correct approach.
Figure 7 and 8: To facilitate comparison between the maps, please ensure a consistent color scale is used, ideally combining them into a single colorbar.
Comments regarding the Dataset
The readme.txt file should contain more information regarding the naming of gridded files to simplify usage, as there seem to be 4 dimensions:
- road type: all / highway / urban / rural
- variable : emission : activity (vkm)
- vehicle types (COPERT types) + “sum”
- model: corine or speed-based
In the filenames: “commerical” → “commercial”
The two datasets (gridded and vector) are in different CRS which can lead to confusion.
For the gridded datasets, looking at the sum of emissions (6237 tNOx), the unit of the provided emission files appears to be inconsistent with the previously cited EEA numbers (2.5 - 3 MtNOx) and the CAMS-REG v8.1 dataset from ECCAD (2.84 MtNOx) . A discrepancy factor of approximately 455 might suggest that the gridded emissions might be expressed in kg NOX per day, which aligns with the daily temporal resolution of the 'vkm' variable in the vector dataset.
Similarly, the unit for "vkm" files needs clarification. Given the resulting value when summing everything (7.7 Million), it is unlikely that the result is in "daily vkm". For instance, Paris has a yearly traffic of approximately 5 Gvkm (Airparif). Maybe it is thousands of vkm ?
The large size of the vector dataset (over 40GB in a single file) may discourage potential users. Could the same segmentation approach used for the gridded file (96 files) be applied to the vector dataset to produce smaller, more manageable files (the current file has 48 columns)? Additionally, it would be beneficial to include the retained maximum speed for road segments in the data (see below).
Applying the city boundaries masks to the dataset I found significant differences between Figure 5 and your dataset for total yearly emissions in some cities, notably Moscow (2000 tNOx), St Petersburg (1349 tNOx), or Istanbul (8873 tNOx). Please verify the numbers are correct, I provided some code attached to explain how I obtained these numbers.
In an attempt to look at the emission factor values you used, I divided the (unscaled) emissions by the (converted to yearly basis) vkm columns. It appears the emission factors (in grams/vkm) are pretty high, notably for Passenger Cars as they exceed 1 gNOx/vkm, although most cars in the EU now have Euro Standards and should have lower emissions associated. Here the retained speed could also help the user understand these values. Please comment and add more detail on this, also for other vehicle categories.
This double analysis (vkm and emission factor) really helps to better understand what are the key differences between different modeling approaches. For instance, in Figure 5 one could say that you have close agreement with AirParif regarding the total emission in Île-de-France on 2019, but a closer look shows significant differences between the parameters (p.51-54): https://www.airparif.fr/sites/default/files/document_publication/Bilan_IDF_2022_0.pdf
NOx emissions (tons)
Activity (Gvkm)
Implied EF (gNOX/vkm)
This study
32594
25
1.30
AirParif
31720
~52
0.61
Data sets
Spatial distribution road transport emissions for CAMS-REG v8.1 Tilman Leo Hohenberger, Marya el Malki, Antoon Visschedijk, Marc Guevara, Martin Otto Paul Ramacher, Alessandro Marongiu, Guido Guiseppe Lanzani, Guiseppe Fossati, Anu Kousa, and Jeroen Kuenen https://zenodo.org/records/15688723
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 324 | 99 | 27 | 450 | 26 | 31 |
- HTML: 324
- PDF: 99
- XML: 27
- Total: 450
- BibTeX: 26
- EndNote: 31
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
The authors present a significant update to the CAMS-REG inventory, improving the modeling of traffic especially in areas where data are missing. The resulting dataset better aligns with regional inventories and will be of wide interest to the traffic engineering and air quality communities. I recommend publication after the following comments are addressed: