Multispecies and high spatiotemporal resolution database of vehicular emissions in Brazil

. In this article, we present the BRAzilian Vehicular Emissions inventory Software (BRAVES) database, a multi-species and high spatiotemporal resolution database of vehicular emissions in Brazil. We provide this database using spatial 10 disaggregation based on road density, temporal disaggregation using vehicular flow profiles, and chemical speciation based on local studies and the SPECIATE database from the United States Environmental Protection Agency. Our BRAVES database provides hourly and annual emissions of 41 gaseous and particle pollutants, where users can define the spatial resolution, which ranges from a coarse to a very refined scale. Spatial correlation analysis reveals that the BRAVES database reaches better performance than the vehicular emissions inventory from Emissions Database for Global Atmospheric Research (ED-15 GAR). A comparison with Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2) surface concentration confirms the consistency and reliability of the BRAVES database in representing the spatial pattern of vehicular emissions. Compared to EDGAR, the BRAVES database brings more spatial, temporal, and chemical details. These additional features are crucial to understanding important atmospheric chemistry processes in Brazil. All codes and inputs are freely available, and the outputs are compatible with the input requirements of sophisticated chemical transport models. We envision


Introduction
Vehicular emissions threaten urban air quality (Brito et al., 2018;Sawyer, 2010) and cause several environmental damages from local to global scales. These emissions deleteriously affect human health (Anenberg et al., 2017;Krzyzanowski et al., 2005) and contribute to the increase in the concentration of greenhouse gases in the atmosphere (Shindell et al., 2011;Unger et al., 2010).
It is challenging to control vehicular emissions in developing countries where city growth is disorganized and vehicle population increases dramatically (Lyu et al., 2020;Sun et al., 2020). Furthermore, vehicular emissions inventories, an essential tool to control air pollution, are scarce and limited to wealthy cities (Huneeus et al., 2020) and developed countries. When available, an emissions inventory often does not provide the required data to design air quality management systems.
Brazil has experienced a rapid rise in its vehicular fleet  and transport volume. Even though the program to control vehicular emissions has reduced the emissions from the transport sector in Brazil (Andrade et al., 2017), vehicles are still potentially the dominant source of air pollution in several municipalities. The impact of vehicular emissions in many Brazilian municipalities is still unknown (Ribeiro et al., 2021). Current inventories provide only annual emissions from national to municipality scales, not reaching the spatial and temporal resolution necessary for air quality modeling (Álamos et al., 2022), nor the emission of chemical species that participate in chemical reactions in the atmosphere. For this rea-son, most regional air quality assessments in Brazil rely on global emissions inventories, which have proved to be biased against local inventories. Also, global inventories do not present enough spatial and temporal resolution for regional and local studies (Ibarra-Espinosa et al., 2018). Even in the megacity of São Paulo, where the air quality network is well developed and multiple inventories have been devel-oped, there is still room for improvement in emissions inventories (Andrade et al., 2017), especially regarding chemical species involved in the photochemical process in the atmosphere.
In this article, we present the first comprehensive multispecies high-spatiotemporal-resolution database of vehicular emissions for the entire Brazilian territory. The BRAzilian Vehicular Emissions inventory Software (BRAVES) database has spatial disaggregation based on road density, temporal disaggregation using vehicular flow profiles, and chemical speciation from local studies and the United States Environmental Protection Agency (US EPA) SPECIATE 5.1 database. The BRAVES database provides hourly and annual emissions of 41 gaseous and particle pollutants. Users can define the spatial resolution from coarse to very refined scales. The emissions are derived from the BRAVES model (Vasques and Hoinaski, 2021), which uses a probabilistic approach that accounts for the fleet characteristics, fuel consumption, vehicle deterioration, and intensity of use, to calculate the vehicular emissions from the exhaust, tires, roads, brake wear, soil resuspension, refueling, and evaporative emissions. Here, we present methods and a comparison between the BRAVES database and independent databases. We also make all codes and inputs freely available.

Vehicular emissions data
Our database uses output data from BRAVES (Vasques and Hoinaski, 2021), which employ a probabilistic bottom-up method to estimate vehicular emissions aggregated by the municipality (Fig. 1). The BRAVES estimates the vehicular emissions from the exhaust, tires, roads, brake wear, soil (road dust) resuspension, refueling, and evaporative emissions. The software provides annual emissions of carbon monoxide (CO), carbon dioxide (CO 2 ), methane (CH 4 ), hydrocarbons (HC), aldehydes (RCHO), non-methane volatile organic compounds (NMVOC), nitrogen oxides (NO x ), particulate material (PM), nitrous oxide (N 2 O), and sulfur dioxide (SO 2 ) by fleet category (i.e., commercial light vehicles, motorcycles, light-duty, and heavy-duty vehicles). Throughout this article, we call the current database the BRAVES database. Codes and outputs from BRAVES are available by registering at https://hoinaski.prof.ufsc.br/BRAVES/ (last access: 7 June 2022) and https://github.com/leohoinaski/ BRAVES (last access: 7 June 2022), where users can access instructions to run the database and download the input files. The outputs are generated in the NetCDF format and with annual or hourly resolutions.

Spatial disaggregation
Since vehicular emissions from BRAVES are aggregated by municipalities, we use a road density approach to distribute the emissions of each municipality in pixels with user-defined resolution. Previous work by Tuia et al. (2007) and Gómez et al. (2018) shows that road density is one of the most reliable approaches to disaggregate vehicular emissions. In this article, the road density factor (RD p,m ) is calculated by the sum of road lengths (L p ) on each pixel (p) divided by the total road length (L m ) inside a municipality (m) (Eq. 1). Road shapefile data derived from Open-StreetMap (https://www.openstreetmap.org/#map=7/51.330/ 10.453, last access: 7 June 2022) can be downloaded at https: //download.geofabrik.de/south-america/brazil.html (last access: 7 June 2022) for Brazilian territory. Figure 2 shows the spatial distribution of RD p,m in Brazil. Multiplying RD p,m by the vehicular emissions in each municipality derived from BRAVES provides the spatialized emission (E p,m,c ) of compound (c) in pixel (p) within a municipality (m) (Eq. 2). We provide a parallelized method to estimate the road density in Brazil at https://github.com/leohoinaski/BRAVES (last access: 7 June 2022). Figure 3 shows the annual emissions of acetaldehyde, CO, and nitrogen dioxide from 2019 in Brazil by fleet category using spatial distribution based on the road density approach. Hotspots of vehicular emissions concentrate in urbanized areas of Brazil (Fig. 3). Among fleet categories, heavy-duty ve-

Temporal disaggregation
Temporal disaggregation based on traffic flow observations from the Environment and Water Resources Institute from Espírito Santo (IEMA ES, 2019) splits original annual emissions from BRAVES into hourly basis emissions. In this article, the temporal disaggregation factor (Fig. 4) is composed of hourly, weekly, and monthly traffic factors. Hourly emissions (E c,p,m,h ) of each air pollutant are obtained by the mul-tiplication of E p,m,c and the temporal disaggregation factor (T f ) (Eq. 3).

Chemical speciation
We use data from SPECIATE 5.1 (US EPA, 2020; Eyth et al., 2020) from US EPA (https: //www.epa.gov/air-emissions-modeling/speciate, last access: 11 March 2022) to speciate emissions of chemical constituents of volatile organic compounds (VOCs) and PM, which have not previously been estimated by BRAVES. The chemical speciation method also converts NO x to NO and NO 2 . Regarding the speciation procedures, in this article, we group light-duty and commercial light vehicles, and motorcycles as light vehicles. We select profiles to speciate PM emissions from the exhaust of heavy and light vehicles, soil resuspension (road dust), tire wear, and brake wear. The VOC emissions from the exhaust and evaporative process are also speciated. Table 6 in the Supplement summarizes the profiles from SPECIATE 5.1 used in the chemical speciation. We have targeted the species required for use in complex threedimensional atmospheric models (Yarwood et al., 2010a, b). Table 1 presents the VOC and PM compounds considered in the chemical speciation in this database.  (Nogueira et al., 2015). The ALDX has been considered as 10 % of RCHO emissions, while acetone (ACET) accounts for 8 % of these emissions.
We have also kept the original estimates using the local emissions factor for ethanol (ETOH), which has been the best way of representing the particularities of biofuels in Brazil. Since 2008, CETESB has provided the ETOH emission factors for light-duty vehicles running with ethanol and gasoline (CETESB, 2022). Since CETESB's RCHO and ETOH emissions factors are available only for light-duty and commercial light vehicles, we have used percentage factors from SPECI-ATE 5.1 to estimate RCHO and ETOH emissions from nonmethane hydrocarbons (NMHC) for motorcycles and heavyduty vehicles.
The Brazilian gasoline C, which has fueled light-duty vehicles, is a mixture of pure gasoline and 20 % to 25 % of anhydrous ethanol. Since 2008, heavy-duty vehicles have run with a blend of diesel and up to 15 % of biodiesel. This unique chemical signature of the biofuels in Brazil reflected significantly in the vehicular emissions, especially those of carbonyls and ethanol (Nogueira et al., 2015;CNPE, 2018). These last compounds deserve attention since they are major precursors of tropospheric ozone (Atkinson, 2000;Jacob, 2000).  The chemical speciation factors employed to split VOC and PM emissions are calculated by the average of the weighting percentage of the corresponding species from SPECIATE 5.1. We consider exhaust, evaporative, and particulate emissions of light and heavy vehicles. Figures 5 and  6 show the speciation factor used to generate the database. Multiplication factors of 0.495 and 0.505 derived from SPE-CIATE 5.1 convert NO x emissions to NO and NO 2 , respectively. The table at https://github.com/leohoinaski/BRAVES/ tree/main/ChemicalSpec (last access: 7 June 2022) summarizes the speciation factors used to build this database.

The database and codes
The database contains hourly emissions of 41 chemical species, such as ACET, ACROLEIN, ALD2, BENZ, BU-TADIENE13, CH 4 , CO, CO 2 , ETH, ETHA, ETHY, ETOH, FORM, ISO, N 2 O, NAPH, NO, NO 2 , PAL, PCA, PCL, PEC, PFE, PK, coarse mode primary PM (PMC), PMG, PMN, unspeciated PM 2.5 (PMOTHR), PNA, PNH 4 , PNO 3 , POC, PRPA, PSI, PSO 4 , PTI, SO 2 , TERP, TOL, VOC, and XYLMN. We provide a code to generate hourly resolved files with a user-defined grid for a single or whole group of species (https://github.com/leohoinaski/BRAVES, last access: 7 June 2022). These files are compatible with the input requirements of sophisticated chemical transport models, such as the Community Multiscale Air Quality Model (CMAQ), the Weather Research and Forecasting (WRF) model coupled with Chemistry (WRF-Chem), the Comprehensive Air Quality Model with Extensions (CMAx), and others. Flags have been included in the NetCDF files to provide the area and time zones of each pixel, so users can choose the option to generate ready-to-use hourly input files for CMAQ (in mass or mol per second) or WRF-Chem (in mass or mol flux per area).
Smaller domains and finer resolution can be easily created by modifying the python codes. Figure 7 shows the vehicular emissions of Benzene in Brazil on 1 January 2019 using the BRAVES database.

Comparison with independent databases
We analyze the spatial correlation and bias between the BRAVES database and the annual grid maps of the Emissions Database for Global Atmospheric Research (EDGAR -version 5.0 https://edgar.jrc.ec.europa.eu/dataset_ap50, last access: 18 February 2022) (Crippa et al., 2018(Crippa et al., , 2019. We performed the comparison using the "Road Transportation" emissions from EDGAR for the Brazilian territory, including soil resuspension emission rates of PM 10 from EDGAR. The BRAVES database emission rates in tons per year from 2015 were regridded to the same spatial resolution of EDGAR. The Spearman coefficient estimates the spatial correlation, while the difference in absolute emissions calculates the bias between the datasets. We compare the disaggregated emissions of CO, PM 10 , NO x , and COV from BRAVES and EDGAR. Table 7 in the Supplement also shows a comparison of the total vehicular emissions aggregate in Brazilian territory, considering BRAVES, EDGAR, and other available national inventories. Emissions from BRAVES and EDGAR present overall spatial correlation (p<0.05) of ρ = 0.35 for CO, ρ = 0.33 for PM 10 , ρ = 0.33 for NO x , and ρ = 0.35 for VOC (Fig. 9). Emissions from EDGAR are higher overall (Fig. 9) than emissions from BRAVES, as also reported by Vasques and Hoinaski (2021). The largest differences are observed in CO emissions, followed by VOC, NO x , and PM 10 . Madrazo et al. (2018) explain that most of the road transport emissions factors are overestimated in EDGAR, while Huneeus et al. (2020) found discrepancies between EDGAR and local/national city emissions. Álamos et al. (2022) also reported an overestimation of EDGAR emissions. Compared with the present database, we have observed higher estimates from EDGAR in pixels with low-road densities and less-populated areas, while in high population areas EDGAR has estimated lower values. A similar pattern has been also observed by Ibarra-Espinosa et al. (2018) when comparing EDGAR with the Vehicular Emissions Inventory (VEIN).
We analyze the spatial correlation between CO vehicular emissions estimated by BRAVES and EDGAR, and CO surface concentration estimated by the Modern-Era Retrospective Analysis for Research and Applications -MERRA-2. The Global Modeling and Assimilation Office (GMAO), managed by NASA, provides MERRA-2 reanalysis products in a spatial resolution of 0.5 × 0.625 • , covering from 1980 to the present (Gelaro et al., 2017;Randles et al., 2017). We calculate the annual average concentration in 2015 from monthly data in NetCDF files available at the GES-DISC platform (https://disc.gsfc.nasa. gov/datasets/M2TMNXCHM_5.12.4/summary, last access: 7 June 2022). The MERRA-2 hourly dataset used in Fig. 10 can be downloaded at https://disc.gsfc.nasa.gov/datasets/ M2T1NXCHM_5.12.4/summary (last access: 7 June 2022). All grids are realigned to match the MERRA-2 spatial resolution. We analyze the spatial correlation by Brazilian states since the vehicular emissions have more influence in urbanized ones. We assume that those cells which have vehicular emissions as the major source of air pollutants also have higher-surface concentrations. However, this assumption has several limitations and should be carefully evaluated since it does not account for the dispersion process and other source types (i.e., industrial, biomass burning, biogenic sources). Figure 10 shows the (a) CO concentrations from MERRA-2 in the São Paulo (SP) state, (b) vehicular emission of CO from EDGAR, and (c) vehicular emission of CO from the BRAVES database. In 2021, ∼ 31 million vehicles were registered in the SP state, being considered the state with the highest vehicular emissions in Brazil (SENATRAN, 2021;Vasques and Hoinaski, 2021). The BRAVES database and EDGAR reach a similar spatial correlation with MERRA-2 when using annual averages (Fig. 8 in the Supplement). The zoom-in quadrant in the SP metropolitan region in Fig. 10 reveals a greater level of details from the BRAVES database compared to EDGAR. In addition, BRAVES has higher-temporal resolution and chemical speciated emissions and has presented a better correlation with MERRA-2 when comparing hourly averages (Fig. 10). In Fig. 10, we have compared MERRA-2 and emissions on 1 January 2015 at 08:00 UTC, when the boundary layer is low and the concentrations are representative of the emissions, as shown by Gallardo et al. (2012). It is worth emphasizing that the straightforward comparison between emissions and concentrations from monitors or reanalysis data must be made carefully and under specific conditions.
In other Brazilian states, such as Minas Gerais (MG) and Rio Grande do Sul (RS), there is also a positive correlation between vehicle emissions and surface concentrations of CO ( Table 9 in the Supplement). It shows that both databases consistently capture the spatial variability of vehicular emissions, and the BRAVES database brings additional features for air quality studies in Brazil.

Data availability
The BRAVES database is freely available at https://doi.org/10.5281/zenodo.6588692 . We provide annual speciated emissions with 0.05 × 0.05 • of resolution covering the entire Brazilian territory.

Conclusions
Here, we introduce the BRAVES database, the first highresolution and chemical speciated database of vehicular emissions covering the entire Brazilian territory. The BRAVES database contains emissions of 41 air pollutants, from annual to hourly basis temporal resolution and userdefined spatial resolution. The attributes of this emission database are fully compatible with sophisticated air quality models. Moreover, the emissions of multiple chemical species presented here provide essential information to understand important atmospheric chemistry processes in Brazil. We also provide python scripts for users who want to create their custom gridded inventory.
Even though detailed emission inventories are required to control air pollution, vehicular emissions are scarce in most developing countries. So far, Brazil has lacked a comprehensive and easily accessible database of vehicular emissions, and creating gridded inventories in South America is urgently needed. This work contributes to overcoming this gap.
The spatial correlation analysis reveals that the BRAVES database agrees with the vehicular emissions from EDGAR, even though EDGAR emissions are consistently higher than those of BRAVES. We conclude that this database can be a better alternative to represent the spatial variability of vehicular emissions in Brazil. The BRAVES database has a similar performance representing the spatial pattern of vehicular emissions, with more spatial, temporal, and chemical details when compared with EDGAR. Moreover, the BRAVES database is in closer agreement with local and very detailed emissions inventories. A comparison with the MERRA-2 surface concentration confirms the consistency of the BRAVES database.
Even though the present database is a step forward for air pollution research in Brazil, there are several opportunities for expanding and improving this work. Most heavy-duty emissions occur on high flow and high-speed limit roads, such as expressways. Future versions could improve the spatial disaggregation in pixels containing roads with high traffic flow and high-speed traffic through the optimization of the disaggregation factors. Different criteria for light and heavy vehicles would also be needed. Moreover, the chemical speciation could include profiles to consider the Brazilian reality as biofuels, fleet motorization, and regionalized soil resuspension properties. Temporal variability would also be improved by regionalizing the profiles to account for the traffic flow in each location or by including monthly fuel consumption data. An evaluation using BRAVES database as input in air quality models would bring important information about the model's errors and representativeness. As reported by Nogueira et al. (2021), the emission factors from CETESB used in this work would require future corrections to better represent field measurements.
Author contributions. LH and TVV designed the methodology and developed the software. LH, TVV, CBR, and BM processed the data curation, formal analysis, and created the figures. LH, CBR, and BM prepared the original draft and revised the manuscript. LH is the project administrator and laboratory supervisor.
Competing interests. The contact author has declared that neither they nor their co-authors have any competing interests.