PAPILA dataset: a regional emission inventory of reactive gases for South America based on the combination of local and global information

The multidisciplinary project Prediction of Air Pollution in Latin America and the Caribbean (PAPILA) is dedicated to the development and implementation of an air quality analysis and forecasting system to assess pollution impacts on human health and economy. In this context, a comprehensive emission inventory for South America was developed on the basis of the existing data on the global dataset CAMS–GLOB–ANT v4.1 (developed by joining CEDS trends and EDGAR v4.3.2 historical data), enriching it with data derived from locally available emission inventories for Argentina, Chile and Colombia. This work 5 presents the results of the first joint effort of South American researchers and European colleagues to generate regional maps of emissions, together with a methodological approach to continue incorporating information into future versions of the dataset. This version of the PAPILA dataset includes CO, NOx, NMVOCs, NH3 and SO2 annual emissions from anthropogenic sources for the period 2014–2016, with a spatial resolution of 0.1o x 0.1o over a domain that covers 32o–120o W and 34o N–58o S. PAPILA dataset is presented as netCDF4 files and is available in an open access data repository under a CC-BY 4 license: 10 http://dx.doi.org/10.17632/btf2mz4fhf.3. A comparative assessment of PAPILA–CAMS datasets was carried out for (i) the South American region, (ii) the countries with local data (Argentina, Colombia and Chile), and (iii) downscaled emission maps for urban domains with different environmental and anthropogenic factors. Relevant differences were found both at country and urban levels for all the compounds analyzed. Among them, we found that when comparing PAPILA total emissions versus CAMS datasets at the national level, higher levels of NOx and considerably lower levels of the other species were 15 obtained for Argentina, higher levels of SO2 and lower levels of CO and NOx for Colombia, and considerably higher levels

These regional particularities have direct consequences not only on the level and chemical profiles of the pollutants dis-50 charged to the atmosphere, but also on the specific locations where these emissions occur and on the population exposed to their environmental and health effects. Assessing the impact of atmospheric emissions as well as designing mitigation strategies, require reliable atmospheric emissions inventories (AEIs), which include spatially disaggregated emissions covering the entire region of interest in a transparent and consistent way in terms of emission sources and estimation methodologies (Kuenen et al., 2014). There is a wide range of global AEIs covering SA for different species and periods that meet the mentioned re-55 quirements. Some of the AEIs worth mentioning include: the Emissions Database for Global Atmospheric Research (EDGAR) (Janssens-Maenhout et al., 2019;EDGAR, 2021), the Evaluating the Climate and Air Quality Impacts of Short-Lived Pollutants (ECLIPSE) (Stohl et al., 2015), the Community Emissions Data System (CEDS) (Hoesly et al., 2018), the integrated assessment model Greenhouse gas -Air pollution Interactions and Synergies (GAINS) (Klimont et al., 2017), or the Copernicus Atmosphere Monitoring Service datasets (CAMS) (Granier et al., 2019). 60 Across the region, government efforts on AEIs are mainly focused on GHGs in line with the international commitments under the United Nations Convention on Climate Change (UNFCCC). The regional community of GHG inventory compilers has grown remarkably in the last two decades and in many cases has helped to improve the collection of activity data including specific areas of national statistics systems. In parallel, research groups in SA have built inventories of ozone precursors and particles to be used as input data to air quality models. Links between several of these groups have recently been strengthened 65 by the creation of a regional initiative focused on the construction of inventories of species not covered by governments in their reports to the UNFCCC (Huneeus et al., 2017(Huneeus et al., , 2020a. Completeness in terms of species, represented sectors and time series is a strength of global AEIs while locally developed inventories seldom fully cover all three aspects. On the other hand, although the most current versions of global AEIs accurately reflect the emissions from sectors for which regional information is well documented in global statistics, they may miss 70 some specificity and accuracy associated with local practices and technologies that is often better represented in local AEIs (Huneeus et al., 2020a). From this, it is plausible to assume that better emission estimates would be obtained by enriching the comprehensive global AEIs with locally generated information. This mosaic approach is an idea that has been successfully applied in the framework of the Task Force on Hemispheric Transport of Air Pollution (HTAP), an international cooperative effort to improve the understanding of the intercontinental transport of air pollution across the Northern Hemisphere. In this 75 context, the HTAP_v2.2 air pollutant grid maps were developed combining available regional information within a complete global dataset (Janssens-Maenhout et al., 2015), and have been widely used even outside of HTAP.
This work presents what to our knowledge constitutes the first AEIs from anthropogenic sources covering the continental SA region, which combines local available information with a global database in a proper and rigorous way. For this purpose, the dataset CAMS-GLOB-ANT v4.1 (Granier et al., 2019), developed by joining CEDS trends and EDGAR v4.3.2 historical 80 data, was used as a basis (hereinafter CAMS dataset), enriching it with locally developed inventories available in the literature until 2019, and selecting those with national coverage and with availability of data for the period and species of interest. The dataset presented in this work, hereinafter called PAPILA, focuses on the group of species known as reactive gases, given their relevance in atmospheric chemistry as precursors of O 3 and PM 2.5 : carbon monoxide (CO), nitrogen oxides (NO x ), non-methane volatile organic compounds (NMVOCs), ammonia (NH 3 ) and sulfur dioxide (SO 2 ) (Sharma et al., 2017). Due to the 85 availability of data in the local AEIs and the completeness of the sectors represented, the 2014-2016 period was selected for this first version of the PAPILA dataset, including local information from the continental areas of Argentina (Puliafito et al., 2017;Castesana et al., 2018), Chile (Mazzeo et al., 2018;Gallardo et al., 2018) and Colombia (IDEAM, 2017). In addition, a comparison of the performance of both AEIs (PAPILA and CAMS) is presented using near-surface CO and NO x mixing ratios simulated by the Weather Research and Forecasting-Chemistry (WRF-Chem) (Grell et al., 2005) at a high spatial resolution 90 (3 km) against in situ observations made in Buenos Aires during February-March and August-September 2015.
This work was carried out within the framework of the Prediction of Air Pollution in Latin America and the Caribbean (PAPILA, 2020) and Emission Inventories in South America (EMISA, 2020) projects. PAPILA combines for the first time an ensemble of state-of-the-art models, high-resolution emission inventories, space observations and surface measurements to provide real time forecasts and analysis of regional air pollution in the Latin America and the Caribbean region. Thus, an 95 important aspect of the project is the development of appropriate and consistent surface emission inventories as input data for air quality models. EMISA initiative was created to lay the foundations for constructing robust and transparent inventories of the same set of species that have been consistently estimated across South American countries using the same methodological approach. Local information on emissions was gathered from the countries that participate in the EMISA project: Argentina, Brazil, Chile, Colombia and Peru. Relevant research groups in Brazil and Peru have developed emission inventories for different 100 cities (Policarpo et al., 2018;Dos Santos Lucon and Moutinho Dos Santos, 2005;Vivanco and Andrade, 2006;Romero et al., 2020;Dawidowski et al., 2014); however as far as we know they have not developed inventories covering the entire countries for the species included in this study. Since national territories are the common administrative entities that can be exchanged with the global inventory, the local information on emissions from these countries was not included in this first version of the combined dataset. However, this work is expected to be the starting point for the preparation of comprehensive emission 105 inventories in South America enriched with local information. For this purpose, we include a flow chart with the general methodology that we have applied in combining local information with a global dataset.
The paper is organized as follows. Section 2 describes, for each country, the approach and sources of information used to develop the PAPILA dataset and also discusses the application of this inventory in an air quality model. Section 3 provides the main differences of PAPILA and CAMS datasets for SA and other smaller domains and the results of the air quality simulations.

110
Section 4 provides a description of the data availability, and finally Section 5 presents the main conclusions of this work.

PAPILA dataset overview
The PAPILA dataset is a collection of CO, NO x , NMVOCs, NH 3 and SO 2 inventories of annual emissions from anthropogenic sources in South America for the period 2014-2016. The inventories are presented as netCDF4 files, one for each 115 species gridded with a spatial resolution of 0.1º x 0.1º covering the domain 32º W-120º W and 34º N-58º S. Each file contains 12 variables corresponding to the emissions in Tg y −1 from the following categories, which are organized and denominated 4 using the nomenclature given by CAMS: thermal power plants (ENE); residential and commercial combustion (RES); road transportation (TRO); non-road transportation (TNR); fugitive emissions (FEF); industries, including fuel consumption in manufacturing industries and construction, refineries, industrial processes and solvent and other products use, (IND); agricul-120 tural soils (AGS); agricultural livestock (AGL); domestic and international navigation (SHP); waste (including solid waste, wastewater and incineration) (SWD); and the sum of all categories (SUM). This grouping of categories was carried out following the CAMS sectoral disaggregation, except for the use of solvents, reported under IND in the PAPILA dataset. To be consistent with the base inventory (CAMS-GLOB-ANT v4.1) used for our mosaic inventory, aviation emissions were not included in this first version of the PAPILA dataset. Agricultural fires were removed to allow the use of the inventories together 125 with fire products such as GFEDv3 (van der Werf et al., 2010), FINN v1 (Wiedinmyer et al., 2011) and GFAS v1 (Kaiser et al., 2012), avoiding double counting of these fires. It is worth mentioning that by "sum of all categories" we refer to all those included in PAPILA, both for the presentation of our results and for comparative purposes with CAMS. A broader description of the activities contemplated under each category is presented in the Table A1 in the Annex A, together with the equivalences with the IPCC 1996 reporting code.

130
The PAPILA dataset (Castesana et al., 2021) combines surface emissions from the comprehensive CAMS dataset with local information of those countries that, at the time of development of this emission inventory, had emission estimates of the mentioned species and covering the entire national territory: Argentina, Chile and Colombia. That information was collected and assessed in terms of species, emission categories and spatial coverage, selecting the most appropriate and representative data for each country, as described in the following subsections and summarized in Figure 1.  Argentina, Chile and Colombia were assessed in terms of species availability. In addition, and as described in the next subsections, the transparency on the methodology applied in emission estimates, and the representativeness and completeness of emission categories were revised in line with the CAMS emission reporting system. For those species and/or categories with 140 absence of local data, CAMS inventory was used to fill the gaps.
One of the challenges of combining different local inventories to a common regional database is bringing them to a single, uniform and homogeneous grid. For this purpose, it was necessary to resolve all conflicts arising from cells shared by more than one country or coastal cells. To be consistent with the base inventory used in this work, this problem was solved using the country and continent masks applied by CAMS (CIESIN and CIAT, 2005), which are created at 0.1º resolution assigning a 145 unique country value for each cell.

Argentina
Spatially disaggregated emission inventories for all the species included in this work are available for Argentina. They cover all categories except SWD. Emissions from the following categories ENE, RES, TRO, TNR, FEF, IND and SHP were taken from the GEAA inventory (Puliafito et al., 2017) which consists of a high-resolution (0.025º × 0.025º) inventory of 2014 composition, fuel consumption by refueling stations, geographically distributed considering road maps by type and distance to the refueling stations; (iv) for off-road transportation, emissions from railways, with fuel consumption data, geographically 155 distributed with rail maps; (v) for fugitive emissions, including those from refining, storage, venting and flaring, and those from distribution of oil products and natural gas, annual data from national statistics, spatially distributed with the exact location of the facilities; (vi) for inland navigation (namely, domestic plus international navigation on the continental area of Argentina), fuel consumption spatially distributed with the geographical identification of the berths, routes and ports boundaries. The GEAA inventory has been updated for this work including emissions from IND, which were not covered in the published 160 version (Puliafito et al., 2017). These emissions include (i) those from fuel consumption and from production process itself for the main industries, disaggregated by fuel and spatially distributed with the precise location of each facility, and (ii) those from fuel consumption of small industries, whose consumption is known by activity and by district, and whose spatial disaggregation of emissions was carried out using the population density of each district as a proxy. We noted that a different allocation of fugitive emissions from the distribution of oil products and natural gas (mainly consisting of NMVOCs) exists between CAMS 165 and the Argentinean inventory: CAMS includes these emissions under the IND category (see Table A1, Annex A) while they are reported under FEF in the Argentinean inventory. This does not imply omission or double counting of emissions.
To construct the PAPILA inventory, this new version of GEAA was updated to 2015 and 2016 by applying CEDS trends by emission categories. Final emissions were adapted to a homogeneous grid of 0.1º x 0.1º, and combined with agricultural local inventories described below and with the CAMS information on emissions from SWD and from SHP outwards from the where E i is the emission amount of the species i, AD is the activity data, and EF i represents the emission factor of the species i related to that activity. Both the activity data and their spatial distribution were based on the previous work by Castesana et al. (2018Castesana et al. ( , 2020, while the emission factors were those suggested by the EMEP according to the level of detail described 180 for each activity in Table 1. These emissions have been estimated ad hoc to be included as part of the PAPILA dataset. Since they have not been published thus far, interested readers may find a more complete description of the results in Annex A, including resulting emissions from fertilizers, crop production and animal excreta (dairy and beef cattle, poultry, swine, sheep, goats, horses and other livestock). Consistent with the referenced studies cited above, emissions from managed excreta are reported as AGL, and those deposited in pasture during animal grazing are reported under AGS. Resulting inventories of 185 annual emissions from agricultural activities spatially disaggregated at district level were converted to grids with a 0.1º x 0.1º resolution.

Chile
Chilean annual emissions were taken from the CR2-MMA dataset (CR2-MMA, 2018), based on the works of Gallardo et al. (2018) and Mazzeo et al. (2018). This dataset is presented with a spatial resolution of 0.01º x 0.01º, and includes 2014 190 emissions of reactive gases, GHG and particles, reported under the following aggregation: industries (which includes emissions from energy), urban and non-urban road transportation (which only includes CO and NO x for reactive gases), residential consumption, and agricultural and forest fires. Emission from industrial sources corresponds to the compilation of self-declared estimates by each facility to the Chilean Repository of Emissions and Pollutants Transport. Neither the methodology nor the emission factors used to estimate these 195 emissions could be traced. For the particular case of SO 2 , the local methodology for the emission estimates is based on sulfur content in fuels and in mass-flow balances in copper production processes, which constitute the main SO 2 emitter activity in Chile (González-Rojas et al., 2021). For this reason, and assuming that the information on sulfur content handled locally is reliable, we have included the spatially distributed emissions as estimated in Chile in our dataset. For the rest of the species, we decided to exclusively adopt the spatial distribution and the share of the locally reported emissions and distributed the CAMS 200 estimates by weighing them on the CR2-MMA spatial distribution as follows: where E cell (i, j, k) are the emissions of species i and category j assigned to the cell grid k, N is the total number of grid cells covering the country, E local represents the emissions locally estimated and E CAMS the corresponding estimate from the global database.

205
Emission estimates from residential sources in the CR2-MMA dataset cover only firewood combustion for all species considered in our work (Mazzeo et al., 2018). According to local experts, one of the most relevant aspects of air quality in the coldest regions in southern Chile is the presence of CO, NMVOCs and particles from burning of firewood in households.
Although this practice is included under the residential category of global inventories, the corresponding estimated emissions do not seem to be consistent with the magnitude of the air pollution situation observed at the local level (Huneeus et al., 2020b). (2020a) found out that residential emissions were strongly overestimated in global databases and attributed this inaccuracy to the use of population density as a proxy for the spatial distribution. Residential emissions from firewood burning are less relevant in the more temperate northern areas where air pollution is mostly linked to emissions of SO 2 and particles from the metal industry (Huneeus et al., 2020b). From this and assuming that: (i) residential firewood burning is a predominant source 215 in the southern region, (ii) in central and northern regions this work improves the representation of the diversity of sources and local practices for the other fuel combustion categories, we have decided to replace the residential emissions of CAMS with those of the CR2-MMA, at the risk of underestimating residential emissions in the central and northern regions by omitting those from fuels other than firewood (see (section 3.1)).
Local estimates of CO and NO x emissions from urban and non-urban road transportation were aggregated and reported in and disaggregated using the spatial distribution of sources of the CAMS inventory as follows: where variables and indexes are those described in Eq. 2.
Although in this context the country reports CO, NO x and SO 2 emissions from SWD, CAMS reports them as zero. The latter precluded the spatial assignment of the locally estimated emissions, and for this reason it was decided to take the SWD 235 category from CAMS.

Comparison of local and global datasets
A spatial analysis was performed following a similar approach to Trombetti et al. (2018) in their work on spatial intercomparison of top-down emission inventories in European urban areas, in which the analysis was made in terms of normalized emission values by group of categories and for different urban domains in order to become independent of emission levels and 240 to show the relative contribution of a certain group of emission activities in different areas. Since in our work we are interested in comparing only two inventories without losing sight of the differences in terms of magnitude, we have adapted this approach by comparing normalized emissions by category and urban domain normalizing them with respect to those from the CAMS dataset, as shown in Eq. 4. In this way, we were able to compare both datasets in relative terms and without losing information on the shares of each group of categories and the differences in the emission levels of each dataset.
where E * i,J (d, area) and E i,J (d, k) are the normalized emissions and the emission levels, respectively, of the species i and group of categories J corresponding to the dataset d and the area (region, country or urban domain) covered by the total number N of cell grids k.
For this analysis, we grouped categories as ENE + IND, RES, TRO, and "Others", and applied the analysis to (i) SA  Table A3 and Figure A1 of the Annex A.

WRF-Chem simulations: case study in Buenos Aires
The performance of the PAPILA dataset in comparison with CAMS, can be assessed using both inventories as input data of a regional model, implemented in the whole domain where local data have been integrated into the global dataset. This vast region, that includes the tropical Andes in Colombia, the dry Andes in Southern Chile and the Argentinean plateau towards 265 the Atlantic coast, is characterized by diverse topographic features and vegetation patterns. In order to capture the differences in boundary layer process and surface energy budget in the whole area, a high-resolution model is needed, setup in each area where the main PAPILA/CAMS datasets changes have been made. As a first step of this verification exercise, we present here a study focused on Buenos Aires, using the Weather Research and Forecasting Chemistry regional model version 4.1.2 (WRF-Chem v4.1.2). This megacity is strongly influenced not only by mobile and residential sources, but also by the presence 270 of four big thermal power plants, an important industrial park and an international port. Simulations were conducted using the model over three nested domains with a highest horizontal resolution of 3 km centered in Buenos Aires. Two time periods were selected to cover summer (from 7 February to 5 March 2015) and winter (from 26 August to 17 September 2015) to assess the role of the emission estimates from the two inventories in the simulated air pollutant concentrations in the different seasons.

275
WRF-Chem is a fully-coupled online chemistry transport model that simultaneously predicts weather and atmospheric composition (Grell et al., 2005). The simulations were done over 3 nested domains. The lowest resolution domain (d01) has a grid size of 18 km x 18 km (51º W-78º W, 15º S-57º S), and the highest resolution domain (d03) has a 3 km x 3 km grid covering the metropolitan region of Buenos Aires and the surroundings. The coverage of the domains can be seen in Figure A2.
All the simulations conducted in this study were performed using a spin-up time of two weeks.

280
Lambert-conformal projections were used. The physical parameterizations adopted for the three domains were: a) The Thompson scheme (Thompson et al., 2008) for Microphysics, b) The Grell 3D scheme for cumulus parameterization, c) The Yonsei University scheme for boundary layer processes, d) The MM5 similarity scheme for surface processes , e) The RRTMG scheme to compute long and shortwave radiation. The chemistry in these simulations was modelled using the GOCART bulk aerosol scheme for aerosol phase chemistry along with RADM2 for gas-phase chemistry. The initial and boundary conditions 285 were taken from the NCEP Final Operational Global Analysis data (FNL), available at a resolution of 1 degree every 6 hours (NOAA, 2000).
The FINN Fire Database was used for fire emissions (Wiedinmyer et al., 2011), the MEGAN Biogenic database for biogenic emissions (Guenther et al., 2006) and sea salt emissions from GOCART were also included.
Reported annual emissions in 2015 from the two inventories were processed to produce hourly-resolved emissions at reso-290 lution of each of the domains. Since PAPILA dataset only includes reactive gases, for aerosol emissions, the ones of the CAMS simulation were used: EDGAR v4.3.2 Global emission database for PM 2.5 and PM 10 and CAMS v4.1 for OC, BC and VOCs.
However, the results presented in this article will only cover the species included in the PAPILA inventory.
For ENE, TRO and RES monthly emission patterns were defined to breakdown total annual emissions into monthly fluxes (see Fig. A2). Emissions from other categories were evenly distributed throughout the year. RES monthly cycle was established 295 from the reports on natural gas consumption reported in national statistics (ENARGAS, 2021) for residential and commercial activities in Buenos Aires. This profile shows a maximum during winter linked to the increase in residential heating. Similarly, TRO monthly cycle was defined from the total fossil fuel consumption from the road transport reported in the statistics for the entire country (Secretaría de Energía, 2021). For ENE, the same source of data was used to obtain monthly fossil fuel sales for thermal power plants in the Buenos Aires Province. Weekly cycles taken from PREP-CHEM (Freitas et al., 2011) were 300 applied to the resulting total monthly emission fluxes. The diurnal cycles were adapted from those reported by Wang et al. (2010), focusing on reproducing Buenos Aires traffic patterns observed in the two monitoring stations: Parque Centenario and Córdoba. With this approach, the best configuration obtained with the simulations includes three hourly emission patterns: one related to diesel vehicle emissions, defined using PM observations, and the other associated with gasoline car emissions in winter and in summer, defined with the measured concentrations of CO and NO x .

Model evaluation
Highest resolution model outputs using these two emission inventories were evaluated against CO and NO x ground-based observations from the available monitoring stations in Buenos Aires (See locations of the sites in Figure A2). Air pollutant data from the Environmental Protection Agency of Buenos Aires (APRA) include hourly measurements of NO x and CO in two sites, Cordoba (34.60º S, 58.39º W) and Parque Centenario (34.61º S, 58.44º W). Cordoba's site is located in a commercial 310 area with high vehicular flow and very low incidence of stationary sources while Parque Centenario is located in a residential area next to an arboreal space with medium vehicular flow and also very low incidence of stationary sources. As the air quality database is at hourly resolution, the model was also sampled at every hour.
The model evaluation was mainly focused on the effects of enriching the CAMS inventory with local inventories on the simulated air pollutant concentrations. For this purpose, median and percentiles for the entire period were evaluated. Also, 315 mean daily concentrations were calculated to inspect whether model performance of both inventories was consistent and satisfactory. Well-accepted statistical measures such as normalized mean bias (NMB), normalized mean gross error (NMGE) and the fraction of predictions within a factor of two (FAC2) were used (Wang et al., 2021). These statistical metrics were calculated using the following expressions: where S k and O k are the simulated and observed hourly average concentrations respectively, and N is the total number of observations. The model was sampled at each measuring location using grid interpolation and compared with the groundbased observations for the calculation of statistical performance metrics.  in CAMS correspond to the sum of the share of each group of categories, adding up to a total of 1.00. On the other hand, the shares of each category in PAPILA can be compared with those in CAMS and add up to a total greater or less than 1.00 according to the differences in the sum of all the categories indicated in Table 2.

335
The spatial distribution of 2015 annual emissions of CO, NO x , NMVOCs, NH 3 and SO 2 is shown in Figure 4 for the sum of all categories. In addition, this figure includes maps with the differences between PAPILA and CAMS datasets for each species, depicting the differences in terms of intensity and location of emission sources. For comparative purposes, emissions from agricultural fires have been subtracted from the sum of all categories in CAMS database.
In what follows results are presented firstly by species, highlighting the most relevant aspects of the 2015 emission (sec-340 tion 3.1). Then, surface concentrations of CO and NO x obtained from the use of PAPILA dataset as input information in a chemical transport model in Buenos Aires, compared to those obtained using CAMS are presented (section 3.2). The section ends with an analysis of the local aspects that may have generated the difference between both emission databases (section 3.3).

Local-global comparison by species
3.1.1 Carbon monoxide

345
Local estimates of CO for Argentina and Colombia presented lower CO annual emissions than those in CAMS, the largest differences occurred under the TRO category. Although the local estimates for TRO in Chile also showed significant smaller levels this difference was masked in the total CO national estimates by the larger emissions from the residential category, even  after having omitted CO emissions from fuel combustion other than firewood. Lower CO PAPILA emissions in Argentina (-39%) and Colombia (-54%) were compensated by larger PAPILA emissions in Chile (58%). According to CAMS dataset 350 these three countries are only responsible for 25% of the total SA emissions and hence the impact of the changes introduced in this work on total SA is very limited. This same situation occurs with the other species, for which it is found that these three countries are only responsible for 20-25% in the case of NO x , NMVOCs and NH 3 and 31% for SO 2 (Table 2). However, when analyzing the impact on these three countries together, and even when these countries compensate for each other, a difference with CO CAMS emissions of -19% is observed.

355
At the urban level, in Buenos Aires domain PAPILA emission estimates resulted 12% higher than those from CAMS, this difference was mainly associated with higher local emissions from TRO and ENE + IND even with lower emissions from RES. In the same way, Mendoza and B. Blanca exhibited lower CO total levels, mainly associated with differences in TRO and in less extent in RES. In B. Blanca, this difference masked the larger emissions by a factor of five in the local estimates of ENE + IND with respect to the global dataset. By downscaling B. Blanca urban domain we identified the absence of emissions 360 from shipping activities in the global inventory. While emissions from SHP within the continental area were estimated locally, offshore emissions were taken from CAMS, which reports zero emissions for this region. In this domain, emissions from navigation activities are a concern since its port activity is almost as relevant as that of the international port of the Buenos Aires city (Ports, 2021). Although the absence of this source was not reflected in this comparative analysis, it is relevant to point it out that it could lead to underestimation of surface concentrations when modelling air quality in the region.

365
In Santiago, local estimates of total emissions were almost 70% lower than global ones, this difference is attributable to the two locally estimated categories that were included in this work (TRO, RES) but also to ENE + IND, categories for which we used a combination of the CAMS emission estimates with national information of location and emission shares. In Antofagasta, although total emissions levels from both datasets were similar, there were substantial differences in the contributions by category: emissions from ENE + IND are almost seven times larger in PAPILA than in CAMS, and while RES and TRO 370 emissions are negligible in the local estimates, according to global estimates they contribute to almost 90% of the domain's emissions. On the contrary, in Osorno local estimates for the sum of all categories were seven times larger than those in CAMS, emissions coming almost entirely from residential firewood burning.

Nitrogen oxides
Local estimates of national NO x emissions for Chile and Colombia were lower by 13% and 7%, respectively, than those in 375 CAMS. In both countries, the main responsible for these differences was the TRO category and to a lesser extent the lower emissions of RES, which in the case of Chile were due in part to the omission of the burning of fossil fuels in this category.
In Colombia the difference was partially offset by considerably higher emissions from TNR. Local estimates for Argentina resulted in higher total NO x emissions (37%) with very different sector contributions to this difference. The contributions by category (from highest to lowest) were TRO, ENE, AGS and RES, partially offset by lower emissions from IND. All in all, 380 NO x estimated emissions with local data for the three countries together were 12% higher than those reported by CAMS.
As seen in Figure 3, all urban domains showed higher local emissions, except Antofagasta with a barely noticeable difference. in ENE + IND in Chilean urban domains are exclusively associated with the local information on the spatial distribution and shares of NO x emissions, and not with a local estimate of the magnitudes.

Non-methane volatile organic compounds
Local estimates of NMVOCs emissions for Argentina were 48% lower, this difference is mainly attributable to IND (which in this work includes solvent production and use). CAMS did not report NMVOCs emissions from agricultural activities 400 (neither livestock nor soils) for any country in South America, while the local estimates for Argentina showed that 11% of the NMVOCs emissions came from these activities. PAPILA estimates of RES emissions in Chile (the only category locally estimated) exceeded those of CAMS by more than an order of magnitude, which was reflected in a total emission level three times larger for the country. Although Argentina partially compensates for the difference introduced by Chile's local data, considering both countries jointly, the changes made resulted in larger emissions by 56%.

405
Local estimates showed important differences in total emissions for the three Argentinean domains (around 60-70% lower than CAMS), being IND the main contributor. Smaller emissions from FEF were observed in B. Blanca and Mendoza; while TRO contributed to these differences in the first domain and counteracted them in the second. In Buenos Aires emissions from FEF and TRO were considerably larger than those in CAMS. Even when estimates from RES in PAPILA were around 80% lower than those of CAMS they exhibited lesser impact on the differences between the two datasets and on the total emissions 410 in each domain.
Local estimates for Santiago and Antofagasta were significantly smaller (around 80%) than the global ones, the difference is mainly attributable to the adopted local information on locations and emission shares for ENE + IND. On the contrary, local estimates for Osorno showed emissions more than 24 times larger than those of CAMS, almost exclusively attributed to the incorporation of local information on firewood consumption in the RES category in cold areas of the country.

Ammonia
Similarly to NMVOCs, the only two countries with local data on NH 3 are Chile and Argentina. At the national level, the inclusion of local information is reflected in differences of -7% of NH 3 emissions in Chile (only attributable to RES) and -40% in Argentina, where smaller emissions from AGS were the main responsible for that difference partially offset by larger emissions from AGL. These two categories represent the main sources of NH 3 emissions in the country with a contribution of 420 72% from soils and 24% from livestock, according to the local estimates. Smaller emissions in local estimates of Argentina and Chile were reflected in a difference with CAMS of -30% of the emissions of both countries together.
Although the impact of emissions from urban domains on the total levels of each country was negligible (around 1%), big differences were found at the category level between the two datasets.

Sulfur dioxide
Local estimates of SO 2 emissions in Argentina were 60% lower than those by CAMS for the sum of all categories, being 435 IND, ENE and RES the main contributors to that difference (and SHP in a lesser extent), while TRO emissions were considerable higher (around eight times) in local estimates. For this country, these larger CAMS emissions were associated with the sulfur content adopted, mainly from solid fuels, since the national mineral coal has lower sulfur content (370 kg SO 2 Tj −1 ) than those imported (1100 kg SO 2 Tj −1 ), and because the national/imported ratio presented high variability between (TCN, 2015. Colombia showed larger emissions from ENE, IND and TRO, partially offset by lower emissions from 440 RES in the local estimates, and although negligible at the national level the emissions from FEF were significantly higher than in CAMS. As in Argentina, the sulfur content in the coal used was highly variable, due to the different sulfur levels that the country's coal fields present. In the same way as Colombia, Chile showed larger emissions from the sum of all categories as a consequence of the inclusion of local data in ENE + IND, differences mainly related with sulfur emissions from the relevant copper mining activities that take place along the country, which were not fully covered by CAMS. The lower TRO emissions 445 reported by CAMS for Argentina and Colombia seem to be related in part to the methodology used for projections, that assumes a sustained reduction in sulfur content from 2012 to 2015. Nevertheless, this reduction did not occur in any of the countries: while Colombia introduced prior to 2012 strong restrictions to fuel quality, in Argentina these restrictions for the fuels used by heavy duty trucks (the main emitters) did not take place. Although the differences introduced by the local data for Chile and Colombia are partially offset by Argentina, all these together result in larger emissions by 12%.

450
Local estimates in Buenos Aires showed smaller emissions by 67%, mainly associated with lower emissions from IND and RES (80-90%), the latter with less impact on the totals. This situation may be related to the fact that the proportion of sulfur emitting industries in Buenos Aires is lower than in the rest of the country. Also with little impact, and offsetting these aforementioned differences, increases were observed in estimates from thermal power plants, inland navigation and transportation (TRO and TRN). Both B. Blanca and Mendoza showed smaller emissions by 77%, mainly attributable to ENE 455 in the first case and IND and RES in the second, where at the same time an increment in emissions from ENE was observed.
Although the contribution of the TRO to the total emissions was minor in the urban domains of the country, the larger emissions estimated locally with respect to CAMS is particularly noticeable.
Santiago showed a difference in local estimates of around -62% mainly attributable to ENE + IND, while the result of having included local estimates of emissions from these categories in Antofagasta was reflected in total levels of SO 2 almost three 460 times larger than in CAMS. In these two domains, local estimates attributed to ENE + IND a contribution of more than 99% of total emissions. In Osorno, although local emission estimates for RES were lower than CAMS, total emissions in the domain resulted 16% larger than in global estimates as a consequence of the difference in emissions from ENE + IND. Table 3 summarizes the overall model performance of PAPILA and CAMS-based results for hourly CO and NO x con-465 centrations for Buenos Aires case study. For the winter period, PAPILA-based results had lower normalized mean error than CAMS-based results; the negative bias was larger for the CAMS emission run, exceeding in all cases more than 12% for both CO and NO x except for NO x in Parque Centenario. FAC2 was also better in PAPILA simulation. Differences in the concentrations resulting from both runs were consistent with those exhibited between the inventories. In terms of CO emissions, PAPILA dataset emissions were 12% higher than CAMS being road transportation the main responsible followed by industry were not as conclusive as for winter simulations. NMB was still negative with CAMS emissions, consistent with winter period results and better FAC2 for PAPILA's run. Therefore, this highlights the importance of having accurate inventories especially for winter when the highest emissions and worst dispersion conditions occur.  also highlight that Cordoba monitoring station is located in a corner with high traffic flow and the sampling is at 2 m high.

Case study: model evaluation and results
Therefore, the measurements may be capturing an overestimation of the real average activity in the area.
Scatter plots of daily mean concentrations using the PAPILA inventory ( Figure 6) depicted a good agreement between observations and model results, in winter more than in summer for CO and the other way around for NO x . Concentrations 485 estimated with the CAMS inventory tend to be underestimated in all cases.
All in all these results show a better agreement between observations and simulations using PAPILA than CAMS, to represent surface concentrations of CO and NO x in Buenos Aires. However, these emission improvements do not fully explain the underestimations of the model especially for CO concentrations with respect to measured data.

PAPILA-CAMS main differences 490
Our results show relevant differences between the PAPILA and CAMS datasets, both in terms of emission levels and their spatial distribution. They also served to exemplify the quality of the PAPILA-based simulated surface mixing ratios in the city where concentrations were analyzed.
The reasons behind the observed differences are diverse and are mainly linked to the activity data and to the methodologies applied to estimate and spatially distribute the emissions. Given the limited availability of local data, the emission factors for and for the Chilean inventory the same methodology than the used by CAMS was applied, but based on local estimates for 2014. In this way, local inventories are not only based on region-specific information, but also the extrapolation of a shorter time period reduces the uncertainties associated with the activity data strongly linked to short-term variations and technological changes.
For those activities related to fuel consumption, local inventories used the information reported in the national energy balance 505 and other national energy statistics while global inventories are based on the information reported by the countries to the International Energy Agency (IEA), which is consistent but not exactly the same as that reported in the national statistics since the IEA processes the information received (IEA, 2020). Moreover, although these statistics adequately represent the national energy balances, it is worth pointing out the lack of specificity in terms of spatial disaggregation. A relevant aspect of the fuel consumption patterns for power generation in the three countries analyzed, is their inter-annual and inter-regional 510 variation, which in turn are strongly correlated with the water availability for hydropower generation, not captured by the extrapolations. In addition, in order that electricity supply matches demand, some short-term technological changes are often used. For example, the incorporation of diesel-fuelled generators located in different urban centers in Argentina such as Buenos Aires in 2014 (CAMESA, 2021). Although the diesel consumed in these generators was reflected in international statistics, they did not distinguish between the gas oil used for this purpose and that used by combined gas cycles and could not reflect 515 their location and operating regimes.
Another relevant aspect of national and therefore international statistics is the lack of reliable information on firewood consumption, widely used in rural areas of SA and even in some urban areas, such as the cold regions of Chile. This fact also impacts on the correct representation of the replacement of firewood by LPG or Natural Gas that took place in Argentina in the last decade, due to higher production of non-conventional shale gas in Vaca Muerta basin (El Pais, 2015) and the resulting 520 reduction in fossil fuel prices. Additional differences between local and global datasets are related with different resolutions of the population distribution maps used as proxies for the spatial distribution of the emissions from some categories. Local inventories use population density information based on higher resolution maps than those used by the global ones. This is clearly noticeable not only in the local-global differences found by downscaling urban domains, but also in the spatial coverage of RES emissions in global inventories, where emissions are assigned even to large non-populated areas, such as 525 the Amazon rainforest or some desert areas of the region (Figure 4). It is also worth mentioning that, unlike local inventories, CAMS treats countries uniformly without correcting for climatic zones, which vary widely within many of the SA countries.
Broader discussions on emissions from RES are given by Puliafito et al. (2021) and Álamos et al. (this issue) for Argentina and Chile, respectively.
For non-combustion sources, as many industrial processes, population estimates are used as drivers for the CAMS projec-530 tions (based on CEDS trends) (Hoesly et al., 2018). This approach may not be the most appropriate for many countries in the region, where changes in economic policies and even the occurrence of economic crises are frequent, affecting not only consumption patterns but also the relocation of activities dependent on regional economies. Added to this, substantial differences have been observed in terms of the location of the IND sources in both inventories, probably attributable to the drivers used by the global databases for this purpose, with a very low presence of sources throughout the Argentine territory in CAMS, 535 and a striking abundance of sources distributed over the central and northern region in Chile, which however do not reflect the heterogeneous share of emissions shown by the local distribution.
Although the representativeness of agricultural emissions in global databases continues to improve over time, there are many aspects that global methodologies have so far failed to replicate. In this sense, the incorporation of local information has given the inventories a greater capacity to reflect the specific agricultural practices of the region, such as the predominant use of 540 grazing for cattle farming, the use of large proportions of urea for the fertilization of crops, as well as economic, natural and technological changes that have occurred during the last decades (Castesana et al., 2018(Castesana et al., , 2020. Lastly, and with the aim of contributing some aspects that may improve both global inventories and the dataset presented  This work not only highlights the strengths and weaknesses of global inventories, but also those of local ones. Although the latter improve the representativeness of the estimates, the groups that generate information on emissions in the region do not necessarily have the same objectives: some are mainly oriented to the generation of input information for models, others towards mitigation measures that respond to air pollution concerns of their region. In this sense, it is worth mentioning that 600 although the resources in the region often limit their growth, the capacities of the groups are growing, which is partly reflected in the development of local databases published in this same special issue Álamos et al., this issue;Osses et al., this issue).
In addition to individual advances, we want to emphasize the role of the PAPILA project and the EMISA initiative, which promote collaboration between groups in the region, enhancing efforts aimed at the development of appropriate and consistent 605 surface emission inventories. In this context, we trust that this work will be a starting point for the development of comprehensive emission inventories for South America enriched with local information. To this end, the first step will be to join the efforts of other countries in this endeavor, encouraging those with inventory capabilities to broaden their focus beyond cities by building national emission maps.

615
Competing interests. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements. This work was conducted within the framework of the Prediction of Air Pollution in Latin America and the Caribbean