Articles | Volume 15, issue 1
Earth Syst. Sci. Data, 15, 189–209, 2023

Special issue: Benchmark datasets and machine learning algorithms for Earth...

Earth Syst. Sci. Data, 15, 189–209, 2023
Data description paper
10 Jan 2023
Data description paper | 10 Jan 2023

A machine learning approach to address air quality changes during the COVID-19 lockdown in Buenos Aires, Argentina

A machine learning approach to address air quality changes during the COVID-19 lockdown in Buenos Aires, Argentina
Melisa Diaz Resquin1,2,3,, Pablo Lichtig1,4, Diego Alessandrello1, Marcelo De Oto1, Darío Gómez1,2, Cristina Rössler1,5, Paula Castesana1,4,6, and Laura Dawidowski1,5, Melisa Diaz Resquin et al.
  • 1Comisión Nacional de Energía Atómica, Gerencia Química, Buenos Aires, Argentina
  • 2Facultad de Ingeniería, Universidad de Buenos Aires, Buenos Aires, Argentina
  • 3Modeling and Observing Systems, Center for Climate and Resilience Research (CR), Santiago, Chile
  • 4Comisión de Ambiente, Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
  • 5Instituto de Investigación e Ingeniería Ambiental, Universidad Nacional de San Martín, Buenos Aires, Argentina
  • 6Misión Ambiente, YPF Tecnología S. A. (Y-TEC), Buenos Aires, Argentina
  • These authors contributed equally to this work.

Correspondence: Melisa Diaz Resquin ( and Laura Dawidowski (


Having a prediction model for air quality at a low computational cost can be useful for research, forecasting, regulatory, and monitoring applications. This is of particular importance for Latin America, where rapid urbanization has imposed increasing stress on the air quality of almost all cities. In recent years, machine learning techniques have been increasingly accepted as a useful tool for air quality forecasting. Out of these, random forest has proven to be an approach that is both well-performing and computationally efficient while still providing key components reflecting the nonlinear relationships among emissions, chemical reactions, and meteorological effects. In this work, we employed the random forest methodology to build and test a forecasting model for the city of Buenos Aires. We used this model to study the deep decline in most pollutants during the lockdown imposed by the COVID-19 (COronaVIrus Disease 2019) pandemic by analyzing the effects of the change in emissions, while taking into account the changes in the meteorology, using two different approaches. First, we built random forest models trained with the data from before the beginning of the lockdown periods. We used the data to make predictions of the business-as-usual scenario during the lockdown periods and estimated the changes in concentrations by comparing the model results with the observations. This allowed us to assess the combined effects of the particular weather conditions and the reduction in emissions during the period when restrictions were in place. Second, we used random forest with meteorological normalization to compare the observational data from the lockdown periods with the data from the same dates in 2019, thus decoupling the effects of the meteorology from short-term emission changes. This allowed us to analyze the general effect that restrictions similar to those imposed during the pandemic could have on pollutant concentrations, and this information could be useful to design mitigation strategies.

The results during testing showed that the model captured the observed hourly variations and the diurnal cycles of these pollutants with a normalized mean bias of less than 6 % and Pearson correlation coefficients of the diurnal variations between 0.64 and 0.91 for all the pollutants considered. Based on the random forest results, we estimated that the lockdown implied relative changes in concentration of up to −45 % for CO, −75 % for NO, −46 % for NO2, −12 % for SO2, and −33 % for PM10 during the strictest mobility restrictions. O3 had a positive relative change in concentration (up to an 80 %) that is consistent with the response in a volatile-organic-compound-limited chemical regime to the decline in NOx emissions. The relative changes estimated using the meteorological normalization technique show mostly smaller changes than those obtained by the random forest predictive model. The relative changes were up to −26 % for CO, up to −47 % for NO, −36 % for NO2, −20 % for PM10, and up to 27 % for O3. SO2 is the only species that had a larger relative change when the meteorology was normalized (up to 20 %). This points out the need for accounting not only for differences in emissions but also in meteorological variables in order to evaluate the lockdown effects on air quality. The findings of this study may be valuable for formulating emission control strategies that do not disregard their implication on secondary pollutants. We believe that the model itself can also be a valuable contribution to a forecasting system in the city and that the general methodology could also be easily applied to other Latin American cities as well. We also provide the first O3 and SO2 observational dataset in more that a decade for a residential area in Buenos Aires, and it is openly available at (Diaz Resquin et al.2021).

1 Introduction

In recent times, machine learning has been proven to be an efficient approach for air quality prediction by relying on historical data to estimate the temporal variability in different pollutants for a specific site at a low computational cost. Also, this kind of model has the ability to unravel underlying patterns in data and deal with complex interactions among predictive variables (Stafoggia et al.2020).

During the last decade, random forest (RF) arose as a new method for the prediction of mean values of atmospheric pollutants (Yu et al.2016; Feng et al.2019; Jiang and Riley2015). This is a supervised machine learning method, consisting of applying multiple tree classifiers created at random, using bagging (i.e., selecting samples stochastically to create new datasets from which every classification tree is created). RF requires a short training time and can provide reliable information on air quality, with a strong anti-overfitting ability (Liu et al.2021). Many data science programming languages have libraries in which random forest is already efficiently implemented (e.g., scikit-learn in Python or randomForest in R). Random forest is faster and cheaper than other available models, such as regional chemical transport models (CTMs), and, in terms of computation costs, it needs fewer input variables and is a useful method when information on air pollutant concentrations at a particular site is needed. According to Masih (2019), machine learning techniques may even provide better forecasting than CTMs, and, out of the different existing algorithms, random forest seems to stand out due to its simplicity and the quality of its results, which can account for nonlinear relationships between emissions, chemical reactions, and meteorological effects. With respect to complex reactive species, the random forest method has also been successfully used to assess O3 levels. For example, Zhan et al. (2018) satisfactorily applied the random forest method to predict spatiotemporal variability in daily O3 concentrations across China using information on meteorology, elevation, and emission inventories. One of the most recent applications of machine learning methods has been aimed at elucidating the interconnection among the COVID-19 pandemic lockdown measures, human mobility, and air quality (Rahman et al.2021; Velders et al.2021; Yang et al.2021).

The outbreak of the COVID-19 pandemic at the end of 2019, with its devastating consequences in terms of loss of life and the economic impact, has caused many governments around the world to impose different degrees of lockdown. For atmospheric scientists, it has also provided a unique opportunity to examine changes in air pollution under decreased emission levels, in what Gaubert et al. (2021) called an unintentional worldwide experiment. Many studies have, in general, identified significant decreases in most pollutants, except for O3, under the stay-at-home orders imposed in attempt to curb the spread of COVID-19 (Muhammad et al.2020; Faridi et al.2021; Srivastava2021; Grange et al.2021; Yang et al.2021). These drastic changes in anthropogenic emissions are of major interest to enhance our understanding of the chemistry related to air quality, particularly when the behavior of secondary pollutants, like ozone (O3) or components of particulate matter (PM), is explored (Gaubert et al.2021). O3, in particular, has a complex behavior depending on multiple factors. Nitrogen monoxide (NO) and nitrogen dioxide (NO2) conform as NOx that, together with volatile organic compounds (VOCs), plays a vital role in the O3 formation process, and its production can be either VOC limited or NOx limited (Shi and Brasseur2020; Liu et al.2021; Li et al.2019).

An early approach to analyzing the changes in air quality due to the implementation of specific control measures was to comparatively assess the concentrations during the lockdown with concentrations from the same period during the previous year or the mean value of a period of 5 years using exclusively ground-based or satellite observations. However, the degree to which the COVID-19 lockdown influenced air quality is not only a function of emissions but also of both meteorology and physical and chemical atmospheric transformations (Kroll et al.2020; Le et al.2020). Consequently, pure statistical tests or observational comparisons might be inadequate in providing a complete understanding of what influences pollutant concentrations, since weather conditions, particle persistence, transport, radiation, and seasonality affect concentrations by linear and nonlinear processes (Šimić et al.2020). In this work, this challenge has been addressed using two different but complementary approaches. The first one consists of using a model to simulate a hypothetical scenario in which the restrictions were not implemented, which we did using the random forest (RF) algorithm, as previously done by Velders et al. (2021). The second one consists of a random-forest-based normalization of the meteorological variables, which makes it possible to decouple the emission changes (Shi et al.2021; Grange and Carslaw2019; Vu et al.2019).

The goals of this study were (i) to provide novel air quality data for the metropolitan area of Buenos Aires (MABA), Argentina, including the first O3 and SO2 observational datasets in a residential area in more than a decade, (ii) to explore the performance of the random forest method in predicting the air quality situation at two monitoring sites in the MABA, (iii) to apply this methodology to estimate the changes in air pollutant concentrations under the COVID-19 control measures, and (iv) to assess the effect of the reduction in emissions by normalizing the meteorological variables. We implemented the RF algorithm to estimate the concentrations of CO, NO2, NO, sulfur dioxide (SO2), O3, and particles with an aerodynamic diameter less than or equal to 10 µm3 (PM10) using meteorological and air quality observations in addition to the local diurnal variation in emissions as explanatory variables. Trained with data acquired in 2019 and 2020 before the start of the pandemic with the variables available for this city, the RF method can only predict concentrations under a business-as-usual (BAU) scenario. We then compared these BAU estimations with the observations during two distinct lockdown phases. We also used a random forest normalization (RFN) technique to decouple the effects of the meteorology over the concentration of the pollutants by normalizing the meteorological variables based on Shi et al. (2021). We compared them with the normalized observations for the same period in the previous year, allowing us to assess the effect of reducing the emissions independently of the particular meteorological situation that occurred during the specific periods analyzed. In addition, we studied the responses of O3 to the reduction in emissions of its precursors (NOx and VOCs) because of its relevance regarding emission control and health effects.

The remainder of this paper is structured as follows. Section 2 provides a description of the studied area, the different lockdown phases, the air quality and meteorological data, and the structure of the random forest models used to estimate the relative changes (RCs) during the lockdown. The analysis of the model performance and the analysis of the impact due to the emission reductions are given in Sect. 3. Section 4 provides a description of the data and code availability. Finally, Sect. 5 presents a summary and the main conclusions of this work.

2 Material and methods

2.1 Description of the studied area

The MABA comprises the autonomous city of Buenos Aires (ACBA) and 40 surrounding districts of greater Buenos Aires (GBA). Located along the western coast of the Río de la Plata estuary, on a flat plain, the MABA is the third-biggest megalopolis in Latin America and the Caribbean. It has a population of approximately 13×106, with a heterogeneous population density in the range of 14 000–20 000 inhabitants per kilometer squared. Its active fleet reached 5.4×106 vehicles by 2019 (Anapolsky2020).

In terms of anthropogenic air pollutant emissions, road transportation is clearly the largest contributor of CO, VOCs, and PM in the area. The MABA is also affected by the emissions from residential, commercial, and institutional buildings, mainly based on natural gas consumption, and from three power plants located near the shoreline of the La Plata River, which mainly burn natural gas and, to a lesser extent, gas oil and fuel oil. Under these circumstances, NOx is emitted by stationary and mobile sources in a similar amount (Castesana et al.2022). Since most of Buenos Aires' vehicle fleet uses low-sulfur fuel, the majority of the SO2 emissions are due to heavy-duty diesel engines used by ships, trucks, and, occasionally, small electricity generators.

2.2 Description of the lockdown for the MABA

Argentina's national government established different lockdown phases for the duration of the pandemic (Decree 297/20202020). Since 80 % of Argentina's COVID-19 cases were concentrated in the MABA, some policies that were applied to the MABA region differed from those applied to the rest of the country. Starting on 20 March 2020, strict measures were imposed to avoid a sharp increase in COVID-19 cases, emphasizing that the population should stay at home and avoid any social contact. All non-essential stores, including toy, furniture, and clothing stores, were closed until 11 May. Table 1 provides a summary of the restrictions set for the MABA during each phase. Under severely restricted mobility, public transport and passenger car circulation decreased drastically. Local mobility dropped by 80 % during the intense lockdown phase and 65 % during the flexible lockdown phase until the end of May (Aktay et al.2020). It is worth noting that, before the COVID-19 pandemic, 1×106 vehicles entered the city of Buenos Aires from the suburbs per day.

Considering the different degrees of the restrictions imposed, we evaluated the impact of the lockdown on air quality according to two distinct periods. The first period, from 20 March to 12 April 2020, corresponded to the most restrictive lockdown (LD). The second period, from 13 April to 25 May, was denoted a partial lockdown (PLD) because some restrictions were lifted. The period of 1–15 March 2020, before the start of the first lockdown, was defined as BLD (before LD) and was used to evaluate the model. As of 16 March, flexible restrictions started but were optional; therefore, the period 16–19 March was not considered in our research.

Because combustion is the main air pollution source in the area, the significant decrease in traffic flow imposed by the lockdown led necessarily to a decrease in the emissions of traffic-related pollutants (D'Angiola et al.2010; Puliafito et al.2017; Diaz Resquin et al.2018; Castesana et al.2022).

Table 1Description of the lockdown phases in MABA. NU is for not used (i.e., not included in the model).

Download Print Version | Download XLSX

2.3 Meteorological description

The atmospheric general circulation in the MABA is controlled by the influence of the semi-permanent South Atlantic high-pressure system. This system influences the climate of the MABA throughout the year by bringing in moist winds from the northeast, which produce most of the precipitation in the area in the form of frontal systems, or storms produced by cyclogenesis, in autumn and winter (Barros et al.2006). In terms of the climate conditions of the MABA, temperatures at the beginning of autumn range from warm to hot in the afternoon, but they are mild during the nights and in the mornings. Later on in the season, conditions are cooler, featuring mild afternoons and cold nights and mornings.

To identify similarities and differences between the meteorological conditions during the lockdown phases and the testing period (BLD, LD, and PLD) with those of the autumn of 2019 (March, April, and May – MAM2019), we carried out a meteorological analysis for all the periods. We used hourly and daily data from the Buenos Aires Central Observatory (OBS; lat 3435 S, long 5829 W). The site of the Argentine National Weather Service is located in a residential area. It is representative of the meteorology of the air quality conditions under study.

Average temperatures in the BLD (24.4 C) and in the LD (21.1 C) were higher than those in MAM2019 (18 C), while the average temperature in the PLD (16.8 C) was lower than that in MAM2019 but close to the corresponding value in May 2019 (16 C). Precipitation in March and April 2020 exceeded the accumulated values of the same months in 2019 (+60 % and +90 %, respectively). On the contrary, precipitation in May 2020 exhibited significantly lower values than those of 2019 (−75 %).

During MAM2019, the average calm value was 6.7 %, while during the BLD, the LD, and the PLD, the corresponding calm values were 3.6 %, 4.7 %, and 8.6 %. Average wind velocity, within the 7.5–8.6 km h−1 range, was similar in all periods. In autumn 2020, the prevailing wind was from the NW–N sector, with an average contribution of 34 % compared to 26.5 % in 2019. The LD and the PLD periods had a similar direction of prevailing winds as in autumn 2019. In contrast, 45 % of winds during the BLD were from the NE–E sector.

Our analysis showed that there were meteorological differences in terms of temperature and precipitation between autumn 2019 and the periods analyzed in 2020 (BLD, LD, and PLD). This is indicative of the need to take into account the influence of meteorological conditions for comparative purposes of air quality conditions that occurred in the different periods.

2.4 Air quality data

We employed air quality data from two monitoring sites, namely the Comisión Nacional de Energía Atómica (CNEA), operated by our research group, and Parque Centenario (PC), managed by the autonomous city of Buenos Aires (described below). Both sites are mostly influenced by the emissions from mobile and residential sources, and, to a lesser extent, by thermal power plants located at least at 6 km from them (Diaz Resquin et al.2018; Pineda Rojas et al.2020).

Figure 1Location of the MABA in Argentina (a). A magnified view of the MABA (b). In yellow, the location of OBS, the site of the Argentine National Weather Service monitoring site referred to in this study, and, in red, the air quality monitoring sites (shapefiles from IGN2021) are shown.

2.4.1 Comisión Nacional de Energía Atómica

From 23 February 2019 to 26 May 2020, a monitoring campaign was carried out in an open area (−34.57 S, −58.51 W) situated 14 km away from the Buenos Aires city center (Fig. 1) to assess the levels of different gases (CO, NO, NO2, SO2, and O3) and their temporal variability in a residential area of the MABA.

The main goal of this monitoring campaign was to assess the temporal variability of SO2 and O3 in the area for an entire year. Although it may seem surprising, especially for a megacity like the MABA, there is scarce and fragmentary information on the concentrations of SO2 and O3. Presently, O3 is routinely monitored in one site of the MABA, which is located in an industrial area. Past data for the region are only available from a few short-time campaigns carried out in the early 2000s (Reich et al.2006). Similarly, there is a lack of monitored SO2 concentrations because historical measurements carried out in the 1990s reported very low values, and therefore, the decision-makers decided not to measure this pollutant on a regular basis. However, it has now become a pollutant of concern for local authorities, who have recently decided to start monitoring SO2 in two of the four air quality stations of the ACBA in the near future.

Air pollutant concentrations were continuously acquired (Table 2). Monitors were placed at an approximate height of 10 and 100 m east of a main traffic artery with a high density of buses, light-duty trucks, and passenger cars. Another main artery is located 500 m north, having circulation of vehicles including trucks and buses in a low-speed, stop-and-go pattern. The Jorge Newbery City Airport, two thermal power plants, the La Plata River, and the port are located within a 19 km radius of the monitoring station.

Data were registered per 1 min average. Unfortunately, from 26 May onwards, restrictions on entering our institute where the monitoring station was located led to the need to suspend the monitoring campaign.

Table 2Description of the equipment used at the CNEA site.

* The calibration of the ambient air gases detectors was performed by following U.S. Environmental Protection Agency (EPA) regulations and HORIBA standard procedures (see U.S. EPA CFR 40 Part 50, Appendix A1, C, D and F, and the corresponding user manual for the HORIBA AP devices). The APMA-370, APSA-370, and APNA-370 devices were calibrated using EPA certified calibration gases and diluted with an Environics 6103, a National Institute of Standards and Technology (NIST) traceable mass flow controller dilutor, when needed. The APOA-370 device was calibrated with ozone generated with the Environics 6103 NIST traceable internal UV-based ozone generator. Note: ppmv is parts per million by volume.

Download Print Version | Download XLSX

2.4.2 Parque Centenario

To include aerosol variations in this analysis and complement the information of the CNEA site, we used PM10, CO, NO, and NO2 data from the PC station (34.61 S, 58.44 W), which is one of the surface air quality sites of the Environmental Protection Agency of Buenos Aires city (APRA). This site is located in a residential–commercial area, with medium vehicular flow and a relatively low incidence of stationary sources. A monthly technical report of the hourly averaged concentrations registered in PC is available at the APRA website (; APRA, 2021). Although the city has three other monitoring stations, at least one of the essential periods needed for this study was missing in each of their datasets. Therefore, they were not taken into account for this study.

2.4.3 Summary of the datasets

Relatively low concentration values for all the analyzed periods, with no exceedances for the short-term air quality standard for all the pollutants measured (Decree 1074/182018; Act 13562004), were registered in both sites. Air pollutants, except for SO2, exhibited well-defined diurnal cycles (see Fig. S2 in the Supplement).

CO and NOx patterns were governed by traffic emissions (Figs. S1 and S2), with the maximum values occurring in winter. Annual mean average values of NOx were ∼37ppb (parts per billion) for both CNEA and PC. Relevant differences in CO were identified, with annual mean levels in PC doubling those measured in CNEA (0.51 ppm – parts per million – versus 0.26 ppm).

PM10, which was only measured in PC, had a mean value of 21 µg m−3, with the maximum values at noon.

With respect to the pollutants that were only measured in CNEA, SO2 maximum concentrations were registered during autumn (April), with monthly averages in the 2–2.9 ppb range. In terms of O3 concentrations, maximum daylight levels were registered during summer. The diurnal cycle presented higher levels during the afternoon and was the opposite to those of NO and NO2.

2.5 Modeling approach

We used the machine learning random forest method to (i) estimate the relative changes during the LD and the PLD phases and (ii) develop a model for the air quality forecast for the MABA at a low computational cost. To this end, two different approaches have been implemented using a random forest algorithm (Fig. 2). The first one estimates the hypothetical prospective pollutant concentrations that would have occurred in the MABA under the regular emissions conditions (BAU scenario) with the particular meteorological conditions that occurred during the period analyzed. This model, named the random forest predictive model or simply RF, has been applied to the LD and the PLD phases to estimate the concentrations if no lockdown measures had been imposed and compare them with the observations during the lockdown phases. This tool could also be used to forecast the air quality situation in the city. The second approach, referred to as RF normalized or RFN, has been designed to decouple the effects of the meteorology by normalizing the meteorological variables, allowing a generalized assessment of the effect of the changes in the emission patterns. This technique has been applied to compare the concentrations of the different lockdown periods to those of the same time frames in 2019 in order to infer the effects of the sudden reduction in emissions during COVID-19 mobility restrictions period. A summarized schematic of the modeling approach can be seen in Fig. 2.

Figure 2Schematic description of the model building and evaluation.


Observations from February 2019 to May 2020 were divided into different groups, following the methodology by Grange et al. (2021), using 8710 total data points for CNEA and 9198 for PC. The training of the models was conducted using a random sample of the 80 % of the input data from February 2019 to February 2020. The remaining 20 % was used as testing (t) to choose the model configuration with best statistical metrics. The BLD period (360 data points; see Table 1) was established as a different evaluation period in order to check the adequate performance of the model 2 weeks before the lockdown periods. Data collected from 20 March to 25 May and the RF estimates were used to quantify and interpret the changes during the LD and the PLD.

The target variables were the measured air pollutant concentrations in each monitoring site, namely CO, NO, NO2, O3, and SO2 (CNEA) and CO, NO, NO2, and PM10 (PC).

As predictive variables, we considered the (i) data taken from the Argentine Meteorological Weather Service, namely wind speed, wind direction, surface temperature, sea level pressure, and relative humidity, (ii) boundary layer height and total cloud cover taken from ERA5 (Hersbach et al.2018, 2020), (iii) pollutant concentrations measured in each of the sites (APrA2021; Diaz Resquin et al.2021), (iv) time variables such as month, hour, and weekday, and (v) diurnal and weekly emission cycles for pollutants associated with gasoline and diesel emissions (Castesana et al.2022; Freitas et al.2011). For the predictive model, all of these variables were tested as explanatory variables for each pollutant, and those performing the best for the testing dataset were selected. Table 3 presents the final set of predictive variables used in the RF model in addition to the hyperparameters that were employed.

For the RF normalized, all variables were used, and only the meteorological variables were normalized, following the approach described in Shi et al. (2021), which consists of resampling only the weather data over the whole study period and is considered adequate for studying emission changes. We employed the randomForest package of the R programming language (Liaw and Wiener2002) and used the rmweather package for the normalization process (Grange et al.2018; Grange and Carslaw2019).

Table 3Random forest model with the target variables, predictors, and hyperparameters for RF.

rh2 is the 2 m relative humidity, slp is the sea level pressure, t2 is the 2 m air temperature, U is the 10 m U component of wind, V is the 10 m V component of wind, wd is the 10 m wind direction, ws is the 10 m wind speed, gas_emcycle is the gasoline-related emission cycle, and aer_emcycle is the diesel-related emission cycle. The hyperparameters are ntree (number of trees to grow) of 300 and mtry (number of variables randomly sampled as candidates at each split), which is the rounded-down square root of the number of variables.

Download Print Version | Download XLSX

2.6 Random forest model evaluation and assessment tools

The RF model was tested for adequate performance, focusing on the reproduction of (i) the hourly concentrations, (ii) the mean diurnal cycles, and (iii) the mean value. For each pollutant, the normalized mean bias (NMB) and the Pearson correlation coefficient (r) for the hourly concentrations were calculated. The diurnal cycles were comparatively assessed by a graphical inspection of the temporal series of the mean values and spreads of the modeled and observed concentrations of each pollutant.


The NMB is useful for comparing pollutants that cover different concentration scales, and it is defined as the difference between modeled and observed mean concentrations, normalized by dividing by the mean observed concentration for that period. The r coefficient is useful to measure the linear relationship between two variables.

To detect, locate, and characterize different pollution sources (Carslaw and Beevers2013; Grange et al.2016), bivariate polar plots were built considering observations and RF results, using the openair library of the R programming language (Carslaw and Ropkins2012; R Core Team2019). These plots provided a graphical support to analyze the air pollutant concentrations together with wind speed and wind direction with and without COVID-19 restrictions. We also calculated them for March, April, and May 2019 (MAM2019), so as to have a baseline to identify the sources of the different pollutants.

Partial dependency plots were also built using the rmweather library of R (Grange et al.2018; Grange and Carslaw2019) to highlight the relationships between pollutant concentrations and all explanatory variables presented in Table 3 and can be seen in the Supplement (Figs. S9 to S11). By obtaining the prediction from the random forest model for each unique value of a specific explanatory variable, these plots allow us to analyze how this dependency varies for different values of the explanatory variable and therefore helps us to detect nonlinear relationships, which are highly relevant in air quality.

3 Results and discussion

3.1 Analysis of the results of the random forest models

In general, modeled CO, NO, NO2, and PM10 concentrations at both sites were in good agreement with the corresponding observations (see Table 4), with NMB <6 % for both sites for the testing dataset. The Pearson correlation coefficient during testing (rt) was above 0.7 for all pollutants, except PM10. That was probably due to both a complex chemistry, with primary and secondary processes being highly relevant, and the effect of a few regional events during the period having a large impact on particulate matter. In addition, calculations of diurnal cycles utilizing RF outcomes adequately reproduced the clear bimodal behavior of CO, NO, and NO2 (Fig. 3). Nevertheless, biases during the BLD period are moderately larger than during the testing period. This is to be expected, given that the model was optimized to reproduce the testing period.

Table 4Summary of the evaluation statistics used in the random forest predictive model for the testing dataset (t) and evaluation period (BLD).

Download Print Version | Download XLSX

Figure 3Mean diurnal cycles for the testing dataset and for the evaluation period (BLD). The line represents the average diurnal cycle, and the shaded area represents the standard deviation.


Figure 4Average daily concentrations for CNEA and PC sites. The line represents the 24 h average concentration, and the shaded area represents the daily levels between the 25th and 75th percentile.


The results for O3 were also satisfactory, particularly considering its secondary nature with complex dynamics which depends on multiple factors such as radiation energies, VOCs, and NOx concentrations and their ratios (Seinfeld and Pandis1998). Model performance indicators were NMBt=2.2 % and rt=0.85. Other processes involved in O3 chemistry (like the O3/VOCs and O3/NOx ratios) in the MABA were analyzed as a further way to test the RF model performance. The O3–CO ratio was used as a proxy for VOCs because direct VOC observations were unavailable in the MABA, and traffic-borne VOCs are intimately linked to CO (Bon et al.2011; Cazorla et al.2020). Overall, above 75 % of O3–CO and O3NOx hourly ratios from RF were within a factor of 2 of those resulting from the observations (Fig. S4). The Pearson correlation coefficients between observed and estimated O3–CO and O3NOx hourly ratios were found to be 0.85 and 0.9, respectively. In this context, this model was suitable for reproducing not only the levels of primary contaminants in the two analyzed sites but also the formation of O3 at the CNEA site. The diurnal cycle of SO2 (Fig. 3) during the BLD period had a sharp peak between 18:00 and 20:00 local time (LT is given here and elsewhere in the paper, unless indicated otherwise) that could not be entirely captured by the model, but it was linked to a day of particularly high concentrations during that time period. Concentrations from 12:00 to 17:00 were also overestimated during the BLD.

Figure 3 shows that, during the BLD, the diurnal cycles of O3 and SO2 estimated using RFN are noticeably different from those calculated using RF and the observations. This is further evidence that the atmospheric conditions can affect the concentrations of pollutants in a relevant way under certain weather conditions.

One of the advantages of building a random forest model is that it could provide the key components that reflect the nonlinear relationship among the emissions, the chemistry, and the meteorology by analyzing variables such as the permutation difference (variable importance; Figs. 5 and S6 to S8) and the partial dependencies. The analysis of the variable importance plots shows that the boundary layer height and the wind speed were important variables to predict CO concentrations at both sites for normalized and not normalized models. This result is consistent with the fact that, at the temporal scale studied here, CO can be considered to be a passive tracer (Saide et al.2011). For NO2 and NO, the most important variables were the other pollutants included in the models and the surface temperature (Table 3), which was also expected because temperature has an influence on NOx chemistry. In the case of O3, the model was dominated by the concentrations of NOx and CO, with NO2 being the most relevant, which is consistent with O3 chemistry (see Sect. 3.2.2 and 3.2.3). The variable importance plot for PM10 model shows that CO and NO2 are the most important variables for predicting PM10 concentrations. This was also expected because in Buenos Aires around 65 % of the PM10 is PM2.5, and the latter is highly correlated with CO (Arkouli et al.2010).

Figure 5Variable importance plot (permutation difference) for CO variables (ppm) in CNEA (a) and PC (b) for the RF model.


Partial dependency plots (Figs. S9 to S11) enlighten the relationships between pollutant concentrations and temperature. As an example, in CNEA, while CO, NO, and NO2 concentrations were inversely related to temperature, SO2 presented the opposite behavior. As described by Grange and Carslaw (2019), this relationship of SO2 with temperature could be associated with shipping emissions. This is also consistent with the fact that there is also a high partial dependence on wind directions from 0 to 100  (Figs. S3 and S11), which is the range of wind that brings air masses from the La Plata River.

3.2 Quantifying and analyzing the changes in concentrations during the lockdown periods

We discuss here the relative changes in the (1) measured concentrations during the LD and the PLD periods, in comparison to the RF outputs for the same period, and (2) normalized measured concentrations during the LD and the PLD with normalized concentrations during the same periods but for 2019 (20 March to 12 April and 13 April to 25 May). The corresponding percent relative changes (RCRF and RCRFN) were estimated using the expressions presented in Eqs. (3) and (4). We make use of RCRF to quantify the number of changes with respect to a BAU scenario for the particular meteorological conditions that happened during the two lockdown periods, and RCRFN is used to quantify the effects of the changes in emissions of these pollutants sources rather than meteorological or environmental effects of particular atmospheric conditions.

(3)RCRF[%]=ObsLD,PLD-RFLD,PLDRFLD,PLD×100(4)RCRFN[%]=RFNLD,PLD-RFNsame periods 2019RFNsame periods 2019×100,

where ObsLD,PLD corresponds to the hourly mean concentrations observed during the LD or the PLD, and RFLD,PLD is the corresponding predictive RF for the same period. RFNLD,PLD refers to the data for the LD and the PLD with the normalization of the meteorological variables, which was compared with the meteorologically normalized data of the same period in 2019.

Since both monitoring sites had been highly influenced by vehicular emissions, the traffic reduction of ∼80 % that was registered during the LD period led to a significant air quality improvement in primary pollutants (Fig. 6). In almost all cases, except for CO in PC and for SO2, the meteorological conditions amplified the change, as shown by the fact that RCRFN is smaller than RCRF. This is consistent with the results obtained by Shi et al. (2021).

On the other hand, observed O3 levels were 80 % and 57 % higher in comparison with the RF estimations for the LD and the PLD, respectively. However, the fact that this increment was considerably smaller when the meteorology was normalized indicates that this change was strongly enhanced by the meteorological conditions that occurred during that period.

Table 5Summary of the average concentrations for the BLD, the LD, and the PLD and the relative changes for the LD and the PLD for PC and CNEA sites estimated by RF and RFN. In every case, the RCs were calculated by considering the mean value for each period. In every case, Obs refers to observed concentrations.

Download Print Version | Download XLSX

Figure 6Mean diurnal cycle for the different pollutants for the LD (from 20 March to 13 April 2020) and the PLD (13 April to 25 May 2020) for both sites. The line represents the average diurnal cycle, and the shaded area represents the standard deviation.


Figure 7Mean diurnal cycle for different pollutants for the LD, PLD, and MAM2019 periods, with meteorological normalization for both sites.


Figure 8Bivariate polar plot for CNEA of hourly means for observations during MAM2019 and the lockdown periods versus the BAU scenario estimated with the RF model. The radial axis represents wind speed, the angular axis represents wind direction, and the color scale represents pollutant concentrations.


Figure 4 displays the differences in daily concentrations between observations and RF estimates for the three considered periods (BLD, LD, and PLD). For CNEA, CO and NOx observations and predictions for the BLD period showed NMB<10 %. Noticeably, most of the changes were observed right from the day after lockdown. Pollutant levels were almost fully recovered by the last week of the PLD period.

In what follows, the results are presented by species, highlighting the most relevant relative changes in concentrations. The results of the meteorological normalization are used to evaluate the effects of the changes in emissions of particular pollutants as a consequence of the restrictions previously discussed. Bivariate polar plots were used to distinguish potential sources that impact the monitoring sites (Figs. 8 and 9).

Figure 9Bivariate polar plot for PC of hourly means for observations during MAM2019 and the lockdown periods versus the BAU scenario estimated with the RF model. The radial axis represents wind speed, the angular axis represents wind direction, and the color scale represents pollutant concentrations.


3.2.1 Carbon monoxide

As shown in Table 5 and discussed below, there was a reduction in CO levels when the highest restrictions were in place (LD). However, the behavior of this pollutant when the restrictions were partially lifted (PLD) differed, depending on the measuring site.

In PC, the recovery of traffic during the PLD (RCRFPLD=-19%) did not result in a smaller relative change with respect to a scenario with higher restrictions (RCRFLD=-20%). Nevertheless, as shown by RCRFN, when decoupling the effects of the meteorology, the relative change was −20 % in the LD but only −7 % in the PLD with respect to the normalized values for the same periods in 2019. These results show the influence that the particular meteorological conditions had on CO concentrations in PC. On the other hand, in CNEA, the partial lift of restrictions during the PLD resulted in a smaller relative change in CO concentrations that is clear for both the particular meteorological conditions of the two periods (−45 % for RCRFLD vs. −26 % for RCRFPLD) and for the normalized model (−26 % for RCRFNLD vs. −11 % for RCRFNPLD).

The observed CO had lower concentration values and flatter diurnal patterns than our simulations of a BAU scenario (Fig. 6). This reduction far surpasses any bias detected in RF simulations, particularly during rush hours, where RF showed close to no bias (Fig. 3). This is particularly true in CNEA, where the general reduction in CO was larger. For this pollutant, there are no big differences between the changes in the normalized diurnal cycle and those obtained by comparing the RF predictive model with the observations (Figs. 6 and 7).

As shown in Fig. 8, for the CNEA site during MAM2019, concentrations were similar for all wind directions and speeds (up to 8 m s−1). The largest relative changes between the 2020 observations and the RF simulations were when winds were coming from the east and southeast (both for the LD and the PLD). These were probably due to a reduction in traffic on the highway (see Sect. 2.4.1), which, according to Diaz Resquin et al. (2018), is one of the principal sources of fuel combustion emissions.

An equivalent analysis for PC (Fig. 9) yielded similar results during MAM2019, although concentrations seemed to be largest when winds were from the west. However, relative changes during the LD and the PLD did not seem to have a clear dominant wind direction. During the PLD, sources from the west reappeared.

3.2.2 Nitrogen oxide levels

The drastic reduction in vehicular emissions impacted positively on the NO and NO2 levels. As shown in Table 5, during the LD period, NO levels were one-third and one-fourth of the estimated value for a BAU scenario in PC and CNEA, respectively. The relative change for NO2 was -45%. During the PLD, the relative change was smaller, with −37 % for both sites for NO and −20 % and −30 % for NO2 in PC and CNEA, respectively.

At both sites, the relative changes in nitrogen oxide levels were larger than those of CO. Arguably, this indicates that the power plants did not contribute in any major way to the observed differences. This is probably due to a reduced circulation of diesel vehicles, which are the major nitrogen oxide emitters (D'Angiola et al.2010; Ghaffarpasand et al.2020).

RCRFN shows that these changes were consistently enhanced by the meteorological conditions during that period, so that the changes with a meteorological normalization are between two-thirds and half as large as those without.

We also see a flattening of the diurnal cycles of NO during the LD, both in the RF predictive model and in the analysis with normalized meteorology (Figs. 6 and 7). The bimodal curve is partially recovered during the PLD. This indicates, once again, the strong role of traffic emissions in NO concentrations. NO2, however, preserves most of its bimodal nature, albeit somewhat diminished. Although a clear explanation of this fact is hard to find, while NO is predominantly a primary pollutant, NO2 is partially secondary in origin and is largely influenced by NO, O3, and HOx concentrations, as well as radiation and other meteorological parameters (Han et al.2011; Brasseur and Jacob2017). NO is photochemically converted to NO2 by reacting with O3 during the morning but is converted back to NO due to photolysis during the daytime, generating an O radical that regenerates the O3. At night, O3 and NO2 react with each other in a chain of reactions that ends up generating HNO3 in the aqueous phase of aerosols. The diurnal cycle of this photochemical processes should be largely regulated by the solar radiation and therefore unaffected by the restrictions. This remains true, even if NO emissions are flattened and the total concentrations of NO2 are also clearly lower, particularly during daytime.

Figure 8 shows the bivariate polar plots of the NOx concentrations at the CNEA site. The bivariate polar plot in MAM2019 provides evidence for two main contributing sources. One source was due to air masses from E–SE directions at low wind speeds, and the second source was associated with higher wind speeds from N–NW direction. The source to the E–SE could be dominated by ground-level road traffic emissions that are closer to the site because high concentrations under low wind speeds are indicative of surface emissions released with little or no buoyancy (Uria-Tellaetxe and Carslaw2014). Also, the wind direction in which this source was dominant corresponds to the highway previously described in Sect. 2.4.1. The source to the N–NW was associated with high concentrations at high wind speeds, which is indicative of emissions at a greater distance. It is plausible to attribute these NOx levels to the main access avenue that connects the city with the suburbs and is located in this direction, due to the presence of heavy-duty diesel vehicles and buses and the number of flowing traffic stops. During the LD and the PLD, the highest RCRF were present when winds were coming from the highway. This serves as further evidence that the observed effects were mainly due to changes in traffic and not to the changes in residential emission patterns due to lifestyle changes during the lockdown.

In the case of PC, as shown in Fig. 9, during MAM2019, the main sources seemed to be located to the west and southwest of the station. These two directions entailed the largest changes due to restrictions during the LD period. During the PLD period, in a similar manner to CO, the sources to the west were partially restored (although concentrations from the southwest remained low).

3.2.3 Ozone

In contrast to the other pollutants considered, the O3 was higher when compared to a no-restrictions scenario. Its relative changes estimated using the RF predictive model were 80 % and 57 % during the LD and the PLD periods, respectively.

Recent studies of the lockdown effects on atmospheric composition have also reported large O3 increases at urban sites and indicated the need to analyze changes in precursor emissions and meteorological parameters in light of their role in the nonlinear response in the O3 concentrations (Ordóñez et al.2020; Tobías et al.2020; Nakada Kondo and Urban2020; Shi and Brasseur2020). Hence, consideration of the joint effects of the changes on precursors and meteorology are of great value to understand the differences between the relative changes estimated using RF concentrations. Based on Figs. 4 and 6, we provide plausible explanations for these discrepancies.

It is well known that decreasing nitrogen oxides levels in a VOC-limited regime tend to increase O3. It is most likely that the lower concentrations of freshly emitted NO registered during the LD and the PLD in CNEA provoked a decline in the local scavenging of O3, leading to higher O3 concentrations, particularly in the morning (Tobías et al.2020; Nakada and Urban2020). Even though NO is the pollutant that had the highest relative decrease during the LD and the PLD, its reduction is not enough to explain the overall relative increase in O3, and therefore, NO2 might have played a role as well. Lower NO2 levels could also have resulted in more OH to initiate O3 production because the inhibition of a termination reaction favors a faster O3 accumulation (Seguel et al.2012).

With respect to the role of aerosols in the O3 formation, it is worth noting that a significant decrease in PM10 was registered in PC. This likely implied a consequent reduction not only in the mass concentrations of PM2.5 and PM1 but also especially in the number concentrations of fine and ultrafine particles (Arkouli et al.2010; Gelman Constantin et al.2021). A similar situation most likely occurred in CNEA. This could have led to greater photolysis due to the decrease in the emissions of fine particles as a consequence of the vehicular restrictions imposed during the lockdown periods, which in turn could have led to higher O3 concentrations (Wang et al.2019).

In this case, meteorological factors were clearly highly relevant, as can be seen by the fact that the relative change estimated with the RFN model is far smaller (27 % for RCRFNLD and only 5 % for RCRFNPLD). The effects of meteorology can be rather complex, since the O3 precursor concentrations and reaction rates are affected in multiple ways (Wang et al.2017). Although meteorological variables such as the temperature and relative humidity are highly relevant for ozone production and chemistry, they were tested as explanatory variables and, in this case, led to model degradation. However, we submit that their effects are indirectly taken into account by the chemical species that were employed (CO, NO, NO2, and SO2). Solar radiation, which is highly relevant for O3 chemistry, is also linked to the variable daynight. In this particular case, during the LD, elevated O3 concentrations occurred on days with high temperatures and low winds, which favor the photochemical production of O3 and the accumulation of ozone and its precursors.

When the meteorology is normalized, the valleys at 07:00 and 20:00 are clearly less marked during 2020 than during 2019 and almost disappeared during the LD when compared with the normalized values for the same period in the previous year (Fig. 7). This is probably due to the lower concentrations of nitrogen oxide levels that are therefore less efficient at titrating O3 (Brasseur and Jacob2017).

As expected, the bivariate polar plots (Fig. 8) show that O3 behaved in an opposite manner to that of NOx and had the largest increases when winds came from the east and southeast during the LD and also when they came from the east and northwest during the PLD.

From these results, we can also derive that the area in which the CNEA site is located behaves as a region with a VOC-limited chemical regime because the reduction in NOx emissions caused an increase in ozone concentrations (Blanchard and Fairley2001; Heuss et al.2003; Yarwood et al.2003; Blanchard and Tanenbaum2006). We identified similar behavior for increasing O3 concentrations under decreasing NOx levels when analyzing the 2019 data for weekends (Fig. S5). This is related to the denoted weekend effect in a VOC-limited regime (Koo et al.2012).

3.2.4 Sulfur dioxide

During the LD, the SO2 concentrations were slightly lower than those of the simulated BAU scenario (RCRF of −12 %). Although this change is not as large as in the other species for the particular meteorological conditions that occurred during the period, if we consider a normalized meteorology, then we observe a relative change of −20 %, which is about as large as the change observed in, for example, CO. There was a smaller relative change during the PLD, which was similar for RF and RFN.

While all other species in this study are mostly controlled, directly or indirectly, by on-road traffic emissions, according to our findings, SO2 concentrations are largely influenced by shipping emissions (see Sect. 3.1). This might be the reason why SO2 is the species with a larger change after normalizing the meteorology.

Another possible reason for having a smaller relative change in SO2 concentrations is that the vehicle emissions of heavy-duty diesel trucks are another relevant source in Buenos Aires. These are mainly associated with essential activities and might not have been affected as much by the restrictions. However, the partial flattening of the normalized diurnal cycle (Fig. 7) is still probably related to changes in this particular sort of traffic.

3.2.5 Particulate matter 10 µm

During the LD, PM10 had a relative change of −33 % compared to what would be expected for that specific period under previous emissions. This effect was once again enhanced by the meteorological factors, considering that RCRFN was only −20 %. During the PLD, similar to what happened with other pollutants, the concentrations had a relative change that was only about half as large (−14 % for the RF predictive model and −7 % for the RFN).

When winds are taken into account (Fig. 9), we observe a general reduction from all directions during the LD. Two sources account for this, namely (i) the anthropogenic PM10 emissions close to the monitoring site that were mostly from vehicle diesel combustion and soot resuspension and (ii) natural sources, such as dust emissions, from the nearest large open area. In a similar fashion to CO and NOx, sources from the west were re-established during the PLD.

3.3 Vehicle emission reduction strategies and air pollution in the MABA

Although, as expected, most pollutants were noticeably reduced during the LD due to the restrictions imposed, O3 was an exception. Strategies for controlling pollution from vehicular emissions in the MABA must take into account the relative reductions in NOx and VOCs to avoid an unintended increase in O3 concentrations. The atmosphere in the MABA is usually cleaned up during the night, due to a flat topography and the city's wind dynamics. Therefore, criteria pollutants rarely surpass air quality norms. Even though no specific policies to reduce them have been implemented, recently announced greenhouse gas emission mitigation policies affecting on-road mobile emissions may have a major impact. These include (i) technological advances in diesel buses that should reduce NOx and PM10, without a major impact on VOCs, and (ii) an increase in the fraction of electric cars, which should reduce NOx and VOC concentrations. Thus, if NOx emissions decrease like they did during the COVID-19 lockdown, then this will likely result in an increase in tropospheric O3 in the MABA if no additional measures regarding VOCs emissions are included, which could be of particular importance for some weather conditions. In fact, under the VOC-limited regime identified for the MABA, control of VOC emissions would be more efficient to reduce local peaks in O3.

This highlights the importance of having comprehensive air quality policies rather than focusing on reducing individual pollutants.

4 Code and data availability

Hourly concentrations of CO, NO, NO2, SO2, and O3 in CNEA are available in *.csv format at (Diaz Resquin et al.2021). We also provide an introductory R notebook with some baseline simulations for the predictive model. For PC, regulatory averages are publicly available and can be accessed through (APrA2021). Nevertheless, hourly data are not regularly reported but can be requested from the Environmental Protection Agency of Buenos Aires. To enable a machine learning quick start to reproduce the baseline experiments, we also added the meteorological data used to run the simulations to the dataset. These data are publicly available on the website of the Argentine National Weather Service (, Servicio Meteorológico Nacional2023).

5 Summary and conclusions

In this study, we present novel air quality data for a residential site located in the metropolitan area of Buenos Aires that includes concentrations of CO, NO, and NO2 and, of particular importance for the city, SO2 and O3. One year of these data, together with data from a public monitoring station, were used to train random forest models. The performance of the models was tested on the basis of observations registered both with a separate testing set during the training period and with data before the outbreak of the COVID-19 pandemic. Observations in the two first phases of the lockdown measures imposed were compared with the business-as-usual RF concentrations to assess the change with respect to the air pollutant concentrations that would have occurred without the lockdown. Simultaneously, a meteorological normalization using random forest was performed (RFN), and the normalized concentrations during these lockdown phases were compared with the normalized concentrations for the same periods during 2019. The main conclusions are listed below.

  • i.

    The resulting set of explanatory variables for the different pollutants at each site provides evidence of the need for careful variable identification during the training period. Although ideally the best explanatory variables could be identified by trial and error by inexperienced users of random forest models with the support of variable importance plots, expert judgment is advisable for a meaningful and relatively fast selection.

  • ii.

    The RF model was able to reproduce air quality observations at two monitoring stations in the MABA when evaluated for a 15 d period prior to the outbreak of the COVID-19 pandemic. This approach allowed predicting the pollutant hourly mean values with a mean bias of less than 10 % by using the data of air quality, emissions, and meteorology and analyzing the effect of wind direction and speed in pollutant concentration, which is useful when characterizing pollution sources.

  • iii.

    During the lockdown, all primary pollutants had lower concentrations than what the RF framework would predict for a business-as-usual scenario. The relative change ranged from −12 % (SO2) to −75 % (NO in the monitoring site of CNEA). In the case of all pollutants except SO2, the relative changes were enhanced by the meteorology, as shown by the fact that, in absolute terms, RCRF was generally larger than RCRFN. This difference was particularly large for O3, probably due to its secondary nature and its complex chemical and photochemical production and destruction mechanisms. The exception observed in the case of SO2 is likely due to the importance of the wind direction, due to the relevance of the shipping emissions. The relative changes in pollutant concentrations are closely linked to both the traffic and the particular meteorological conditions. The use of bivariate polar plots is also helpful for identifying potential sources, while remaining relatively easy to implement.

  • iv.

    RF estimations can be implemented at a low computational cost and can be used to assess the changes that occurred in a specific period if an anomalous situation happened. It can also be used to forecast air quality conditions in the short term at a lower cost than CTMs, which could be of use for local authorities, considering that the MABA has, thus far, only six long-term air quality monitoring stations. When, as in this case, detailed temporal information on different emission sources is lacking (for example, traffic information from on-road sensors), it is essential to use a set of data in which the emissions are similar to those that are expected to be simulated. The model also allows the analysis of the relations between different pollutants, which is of particular interest for those that have very complex chemistry, such as O3. The observational input data needed for future RF simulations can be readily updated. The modeling framework developed in this study is user-friendly, rather straightforward to implement, and does not require a large computational capacity. The methodology is capable of being adapted to different time periods and sites and implemented by the technical staff of regulatory agencies. Expert advice may be needed during the selection of the predictive variables and model optimization.

  • v.

    To assess the effectiveness of a particular measure in air quality (AQ) independently of particular meteorological conditions of specific periods, a meteorological normalization technique based on random forest can be used. This approach is relatively simple to implement with already existing R packages.

  • vi.

    Although previous studies employed both techniques with similar aims, we postulate that the use of the RF predictive model and the meteorological normalization serve different purposes and should be used accordingly. The predictive model can be used to analyze the changes for particular weather conditions or, combined with a meteorological forecast, to forecast pollutant concentrations. On the other hand, the meteorological normalization makes it possible to evaluate the general impact on concentrations due to changes in emissions, decoupling the effects of particular meteorological conditions from the short-term emission changes from the AQ datasets.

  • vii.

    In this work we provide the first year-long in situ observational dataset on tropospheric O3 and SO2 outside of an industrial area in the MABA in the last decade. We also provide concentrations of CO, NO, and NO2 determined by colocated instruments.

  • viii.

    According to our measurements, the MABA seems to be in a VOC-limited regime. If VOC emissions are not carefully regulated, a NOx reduction would imply an increase in the tropospheric O3. Knowing how the concentrations of O3 in the troposphere respond to reducing the emissions of their precursors is relevant when planning appropriate strategies to reduce CO, non-methane volatile organic compounds (NMVOCs), and NOx emissions. Even though this classification is limited due to the fact that we only have single-point measurements, this could be a useful starting point for a more thorough characterization of the ozone regime in this urban area.


The supplement related to this article is available online at:

Author contributions

MDR and LD conceived the conceptualization. DA and MDO acquired the data for CNEA, and MDR retrieved the data from the Argentine National Weather Service and the environmental authority of the city of Buenos Aires. MDR validated the data. MDR and DA curated the data. MDR and PL analyzed the data, and CR contributed with the meteorological analysis. MDR, PL, DG, and LD performed the formal analysis. DG and LD supervised the project and acquired the funding. MDR, PL, DA, MDO, CR, DG, and LD wrote the original draft. MDR, PL, MDO, PC, DG, and LD reviewed the draft. MDR, PL, MDO, DG, and LD were part of the editing process.

Competing interests

The contact author has declared that none of the authors has any competing interests.


Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Special issue statement

This article is part of the special issue “Benchmark datasets and machine learning algorithms for Earth system science data (ESSD/GMD inter-journal SI)”. It is not associated with a conference.


We want to acknowledge the participation of the entire group of Atmospheric Chemistry of the National Atomic Energy Commission of Argentina (CNEA), who continued with the campaign, even during Lockdown. The authors wish to thank the Environmental Protection Agency of Buenos Aires (APRA) and the Argentine National Weather Service (SMN) for sharing the air quality and meteorological data for this study. We also appreciate the editor and reviewers, for their comments and recommendations that helped to improve the paper.

Financial support

This research has been supported by CNEA (grant no. CNEA-GQ-20192), the Agencia Nacional de Promoción Científica y Tecnológica, Argentina (grant nos. PICT-O 2016-4802 and PICT 2016-3590), and the EU Horizon 2020 Marie Skłodowska-Curie project PAPILA (grant no. 777544; MSCA action for research and innovation staff exchange).

Review statement

This paper was edited by Nellie Elguindi and reviewed by two anonymous referees.


Act 1356: Preservación del recurso aire y prevención y control de la contaminación atmosférica, (last access: 7 September 2021), 2004. a

Agencia de Protección Ambiental (APrA), Secretaría de Ambiente, Jefatura de Gobierno: Calidad de Aire, Buenos Aires Data [data set], (last access: 4 January 2023), 2021. a, b

Aktay, A., Bavadekar, S., Cossoul, G., Davis, J., Desfontaines, D., Fabrikant, A., Gabrilovich, E., Gadepalli, K., Gipson, B., Guevara, M., Kamath, C., Kansal, M., Lange, A., Mandayam, C., Oplinger, A., Pluntke, C., Roessler, T., Schlosberg, A., Shekel, T., Vispute, S., Vu, M., Wellenius, G., Williams, B., and Wilson, R. J.: Google COVID-19 Community Mobility Reports: Anonymization Process Description (version 1.1), arXiv [preprint],, 2020. a

Anapolsky, S.: ¿cómo nos movemos en el AMBA? Conclusiones de la evidencia empírica y alternativas post-covid, Universidad de San Martín. ISSN: 2469-1631 Serie: Documentos de Trabajo del IT, 18/ Comonos/ movemos/ en/ el/ AMBA/ -/ Anapolsky.pdfl (last access: 7 September 2021), 2020. a

Arkouli, M., Ulke, A. G., Endlicher, W., Baumbach, G., Schultz, E., Vogt, U., Müller, M., Dawidowski, L., Faggi, A., Wolf-Benning, U., and Scheffknecht, G.: Distribution and temporal behavior of particulate matter over the urban area of Buenos Aires, Atmos. Pollut. Res., 1, 1–8,, 2010. a, b

Barros, V., Clarke, R., and Dias, P. S.: Climate change in the La Plata basin, Publication of the Inter-American Institute for Global Change Research (IAI), São José dos Campos, Brazil, ISBN 950-692-066-4, ISBN-13 978-950-692-066-1, 2006. a

Blanchard, C. and Tanenbaum, S.: Weekday/Weekend differences in ambient air pollutant concentrations in atlanta and the southeastern United States, J. Air Waste Manage., 56, 271–284,, 2006. a

Blanchard, C. L. and Fairley, D.: Spatial mapping of VOC and NOx-limitation of ozone formation in central California, Atmos. Environ., 35, 3861–3873,, 2001. a

Bon, D. M., Ulbrich, I. M., de Gouw, J. A., Warneke, C., Kuster, W. C., Alexander, M. L., Baker, A., Beyersdorf, A. J., Blake, D., Fall, R., Jimenez, J. L., Herndon, S. C., Huey, L. G., Knighton, W. B., Ortega, J., Springston, S., and Vargas, O.: Measurements of volatile organic compounds at a suburban ground site (T1) in Mexico City during the MILAGRO 2006 campaign: measurement comparison, emission ratios, and source attribution, Atmos. Chem. Phys., 11, 2399–2421,, 2011. a

Brasseur, G. P. and Jacob, D. J.: Modeling of Atmospheric Chemistry, Cambridge University Press, 1 edn.,, 2017. a, b

Carslaw, D. C. and Beevers, S. D.: Characterising and understanding emission sources using bivariate polar plots and k-means clustering, Environ. Modell. Softw., 40, 325–329,, 2013. a

Carslaw, D. C. and Ropkins, K.: openair – An R package for air quality data analysis, Environ. Model. Softw., 27–28, 52–61,, 2012. a

Castesana, P., Diaz Resquin, M., Huneeus, N., Puliafito, E., Darras, S., Gómez, D., Granier, C., Osses Alvarado, M., Rojas, N., and Dawidowski, L.: PAPILA dataset: a regional emission inventory of reactive gases for South America based on the combination of local and global information, Earth Syst. Sci. Data, 14, 271–293,, 2022. a, b, c

Cazorla, M., Herrera, E., Palomeque, E., and Saud, N.: What the COVID-19 lockdown revealed about photochemistry and ozone production in Quito, Ecuador, Atmos. Pollut. Res., 12, 124–133,, 2020. a

D'Angiola, A., Dawidowski, L. E., Gómez, D. R., and Osses, M.: On-road traffic emissions in a megacity, Atmos. Environ., 44, 483–493,, 2010. a, b

Decree 1074/18: Decreto 1074/2018, (last access: 3 January 2023), 2018. a

Decree 297/2020: AISLAMIENTO SOCIAL PREVENTIVO Y OBLIGATORIO, Decreto 297/2020, (last access: 7 September 2021), 2020. a

Diaz Resquin, M., Santágata, D., Gallardo, L., Gómez, D., Rössler, C., and Dawidowski, L.: Local and remote black carbon sources in the Metropolitan Area of Buenos Aires, Atmos. Environ., 182, 105–114,, 2018. a, b, c

Diaz Resquin, M. C., Alessandrello, D., De Oto, M., Lichtig, P., Bajano, H., Ponso, A., Bajano, F., Dawidowski, L., and Gómez, D.: AQ-CNEA-CAC Air quality dataset (2019–2020): “A machine learning approach to address air quality changes during the COVID-19 lockdown in Buenos Aires, Argentina”, v1, Mendeley Data [data set, code],, 2021. a, b, c

Faridi, S., Yousefian, F., Janjani, H., Niazi, S., Azimi, F., Naddafi, K., and Hassanvand, M. S.: The effect of COVID-19 pandemic on human mobility and ambient air quality around the world: A systematic review, Urban Clim., 38, 100888,, 2021. a

Feng, R., jun Zheng, H., Gao, H., Ran Zhang, A., Huang, C., Xi Zhang, J., Luo, K., and Ren Fan, J.: Recurrent Neural Network and random forest for analysis and accurate forecast of atmospheric pollutants: A case study in Hangzhou, China, J. Clean. Prod., 231, 1005–1015,, 2019. a

Freitas, S. R., Longo, K. M., Alonso, M. F., Pirre, M., Marecal, V., Grell, G., Stockler, R., Mello, R. F., and Sánchez Gácita, M.: PREP-CHEM-SRC – 1.0: a preprocessor of trace gas and aerosol emission fields for regional and global atmospheric chemistry models, Geosci. Model Dev., 4, 419–433,, 2011. a

Gaubert, B., Bouarar, I., Doumbia, T., Liu, Y., Stavrakou, T., Deroubaix, A., Darras, S., Elguindi, N., Granier, C., Lacey, F., Müller, J. F., Shi, X., Tilmes, S., Wang, T., and Brasseur, G. P.: Global Changes in Secondary Atmospheric Pollutants During the 2020 COVID-19 Pandemic, J. Geophys. Res.-Atmos., 126, e2020JD034213,, 2021. a, b

Gelman Constantin, J., Londonio, A., Bajano, H., Smichowski, P., and Gómez, D.: Plasma-based technique applied to the determination of 21 elements in ten size fractions of atmospheric aerosols, Microchem. J., 160, 105736,, 2021. a

Ghaffarpasand, O., Beddows, D. C., Ropkins, K., and Pope, F. D.: Real-world assessment of vehicle air pollutant emissions subset by vehicle type, fuel and EURO class: New findings from the recent UK EDAR field campaigns, and implications for emissions restricted zones, Sci. Total Environ., 734, 139416,, 2020. a

Grange, S. K. and Carslaw, D. C.: Using meteorological normalisation to detect interventions in air quality time series, Sci. Total Environ., 653, 578–588,, 2019. a, b, c, d

Grange, S. K., Lewis, A. C., and Carslaw, D. C.: Source apportionment advances using polar plots of bivariate correlation and regression statistics, Atmos. Environ., 145, 128–134,, 2016. a

Grange, S. K., Carslaw, D. C., Lewis, A. C., Boleti, E., and Hueglin, C.: Random forest meteorological normalisation models for Swiss PM10 trend analysis, Atmos. Chem. Phys., 18, 6223–6239,, 2018. a, b

Grange, S. K., Lee, J. D., Drysdale, W. S., Lewis, A. C., Hueglin, C., Emmenegger, L., and Carslaw, D. C.: COVID-19 lockdowns highlight a risk of increasing ozone pollution in European urban areas, Atmos. Chem. Phys., 21, 4169–4185,, 2021. a, b

Han, S., Bian, H., Feng, Y., Liu, A., Li, X., Zeng, F., and Zhang, X.: Analysis of the Relationship between O3, NO and NO2 in Tianjin, China, Aerosol Air Qual. Res., 11, 128–139,, 2011. a

Hersbach, H., Bell, B., Berrisford, P., Biavati, G., Horányi, A., Muñoz Sabater, J., Nicolas, J., Peubey, C., Radu, R., Rozum, I., Schepers, D., Simmons, A., Soci, C., Dee, D., and Thépaut, J.-N.: ERA5 hourly data on single levels from 1959 to present, Copernicus Climate Change Service (C3S) Climate Data Store (CDS) [data set],, 2018. a

Hersbach, H., Bell, B., Berrisford, P., Hirahara, S., Horányi, A., Muñoz‐Sabater, J., Nicolas, J., Peubey, C., Radu, R., Schepers, D., Simmons, A., Soci, C., Abdalla, S., Abellan, X., Balsamo, G., Bechtold, P., Biavati, G., Bidlot, J., Bonavita, M., De Chiara, G., Dahlgren, P., Dee, D., Diamantakis, M., Dragani, R., Flemming, J., Forbes, R., Fuentes, M., Geer, A., Haimberger, L., Healy, S.,Hogan, R. J., Hólm, E., Janisková, M., Keeley, S., Laloyaux, P., Lopez, P., Lupu, C., Radnoti, G., de Rosnay, P., Rozum, I., Vamborg, F., Villaume, S., and Thépaut, J.: The ERA5 global reanalysis, Q. J. Roy. Meteor. Soc., 146, 1999–2049, 2020. a

Heuss, J. M., Kahlbaum, D. F., and Wolff, G. T.: Weekday/Weekend Ozone Differences: What Can We Learn from Them?, J. Air Waste Manage., 53, 772–788,, 2003. a

IGN: Mapas base de Argentina Bicontinental y Argentina Parte Continental Americana, Capas SIG [data set],, last access: 7 September 2021. a

Jiang, N. and Riley, M. L.: Exploring the utility of the random forest method for forecasting ozone pollution in SYDNEY, J. Environ. Protect. Sustainable Develop, 1, 245–254, 2015. a

Koo, B., Jung, J., Pollack, A. K., Lindhjem, C., Jimenez, M., and Yarwood, G.: Impact of meteorology and anthropogenic emissions on the local and regional ozone weekend effect in Midwestern US, Atmos. Environ., 57, 13–21,, 2012. a

Kroll, J. H., Heald, C. L., Cappa, C. D., Farmer, D. K., Fry, J. L., Murphy, J. G., and Steiner, A. L.: The complex chemical effects of COVID-19 shutdowns on air quality, Nat. Chem., 12, 777–779,, 2020. a

Le, T., Wang, Y., Liu, L., Yang, J., Yung, Y. L., Li, G., and Seinfeld, J. H.: Unexpected air pollution with marked emission reductions during the COVID-19 outbreak in China, Science, 369, 702–706,, 2020. a

Li, K., Jacob, D. J., Liao, H., Zhu, J., Shah, V., Shen, L., Bates, K. H., Zhang, Q., and Zhai, S.: A two-pollutant strategy for improving ozone and particulate air quality in China, Nat. Geosci., 12, 906–910,, 2019. a

Liaw, A. and Wiener, M.: Classification and Regression by randomForest, R News, 2, 18–22, (last access: 3 January 2023), 2002. a

Liu, Y., Wang, T., Stavrakou, T., Elguindi, N., Doumbia, T., Granier, C., Bouarar, I., Gaubert, B., and Brasseur, G. P.: Diverse response of surface ozone to COVID-19 lockdown in China, Sci. Total Environ., 789, 147739,, 2021. a, b

Masih, A.: Machine learning algorithms in air quality modeling, Glob. J. Environ. Sci. Manag., 5, 515–534,, 2019. a

Muhammad, S., Long, X., and Salman, M.: COVID-19 pandemic and environmental pollution: A blessing in disguise?, Sci. Total Environ., 728, 138820,, 2020. a

Nakada, L. Y. K. and Urban, R. C.: COVID-19 pandemic: Impacts on the air quality during the partial lockdown in São Paulo state, Brazil, Sci. Total Environ., 730, 139087,, 2020. a

Nakada Kondo, L. Y. and Urban, R. C.: COVID-19 pandemic: Impacts on the air quality during the partial lockdown in São Paulo state, Brazil, Sci. Total Environ., 730, 139087,, 2020. a

Ordóñez, C., Garrido-Perez, J. M., and García-Herrera, R.: Early spring near-surface ozone in Europe during the COVID-19 shutdown: Meteorological effects outweigh emission changes, Sci. Total Environ., 747, 141322,, 2020. a

Pineda Rojas, A. L., Borge, R., Mazzeo, N. A., Saurral, R. I., Matarazzo, B. N., Cordero, J. M., and Kropff, E.: High PM10 concentrations in the city of Buenos Aires and their relationship with meteorological conditions, Atmos. Environ., 241, 117773,, 2020. a

Puliafito, S. E., Allende, D. G., Castesana, P. S., and Ruggeri, M. F.: High-resolution atmospheric emission inventory of the argentine energy sector. Comparison with edgar global emission database, Heliyon, 3, e00489,, 2017. a

R Core Team: R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, (last access: 16 December 2022), 2019. a

Rahman, M. M., Paul, K. C., Hossain, M. A., Ali, G. G. M. N., Rahman, M. S., and Thill, J.-C.: Machine Learning on the COVID-19 Pandemic, Human Mobility and Air Quality: A Review, IEEE Access, 9, 72420–72450,, 2021. a

Reich, S., Magallanes, J., Dawidowski, L., Gómez, D., Grošelj, N., and Zupan, J.: An Analysis of Secondary Pollutants in Buenos Aires City, Environ. Monit. Assess., 119, 441–457,, 2006. a

Saide, P. E., Carmichael, G. R., Spak, S. N., Gallardo, L., Osses, A. E., Mena-Carrasco, M. A., and Pagowski, M.: Forecasting urban PM10 and PM2.5 pollution episodes in very stable nocturnal conditions and complex terrain using WRF–Chem CO tracer model, Atmos. Environ., 45, 2769–2780,, 2011. a

Seguel, R. J., Morales S., R. G., and Leiva, G. M. A.: Ozone weekend effect in Santiago, Chile, Environ. Pollut., 162, 72–79,, 2012. a

Seinfeld, J. and Pandis, S.: Atmospheric Chemistry & Physics: From Air Pollution to Climate Change, Wiley, ISBN 0-471-17815-2, 1998. a

Servicio Meteorológico Nacional: Descarga del Catálogo de Datos Abiertos del SMN [data set],, last access: 3 January 2023. a

Shi, X. and Brasseur, G. P.: The Response in Air Quality to the Reduction of Chinese Economic Activities During the COVID-19 Outbreak, Geophys. Res. Lett., 47, 1–8,, 2020. a, b

Shi, Z., Song, C., Liu, B., Lu, G., Xu, J., Vu, T. V., Elliott, R. J. R., Li, W., Bloss, W. J., and Harrison, R. M.: Abrupt but smaller than expected changes in surface air quality attributable to COVID-19 lockdowns, Sci. Adv., 7, eabd6696,, 2021. a, b, c, d

Šimić, I., Lovrić, M., Godec, R., Kröll, M., and Bešlić, I.: Applying machine learning methods to better understand, model and estimate mass concentrations of traffic-related pollutants at a typical street canyon, Environ. Pollut., 263, 114587,, 2020. a

Srivastava, A.: COVID-19 and air pollution and meteorology-an intricate relationship: A review, Chemosphere, 263, 128297,, 2021. a

Stafoggia, M., Johansson, C., Glantz, P., Renzi, M., Shtein, A., de Hoogh, K., Kloog, I., Davoli, M., Michelozzi, P., and Bellander, T.: A random forest approach to estimate daily particulate matter, nitrogen dioxide, and ozone at fine spatial resolution in Sweden, Atmosphere, 11, 239,, 2020. a

Tobías, A., Carnerero, C., Reche, C., Massagué, J., Via, M., Minguillón, M. C., Alastuey, A., and Querol, X.: Changes in air quality during the lockdown in Barcelona (Spain) one month into the SARS-CoV-2 epidemic, Sci. Total Environ., 726, 138540,, 2020. a, b

Uria-Tellaetxe, I. and Carslaw, D. C.: Conditional bivariate probability function for source identification, Environ. Model. Softw., 59, 1–9,, 2014. a

Velders, G. J., Willers, S. M., Wesseling, J., van den Elshout, S., van der Swaluw, E., Mooibroek, D., and van Ratingen, S.: Improvements in air quality in the Netherlands during the corona lockdown based on observations and model simulations, Atmos. Environ., 247, 118158,, 2021. a, b

Vu, T. V., Shi, Z., Cheng, J., Zhang, Q., He, K., Wang, S., and Harrison, R. M.: Assessing the impact of clean air action on air quality trends in Beijing using a machine learning technique, Atmos. Chem. Phys., 19, 11303–11314,, 2019. a

Wang, T., Xue, L., Brimblecombe, P., Lam, Y. F., Li, L., and Zhang, L.: Ozone pollution in China: A review of concentrations, meteorological influences, chemical precursors, and effects, Sci. Total Environ., 575, 1582–1596,, 2017. a

Wang, W., Li, X., Shao, M., Hu, M., Zeng, L., Wu, Y., and Tan, T.: The impact of aerosols on photolysis frequencies and ozone production in Beijing during the 4-year period 2012–2015, Atmos. Chem. Phys., 19, 9413–9429,, 2019. a

Yang, J., Wen, Y., Wang, Y., Zhang, S., Pinto, J. P., Pennington, E. A., Wang, Z., Wu, Y., Sander, S. P., Jiang, J. H., Hao, J., Yung, Y. L., and Seinfeld, J. H.: From COVID-19 to future electrification: Assessing traffic impacts on air quality by a machine-learning model, P. Natl. Acad. Sci. USA, 118, e2102705118,, 2021. a, b

Yarwood, G., Stoeckenius, T. E., Heiken, J. G., and Dunker, A. M.: Modeling Weekday/Weekend Ozone Differences in the Los Angeles Region for 1997, J. Air Waste Manage., 53, 864–875,, 2003.  a

Yu, R., Yang, Y., Yang, L., Han, G., and Move, O. A.: RAQ–A random forest approach for predicting air quality in urban sensing systems, Sensors, 16, 86,, 2016. a

Zhan, Y., Luo, Y., Deng, X., Grieneisen, M. L., Zhang, M., and Di, B.: Spatiotemporal prediction of daily ambient ozone levels across China using random forest for human exposure assessment, Environ. Pollut., 233, 464–473,, 2018. a

Short summary
We explored the performance of the random forest algorithm to predict CO, NOx, PM10, SO2, and O3 air quality concentrations and comparatively assessed the monitored and modeled concentrations during the COVID-19 lockdown phases. We provide the first long-term O3 and SO2 observational dataset for an urban–residential area of Buenos Aires in more than a decade and study the responses of O3 to the reduction in the emissions of its precursors because of its relevance regarding emission control.