Articles | Volume 16, issue 9
https://doi.org/10.5194/essd-16-4051-2024
https://doi.org/10.5194/essd-16-4051-2024
Data description paper
 | 
12 Sep 2024
Data description paper |  | 12 Sep 2024

PM2.5 concentrations based on near-surface visibility in the Northern Hemisphere from 1959 to 2022

Hongfei Hao, Kaicun Wang, Guocan Wu, Jianbao Liu, and Jing Li
Abstract

Long-term PM2.5 data are essential for the atmospheric environment, human health, and climate change. PM2.5 measurements are sparsely distributed and of short duration. In this study, daily PM2.5 concentrations are estimated using a machine learning method for the period from 1959 to 2022 in the Northern Hemisphere based on near-surface atmospheric visibility. They are extracted from the Integrated Surface Database (ISD). Daily continuous monitored PM2.5 concentration is set as the target, and near-surface atmospheric visibility and other related variables are used as the inputs. A total of 80 % of the samples of each site are the training set, and 20 % are the testing set. The training result shows that the slope of linear regression with a 95 % confidence interval (CI) between the estimated PM2.5 concentration and the monitored PM2.5 concentration is 0.955 [0.955, 0.955], the coefficient of determination (R2) is 0.95, the root mean square error (RMSE) is 7.2 µg m−3, and the mean absolute error (MAE) is 3.2 µg m−3. The test result shows that the slope within a 95 % CI between the predicted PM2.5 concentration and the monitored PM2.5 concentration is 0.864 [0.863, 0.865], the R2 is 0.79, the RMSE is 14.8 µg m−3, and the MAE is 7.6 µg m−3. Compared with a global PM2.5 concentration dataset derived from a satellite aerosol optical depth product with 1 km resolution, the slopes of linear regression on the daily (monthly) scale are 0.817 (0.854) from 2000 to 2021, 0.758 (0.821) from 2000 to 2010, and 0.867 (0.879) from 2011 to 2022, indicating the accuracy of the model and the consistency of the estimated PM2.5 concentration on the temporal scale. The interannual trends and spatial patterns of PM2.5 concentration on the regional scale from 1959 to 2022 are analyzed using a generalized additive mixed model (GAMM), suitable for situations with an uneven spatial distribution of monitoring sites. The trend is the slope of the Theil–Sen estimator. In Canada, the trend is −0.10µg m−3 per decade, and the PM2.5 concentration exhibits an east–high to west–low pattern. In the United States, the trend is −0.40µg m−3 per decade, and PM2.5 concentration decreases significantly after 1992, with a trend of −1.39µg m−3 per decade. The areas of high PM2.5 concentration are in the east and west, and the areas of low PM2.5 concentration are in the central and northern regions. In Europe, the trend is −1.55µg m−3 per decade. High-concentration areas are distributed in eastern Europe, and the low-concentration areas are in northern and western Europe. In China, the trend is 2.09 µg m−3 per decade. High- concentration areas are distributed in northern China, and the low-concentration areas are distributed in southern China. The trend is 2.65 µg m−3 per decade up to 2011 and −22.23µg m−3 per decade since 2012. In India, the trend is 0.92 µg m−3 per decade. The concentration exhibits a north–high to south–low pattern, with high-concentration areas distributed in northern India, such as the Ganges Plain and Thar Desert, and the low-concentration area in the Deccan Plateau. The trend is 1.41 µg m−3 per decade up to 2013 and −23.36µg m−3 per decade from 2014. The variation in regional PM2.5 concentrations is closely related to the implementation of air quality laws and regulations. The daily site-scale PM2.5 concentration dataset from 1959 to 2022 in the Northern Hemisphere is available at the National Tibetan Plateau/Third Pole Environment Data Center (https://doi.org/10.11888/Atmos.tpdc.301127) (Hao et al., 2024).

1 Introduction

Fine particulate matter (PM2.5) refers to particulate matter suspended in air with an aerodynamic diameter of less than 2.5 µm. PM2.5 has various shapes and is composed of complex components, such as inorganic salts (e.g., sulfate, nitrate, and ammonium), as well as organic carbon and elemental carbon, metallic elements, and organic compounds (Chen et al., 2020; Fan et al., 2021). PM2.5 can be emitted directly into the atmosphere (Viana et al., 2008; Zhang et al., 2019) and generated through photochemical reactions and transformations (Guo et al., 2014). PM2.5 exhibits high concentrations near emission sources, which gradually decreases with distance. Due to the smaller size and longer life span compared with coarse particulate matter, PM2.5 can be transported over long distances by atmospheric movements, leading to wide-ranging impacts. Studies indicate that regional transport contributes significantly to local PM2.5 concentration (Wang et al., 2014; Chen et al., 2020).

PM2.5 reduces atmospheric visibility and facilitates the formation of fog and haze conditions (Fan et al., 2021). Direct and indirect effects of PM2.5 on solar radiation in the atmosphere (Albrecht, 1989; Ramanathan et al., 2001; Bergstrom et al., 2007; Chen et al., 2022) alter the energy balance and the number of condensation nuclei, thereby influencing atmospheric circulation and the water cycle (Wang et al., 2012; Liao et al., 2015; Samset et al., 2019; Li et al., 2022).

PM2.5 is also known as respirable particulate matter. Due to its complex composition, PM2.5 may carry toxic substances that can significantly impair human health. The World Health Organization states explicitly that PM2.5 is more harmful than coarse particles, and long-term exposure to high PM2.5 concentrations increases the risk of respiratory diseases, cardiovascular diseases, and lung cancer (Lelieveld et al., 2015), regardless of a country's development status. A Global Burden of Diseases study reveals that exposure to environmental PM2.5 causes thousands of deaths and millions of lung diseases annually (Chafe et al., 2014; Kim et al., 2015; Cohen et al., 2017).

PM2.5 is an important parameter for assessing particulate matter pollution and air quality (Wang et al., 2012). PM2.5 can lead to soil acidification, water pollution, disruption of plant respiration, and ecological degradation (Wu and Zhang, 2018; Liu et al., 2019). Due to globalization and economic integration, preventing and controlling particulate matter pollution is a challenge at city, country, and global scales.

Therefore, long-term PM2.5 concentration data are needed for studies on the environment, human health, and climate change. At present, ground-based measurements, chemical models, and estimations of alternatives are the primary sources of PM2.5 concentration data.

Ground-based measurements are the most effective means of measuring PM2.5 concentration. PM2.5 monitoring has been ongoing since the 1990s in North America and Europe (Van Donkelaar et al., 2010), and large-scale PM2.5 monitoring has been implemented in other regions since 2000, including China in 2013 (Liu et al., 2017). As a result, the records for PM2.5 concentration are short, with only a few years of data available in many countries. The scarcity of PM2.5 measurements makes it challenging to provide long-term historical data for research.

Many studies have employed statistical methods and machine learning and deep learning methods to estimate PM2.5 concentrations based on aerosol optical depth. Van Donkelaar et al. (2021) utilized satellite aerosol optical depth data, aerosol vertical structure of chemical transport models, and ground-level measurements to estimate monthly PM2.5 concentrations and their uncertainties over global land from 1998 to 2019, and there are several related studies (Van Donkelaar et al., 2010; Boys et al., 2014; Van Donkelaar et al., 2015, 2016; Hammer et al., 2020). Many studies have been conducted at the regional scale, such as in the United States (Beckerman et al., 2013), China (Wei et al., 2019b; Xue et al., 2019; Wei et al., 2020; He et al., 2021; Wei et al., 2021), and India (Mandal et al., 2020). Although the PM2.5 concentrations derived from satellite retrievals have high spatial coverage, there are some limitations that need to be considered. Aerosol optical depth describes the column properties of aerosol, while PM2.5 concentration describes the near-surface properties of aerosol. Therefore, aerosol vertical structure is crucial in establishing the relationship between the two. The daily representativeness is also considerable, as PM2.5 concentration is continuously monitored, while the daily frequency of satellite observations is low (one to two times). Surface types, cloud conditions (Wei et al., 2019a), and resolution (Nagaraja Rao et al., 1989; Hsu et al., 2017) affect the accuracy of satellite products, thereby increasing uncertainty of estimation of PM2.5 concentration.

Reanalysis datasets provide estimates of long-term particulate matter concentrations. The Modern-Era Retrospective Analysis for Research and Applications version 2 (MERRA-2) is an excellent reanalysis dataset from NASA that uses the Goddard Earth Observing System version 5 (GEOS-5). It has been providing global PM2.5 data since 1980 (Buchard et al., 2015, 2016, 2017; Gelaro et al., 2017; Sun et al., 2019). There are some emission inventories in the aerosol model, including volcanic material; monthly biomass burning from 1980 to 1996; monthly SO2, SO4, particulate organic matter (POM), and black carbon (BC) from 1997 to 2009; annual anthropogenic SO2 between 100 and 500 m above the surface from 1980 to 2008; and annual anthropogenic SO4, BC, and POM concentrations from 1980 to 2006. In assimilation systems, satellite aerosol products from MISR and MODIS Aqua/Terra are assimilated after 2000. Another reanalysis dataset is the Copernicus Atmosphere Monitoring Service (CAMS) global reanalysis, which is a global reanalysis dataset of the atmospheric composition produced by the European Centre for Medium-Range Weather Forecasts (ECMWF). It has provided PM2.5 data since 2003 (Che et al., 2014; Inness et al., 2019). Although reanalysis provides long-term PM2.5 data, the uncertainty in emission inventories increases the uncertainty in PM2.5 concentration (Granier et al., 2011). The validation of the reanalysis based on emission inventories shows that PM2.5 concentration is still overestimated or underestimated in some regions (Buchard et al., 2017; Ali et al., 2022; Jin et al., 2022). The assimilation of aerosol optical depth products improves the aerosol column properties (Buchard et al., 2017), thereby improving the estimation of surface PM2.5 concentration, as it to some extent constrains the vertical structure of aerosols. However, the lack of high-spatiotemporal-resolution emission inventories and long-term assimilation data greatly limits the accuracy of surface PM2.5 concentrations.

Another alternative for estimating PM2.5 concentrations is the near-surface atmospheric horizontal visibility, which is the maximum distance at which observers with normal visual acuity can discern target contours under current weather conditions. In addition to manual observations, automated visibility measurement has been implemented early, typically relying on the aerosol scattering principle (Wang et al., 2009; Zhang et al., 2020). Both visibility and PM2.5 concentration are measurements of near-surface aerosols. They describe atmospheric horizontal transparency and are used to describe atmospheric pollution. Long-term visibility records have been used to quantify long-term aerosol properties (Molnár et al., 2008; Wang et al., 2009; Zhang et al., 2017, 2020). Visibility observation stations are densely distributed across the world. Compared to satellite retrievals, visibility observations have longer historical records dating back to the early 20th century (Boers et al., 2015), are not affected by cloud interference, and provide continuous measurements.

Visibility has been used as a proxy for PM2.5 concentration (Huang et al., 2009) and for the estimation of PM2.5 concentration (Liu et al., 2017; Li et al., 2020; Singh et al., 2020). Singh et al. (2020) analyzed air quality in east Africa from 1974 to 2018 using visibility data. Liu et al. (2017) developed a statistical model and utilized ground-level visibility data to estimate long-term PM2.5 concentrations in China from 1957 to 1964 and 1973 to 2014. Gui et al. (2020) proposed a method to establish a virtual ground observation network for PM2.5 concentration in China using extreme gradient boosting modeling in 2018. Zeng et al. (2021) used LightGBM to establish a virtual network for hourly PM2.5 concentrations in China in 2017. Zhong et al. (2021, 2022) used LightGBM to predict 6 h PM2.5 concentrations based on visibility, temperature, and relative humidity in China from 1960 to 2020. Meng et al. (2018) utilized a random forest model to estimate the daily PM2.5 components in the United States from 2005 to 2015. These studies have provided various methods for estimating PM2.5 using visibility data. However, some have only focused on methodological innovations without providing long-term trends in PM2.5 concentration. Other studies offer long-term trends, but the primary focus is at an urban or national scale. There are few studies on long-term and high-temporal-resolution PM2.5 concentration at the global scale or across different countries.

This study uses a convenient, accurate, and easily understandable machine learning approach to estimate daily PM2.5 concentrations based on visibility at 5023 land-based sites from 1959 to 2022. First, we build a machine learning model and then analyze the importance of the variables. Second, we evaluate the model's performance and predictive ability. Third, we discuss the errors and limitations of the dataset. Fourth, we compare the estimated PM2.5 concentration with the other dataset. Finally, we analyze the long-term trends and spatial patterns of PM2.5 concentration in different regions. We hope the PM2.5 dataset will provide support for the atmospheric environment, human health, and climate change studies.

2 Data and methods

2.1 Study area

The study area is the Northern Hemisphere. Figure 1 shows the distributions of visibility stations (a) and the PM2.5 monitoring sites (b). Table 1 lists information of stations such as the number and time span in each region. The number of visibility stations and PM2.5 monitoring sites is 5023. Due to the establishment of a PM2.5 monitoring network related to national or regional development, the record length and distribution of PM2.5 observation are uneven. In this study, the site-scale PM2.5 observations are met in at least 3 years. These sites are densely populated in North America, east and south Asia, and Europe and are very sparse in regions such as Africa, South America, and west Asia.

https://essd.copernicus.org/articles/16/4051/2024/essd-16-4051-2024-f01

Figure 1Study area and distributions of visibility stations (a) and PM2.5 monitoring sites (b). The color of marker (circle) represents the year number of visibility observations and PM2.5 concentration observations.

Table 1Data summary.

* Last access: 30 August 2024.

Download Print Version | Download XLSX

2.2 PM2.5 data

2.2.1 PM2.5 data in the United States

The hourly PM2.5 concentration data for the United States from 1998 to 2022 are sourced from the Air Quality System (AQS) and are available at https://www.epa.gov/aqs (last access: 30 August 2024). The AQS provides PM2.5 mass monitoring and routine chemical speciation data and contains other ambient air pollution data collected by the U.S. Environmental Protection Agency (EPA) and state, local, and tribal air pollution control agencies from thousands of monitors, comprising the federal reference method (FRM) and the federal equivalent method (FEM). The primary purpose of both methods is to assess compliance with the PM2.5 National Ambient Air Quality Standards (NAAQS). FRMs include in-stack particulate filtration, and FEMs include beta-attenuation monitoring, very sharply cut cyclones, and tapered element oscillating microbalances (TEOMs). The measurement precision is ± (1–2) µg m−3 (hour) (Hall and Gilliam, 2016). The TEOM and beta attenuation are automatic and near-real-time monitoring methods. The TEOM, which is based on gravity, measures the mass of particles collected on filters by monitoring the frequency changes in tapered elements. The beta-attenuation method uses beta-ray attenuation and particle mass to measure the PM2.5 concentration. In this study, we use two PM2.5 measurement methods, FRM/FEM (88101) and non-FRM/FEM (88502). The 88502 monitors are “FRM-like” but are not used for regulatory purposes. Both the 88101 and 88502 monitors are used for reporting daily air quality index values.

2.2.2 PM2.5 data in Canada

The hourly PM2.5 concentration data for Canada from 1995 to 2022 are sourced from the National Air Pollution Surveillance (NAPS) program and are available at https://www.canada.ca (last access: 30 August 2024). The NAPS program is a collaborative effort between Environment and Climate Change Canada and provincial, territorial, and regional governments and is the primary source of environmental air quality data. Since 1984, PM2.5 concentrations have been measured in Canada using a dichotomous sampler. Continuous or real-time particle monitoring began in the NAPS network in 1995 using TEOMs and beta-attenuation monitoring (Demerjian, 2000). The samples are supplemented by U.S. EPA (FRM) samples obtained after 2009 (Dabek-Zlotorzynska et al., 2011).

2.2.3 PM2.5 data in Europe

The hourly PM2.5 concentration data for Europe from 1998 to 2012 are obtained from the AirBase database, which is available at https://european-union.europa.eu (last access: 30 August 2024). The hourly PM2.5 concentration data (E1a) from 2013 to 2022 are obtained from the AirQuality database, which is available at https://www.eea.europa.eu (last access: 30 August 2024). AirBase is maintained by the European Environment Agency (EEA) through its European Topic Center on Air Pollution and Climate Change Mitigation. AirBase contains air quality monitoring data and information submitted by participating countries throughout Europe. After the Air Quality Directive 2008/50/EC was enforced, the PM2.5 concentration data began to be stored in the AirQuality database. The main monitoring methods for PM2.5 concentration include TEOMs and beta attenuation (Green and Fuller, 2006; Chow et al., 2008). The sites are distributed across rural, rural–near city, rural–regional, rural–remote, suburban, and urban areas.

2.2.4 PM2.5 data in China

The hourly PM2.5 concentration data for China from 2014 to 2022 are obtained from the China National Environmental Monitoring Center and are available at https://www.cnemc.cn (last access: 30 August 2024). The continuous monitoring of PM2.5 nationwide began in 2013, and PM2.5 concentration data are available to the public (Su et al., 2022). There were about 2000 air quality observation sites in 2022. PM2.5 concentrations are measured using the TEOM and beta-attenuation method (S. Zhao et al., 2016; Miao and Liu, 2019). According to the China Environmental Protection Standards, instrument maintenance, data transmission, data assurance, and quality control ensure the reliability of PM2.5 concentration measurements. The uncertainty in the PM2.5 concentration is <5µg m−3 (Pui et al., 2014).

2.2.5 PM2.5 data in India

The hourly PM2.5 concentration data for India from 2010 to 2022 are obtained from the Central Pollution Control Board (CPCB) and are available at https://www.kaggle.com/datasets/abhisheksjha/time-series-air-quality-data-of-india-2010-2023 (last access: 30 August 2024). The Air Prevention and Control of Pollution Act of 1981 is enacted by the Central Pollution Control Board (CPCB) of the Ministry of Environment, Forest and Climate Change (MoEFCC). The National Air Quality Monitoring Programme (NQAMP) is a key air quality monitoring program employed by the government of India that is managed by the CPCB in coordination with the State Pollution Control Boards (SPCBs) and UT Pollution Control Committees (PCCs). A standard of 60 µg m−3 PM2.5 concentration over 24 h was added in 2009. The methods used by the Indian National Ambient Air Quality Standards (NAAQS) for PM2.5 concentration and related component measurements include the FRM and FEM (Pant et al., 2019). The measurement precision is ± (1–2) µg m−3 (hour).

2.2.6 PM2.5 data in other regions

The hourly PM2.5 concentration data of other regions from 2016 to 2022 are from OpenAQ (https://openaq.org, last access: 30 August 2024), which is a nonprofit organization providing air quality data. These air quality data are collected from environmental protection departments and other departments around the world without any processing; therefore they have good accuracy. The PM2.5 concentrations are usually measured by the TEOM and beta-attenuation method and have been used for scientific research (Jin et al., 2022; Tan et al., 2022).

2.3 Visibility and meteorological data

The hourly visibility and meteorological data are from the Integrated Surface Database (ISD) (Smith et al., 2011), which is a global database that consists of hourly and synoptic surface observations and is archived at the NOAA's National Centers for Environmental Information (NCEI), available at https://www.ncei.noaa.gov/products/land-based-station/integrated-surface-database (last access: 30 August 2024). The ISD integrates data from more than 100 original data sources, incorporates data from over 35 000 stations around the world, and includes observation data dating back to 1901. The strict quality control algorithms are used to ensure data quality by checking data format, extreme values and limits, consistency between parameters, and continuity between observations. Detailed information about the quality control is available at http://www.ncei.noaa.gov/pub/data/inventories/ish-qc.pdf (last access: 30 August 2024). The best spatial coverage of stations is evident in North America, Europe, Australia, and parts of Asia, and the coverage in the Northern Hemisphere is better than the Southern Hemisphere.

Visibility and meteorological records are filtered by the geophysical report type code. The codes FM-12 and FM-15 are selected. The FM-12 code represents the report being from the Surface Synoptic Observations (SYNOP) report, which is a coding system developed by the World Meteorological Organization (WMO) for reporting observation data from ground meteorological stations. The FM-15 code represents the report being from the Meteorological Terminal Aviation Routine Weather Report (METAR), providing weather information at the airport and its surrounding areas. The format and content of the METAR are consistent globally and comply with WMO's international meteorological observation and reporting standards. The frequency of the SYNOP report is generally every 3 or 6 h, and the frequency of the METAR is usually once per hour.

In this study, visibility is an essential variable for PM2.5 concentration. The reciprocal of visibility is directly proportional to the aerosol extinction coefficient, which is closely related to the PM2.5 concentration (Wang et al., 2009, 2012). Considering that temperature, wind speed, humidity, and precipitation are factors that impact particle dispersion, particle growth, and secondary generation (Zhang et al., 2020), temperature, dew point temperature, wind speed, and precipitation are selected.

2.4 Data preprocessing

When processing the visibility and meteorological variables, we use some screening conditions from previous studies (Husar et al., 2000; Wang et al., 2009; Li et al., 2016; Zhong et al., 2021). We remove the records with missing visibility, temperature, dew point temperature, wind speed, and hourly precipitation greater than 0.1 mm. Relative humidity is calculated using the Goff–Gratch formula (Goff, 1957). When relative humidity is greater than 90 %, the record is removed to reduce the influence of fog, even precipitation. In high-latitude regions, the low-visibility records caused by ice fog and snow are removed when the temperature is less than −29 °C and the wind speed is greater than 16 km h−1. Since PM2.5 exhibits hygroscopic growth, dry visibility is calculated when relative humidity is between 30 % and 90 % (Yang et al., 2021).

(1) VISD = VIS / ( 026 + 04285 × log 100 - RH ) ,

where VIS is the visibility, RH is the relative humidity, and VISD is the dry visibility.

For a single visibility site, there should be at least five non-repetitive visibility values and at least three valid records per day. The upper limit of visibility is set to the 99 % percentile of visibility (Li et al., 2016). The harmonic mean is used to calculate the daily VIS and VISD because it can better capture rapid weather changes and enhance daily representativeness. The arithmetic mean is used for other variables.

The maximum hourly PM2.5 concentration is set to 1000 µg m−3. The daily PM2.5 concentration needs at least 3-hourly records. We select the PM2.5 monitoring sites with a condition of at least 3-year continuous monitoring. The distribution of PM2.5 sites is shown in Fig. 1, and the details are shown in Table 1.

The spatial matching between the PM2.5 site and visibility station adopts the nearest principle, and the upper limit of distance is set to 100 km. Experiments show that the upper limit of distance has little effect on model training and prediction, but when the upper limit is small, the number of site pairs significantly decreases, especially in Asia. Matched visibility stations are not used again. To match more PM2.5 monitoring sites, we construct a “virtual” visibility station, whose variables are established by the average of variables of the two nearest visibility stations.

We merge daily PM2.5 concentration and visibility and other meteorological variables. We have adopted two matching methods: (1) merge at the hourly scale first and then calculate the daily mean, and (2) calculate the daily mean first and then match them. The results of two methods have no impact on the training of the model, but there are differences in the predicted results. Since SNOPY's visibility is not continuously observed hourly, we select the second method to merge PM2.5 concentration and visibility data on the daily scale to improve the daily representativeness of estimated PM2.5 concentration.

2.5 PM2.5 data for comparison

The Long-term Gap-free High-resolution air Pollutants (LGHAP) dataset provides daily PM2.5 concentrations from 2000 to 2021 over global land, with a 1 km grid resolution, and is available at https://zenodo.org/communities/ecnu_lghap (last access: 30 August 2024). The PM2.5 concentration is estimated using aerosol optical depth and other factors such as geographic location, land cover type, climate zone, and population density, based on a deep learning approach, termed the scene-aware ensemble learning graph attention network. The correlation coefficient with ground-based measurements is 0.95, and the RMSE is 5.7 µg m−3 (Bai et al., 2024). This dataset provides global PM2.5 concentration with a high spatiotemporal resolution.

For most regions in the Northern Hemisphere, except for North America and Europe, the duration of continuous monitoring PM2.5 concentration data is relatively short, making it difficult to evaluate historical PM2.5 concentration. For example, the PM2.5 monitoring network in China was implemented from the end of 2012, resulting in the inability to verify the PM2.5 concentrations before 2012. Therefore, we compare our data with the LGHAP PM2.5 concentration to evaluate the predictive ability of the model and the consistency of our data on the temporal scale.

2.6 Decision tree regression

We employ decision tree regression (Teixeira, 2004) to estimate daily PM2.5 concentrations. The key to decision tree regression is to find the optimal split variable and optimal split point. The optimal split point of the predictor is determined by the minimum mean square error, which determines the optimal tree structure. Decision tree regression is a commonly used nonlinear machine learning method that partitions the feature space based on the mapping between feature attributes and response values, with each leaf node representing a specific output for each feature space region. Its ability to handle complex relationships with relatively few model parameters is advantageous, minimizing the risk of overfitting and enabling the prediction of continuous and categorical predictive variables.

The sample data include the predictor and response. The predictor is composed of nine variables: the reciprocal of dry visibility (Vis_Dry_In), the reciprocal of visibility (Vis_In), temperature (Temp), dew point temperature (Td), temperaturedew point difference (Temp-Td), relative humidity (RH), wind speed (WS), wind numerical time (DateTime), and daily record number (DailyObsNum). Both visibility and meteorological variables are daily means. The response variable is the daily monitored PM2.5 concentration.

For each site, we sort the sample data by time, with the first 80 % being the training set and the last 20 % being the test set. Due to the inconsistent sample length among different sites, this approach is appropriate for sites with small sample sizes (such as only 3-year observations). We use a 10-fold cross-validation method (Browne, 2000) to train the model. The test set is used to evaluate the predictive ability of the model.

2.7 Evaluation metrics

2.7.1 Statistical metrics

We use the root mean square error (RMSE), mean absolute error (MAE), and correlation coefficient (ρ) as evaluation metrics to evaluate the model's performance and predictive ability. The formulas are given as follows:

(2)MSE=1ni=1nyi-y^i2(3)MAE=1ni=1nyi-y^i(4)ρ=i=1n(yi-y)(y^i-y^)sqrti=1n(yi-y)2i=1n(y^i-y^)2,

where yi and y are the predicted value and the average of the predicted values; y^i and y^ are the target and the average of the target; and i=1,2,,n, where n is the length of the sample.

2.7.2 Partial dependence

The importance of predictor variables is assessed via partial dependence. Partial dependence represents the relationship between the individual predictive variable and the predicted response (Friedman, 2001). By marginalizing the other variables, the expected response of the predicted variable is calculated. All the partial dependences of the predicted response on the subset of predicted variables are calculated. The calculation process of the partial dependency method is described next.

The dataset of the predictor is X, X=[X1,X2,,Xn], and n represents the number of predictive factors. The complement of subset Xs is Xc, where Xs is a single variable in X, and Xc is all other variables in X. The predicted response f(x) depends on all variables in X, and it is expressed as follows:

(5) f ( x ) = f ( X s , X c ) .

The partial dependence of the predicted response to Xs is expressed as follows:

(6) f s X s = f X s , X c pC X c d X c ,

where pC(Xc) is the marginal probability of Xc; that is, pCXcfXs,XcdXs. Assuming that the likelihood of each observation is equal, the dependence between Xs and Xc and the interactions of Xs and Xc in the response are not strong. The partial dependence is shown as

(7) f s X s 1 N i = 1 N f X s , X i s ,

where N is the number of observations, and i represents the ith observation.

2.7.3 Generalized additive mixed model

The generalized additive mixed model (GAMM) originates from two independent yet complementary statistical methods: the generalized additive model (GAM) and mixed-effects models. GAM was introduced by Trevor Hastie and Robert Tibshirani in the 1980s (Hastie and Tibshirani, 1987). GAM employs smooth functions (such as splines) to replace linear terms in traditional regression, capturing nonlinear relationships between response and explanatory variables. The primary aim of GAM is to enhance model flexibility, allowing the data to determine the form of the nonlinear relationships rather than pre-specifying them. The mixed-effects model includes both fixed and random effects, enabling the analysis of hierarchical and correlated data (Verbeke and Lesaffre, 1996). Fixed effects apply to the entire sample, whereas random effects account for variations within individuals or groups, explaining data correlation and variability. GAMM represents the evolution of statistical models from linear to nonlinear, from simple to complex, and from single effects to mixed effects. GAMM has been widely applied in various fields, such as ecology, climate, and air pollution, becoming an essential tool for studying complex nonlinear relationships and hierarchical data (Park et al., 2013; Polansky and Robbins, 2013; Chang et al., 2017; Ravindra et al., 2019).

The relationship between PM2.5 concentrations and time (e.g., months, seasons) is typically nonlinear and exhibits seasonal variation. GAMM uses smooth functions (such as splines) to capture the nonlinear variations and model the periodic features with cyclical smooth functions. Interannual variations in PM2.5 concentrations can also be captured using smooth functions. Due to the inherent autocorrelation in time series, GAMM effectively handles the autocorrelation by incorporating time-related smooth functions or random effects, thereby enhancing the model accuracy. PM2.5 concentrations from neighboring locations often exhibit spatial correlation. GAMM can address this spatial correlation by introducing spatially correlated smooth functions or random effects. Therefore, it is also suitable for spatial variations, especially when the spatial distribution of site observations is uneven.

Based on the GAMM, the PM2.5 concentration y(i,t) at site i and time t can be expressed as

(8) y ( i , t ) = x β + f ( ) + b ( i , t ) + ε ( i , t ) .

The following is an explanation of the expression and parameter settings.

Linear terms. xβ includes the terms of site elevation and the overall mean PM2.5 concentration, where x is the vector of explanatory variables, and β is a coefficient vector.

Smooth terms. f(⋅) can be decomposed into three individual smooth terms, i.e., the seasonal smooth term, interannual smooth term, and spatial smooth term, as shown in Eq. (9).

(9) f ( ) = f ( month ) + f ( year ) + f ( spatial )

They are composed of linear combinations using spline basis functions. The seasonal smooth term is a function of the month. The smooth function is the penalized regression cyclic cubic splines (assumed with periodic nature) (Wood et al., 2016), and the knot number is 12. The interannual smooth term is a function of year. The smooth function is the penalized regression cubic splines (Wood et al., 2016), and the knot number is 64. The spatial smooth term is a function of longitude and latitude. The smooth function is the Gaussian-process-penalized regression splines (Kammann and Wand, 2003), and the knot number is 80. In this study, they are used to describe the regional long-term PM2.5 concentration annual cycle, interannual trends, and spatial distribution, respectively.

The term of station-specific effects, b(i,t), is a random effect term to describe the differences between observation sites, based on the assumption that observations are independent.

The residual noise term ε(i,t) is a first-order autoregressive term.

More explanations about GAMM are detailed in the R package mgcv. Some studies also provide an introduction and selection of parameters (Polansky and Robbins, 2013; Chang et al., 2017; Ravindra et al., 2019).

3 Results and discussion

3.1 Evaluation of variable importance

We evaluate the contribution of each variable to the response by partial dependence. The variable with the highest partial dependence value is the most important variable in the model. Figure 2a shows the proportion of the most important variables for all sites, and Fig. 2b shows the ranking of the importance of all variables. The reciprocal of dry visibility is the most important variable at 65.8 % of sites, and the reciprocal of visibility is the second-most-important variable at 14.9 % of sites. The contribution of meteorological variables ranges from 2.1 % to 6.6 %. The time variable contributes 1.7 %. The lowest contribution is the daily number of visibility record at only 0.9 % because it is only a variable that describes the daily representativeness of visibility. It also indicates that daily visibility has high daily representativeness (under the conditions of at least 3-hourly records).

The PM2.5 concentration level varies spatially, which is related to regional geographical environment, climate, and air quality laws and regulations. Therefore, we analyze the importance of variables in different regions, as shown in Fig. 2c–h. The two most important variables are still reciprocal of dry visibility and reciprocal of visibility, with a proportion of 73.1 % in the United States, 77.5 % in Canada, 80.8 % in Europe, 98.8 % in China, and 60.2 % in India. It indicates that PM2.5 concentration is the most significantly correlated with visibility in China. The contribution of meteorological variables is significantly higher in the United States and India than in other regions. It indicates that meteorological conditions have a significant contribution to PM2.5 concentration in these regions, which may be related to the formation mechanism and transport of particulate matter.

The above results indicate a strong correlation between the PM2.5 concentration and visibility, as visibility can be considered an indicator of air quality without fog or precipitation. Meteorological factors play secondary roles, which influence the formation, dispersion, and deposition of PM2.5 (Gui et al., 2020; Zhong et al., 2022). Although the number of daily records and the time have the most negligible impacts on the PM2.5 concentration in the model, they have significant impacts on the cyclical changes and daily representativeness of PM2.5 concentration (Wang et al., 2012; Zhang et al., 2020).

https://essd.copernicus.org/articles/16/4051/2024/essd-16-4051-2024-f02

Figure 2The most important variable (a) and the ranking (b) at all sites. The most important variable in each region (c–h). The stacked bar shows the importance rankings of the variables (“rank = 1” represents the most important variable). The bar shows the proportion of the most important variable. The variables are the reciprocal of dry visibility (Vis_Dry_In), reciprocal of visibility (Vis_In), temperature (Temp), dew point temperature (Td), temperaturedew point difference (Temp-Td), relative humidity (RH), wind speed (WS), numerical time (DateTime), and daily number of visibility record (DailyObsNum).

Download

3.2 Evaluation of model performance

We analyze the linear regression relationship between all estimated and corresponding response values to evaluate the model's performance. Figure 3 is the density scatter plot of the monitored PM2.5 concentration (response values) and the estimated PM2.5 concentration (estimated values). There is a total of 8 031 473 data pairs for all the sites. The linear regression slope (95 % confidence interval) is 0.955 [0.955, 0.955], the R2 is 0.95, the RMSE is 7.2 µg m−3, and the MAE is 3.2 µg m−3.

https://essd.copernicus.org/articles/16/4051/2024/essd-16-4051-2024-f03

Figure 3Density scatter plot (a) between estimated PM2.5 concentration and monitored PM2.5 concentration. The dashed black line is the linear regression line. N is the length of the data pairs, and slope is the linear regression coefficient within a 95 % confidence interval (CI). R2 is the coefficient of determination, RMSE is the root mean square error, and MAE is the mean absolute error.

Download

Figure 4a–c show the spatial distribution (a–c) and frequency of training of RMSE, MAE, and ρ. Table 2 lists the model's performance metrics in the United States, Canada, Europe, China, and India. For all sites, the average RMSE is 6.92 µg m−3, with a median of 4.76 µg m−3. The RMSE of 80 % of the sites is less than 10.01 µg m−3. The RRMSE (the ratio of RMSE to the mean of PM2.5 concentration) is 28.7 %. The MAE is 3.77 µg m−3, with a median of 2.72 µg m−3. The MAE is less than 5.66 µg m−3 for 80 % of the sites. The RMAE (the ratio of MAE to the mean of the PM2.5 concentration) is 15.4 %. The average ρ is 0.91, and the median is 0.92. The ρ of 80 % of the sites is greater than 0.87. Previous studies have shown that for PM2.5 concentration retrieved from daily visibility or satellite aerosol optical depth, the R2 range of the model is from 0.42 to 0.89, and the RMSE range is from 9.59 to 32.09 µg m−3 (Shen et al., 2016; Liu et al., 2017; Wei et al., 2019b; Gui et al., 2020; Li et al., 2021; Zhong et al., 2021). This finding indicates that our model performs well at the daily scale.

On the regional scale, the RMSE values for the United States, Canada, Europe, China, and India are 3.10, 2.78, 4.92, 9.65, and 17.46 µg m−3, respectively, and the RRMSE values are 34.9 %, 40.4 %, 29.8 %, 23.1 %, and 28.8 %, respectively. The MAEs for the United States, Canada, Europe, China, and India are 1.61, 1.35, 2.54, 5.47, and 9.13 µg m−3, respectively. The RMAEs are 17.9 %, 19.5 %, 16.3 %, 13.1 %, and 14.4 %, respectively. The ρ values for the United States, Canada, Europe, China, and India are 0.87, 0.88, 0.91, 0.94, and 0.92, respectively. The correlation coefficients are higher in China and India and lower in the United States and Canada.

The largest RMSE and MAE are in India, and the smallest are in Canada. The RRMSE and RMAE are larger in the United States, Canada, and Europe than in China and India and other regions.

Table 2The metrics for all sites and sites in the United States, Canada, Europe, China, and India. RRMSE is the ratio of the RMSE to the mean of PM2.5 concentration (in %). RMAE is the ratio of the MAE to the mean of PM2.5 concentration.

Download Print Version | Download XLSX

https://essd.copernicus.org/articles/16/4051/2024/essd-16-4051-2024-f04

Figure 4Statistical metrics' distribution of training (a, b, c) and test (d, e, f) data. The bar is the frequency of sites. RMSE is the root mean square error, MAE is the mean absolute error, and ρ is the correlation coefficient.

3.3 Evaluation of model's predictive ability

A total of 1 911 183 pairs of test data are employed to evaluate the model's predictive ability. Figure 5 is the density scatter plot between the predicted PM2.5 concentration and the test PM2.5 concentration. The linear regression slope (95 % CI) is 0.864 [0.863, 0.865], R2 is 0.79, RMSE is 14.8 µg m−3, and MAE is 7.6 µg m−3. Previous studies have shown that the R2 range of the model's predictive results at the daily scale is 0.31–0.84, and the RMSE range is 13.8–29.0 µg m−3 (Gui et al., 2020; Zhong et al., 2021). The test results exhibit excellent predictive capability.

https://essd.copernicus.org/articles/16/4051/2024/essd-16-4051-2024-f05

Figure 5Density scatter plot (a) between the predicted PM2.5 concentration and monitored PM2.5 concentration of the test results. The dashed black line is the linear regression line. N is the length of the data pairs, and slope is the linear regression coefficient within a 95 % confidence interval (CI). R2 is the coefficient of determination, RMSE is the root mean square error, and MAE is the mean absolute error.

Download

We analyze the test results for Canada, the United States, Europe, China, and India to assess the predictive ability of the model in different regions. Figure 4d–f show the spatial distributions of the test RMSE, MAE, and ρ and their frequency. Table 3 lists the test results of the metrics. For all sites, the average RMSE is 11.50 µg m−3. The RRMSE is 46.0 %. The average MAE is 7.72 µg m−3. The RMAE is 30.7 %. The ρ is 0.81. For the United States, the RMSE, MAE, and ρ are 5.06, 3.25 µg m−3, and 0.72, respectively. For Canada, the RMSE, MAE, and ρ are 4.73, 2.88 µg m−3, and 0.77, respectively. The results in the United States and Canada are better in the west than in the east. The RMSE, MAE, and ρ for Europe are 7.79, 5.10 µg m−3, and 0.80, respectively. For China, the RMSE, MAE, and ρ are 16.83, 11.50 µg m−3, and 0.85, respectively. For India, the RMSE, MAE, and ρ are 27.05, 17.89 µg m−3, and 0.85, respectively. The results show that in developing regions (China and India), ρ is better than that in developed regions (the United States, Canada, and Europe), which means that the predictive ability of the model is better for severely polluted regions.

Table 3The test results of the model's performance metrics for all sites and sites in the United States, Canada, Europe, China, and India. RRMSE is the ratio of the RMSE to the mean of PM2.5 concentration (in %). RMAE is the ratio of the MAE to the mean of PM2.5 concentration (in %).

Download Print Version | Download XLSX

3.4 Uncertainties and limitations

3.4.1 Uncertainty in the pollution level

Figure 6 shows the uncertainty in the predicted PM2.5 concentration with respect to the pollution level of the monitored PM2.5 concentration. For all sites, the uncertainty in the bias increases as the pollution level increases. The mean and median of the bias shift from positive to negative with increasing pollution levels; 83.6 % of PM2.5 concentration data is less than 45 µg m−3, and the mean bias (<0.8µg m−3) is positive; 36.8 % is less than 10 µg m−3, and the median (<0.4µg m−3) of the bias is positive; 16.4 % of PM2.5 concentration is greater than 45 µg m−3, and the mean bias is negative; and 63.2 % of PM2.5 concentration is greater than 10 µg m−3, and the median is negative. It indicates that the model overestimates at low pollution levels and underestimates at high pollution levels.

The bias for each region also increases with pollution level. For the United States, the mean bias of 69.4 % is positive and less than 0.8 µg m−3, and the PM2.5 concentration is less than 10 µg m−3. When the PM2.5 concentration is greater than 10 µg m−3, the mean bias is negative. For Canada, the mean bias of 74.1 % is positive and less than 0.7 µg m−3. When the PM2.5 concentration is greater than 8 µg m−3, the mean bias is negative. For Europe, the mean bias of 67.1 % is positive and less than 0.9 µg m−3. When the PM2.5 concentration is greater than 15 µg m−3, the mean bias is negative. For China, 67.7 % of the bias is positive and less than 2.7 µg m−3. When the PM2.5 concentration is greater than 45 µg m−3, the mean bias is negative. For India, 80.1 % of the bias is positive and less than 4.2 µg m−3, and when the PM2.5 concentration is greater than 100 µg m−3, the mean bias is negative. When the PM2.5 concentration is greater than 60 µg m−3, the bias median is negative, with a percentage of 40.3 %. The uncertainty in each region is similar, and the uncertainty increases as the pollution level increases.

https://essd.copernicus.org/articles/16/4051/2024/essd-16-4051-2024-f06

Figure 6Box plots of the pollution level and bias (predicted PM2.5 concentration – monitored PM2.5 concentration) for all sites (a) and sites in the United States (b), Canada (c), Europe (d), China (e), and India (f). The box's upper and lower limits represent ±1 standard deviation, the whiskers represent 2 times the standard deviation, the red circle represents the median, and the short line represents the mean bias. The frequency (%) on the right-hand y axis represents the percentage of data with different pollution levels (dashed line).

Download

3.4.2 Uncertainty in the station elevation

With the spatial variability in PM2.5 concentration, we analyze the mean bias at different visibility station elevations. Figure 7 shows the relationships between the elevations of the visibility stations and the bias. The bias exhibits variations across different elevations for all stations. The mean bias of all sites ranges from −0.04 to 0.02 µg m−3. A total of 90.1 % of the stations have positive biases. The median of the bias is almost positive, with a positive bias of 99.5 % stations, except for the elevation at 4 km. The elevations of 86.5 % of the stations are less than 1 km, with a positive median of the bias. High uncertainties in bias occur at elevations of 0.05, 0.2, and 0.3 km. Negative biases are observed at elevations of 0.4, 0.9–1, and 4 km. This finding indicates a nonsignificant overestimation of the predicted PM2.5 concentration due to the various elevations.

The bias patterns vary across regions. For the United States, a total of 88.8 % of the stations have negative biases. The median of the bias is negative with a percentage of 63.4 %. High uncertainties in bias occur at elevations of 0.05, 2, and 0.3 km. For Canada, 52.3 % of the stations have positive biases. The median of the bias is negative with a percentage of 33.8 %. High uncertainties in bias occur at elevations of 0.05 and 1 km. For Europe, 58.9 % of the stations have positive biases. The median of the bias is negative with a percentage of 40.2 %. High uncertainties in bias occur at elevations of 0.05 and 0.9 km. For China, 76.7 % of the stations have negative biases. The median of the bias is negative with a percentage of 54.1 %. High uncertainties in bias occur at elevations of 0.05, 0.5, and 3 km. For India, 68.1 % of the stations have positive biases. The median of the bias is negative with a percentage of 63.8 %. The elevation of most stations with a high uncertainty is at 0.05 km. High uncertainties in bias occur at elevations of 0.1 and 3 km. More stations with negative bias are in the United States and China. More stations with positive bias are in Canada, Europe, and India.

https://essd.copernicus.org/articles/16/4051/2024/essd-16-4051-2024-f07

Figure 7Box plots of the bias (predicted PM2.5 concentration–monitored PM2.5 concentration) and the elevation of the visibility station for all sites (a) and sites in the United States (b), Canada (c), Europe (d), China (e), and India (f). The box's upper and lower limits represent ±1 standard deviation, the whiskers represent 2 times the standard deviation, the red circle represents the median, and the short line represents the mean bias. The station number (%) on the right-hand y axis represents the percentage of the station number at different elevations (dashed line).

Download

3.4.3 Uncertainty in the station distance

As the visibility stations and PM2.5 sites are not collocated, we analyze the mean bias of PM2.5 concentration at different distances, as shown in Fig. 8. For all sites, 86.1 % of the stations have negative biases. The median of the bias is negative with a percentage of 70.8 %. More stations have a negative bias caused by the distance. The uncertainty has no signification with the distance. The distances with low uncertainties are at 1 and 20–40 km. The distances with high uncertainties are at 5 and 60 km.

For the United States, 63.1 % of the stations have negative biases. The median of the bias is negative with a percentage of 69.2 %. The distance with the lowest uncertainty is at 1 km. The distances with high uncertainties are at 5 and 60 km. For Canada, 60.0 % of the stations have positive biases. The median of the bias is positive with a percentage of 80.0 %. The uncertainty shows an increase with the distance increasing. For Europe, 72.7 % of the stations have negative biases. The median of the bias is positive with a percentage of 67.1 %. When the distance is less than 10 km, the uncertainty increases with the distance. The distances with low uncertainties are at 1 and 30–40 km. The distances with high uncertainties are at 10 and 75 km. For China, 64.3 % of the stations have negative biases. The median of the bias is negative with a percentage of 72.7 %. The distance with a low uncertainty is at 30 km. The distance with a high uncertainty is at 60 km. For India, 62.3 % of the stations have negative biases. The median of the bias is positive with a percentage of 59.1 %. The distance with the lowest uncertainty is at 30 km. The distance with the highest uncertainty is at 20 km.

More visibility stations have negative biases, except for the stations in Canada. For the stations in the United States, Canada, and Europe, the lowest uncertainty is at 1 km. For the stations in China and India, the uncertainty has no significant relationship with distance, though the distance has caused a negative bias.

https://essd.copernicus.org/articles/16/4051/2024/essd-16-4051-2024-f08

Figure 8Box plots of the mean bias (predicted PM2.5 concentration–monitored PM2.5 concentration) and the distance between the visibility station and the PM2.5 site and for all sites (a) and sites in the United States (b), Canada (c), Europe (d), China (e), and India (f). The box's upper and lower limits represent ±1 standard deviation, the whiskers represent 2 times the standard deviation, the red circle represents the median, and the short line represents the mean bias. The station number (%) on the right-hand y axis represents the percentage of the station number at different distances (dashed line).

Download

3.4.4 Discussion on the uncertainties and limitations

There are some uncertainties and limitations in this study. The upper limit of visibility and PM2.5 concentration can cause some uncertainties in model training. The maximum distance between the visibility stations and PM2.5 monitoring sites is 100 km due to the spatial variability in aerosols, which may increase the uncertainty in the estimated PM2.5 concentration. Because of the nonuniform vertical distribution of aerosols, the different elevations of the visibility stations and the PM2.5 monitoring sites further increase the uncertainty in estimating PM2.5 concentration. In addition, the spatial coverage of visibility stations, especially in China and India, is still limited, which may increase the uncertainty in the representativeness of regional PM2.5 concentration and pollution levels. With the increasing human concern of air pollution and the implementation of air pollution control measures, the types of major atmospheric pollutants may have changed at regional scale, the composition of particulate matter has also evolved, the scattering and absorption characteristics may have changed, and the relationship between visibility and PM2.5 concentration may change. These changes may lead to uncertainties in estimating historical PM2.5 concentration. It is challenging to validate data using ground observations and satellite-based estimation prior to 2000. Despite these limitations and challenges, we establish a long-term PM2.5 concentration dataset based on visibility from 1959 to 2022, which has been carefully validated and evaluated, providing insights into the long-term spatiotemporal characteristics of concentration PM2.5 in the Northern Hemisphere.

4 Comparisons with other PM2.5 concentration datasets

We compare the daily and monthly estimated PM2.5 concentration with the LGHAP PM2.5 concentration from 2000 to 2021 to further demonstrate the reliability the estimated PM2.5 concentration. When comparing on the regional scale, we split the time range into 2000–2010 and 2011–2021 to further validate the accuracy and consistency of estimated PM2.5 concentrations, as in some regions such as India and China, there are almost no continuous PM2.5 monitoring data before 2010.

4.1 Comparisons on the daily scale

We spatiotemporally match the LGHAP PM2.5 concentration with the estimated PM2.5 concentration. Figure 9 shows the density scatter plot between the estimated PM2.5 concentration and LGHAP PM2.5 concentration. There is a total of 96 188 682 pairs during the period of 2000 and 2021, 46 846 389 pairs during the period from 2000 to 2010, and 49 342 302 during the period of 2011 and 2021, with slopes of 0.817, 0.758, and 0.867. The intercepts are 6.928, 8.933, and 5.377 µg m−3, respectively. The slope decreases before 2010, which may be related to the upper limit of LGHAP PM2.5 concentration with a significantly decreasing quantity of the concentration (>300µg m−3).

We further compare the PM2.5 concentrations of the annual calendar cycles on the regional scale in Fig. 10. The PM2.5 concentration of each day is the mean of the PM2.5 concentrations at all sites in the region. The correlation coefficients of the PM2.5 concentrations are greater than 0.89 from 2011 to 2021 and greater than 0.92 from 2000 to 2010. The correlation is greater in Europe, China, and India than in the United States and Canada. There is no significant difference in the variation of annual calendar cycles between two periods on the regional scale. In the United States, PM2.5 concentration between 2000 and 2010 is more similar than the concentration between 2011 and 2021, and the bias decreases. In Canada, the correlation coefficient increases, although the bias increases. In Europe, the correlation coefficient and bias increase. There are similar changes in China and India. The bias increases on days 1 to 60 and 300 to 366, but the correlation remains significant. The difference of PM2.5 concentration during the two periods is mainly reflected in the increasing bias in Canada and Europe, which is a non-seasonal bias, and the increasing bias in winter in China and India, which is a seasonal bias. Overall, PM2.5 concentrations show a good consistency before and after 2010 on the daily scale.

https://essd.copernicus.org/articles/16/4051/2024/essd-16-4051-2024-f09

Figure 9Density scatter plot between the estimated PM2.5 concentration (this study) and LGHAP PM2.5 concentration on the daily scale from 2000 to 2021 (a), from 2000 to 2010 (b), and from 2011 to 2021 (c). The dashed black line is the linear regression line. N is the length of the data pairs, and slope is the linear regression coefficient. Intercept represents the y intercept.

Download

https://essd.copernicus.org/articles/16/4051/2024/essd-16-4051-2024-f10

Figure 10Comparison of annual calendar cycle of PM2.5 concentration on the regional scale from 2011 to 2021 (left) and from 2000 to 2010 (right) between the estimated PM2.5 concentration (this study) and LGHAP PM2.5 concentration on the daily scale. ρ is the correlation coefficient.

Download

4.2 Comparisons on the monthly scale

Figure 11 shows the density scatter plot between the estimated PM2.5 concentration and LGHAP PM2.5 concentration on the monthly scale. The monthly PM2.5 concentration is calculated by the matched daily concentrations. There is a total of 3 296 739 pairs during the period from 2000 to 2021, 1 582 161 pairs during the period from 2000 to 2010, and 1 714 578 during the period from 2011 to 2021, with slopes of 0.857, 0.821 and 0.879. The intercepts are 6.774, 8.716, and 5.272 µg m−3, respectively. The slope of monthly concentration significantly improves before 2010 and slightly increases after 2010 compared to the daily scale.

We also compare the PM2.5 concentrations of the annual cycles on the regional scale in Fig. 12. The PM2.5 concentration of each month is the mean of the PM2.5 concentrations at all sites in the region. The correlation coefficients of the PM2.5 concentrations are greater than 0.92 from 2011 to 2021 and greater than 0.87 from 2000 to 2010. In the United States, the PM2.5 concentrations before 2010 are closer compared to those after 2010, except in April and August, and the biases in other months have significantly decreased. In Europe and Canada, the biases have increased. In China, the result is similar to the result on the daily scale. In India, the performance of the two is almost consistent, with a correlation coefficient of 0.99 and 0.96. The two datasets have a very high similarity in annual cycles, indicating that the estimated PM2.5 concentration in this study is accurate and consistent before and after 2010.

https://essd.copernicus.org/articles/16/4051/2024/essd-16-4051-2024-f11

Figure 11Density scatter plot between the estimated PM2.5 concentration (this study) and LGHAP PM2.5 concentration on the monthly scale from 2000 to 2021 (a), from 2000 to 2010 (b), and from 2011 to 2021 (c). The dashed black line is the linear regression line. N is the length of the data pairs, and slope is the linear regression coefficient. Intercept represents the y intercept.

Download

https://essd.copernicus.org/articles/16/4051/2024/essd-16-4051-2024-f12

Figure 12Comparison of annual cycle of monthly PM2.5 concentration on the regional scale from 2011 to 2021 (left) and from 2000 to 2010 (right) between the estimated PM2.5 concentration (this study) and LGHAP PM2.5 concentration on the daily scale. ρ is the correlation coefficient.

Download

4.3 Discussion on the differences of PM2.5 concentration estimated using visibility and aerosol optical depth

Both visibility and aerosol optical depth are excellent alternatives for estimating PM2.5 concentration, with its own advantages. However, they have differences in principle, which may be the reason for the difference between the two datasets in comparison.

Fine particulate matter near the ground surface affects atmospheric visibility through scattering. Studies have shown visibility has a negative correlation with PM2.5 concentration, and the reciprocal of visibility has a positive correlation with the extinction coefficient and has a negative correlation with the particulate matter concentration (Wang et al., 2012; Zhang et al., 2017, 2020). Therefore, visibility is often used as a proxy for particulate matter pollution (Huang et al., 2009; Singh et al., 2020), and it is the basis for estimating PM2.5 concentration. In addition, studies have shown that meteorological observations such as temperature and humidity also play an important role in estimating PM2.5 concentration using visibility (Shen et al., 2016; Xue et al., 2019; Zhong et al., 2021). Therefore, when estimating PM2.5 concentration based on visibility data, only conventional meteorological variables need to be added, which is convenient and accurate observational data. In addition, the long-term, complete, and highly temporal ground-based observations are the advantages of historical estimation of PM2.5 concentration. The daily mean from continuous or equidistant hourly observations greatly increases the daily representativeness.

The aerosol optical depth is a physical quantity that describes aerosol column properties. It is the integration of the extinction coefficient in the vertical direction. When establishing a connection between aerosol optical depth and near-ground PM2.5 concentration, it is essential to consider the vertical structure of aerosols. Studies have shown that the aerosol vertical profiles usually are provided by observations, assumptions, or chemical transport models to obtain the aerosol properties near the surface (Van Donkelaar et al., 2010; Wei et al., 2019b; Van Donkelaar et al., 2021). Van Donkelaar et al. (2006, 2010) have demonstrated that aerosol vertical profile errors in chemical transport models and aerosol optical depth retrieval and sampling result in an approximately 25 % uncertainty of 1 standard deviation. Sensitivity testing shows that a 1 % estimation error in the aerosol optical depth can lead to a 0.27 % estimation error in the PM2.5 concentration (Wei et al., 2021). In addition, the retrieval of aerosol optical depth is affected by clouds or surface types and a finite number of daily observations (usually one to two times), though it has the advantage of high spatial coverage (Liu et al., 2017; Singh et al., 2020; Zhong et al., 2021).

Another difference is the upper limit of PM2.5 concentration. In this study, the upper limit of the estimated daily PM2.5 concentration is set to 1000 µg m−3 (the same for input data). When the PM2.5 concentration is greater than 500 µg m−3 during heavy pollution, which may contribute to the higher frequency at high pollution levels than in the LGHAP dataset, especially before 2010. We do not remove visibility records during dust weather when preprocessing the data, which may lead to an overestimation of PM2.5 concentration in dusty areas, such as northern China and northwestern India. In Sect. 3.4, the uncertainty analysis has provided an explanation for the overestimation. Overall, our PM2.5 concentration dataset has a good consistency with PM2.5 concentration based on aerosol optical depth.

5 Regional trends and spatial patterns

We use the estimated PM2.5 concentrations (at least 10 d records in a site) to calculate monthly PM2.5 concentrations and analyze the annual cycles, interannual trends, and spatial patterns of PM2.5 concentrations in different regions based on the GAMM. The annual variation comes from the monthly smooth term of GAMM, the interannual variation comes from the annual smooth term, and the spatial pattern comes from the spatial smooth term. The regions include Canada, the United States, Europe, China, and India. The results are shown in Fig. 13. The trend from 1959 to 2022 in each region is the slope of the Theil–Sen (ST slope) estimators (Sen, 1968; Theil, 1992), and the Mann–Kendall test (Mann, 1945; Kendall, 1948) is used to calculate the significance of the trend. The test results show that the p values are all less than 0.01 in all regions.

https://essd.copernicus.org/articles/16/4051/2024/essd-16-4051-2024-f13

Figure 13Annual cycles, interannual trends, and spatial patterns of PM2.5 concentrations in the United States (a1–a3), Canada (b1–b3), Europe (c1–c3), China (d1–d3), and India (e1–e3). The left column “f(month)” is the annual cycle, the middle column “f(year)” is the interannual trend, and the right column “f(spatial)” is the spatial distribution from the generalized additive mixed model (GAMM). The dashed blue lines represent ±1 standard error of the month and annual mean of PM2.5 concentrations. The dashed red or black lines represent the trends of the Theil–Sen estimators (ST slope). The Mann–Kendall test of trends shows that the p values are less than 0.01 in all regions. The scatter points in right column are the locations of PM2.5 monitoring sites.

In the United States, the annual cycle curve shows that the PM2.5 concentration is a “double peak and double valley” shape. The peaks occur in July and December, respectively, with the highest PM2.5 concentration in July throughout the year. The valley values are in April and October, and the PM2.5 concentration levels are equivalent. The trend is −0.40µg m−3 per decade, and PM2.5 concentration decreases significantly after 1992, with a trend of −1.39µg m−3 per decade. The areas of high PM2.5 concentration are in the east and west. The areas with low PM2.5 concentrations are mainly located in the central and northern regions. The high concentration in the eastern and western regions is related to extensive industrial activities and densely populated cities. The low concentration in the central and northern regions is related to high vegetation coverage, low industrial activity, and low population density.

In Canada, the annual cycle curve also shows that the PM2.5 concentration is a “double peak and double valley” shape. The peak values occur in August and February, with the highest PM2.5 concentration in August. The valley values are in April and October. The trend is −0.10µg m−3 per decade, and PM2.5 concentration increases after 2010. The PM2.5 concentration exhibits an east–high to west–low pattern. The eastern regions, such as Ontario and Quebec, are characterized by high population density and significant industrial and transportation activities.

In Europe, the annual cycle of PM2.5 concentration shows that the PM2.5 concentration is the highest in February and is low from May to September. The valley values are in April and October. The trend is −1.55µg m−3 per decade. High-concentration areas are distributed in eastern Europe, while low-concentration areas are in northern and western Europe. Eastern Europe exhibits more industrialization, particularly with a prevalence of traditional heavy industries and the use of coal and other high-pollution energy sources. In contrast, the energy structure in western Europe tends to favor cleaner energy sources.

In China, the annual cycle curve of PM2.5 concentration presents a V-like shape. It indicates that high concentrations are in winter, while low concentrations are in summer. The trend is 2.09 µg m−3 per decade. The trend is 2.65 µg m−3 per decade from 1959 to 2011 and −22.23µg m−3 per decade from 2012 to 2022. High-concentration areas are distributed in northern China, such as the North China Plain, northeast China, the Sichuan Basin, Taklimakan Desert, and Badain Jaran Desert. Low-concentration areas are in southern China and the northern Tian Shan. Besides dust, industrial activities and coal combustion for heating during winter are significant contributors to the PM2.5 concentration in northern regions.

In India, the annual cycle curve of PM2.5 concentration also presents a V-like shape. High concentrations are in winter, and low concentrations are in summer. The trend is 0.92 µg m−3 per decade. The trend is 1.41 µg m−3 per decade from 1959 to 2013 and −23.36µg m−3 per decade from 2014 to 2022. Some studies have shown that the PM2.5 concentration in India has decreased since 2014, especially in northern cities. Singh et al. (2021) found that five major cities in India show a downward trend from 2014 to 2019, with the largest decline of approximately −4.2µg m−3 yr−1 in New Delhi. Ravindra et al. (2024) also found that the trend in New Delhi is about −5µg m−3 yr−1 from 2014 to 2020. These studies have shown a faster downward trend than our study, as these PM2.5 monitoring sites are mainly concentrated in urban areas. The PM2.5 concentration exhibits a north–high to south–low pattern. High-concentration areas are distributed in northern India, such as the Ganges Plain and Thar Desert, because there are more industrial and densely populated areas, and the terrain leads to the retention of air pollutants. Low-concentration areas are in the Deccan Plateau.

Above all, the PM2.5 concentrations in developed countries and regions are significantly lower than those in developing countries in the Northern Hemisphere. Regional trends are similar to those of previous studies in different periods (Van Donkelaar et al., 2010; Wang et al., 2012; Boys et al., 2014; Ma et al., 2016; Li et al., 2017; Hammer et al., 2020). The trends in PM2.5 concentration changes in different regions are closely associated with the implementation of relevant policies. The earlier pollution control measures are taken, the earlier the decreasing trend in the PM2.5 concentration occurs, and the lower the threat of particulate matter pollution is to humans. In 1997, the U.S. EPA classified PM2.5 as a hazardous substance in the National Ambient Air Quality Standard, and subsequent regulations in 2006 further strengthened the source control and management of fine particulate matter (Hall and Gilliam, 2016). In 1988, the Canadian federal government enacted the Canadian Environmental Protection Act, which enhanced the regulation of PM2.5 (Davies, 1988). The European Union introduced the Air Quality Directive in 1996, followed by multiple revisions and updates to regulate and restrict air pollutants, including PM2.5 (Kuklinska et al., 2015). However, Europe stands out due to its early adoption of clean production practices in heavy industries since the 1970s. Since 2012, China has implemented numerous regulations and standards for PM2.5. For instance, the Monitoring Method for Atmospheric Particulate Matter (PM2.5) was issued in 2012, and the Chinese Ministry of Environmental Protection released the Ambient Air Quality Standards in 2013, including emission standards for PM2.5 (B. Zhao et al., 2016). In 2009, the Indian Ministry of Environment and Forests issued the National Ambient Air Quality Standards, which include control standards for PM2.5. In 2019, the Indian government launched the National Clean Air Programme (NCAP) to improve air quality by implementing a series of measures to reduce the emissions of PM2.5 and other pollutants (Ganguly et al., 2020). These environmental regulations have contributed significantly to the decline of PM2.5 concentrations. Some studies have shown that the variation of PM2.5 concentrations is also related to several factors, such as the energy structure, urbanization process, population distribution, and vegetation coverage (Shi et al., 2018; Wu et al., 2018; Li et al., 2019; Wang et al., 2019; Lim et al., 2020; Qi et al., 2023).

6 Data availability

Daily PM2.5 concentration data in the Northern Hemisphere from 1959 to 2022 are available at https://doi.org/10.11888/Atmos.tpdc.301127 (Hao et al., 2024).

All site-scale PM2.5 data files are in “PM25-Daily_1959_2022. zip”. The file name includes a region name and a site number. For example, the file name, “China_1001. txt”, means that the site is in China, and the site number is 1001, which describes the daily PM2.5 concentration at a single site and can be directly opened using a text program (such as Notepad), separated by commas. The data include four column names: “Date”, “PM25(µg/m3)”, “Longitude(degree_east)”, and “Latitude(degree_north)”. “Date” is UTC time, “PM25(µg/m3)” is the daily PM2.5 concentration (unit: µg m−3), “Longitude” is the longitude that ranged from −180 to 180° E, and “Latitude” is the latitude that ranged from 0 to 90° N.

7 Conclusions

In this study, we use a machine learning method to estimate daily PM2.5 concentration for 5023 terrestrial sites in the Northern Hemisphere from 1959 to 2022 based on daily visibility and related meteorological variables. The first 80 % of PM2.5 concentration data in each site are used to train the model, and the last 20 % are used to test. The model's performance and predictive ability are evaluated, and a dataset of daily PM2.5 concentration based on aerosol optical depth is used to compare and evaluate the estimated PM2.5 concentration. We analyze the uncertainty and discuss the limitations of our dataset. Finally, the PM2.5 concentration variation (annual calendar cycle, interannual cycle, and spatial distribution) in five regions over the past 64 years is analyzed based on GAMM. We hope our dataset will be useful for studying the atmospheric environment, human health, and climate change and provide auxiliary support for assimilation. Several key results of this study are described as follows:

  • The most important variable. Visibility is the most important variable at 80.7 % of the PM2.5 sites, as visibility can be considered an indicator of PM2.5 concentration without fog or precipitation. Other meteorological variables play a secondary role in the model, especially temperature and dew point temperature.

  • Model performance and predictive ability. The training results show that the slope between the estimated PM2.5 concentration and the monitored PM2.5 concentration within the 95 % confidence interval is 0.955, the R2 is 0.95, the RMSE is 7.2 µg m−3, and the MAE is 3.2 µg m−3. The test results show that the slope between the predicted PM2.5 concentration and the monitored PM2.5 concentration is 0.864 ±0.0010 within a 95 % confidence interval, R2 is 0.79, RMSE is 14.8 µg m−3, and MAE is 7.6 µg m−3. The model shows good stability and predictive ability. Compared with a global PM2.5 concentration dataset based on satellite retrieval, the slopes of linear regression on the daily (monthly) scale are 0.817 (0.854) from 2000 to 2021, 0.758 (0.821) from 2000 to 2010, and 0.867 (0.879) from 2011 to 2022. The result indicates the accuracy of the model and the consistency of the estimated PM2.5 concentration on the temporal scale.

  • Regional trends and spatial patterns. The interannual trends and spatial patterns of PM2.5 concentration on the regional scale from 1959 to 2022 are analyzed based on GAMM. In Canada, the trend is −0.10µg m−3 per decade in Canada, and the PM2.5 concentration exhibits an east–high to west–low pattern. In the United States, the trend is −0.40µg m−3 per decade, and PM2.5 concentration decreases significantly after 1992, with a trend of −1.39µg m−3 per decade. The areas with high PM2.5 concentration are in the east and west, and the areas with low PM2.5 concentration are in the central and northern regions. In Europe, the trend is −1.55µg m−3 per decade. High-concentration areas are distributed in eastern Europe, while the low-concentration area is in northern and western Europe. In China, the trend is 2.09 µg m−3 per decade. High-concentration areas are distributed in northern China, and the low-concentration areas are distributed in southern China and the northern Tian Shan. The trend is 2.65 µg m−3 per decade from 1959 to 2011 and −22.23µg m−3 per decade from 2012 to 2022. In India, the trend is 0.92 µg m−3 per decade. The concentration exhibits a north–high to south–low pattern, with high-concentration areas distributed in northern India, such as the Ganges Plain and Thar Desert, and the low-concentration area in the Deccan Plateau. The trend is 1.41 µg m−3 per decade from 1959 to 2013 and −23.36µg m−3 per decade from 2014 to 2012. The variation of PM2.5 concentration is inseparable with the implementation of pollution control laws and regulations, the energy structure, industrialization, population and vegetation coverage.

Author contributions

HH and KW designed and organized the research. HH produced the dataset. HH wrote the original draft, and KW, GW, JiaL, and JinL provided scientific advice and guidance. All of the authors were involved in the review and editing precess.

Competing interests

The contact author has declared that none of the authors has any competing interests.

Disclaimer

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.

Acknowledgements

The authors would like to express gratitude to the relevant organizations and data archiving services that provided the essential data used in this study.

Financial support

This research has been supported by the National Key Research and Development Program of China (grant no. 2022YFF0801302) and the National Natural Science Foundation of China (grant no. 41930970).

Review statement

This paper was edited by Yuqiang Zhang and reviewed by three anonymous referees.

References

Albrecht, B. A.: Aerosols, cloud microphysics, and fractional cloudiness, Science, 245, 1227–1230, https://doi.org/10.1126/science.245.4923.1227, 1989. 

Ali, M. A., Bilal, M., Wang, Y., Nichol, J. E., Mhawish, A., Qiu, Z., de Leeuw, G., Zhang, Y., Zhan, Y., Liao, K., Almazroui, M., Dambul, R., Shahid, S., and Islam, M. N.: Accuracy assessment of CAMS and MERRA-2 reanalysis PM2.5 and PM10 concentrations over China, Atmos. Environ., 288, 119297, https://doi.org/10.1016/j.atmosenv.2022.119297, 2022. 

Bai, K. and Li, K.: LGHAP v2: Global daily 1-km gap-free PM2.5 grids (2000), Zenodo [data set], https://doi.org/10.5281/zenodo.8307595, 2023a. 

Bai, K. and Li, K.: LGHAP v2: Global daily 1-km gap-free PM2.5 grids (2001), Zenodo [data set], https://doi.org/10.5281/zenodo.8307597, 2023b. 

Bai, K. and Li, K.: LGHAP v2: Global daily 1-km gap-free PM2.5 grids (2002), Zenodo [data set], https://doi.org/10.5281/zenodo.8307599, 2023c. 

Bai, K. and Li, K.: LGHAP v2: Global daily 1-km gap-free PM2.5 grids (2003), Zenodo [data set], https://doi.org/10.5281/zenodo.8307601, 2023d. 

Bai, K. and Li, K.: LGHAP v2: Global daily 1-km gap-free PM2.5 grids (2004), Zenodo [data set], https://doi.org/10.5281/zenodo.8307605, 2023e. 

Bai, K. and Li, K.: LGHAP v2: Global daily 1-km gap-free PM2.5 grids (2005), Zenodo [data set], https://doi.org/10.5281/zenodo.8307607, 2023f. 

Bai, K. and Li, K.: LGHAP v2: Global daily 1-km gap-free PM2.5 grids (2006), Zenodo [data set], https://doi.org/10.5281/zenodo.8308225, 2023g. 

Bai, K. and Li, K.: LGHAP v2: Global daily 1-km gap-free PM2.5 grids (2007), Zenodo [data set], https://doi.org/10.5281/zenodo.8308227, 2023h. 

Bai, K. and Li, K.: LGHAP v2: Global daily 1-km gap-free PM2.5 grids (2008), Zenodo [data set], https://doi.org/10.5281/zenodo.8308231, 2023i. 

Bai, K. and Li, K.: LGHAP v2: Global daily 1-km gap-free PM2.5 grids (2009), Zenodo [data set], https://doi.org/10.5281/zenodo.8308233, 2023j. 

Bai, K. and Li, K.: LGHAP v2: Global daily 1-km gap-free PM2.5 grids (2010), Zenodo [data set], https://doi.org/10.5281/zenodo.8308237, 2023k. 

Bai, K. and Li, K.: LGHAP v2: Global daily 1-km gap-free PM2.5 grids (2011), Zenodo [data set], https://doi.org/10.5281/zenodo.8310586, 2023l. 

Bai, K. and Li, K.: LGHAP v2: Global daily 1-km gap-free PM2.5 grids (2012), Zenodo [data set], https://doi.org/10.5281/zenodo.8310590, 2023m. 

Bai, K. and Li, K.: LGHAP v2: Global daily 1-km gap-free PM2.5 grids (2013), Zenodo [data set], https://doi.org/10.5281/zenodo.8310702, 2023n. 

Bai, K. and Li, K.: LGHAP v2: Global daily 1-km gap-free PM2.5 grids (2014), Zenodo [data set], https://doi.org/10.5281/zenodo.8310704, 2023o. 

Bai, K. and Li, K.: LGHAP v2: Global daily 1-km gap-free PM2.5 grids (2015), Zenodo [data set], https://doi.org/10.5281/zenodo.8310706, 2023p. 

Bai, K. and Li, K.: LGHAP v2: Global daily 1-km gap-free PM2.5 grids (2016), Zenodo [data set], https://doi.org/10.5281/zenodo.8310708, 2023q. 

Bai, K. and Li, K.: LGHAP v2: Global daily 1-km gap-free PM2.5 grids (2017), Zenodo [data set], https://doi.org/10.5281/zenodo.8310711, 2023r. 

Bai, K. and Li, K.: LGHAP v2: Global daily 1-km gap-free PM2.5 grids (2018), Zenodo [data set], https://doi.org/10.5281/zenodo.8313603, 2023s. 

Bai, K. and Li, K.: LGHAP v2: Global daily 1-km gap-free PM2.5 grids (2019), Zenodo [data set], https://doi.org/10.5281/zenodo.8313611, 2023t. 

Bai, K. and Li, K.: LGHAP v2: Global daily 1-km gap-free PM2.5 grids (2020), Zenodo [data set], https://doi.org/10.5281/zenodo.8313613, 2023u. 

Bai, K. and Li, K.: LGHAP v2: Global daily 1-km gap-free PM2.5 grids (2021), Zenodo [data set], https://doi.org/10.5281/zenodo.8313615, 2023v. 

Bai, K., Li, K., Shao, L., Li, X., Liu, C., Li, Z., Ma, M., Han, D., Sun, Y., Zheng, Z., Li, R., Chang, N.-B., and Guo, J.: LGHAP v2: a global gap-free aerosol optical depth and PM2.5 concentration dataset since 2000 derived via big Earth data analytics, Earth Syst. Sci. Data, 16, 2425–2448, https://doi.org/10.5194/essd-16-2425-2024, 2024. 

Beckerman, B. S., Jerrett, M., Serre, M., Martin, R. V., Lee, S.-J., Van Donkelaar, A., Ross, Z., Su, J., and Burnett, R. T.: A hybrid approach to estimating national scale spatiotemporal variability of PM2.5 in the contiguous United States, Environ. Sci. Technol., 47, 7233–7241, https://doi.org/10.1021/es400039u, 2013. 

Bergstrom, R. W., Pilewskie, P., Russell, P. B., Redemann, J., Bond, T. C., Quinn, P. K., and Sierau, B.: Spectral absorption properties of atmospheric aerosols, Atmos. Chem. Phys., 7, 5937–5943, https://doi.org/10.5194/acp-7-5937-2007, 2007. 

Boers, R., van Weele, M., van Meijgaard, E., Savenije, M., Siebesma, A. P., Bosveld, F., and Stammes, P.: Observations and projections of visibility and aerosol optical thickness (1956–2100) in the Netherlands: impacts of time-varying aerosol composition and hygroscopicity, Environ. Res. Lett., 10, 015003, https://doi.org/10.1088/1748-9326/10/1/015003, 2015. 

Boys, B., Martin, R., Van Donkelaar, A., MacDonell, R., Hsu, N., Cooper, M., Yantosca, R., Lu, Z., Streets, D., and Zhang, Q.: Fifteen-year global time series of satellite-derived fine particulate matter, Environ. Sci. Technol., 48, 11109–11118, https://doi.org/10.1021/es502113p, 2014. 

Browne, M. W.: Cross-validation methods, J. Math. Psychol., 44, 108–132, https://doi.org/10.1006/jmps.1999.1279, 2000. 

Buchard, V., da Silva, A. M., Colarco, P. R., Darmenov, A., Randles, C. A., Govindaraju, R., Torres, O., Campbell, J., and Spurr, R.: Using the OMI aerosol index and absorption aerosol optical depth to evaluate the NASA MERRA Aerosol Reanalysis, Atmos. Chem. Phys., 15, 5743–5760, https://doi.org/10.5194/acp-15-5743-2015, 2015. 

Buchard, V., da Silva, A. M., Randles, C. A., Colarco, P., Ferrare, R., Hair, J., Hostetler, C., Tackett, J., and Winker, D.: Evaluation of the surface PM2.5 in Version 1 of the NASA MERRA Aerosol Reanalysis over the United States, Atmos. Environ., 125, 100–111, https://doi.org/10.1016/j.atmosenv.2015.11.004, 2016. 

Buchard, V., Randles, C. A., da Silva, A. M., Darmenov, A., Colarco, P. R., Govindaraju, R., Ferrare, R., Hair, J., Beyersdorf, A. J., Ziemba, L. D., and Yu, H.: The MERRA-2 Aerosol Reanalysis, 1980 Onward. Part II: Evaluation and Case Studies, J. Climate, 30, 6851–6872, https://doi.org/10.1175/JCLI-D-16-0613.1, 2017. 

Chafe, Z. A., Brauer, M., Klimont, Z., Van Dingenen, R., Mehta, S., Rao, S., Riahi, K., Dentener, F., and Smith, K. R.: Household Cooking with Solid Fuels Contributes to Ambient PM2.5 Air Pollution and the Burden of Disease, Environ. Health Persp., 122, 1314–1320, https://doi.org/10.1289/ehp.1206340, 2014. 

Chang, K.-L., Petropavlovskikh, I., Cooper, O. R., Schultz, M. G., and Wang, T.: Regional trend analysis of surface ozone observations from monitoring networks in eastern North America, Europe and East Asia, Elementa: Science of the Anthropocene, 5, 50, https://doi.org/10.1525/elementa.243, 2017. 

Che, H., Xia, X., Zhu, J., Hong, W., and Shi, G.: Aerosol optical properties under the condition of heavy haze over an urban site of Beijing, China, Environ. Sci. Pollut. R., 22, 1043–1053, https://doi.org/10.1007/s11356-014-3415-5, 2014. 

Chen, A., Zhao, C., and Fan, T.: Spatio-temporal distribution of aerosol direct radiative forcing over mid-latitude regions in north hemisphere estimated from satellite observations, Atmos. Res., 266, 105938, https://doi.org/10.1016/j.atmosres.2021.105938, 2022. 

Chen, Z., Chen, D., Zhao, C., Kwan, M.-p., Cai, J., Zhuang, Y., Zhao, B., Wang, X., Chen, B., Yang, J., Li, R., He, B., Gao, B., Wang, K., and Xu, B.: Influence of meteorological conditions on PM2.5 concentrations across China: A review of methodology and mechanism, Environ. Int., 139, 105558, https://doi.org/10.1016/j.envint.2020.105558, 2020. 

Chow, J. C., Doraiswamy, P., Watson, J. G., Chen, L. W. A., Ho, S. S. H., and Sodeman, D. A.: Advances in Integrated and Continuous Measurements for Particle Mass and Chemical Composition, Japca J. Air Waste Ma., 58, 141–163, https://doi.org/10.3155/1047-3289.58.2.141, 2008. 

Cohen, A. J., Brauer, M., Burnett, R., Anderson, H. R., Frostad, J., Estep, K., Balakrishnan, K., Brunekreef, B., Dandona, L., Dandona, R., Feigin, V., Freedman, G., Hubbell, B., Jobling, A., Kan, H., Knibbs, L., Liu, Y., Martin, R., Morawska, L., Pope, C. A., III, Shin, H., Straif, K., Shaddick, G., Thomas, M., van Dingenen, R., van Donkelaar, A., Vos, T., Murray, C. J. L., and Forouzanfar, M. H.: Estimates and 25-year trends of the global burden of disease attributable to ambient air pollution: an analysis of data from the Global Burden of Diseases Study 2015, Lancet, 389, 1907–1918, https://doi.org/10.1016/s0140-6736(17)30505-6, 2017. 

Dabek-Zlotorzynska, E., Dann, T. F., Martinelango, P. K., Celo, V., Brook, J. R., Mathieu, D., Ding, L., and Austin, C. C.: Canadian National Air Pollution Surveillance (NAPS) PM2.5 speciation program: Methodology and PM2.5 chemical composition for the years 2003–2008, Atmos. Environ., 45, 673-686, https://doi.org/10.1016/j.atmosenv.2010.10.024, 2011. 

Davies, J.: CEPA – The Canadian. Environmental Protection Act, JAPCA, 38, 1111–1113, https://doi.org/10.1080/08940630.1988.10466452, 1988. 

Demerjian, K. L.: A review of national monitoring networks in North America, Atmos. Environ., 34, 1861–1884, https://doi.org/10.1016/S1352-2310(99)00452-5, 2000. 

Fan, H., Zhao, C., Yang, Y., and Yang, X.: Spatio-Temporal Variations of the PM2.5/PM10 Ratios and Its Application to Air Pollution Type Classification in China, Front. Environ. Sci., 9, 692440, https://doi.org/10.3389/fenvs.2021.692440, 2021. 

Friedman, J. H.: Greedy function approximation: A gradient boosting machine, Ann. Stat., 29, 1189–1232, https://doi.org/10.1214/aos/1013203451, 2001. 

Ganguly, T., Selvaraj, K. L., and Guttikunda, S. K.: National Clean Air Programme (NCAP) for Indian cities: Review and outlook of clean air action plans, Atmospheric Environment X, 8, 100096, https://doi.org/10.1016/j.aeaoa.2020.100096, 2020. 

Gelaro, R., McCarty, W., Suárez, M. J., Todling, R., Molod, A., Takacs, L., Randles, C. A., Darmenov, A., Bosilovich, M. G., Reichle, R., Wargan, K., Coy, L., Cullather, R., Draper, C., Akella, S., Buchard, V., Conaty, A., da Silva, A. M., Gu, W., Kim, G.-K., Koster, R., Lucchesi, R., Merkova, D., Nielsen, J. E., Partyka, G., Pawson, S., Putman, W., Rienecker, M., Schubert, S. D., Sienkiewicz, M., and Zhao, B.: The Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2), J. Climate, 30, 5419–5454, https://doi.org/10.1175/JCLI-D-16-0758.1, 2017. 

Goff, J. A.: Saturation pressure of water on the new Kelvin temperature scale, Transactions of the American Society of Heating and Ventilating Engineers, 63, 347–354, 1957. 

Granier, C., Bessagnet, B., Bond, T., D'Angiola, A., Denier van der Gon, H., Frost, G. J., Heil, A., Kaiser, J. W., Kinne, S., and Klimont, Z.: Evolution of anthropogenic and biomass burning emissions of air pollutants at global and regional scales during the 1980–2010 period, Climatic Change, 109, 163–190, https://doi.org/10.1007/s10584-011-0154-1, 2011. 

Green, D. and Fuller, G. W.: The implications of tapered element oscillating microbalance (TEOM) software configuration on particulate matter measurements in the UK and Europe, Atmos. Environ., 40, 5608–5616, https://doi.org/10.1016/j.atmosenv.2006.04.052, 2006. 

Gui, K., Che, H., Zeng, Z., Wang, Y., Zhai, S., Wang, Z., Luo, M., Zhang, L., Liao, T., and Zhao, H.: Construction of a virtual PM2.5 observation network in China based on high-density surface meteorological observations using the Extreme Gradient Boosting model, Environ. Int., 141, 105801, https://doi.org/10.1016/j.envint.2020.105801, 2020. 

Guo, S., Hu, M., Zamora, M. L., Peng, J., Shang, D., Zheng, J., Du, Z., Wu, Z., Shao, M., Zeng, L., Molina, M. J., and Zhang, R.: Elucidating severe urban haze formation in China, P. Natl. Acad. Sci. USA, 111, 17373–17378, https://doi.org/10.1073/pnas.1419604111, 2014. 

Hall, E. and Gilliam, J.: Reference and Equivalent Methods Used to Measure National Ambient Air Quality Standards (NAAQS) Criteria Air Pollutants – Volume I, https://doi.org/10.13140/RG.2.1.3471.8329, 2016. 

Hammer, M. S., van Donkelaar, A., Li, C., Lyapustin, A., Sayer, A. M., Hsu, N. C., Levy, R. C., Garay, M. J., Kalashnikova, O. V., and Kahn, R. A.: Global estimates and long-term trends of fine particulate matter concentrations (1998–2018), Environ. Sci. Technol., 54, 7879–7890, https://doi.org/10.1021/acs.est.0c01764, 2020. 

Hao, H., Wang, K., Wu, G., Liu, J., and Li, J.: PM2.5 concentrations based on near-surface visibility at 4011 sites in the Northern Hemisphere from 1959 to 2022, National Tibetan Plateau Data Center [data set], https://doi.org/10.11888/Atmos.tpdc.301127, 2024. 

Hastie, T. and Tibshirani, R.: Generalized Additive Models: Some Applications, J. Am. Stat. Assoc., 82, 371–386, https://doi.org/10.1080/01621459.1987.10478440, 1987. 

He, Q., Gao, K., Zhang, L., Song, Y., and Zhang, M.: Satellite-derived 1-km estimates and long-term trends of PM2.5 concentrations in China from 2000 to 2018, Environ. Int., 156, 106726, https://doi.org/10.1016/j.envint.2021.106726, 2021. 

Hsu, N., Lee, J., Sayer, A., Carletta, N., Chen, S. H., Tucker, C., Holben, B., and Tsay, S. C.: Retrieving near-global aerosol loading over land and ocean from AVHRR, J. Geophys. Res-Atmos., 122, 9968–9989, https://doi.org/10.1002/2017JD026932, 2017. 

Huang, W., Tan, J., Kan, H., Zhao, N., Song, W., Song, G., Chen, G., Jiang, L., Jiang, C., and Chen, R.: Visibility, air quality and daily mortality in Shanghai, China, Sci. Total Environ., 407, 3295–3300, https://doi.org/10.1016/j.scitotenv.2009.02.019, 2009. 

Husar, R. B., Husar, J. D., and Martin, L.: Distribution of continental surface aerosol extinction based on visual range data, Atmos. Environ., 34, 5067–5078, https://doi.org/10.1016/s1352-2310(00)00324-1, 2000. 

Inness, A., Ades, M., Agustí-Panareda, A., Barré, J., Benedictow, A., Blechschmidt, A.-M., Dominguez, J. J., Engelen, R., Eskes, H., Flemming, J., Huijnen, V., Jones, L., Kipling, Z., Massart, S., Parrington, M., Peuch, V.-H., Razinger, M., Remy, S., Schulz, M., and Suttie, M.: The CAMS reanalysis of atmospheric composition, Atmos. Chem. Phys., 19, 3515–3556, https://doi.org/10.5194/acp-19-3515-2019, 2019. 

Jin, C., Wang, Y., Li, T., and Yuan, Q.: Global validation and hybrid calibration of CAMS and MERRA-2 PM2.5 reanalysis products based on OpenAQ platform, Atmos. Environ., 274, 118972, https://doi.org/10.1016/j.atmosenv.2022.118972, 2022. 

Kammann, E. E. and Wand, M. P.: Geoadditive Models, J. R. Stat. Soc. C-Appl., 52, 1–18, https://doi.org/10.1111/1467-9876.00385, 2003. 

Kendall, M. G.: Rank correlation methods, Griffin, https://psycnet.apa.org/record/1948-15040-000 (last access: 30 August 2024), 1948. 

Kim, K.-H., Kabir, E., and Kabir, S.: A review on the human health impact of airborne particulate matter, Environ. Int., 74, 136–143, https://doi.org/10.1016/j.envint.2014.10.005, 2015. 

Kuklinska, K., Wolska, L., and Namiesnik, J.: Air quality policy in the US and the EU – a review, Atmos. Pollut. Res., 6, 129–137, https://doi.org/10.5094/APR.2015.015, 2015. 

Lelieveld, J., Evans, J. S., Fnais, M., Giannadaki, D., and Pozzer, A.: The contribution of outdoor air pollution sources to premature mortality on a global scale, Nature, 525, 367–371, https://doi.org/10.1038/nature15371, 2015. 

Li, C., Martin, R. V., Boys, B. L., van Donkelaar, A., and Ruzzante, S.: Evaluation and application of multi-decadal visibility data for trend analysis of atmospheric haze, Atmos. Chem. Phys., 16, 2435–2457, https://doi.org/10.5194/acp-16-2435-2016, 2016. 

Li, C., Martin, R. V., van Donkelaar, A., Boys, B. L., Hammer, M. S., Xu, J.-W., Marais, E. A., Reff, A., Strum, M., and Ridley, D. A.: Trends in chemical composition of global and regional population-weighted fine particulate matter estimated for 25 years, Environ. Sci. Technol., 51, 11185–11195, https://doi.org/10.1021/acs.est.7b02530, 2017. 

Li, J., Han, X., Jin, M., Zhang, X., and Wang, S.: Globally analysing spatiotemporal trends of anthropogenic PM2.5 concentration and population's PM2.5 exposure from 1998 to 2016, Environ. Int., 128, 46–62, https://doi.org/10.1016/j.envint.2019.04.026, 2019. 

Li, J., Garshick, E., Hart, J. E., Li, L., Shi, L., Al-Hemoud, A., Huang, S., and Koutrakis, P.: Estimation of ambient PM2.5 in Iraq and Kuwait from 2001 to 2018 using machine learning and remote sensing, Environ. Int., 151, 106445, https://doi.org/10.1016/j.envint.2021.106445, 2021. 

Li, J., Carlson, B. E., Yung, Y. L., Lv, D., Hansen, J., Penner, J. E., Liao, H., Ramaswamy, V., Kahn, R. A., Zhang, P., Dubovik, O., Ding, A., Lacis, A. A., Zhang, L., and Dong, Y.: Scattering and absorbing aerosols in the climate system, Nat. Rev. Earth. Environ., 3, 363–379, https://doi.org/10.1038/s43017-022-00296-7, 2022. 

Li, S., Chen, L., Huang, G., Lin, J., Yan, Y., Ni, R., Huo, Y., Wang, J., Liu, M., and Weng, H.: Retrieval of surface PM2.5 mass concentrations over North China using visibility measurements and GEOS-Chem simulations, Atmos. Environ., 222, 117121, https://doi.org/10.1016/j.atmosenv.2019.117121, 2020. 

Liao, H., Chang, W., and Yang, Y.: Climatic Effects of Air Pollutants over China: A Review, Adv. Atmos. Sci., 32, 115–139, https://doi.org/10.1007/s00376-014-0013-x, 2015. 

Lim, C.-H., Ryu, J., Choi, Y., Jeon, S. W., and Lee, W.-K.: Understanding global PM2.5 concentrations and their drivers in recent decades (1998–2016), Environ. Int., 144, 106011, https://doi.org/10.1016/j.envint.2020.106011, 2020. 

Liu, M., Bi, J., and Ma, Z.: Visibility-based PM2.5 concentrations in China: 1957–1964 and 1973–2014, Environ. Sci. Technol., 51, 13161–13169, https://doi.org/10.1021/acs.est.7b03468, 2017. 

Liu, M., Huang, X., Song, Y., Tang, J., Cao, J., Zhang, X., Zhang, Q., Wang, S., Xu, T., Kang, L., Cai, X., Zhang, H., Yang, F., Wang, H., Yu, J. Z., Lau, A. K. H., He, L., Huang, X., Duan, L., Ding, A., Xue, L., Gao, J., Liu, B., and Zhu, T.: Ammonia emission control in China would mitigate haze pollution and nitrogen deposition, but worsen acid rain, P. Natl. Acad. Sci. USA, 116, 7760–7765, https://doi.org/10.1073/pnas.1814880116, 2019. 

Ma, Z., Hu, X., Sayer, A. M., Levy, R., Zhang, Q., Xue, Y., Tong, S., Bi, J., Huang, L., and Liu, Y.: Satellite-based spatiotemporal trends in PM2.5 concentrations: China, 2004–2013, Environ. Health Persp., 124, 184–192, https://doi.org/10.1289/ehp.1409481, 2016. 

Mandal, S., Madhipatla, K. K., Guttikunda, S., Kloog, I., Prabhakaran, D., Schwartz, J. D., and Team, G. H. I.: Ensemble averaging based assessment of spatiotemporal variations in ambient PM2.5 concentrations over Delhi, India, during 2010–2016, Atmos. Environ., 224, 117309, https://doi.org/10.1016/j.atmosenv.2020.117309, 2020. 

Mann, H. B.: Nonparametric Tests Against Trend, Econometrica, 13, 245–259, https://doi.org/10.2307/1907187, 1945. 

Meng, X., Hand, J. L., Schichtel, B. A., and Liu, Y.: Space-time trends of PM2.5 constituents in the conterminous United States estimated by a machine learning approach, 2005–2015, Environ. Int., 121, 1137–1147, https://doi.org/10.1016/j.envint.2018.10.029, 2018. 

Miao, Y. and Liu, S.: Linkages between aerosol pollution and planetary boundary layer structure in China, Sci. Total Environ., 650, 288–296, https://doi.org/10.1016/j.scitotenv.2018.09.032, 2019. 

Molnár, A., Mészáros, E., Imre, K., and Rüll, A.: Trends in visibility over Hungary between 1996 and 2002, Atmos. Environ., 42, 2621–2629, https://doi.org/10.1016/j.atmosenv.2007.05.012, 2008. 

Nagaraja Rao, C., Stowe, L., and McClain, E.: Remote sensing of aerosols over the oceans using AVHRR data Theory, practice and applications, Int. J. Remote Sens., 10, 743–749, https://doi.org/10.1080/01431168908903915, 1989. 

Pant, P., Lal, R. M., Guttikunda, S. K., Russell, A. G., Nagpure, A. S., Ramaswami, A., and Peltier, R. E.: Monitoring particulate matter in India: recent trends and future outlook, Air Qual. Atmos. Hlth., 12, 45–58, https://doi.org/10.1007/s11869-018-0629-6, 2019. 

Park, A., Guillas, S., and Petropavlovskikh, I.: Trends in stratospheric ozone profiles using functional mixed models, Atmos. Chem. Phys., 13, 11473–11501, https://doi.org/10.5194/acp-13-11473-2013, 2013. 

Polansky, L. and Robbins, M. M.: Generalized additive mixed models for disentangling long-term trends, local anomalies, and seasonality in fruit tree phenology, Ecol. Evol., 3, 3141–3151, https://doi.org/10.1002/ece3.707, 2013. 

Pui, D. Y. H., Chen, S.-C., and Zuo, Z.: PM2.5 in China: Measurements, sources, visibility and health effects, and mitigation, Particuology, 13, 1–26, https://doi.org/10.1016/j.partic.2013.11.001, 2014. 

Qi, G., Wei, W., Wang, Z., Wang, Z., and Wei, L.: The spatial-temporal evolution mechanism of PM2.5 concentration based on China's climate zoning, J. Environ. Manage., 325, 116671, https://doi.org/10.1016/j.jenvman.2022.116671, 2023. 

Ramanathan, V., Crutzen, P. J., Kiehl, J., and Rosenfeld, D.: Aerosols, climate, and the hydrological cycle, Science, 294, 2119–2124, https://doi.org/10.1126/science.1064034, 2001. 

Ravindra, K., Rattan, P., Mor, S., and Aggarwal, A. N.: Generalized additive models: Building evidence of air pollution, climate change and human health, Environ. Int., 132, 104987, https://doi.org/10.1016/j.envint.2019.104987, 2019. 

Ravindra, K., Vakacherla, S., Singh, T., Upadhya, A. R., Rattan, P., and Mor, S.: Long-term trend of PM2.5 over five Indian megacities using a new statistical approach, Stoch. Env. Res. Risk A., 38, 715–725, https://doi.org/10.1007/s00477-023-02595-x, 2024. 

Samset, B. H., Lund, M. T., Bollasina, M., Myhre, G., and Wilcox, L.: Emerging Asian aerosol patterns, Nat. Geosci., 12, 582–584, https://doi.org/10.1038/s41561-019-0424-5, 2019. 

Sen, P. K.: Estimates of the Regression Coefficient Based on Kendall's Tau, J. Am. Stat. Assoc., 63, 1379–1389, https://doi.org/10.1080/01621459.1968.10480934, 1968. 

Shen, Z., Cao, J., Zhang, L., Zhang, Q., Huang, R.-J., Liu, S., Zhao, Z., Zhu, C., Lei, Y., and Xu, H.: Retrieving historical ambient PM2.5 concentrations using existing visibility measurements in Xi'an, Northwest China, Atmos. Environ., 126, 15–20, https://doi.org/10.1016/j.atmosenv.2015.11.040, 2016. 

Shi, Y., Matsunaga, T., Yamaguchi, Y., Li, Z., Gu, X., and Chen, X.: Long-term trends and spatial patterns of satellite-retrieved PM2.5 concentrations in South and Southeast Asia from 1999 to 2014, Sci. Total Environ., 615, 177–186, https://doi.org/10.1016/j.scitotenv.2017.09.241, 2018. 

Singh, A., Avis, W. R., and Pope, F. D.: Visibility as a proxy for air quality in East Africa, Environ. Res. Lett., 15, 084002, https://doi.org/10.1088/1748-9326/ab8b12, 2020. 

Singh, V., Singh, S., and Biswal, A.: Exceedances and trends of particulate matter (PM2.5) in five Indian megacities, Sci. Total Environ., 750, 141461, https://doi.org/10.1016/j.scitotenv.2020.141461, 2021. 

Smith, A., Lott, N., and Vose, R.: The Integrated Surface Database: Recent Developments and Partnerships, B. Am. Meteorol. Soc., 92, 704–708, https://doi.org/10.1175/2011BAMS3015.1, 2011. 

Su, L., Gao, C., Ren, X., Zhang, F., Cao, S., Zhang, S., Chen, T., Liu, M., Ni, B., and Liu, M.: Understanding the spatial representativeness of air quality monitoring network and its application to PM2.5 in the mainland China, Geosci. Front., 13, 101370, https://doi.org/10.1016/j.gsf.2022.101370, 2022. 

Sun, E., Xu, X., Che, H., Tang, Z., Gui, K., An, L., Lu, C., and Shi, G.: Variation in MERRA-2 aerosol optical depth and absorption aerosol optical depth over China from 1980 to 2017, J. Atmos. Sol.-Terr. Phy., 186, 8–19, https://doi.org/10.1016/j.jastp.2019.01.019, 2019. 

Tan, S., Wang, Y., Yuan, Q., Zheng, L., Li, T., Shen, H., and Zhang, L.: Reconstructing global PM2.5 monitoring dataset from OpenAQ using a two-step spatio-temporal model based on SES-IDW and LSTM, Environ. Res. Lett., 17, 034014, https://doi.org/10.1088/1748-9326/ac52c9, 2022. 

Teixeira, A.: Analyse discrimante par arbre de décision binaire (CART: Classification And Regression Tree), Rev. Mal. Respir., 21, 1174–1176, https://doi.org/10.1016/S0761-8425(04)71596-X, 2004. 

Theil, H.: A Rank-Invariant Method of Linear and Polynomial Regression Analysis, in: Henri Theil's Contributions to Economics and Econometrics: Econometric Theory and Methodology, edited by: Raj, B. and Koerts, J., Springer Netherlands, Dordrecht, 345–381, https://doi.org/10.1007/978-94-011-2546-8_20, 1992. 

Van Donkelaar, A., Martin, R. V., and Park, R. J.: Estimating ground-level PM2.5 using aerosol optical depth determined from satellite remote sensing, J. Geophys. Res., 111, D21201, https://doi.org/10.1029/2005JD006996, 2006. 

Van Donkelaar, A., Martin, R. V., Brauer, M., Kahn, R., Levy, R., Verduzco, C., and Villeneuve, P. J.: Global estimates of ambient fine particulate matter concentrations from satellite-based aerosol optical depth: development and application, Environ. Health Persp., 118, 847–855, https://doi.org/10.1289/ehp.0901623, 2010. 

Van Donkelaar, A., Martin, R. V., Brauer, M., and Boys, B. L.: Use of satellite observations for long-term exposure assessment of global concentrations of fine particulate matter, Environ. Health Persp., 123, 135–143, https://doi.org/10.1289/ehp.1408646, 2015. 

Van Donkelaar, A., Martin, R. V., Brauer, M., Hsu, N. C., Kahn, R. A., Levy, R. C., Lyapustin, A., Sayer, A. M., and Winker, D. M.: Global estimates of fine particulate matter using a combined geophysical-statistical method with information from satellites, models, and monitors, Environ. Sci. Technol., 50, 3762–3772, https://doi.org/10.1021/acs.est.5b05833, 2016. 

van Donkelaar, A., Hammer, M. S., Bindle, L., Brauer, M., Brook, J. R., Garay, M. J., Hsu, N. C., Kalashnikova, O. V., Kahn, R. A., Lee, C., Levy, R. C., Lyapustin, A., Sayer, A. M., and Martin, R. V.: Monthly Global Estimates of Fine Particulate Matter and Their Uncertainty, Environ. Sci. Technol., 55, 15287–15300, https://doi.org/10.1021/acs.est.1c05309, 2021. 

Verbeke, G. and Lesaffre, E.: A Linear Mixed-Effects Model with Heterogeneity in the Random-Effects Population, J. Am. Stat. Assoc., 91, 217–221, https://doi.org/10.1080/01621459.1996.10476679, 1996. 

Viana, M., Kuhlbusch, T. A. J., Querol, X., Alastuey, A., Harrison, R. M., Hopke, P. K., Winiwarter, W., Vallius, A., Szidat, S., Prevot, A. S. H., Hueglin, C., Bloemen, H., Wahlin, P., Vecchi, R., Miranda, A. I., Kasper-Giebl, A., Maenhaut, W., and Hitzenberger, R.: Source apportionment of particulate matter in Europe: A review of methods and results, J. Aerosol Sci., 39, 827–849, https://doi.org/10.1016/j.jaerosci.2008.05.007, 2008. 

Wang, K., Dickinson, R. E., and Liang, S.: Clear Sky Visibility Has Decreased over Land Globally from 1973 to 2007, Science, 323, 1468–1470, https://doi.org/10.1126/science.1167549, 2009. 

Wang, K. C., Dickinson, R. E., Su, L., and Trenberth, K. E.: Contrasting trends of mass and optical properties of aerosols over the Northern Hemisphere from 1992 to 2011, Atmos. Chem. Phys., 12, 9387–9398, https://doi.org/10.5194/acp-12-9387-2012, 2012. 

Wang, Q., Kwan, M.-P., Zhou, K., Fan, J., Wang, Y., and Zhan, D.: The impacts of urbanization on fine particulate matter (PM2.5) concentrations: Empirical evidence from 135 countries worldwide, Environ. Pollut., 247, 989–998, https://doi.org/10.1016/j.envpol.2019.01.086, 2019. 

Wang, Z., Li, J., Wang, Z., Yang, W., Tang, X., Ge, B., Yan, P., Zhu, L., Chen, X., Chen, H., Wand, W., Li, J., Liu, B., Wang, X., Wand, W., Zhao, Y., Lu, N., and Su, D.: Modeling study of regional severe hazes over mid-eastern China in January 2013 and its implications on pollution prevention and control, Sci. China Earth Sci., 57, 3–13, https://doi.org/10.1007/s11430-013-4793-0, 2014. 

Wei, J., Li, Z., Peng, Y., and Sun, L.: MODIS Collection 6.1 aerosol optical depth products over land and ocean: validation and comparison, Atmos. Environ., 201, 428–440, https://doi.org/10.1016/j.atmosenv.2018.12.004, 2019a. 

Wei, J., Huang, W., Li, Z., Xue, W., Peng, Y., Sun, L., and Cribb, M.: Estimating 1-km-resolution PM2.5 concentrations across China using the space-time random forest approach, Remote Sens. Environ., 231, 111221, https://doi.org/10.1016/j.rse.2019.111221, 2019b. 

Wei, J., Li, Z., Cribb, M., Huang, W., Xue, W., Sun, L., Guo, J., Peng, Y., Li, J., Lyapustin, A., Liu, L., Wu, H., and Song, Y.: Improved 1 km resolution PM2.5 estimates across China using enhanced space–time extremely randomized trees, Atmos. Chem. Phys., 20, 3273–3289, https://doi.org/10.5194/acp-20-3273-2020, 2020. 

Wei, J., Li, Z., Lyapustin, A., Sun, L., Peng, Y., Xue, W., Su, T., and Cribb, M.: Reconstructing 1-km-resolution high-quality PM2.5 data records from 2000 to 2018 in China: spatiotemporal variations and policy implications, Remote Sens. Environ., 252, 112136, https://doi.org/10.1016/j.rse.2020.112136, 2021. 

Wood, S. N., Pya, N., and Säfken, B.: Smoothing Parameter and Model Selection for General Smooth Models, J. Am. Stat. Assoc., 111, 1548–1563, https://doi.org/10.1080/01621459.2016.1180986, 2016. 

Wu, J., Zheng, H., Zhe, F., Xie, W., and Song, J.: Study on the relationship between urbanization and fine particulate matter (PM2.5) concentration and its implication in China, J. Clean. Prod., 182, 872–882, https://doi.org/10.1016/j.jclepro.2018.02.060, 2018. 

Wu, W. and Zhang, Y.: Effects of particulate matter (PM2.5) and associated acidity on ecosystem functioning: response of leaf litter breakdown, Environ. Sci. Pollut. R., 25, 30720–30727, https://doi.org/10.1007/s11356-018-2922-1, 2018. 

Xue, T., Zheng, Y., Tong, D., Zheng, B., Li, X., Zhu, T., and Zhang, Q.: Spatiotemporal continuous estimates of PM2.5 concentrations in China, 2000–2016: A machine learning method with inputs from satellites, chemical transport model, and ground observations, Environ. Int., 123, 345–357, https://doi.org/10.1016/j.envint.2018.11.075, 2019. 

Yang, X., Zhao, C., Yang, Y., Yan, X., and Fan, H.: Statistical aerosol properties associated with fire events from 2002 to 2019 and a case analysis in 2019 over Australia, Atmos. Chem. Phys., 21, 3833–3853, https://doi.org/10.5194/acp-21-3833-2021, 2021. 

Zeng, Z., Gui, K., Wang, Z., Luo, M., Geng, H., Ge, E., An, J., Song, X., Ning, G., and Zhai, S.: Estimating hourly surface PM2.5 concentrations across China from high-density meteorological observations by machine learning, Atmos. Res., 254, 105516, https://doi.org/10.1016/j.atmosres.2021.105516, 2021. 

Zhang, Q., Zheng, Y., Tong, D., Shao, M., Wang, S., Zhang, Y., Xu, X., Wang, J., He, H., Liu, W., Ding, Y., Lei, Y., Li, J., Wang, Z., Zhang, X., Wang, Y., Cheng, J., Liu, Y., Shi, Q., Yan, L., Geng, G., Hong, C., Li, M., Liu, F., Zheng, B., Cao, J., Ding, A., Gao, J., Fu, Q., Huo, J., Liu, B., Liu, Z., Yang, F., He, K., and Hao, J.: Drivers of improved PM2.5 air quality in China from 2013 to 2017, P. Natl. A. Sci. USA, 116, 24463–24469, https://doi.org/10.1073/pnas.1907956116, 2019. 

Zhang, S., Wu, J., Fan, W., Yang, Q., and Zhao, D.: Review of aerosol optical depth retrieval using visibility data, Earth-Sci. Rev., 200, 102986, https://doi.org/10.1016/j.earscirev.2019.102986, 2020. 

Zhang, Z., Wu, W., Wei, J., Song, Y., Yan, X., Zhu, L., and Wang, Q.: Aerosol optical depth retrieval from visibility in China during 1973–2014, Atmos. Environ., 171, 38–48, https://doi.org/10.1016/j.atmosenv.2017.09.004, 2017. 

Zhao, B., Su, Y., He, S., Zhong, M., and Cui, G.: Evolution and comparative assessment of ambient air quality standards in China, J. Integr. Environ. Sci., 13, 85–102, https://doi.org/10.1080/1943815X.2016.1150301, 2016. 

Zhao, S., Yu, Y., Yin, D., He, J., Liu, N., Qu, J., and Xiao, J.: Annual and diurnal variations of gaseous and particulate pollutants in 31 provincial capital cities based on in situ air quality monitoring data from China National Environmental Monitoring Center, Environ. Int., 86, 92–106, https://doi.org/10.1016/j.envint.2015.11.003, 2016.  

Zhong, J., Zhang, X., Gui, K., Wang, Y., Che, H., Shen, X., Zhang, L., Zhang, Y., Sun, J., and Zhang, W.: Robust prediction of hourly PM2.5 from meteorological data using LightGBM, Natl. Sci. Rev., 8, nwaa307, https://doi.org/10.1093/nsr/nwaa307, 2021. 

Zhong, J., Zhang, X., Gui, K., Liao, J., Fei, Y., Jiang, L., Guo, L., Liu, L., Che, H., Wang, Y., Wang, D., and Zhou, Z.: Reconstructing 6-hourly PM2.5 datasets from 1960 to 2020 in China, Earth Syst. Sci. Data, 14, 3197–3211, https://doi.org/10.5194/essd-14-3197-2022, 2022. 

Download
Short summary
In this study, daily PM2.5 concentrations are estimated from 1959 to 2022 using a machine learning method at more than 5000 terrestrial sites in the Northern Hemisphere based on hourly atmospheric visibility data, which are extracted from the Meteorological Terminal Aviation Routine Weather Report (METAR).
Altmetrics
Final-revised paper
Preprint