A homogenized daily in situ PM2.5 concentration dataset from the national air quality monitoring network in China

In situ PM2.5 concentration observations have long been used as critical data sources in haze-related studies. Due to the frequently occurring haze pollution events, China started to regularly monitor PM2.5 concentration nationwide from the newly established air quality monitoring network in 2013. Nevertheless, the acquisition of these invaluable air quality samples is challenging given the absence of a publicly available data download interface. In this study, we provided a homogenized in situ PM2.5 concentration dataset that was created on the basis of hourly PM2.5 data retrieved from the China National Environmental Monitoring Center (CNEMC) via a web crawler between 2015 and 2019. Methods involving missing value imputation, change point detection, and bias adjustment were applied sequentially to deal with data gaps and inhomogeneities in raw PM2.5 observations. After excluding records with limited samples, a homogenized PM2.5 concentration dataset comprising of 1309 5-year long PM2.5 data series at a daily resolution was eventually compiled. This is the first attempt to homogenize in situ PM2.5 observations in China. The trend estimations derived from the homogenized dataset indicate a spatially homogeneous decreasing tendency of PM2.5 across China at a mean rate of about −7.6 % per year from 2015 to 2019. In contrast to raw PM2.5 observations, the homogenized data record not only has complete data integrity but is more consistent over space and time. This homogenized daily in situ PM2.5 concentration dataset is publicly accessible at https://doi.org/10.1594/PANGAEA.917557 (Bai et al., 2020a) and can be applied as a promising dataset for PM2.5-related studies such as satellite-based PM2.5 mapping, human exposure risk assessment, and air quality management.


Introduction
A consistent PM 2.5 concentration dataset is vital to the analysis of variations in PM 2.5 loadings over space and time as well as in support of its risk analysis for air quality management, meteorological forecasting, and health-related exposure assessment (Lelieveld et al., 2015;Yin et al., 2020). Ground-based monitoring networks are commonly built to measure concentrations of air pollutants across the globe.
Suffering from extensive and severe haze pollution events in the past few years (Guo et al., 2014;Ding et al., 2016;Wang et al., 2016;Cai et al., 2017;Huang et al., 2018;Luan et al., 2018;Ning et al., 2018), China launched the operational ambient air quality sampling late in 2012 on the basis of the sparsely distributed aerosol observation network. To date, this in situ network has been enlarged to cover almost all major cities in China consisting of about 1500 monitoring stations. Concentrations of six key air pollutants including PM 2.5 , PM 10 , NO 2 , SO 2 , CO, and O 3 are routinely measured on an hourly basis while the sampled data are released publicly online by the China National Environmental Monitoring Center (CNEMC) since 2013.
Although in situ PM 2.5 concentration data have played critical roles in improving our understanding of regional air quality variations and relevant influential factors (D. Q. Yang et al., 2019;Zheng et al., 2017), little concern was raised about the quality of such dataset itself (Bai et al., 2019a, c;He and Huang, 2018;Zhang et al., 2019Zhang et al., , 2018Zou et al., 2016). Meanwhile, few studies provided a detailed description of the accuracy or bias level (uncertainty) of the observed PM 2.5 data in recent years (Xin et al., 2015;You et al., 2016;Guo et al., 2017;Shen et al., 2018). The primary reason lies in the fact that neither quality assurance flags nor metadata information documenting the uncertainty other than data values were provided, making such quality assessment infeasible.
The data quality, in particular the data homogeneity, is of critical importance to the exploration of the given dataset, especially for trend analysis (Bai et al., 2019c;Liu et al., 2018;Ma et al., 2015) and data integration (Bai et al., 2019a, b;T. Li et al., 2017;Zhang et al., 2019) in which a homogeneous dataset is absolutely essential for downstream applications. Since two distinct kinds of instruments are used in the current air quality monitoring network to measure near-surface PM 2.5 concentration in China (Bai et al., 2020b), imperfect instrumental calibration and intermittent replacement of instruments may thus introduce the obvious issue of discontinuity in PM 2.5 observations. Such inhomogeneity may result in large uncertainty and even biased results in the subsequent analysis, especially in contextbased and data-driven PM 2.5 concentration mapping (Bai et al., 2019b, a;He and Huang, 2018;Wei et al., 2020), in which in situ PM 2.5 concentration observations are used as the ground truth to characterize complex statistical relationships with other possible contributing factors.
Given the absence of an open-access and quality-assured in situ PM 2.5 concentration dataset in China, in this study, we attempted to generate a long-term coherent in situ PM 2.5 concentration dataset for scientific community to use in future applications. A set of methods involving missing value imputation, change point detection, and bias adjustment were geared up seamlessly in a big data analytic manner toward the improvement of data integrity and the removal of possible discontinuities in raw PM 2.5 observations. Such an analytical process is also referred to as data homogenization in data science or big data analytics (Cao and Yan, 2012;Wang et al., 2007). To our knowledge, this is the first attempt to homogenize a large-scale dataset of in situ PM 2.5 concentration observations in China. In the following sections, we will introduce the data source as well as detailed big data analytics methods used for the creation of a homogenized PM 2.5 concentration dataset.

In situ PM 2.5 concentration observations
In this study, the hourly PM 2.5 concentration data sampled from more than 1600 state-controlled air quality monitoring stations across China between 1 January 2015 and 31 December 2019 were utilized. These PM 2.5 concentration data were measured on an hourly basis using either betaattenuation monitors or a tapered element oscillating microbalance (TEOM) analyzer. The ordinary instrumental calibration and quality control were performed according to the national ambient air quality standard of GB3095-2012and HJ 618-2011(Guo et al., 2009. Generally, TEOM can measure PM 2.5 concentration within the range of 0-5000 µg m −3 at a resolution of 0.1 µg m −3 , with precision of ±0.5 µg m −3 for the 24 h average and ±1.5 µg m −3 for the hourly average Xin et al., 2012;Xin et al., 2015). The PM 2.5 measurements were publicly released online by the China National Environmental Monitoring Center (CNEMC) via the National Urban Air Quality Real-time Publishing Platform (http://106.37.208.233:20035/, last access: 10 November 2020) within 1 h after the direct sampling.
Although the sampled data were publicly released, the acquisition of these valuable samplings is always challenging because no data download interface is provided to the public by the CNEMC website. Therefore, it is impossible for users to retrieve the historical observations from the given website. Rather, the science community has to count on other measures such as an automatic web crawler for the retrieval of these online updated data samples from the data publishing platform. Nevertheless, the data records retrieved through such an approach suffered from significant data losses due to various unexpected reasons like power outage and internet interruption. Consequently, the data integrity becomes problematic and further treatments like gap filling are thus required to account for such defects.
Moreover, hourly PM 2.5 concentration observations that were sampled at five embassies of the United States in China from January 2015 to June 2017 were used as an independent dataset to evaluate the fidelity of the homogenized PM 2.5 concentration dataset. Geographic locations of these five embassies are shown in Table S1 in the Supplement. These PM 2.5 data were measured independently under the US Department of State Air Quality Monitoring Program and can be acquired from http://www.stateair.net/ (last access: 10 November 2020). To be in line with the homogenized dataset, the hourly PM 2.5 concentration data were aggregated to the daily level by averaging the 24 h observations sampled on each date while daily averages were calculated only for days with more than 12 valid samples of a possible 24 h.

Homogenization of in situ PM 2.5 concentration data
For the creation of a long-term coherent in situ PM 2.5 concentration dataset, it is necessary to create an analytical framework of the big data analytics which seamlessly gears up several methods as a whole for the purposes of missing value imputation, change point detection, and discontinuity adjustment, given the presence of data gaps and possible discontinuity in raw PM 2.5 observations. Figure 1 shows a schematic illustration of the general workflow toward generating a homogenized PM 2.5 concentration dataset, and the whole process can be outlined as follows.
1. It is necessary to perform essential quality control and gap filling on raw PM 2.5 observations so that the bias arising from large outliers and resampling errors due to incomplete observations can be reduced.
2. Short-term time series due to site relocation were temporally merged to attain a long-term record. Then, PM 2.5 concentration time series with a temporal coverage of less than 4 years during the study period were excluded. Subsequently, the quality-controlled observations of hourly in situ PM 2.5 concentrations were resampled to daily and monthly scales to initiate the homogeneity test.
3. Reference time series were constructed for each longterm PM 2.5 concentration record on the basis of data measured from adjacent monitoring sites. For PM 2.5 concentration records failing to produce a reliable reference series, no homogeneity test was performed for such data due to the absence of essential reference data series.
4. The discontinuity identified in each daily long-term PM 2.5 concentration time series were corrected using the quantile-matching (QM) adjustment method according to the change points detected in each monthly data record with the support of reference series.
5. Post-processing measures such as nonpositive value correction and another round of gap filling were further performed on the homogenized records to attain a quality-assured in situ PM 2.5 concentration dataset. More details of each analytic method are described in the following subsections.

Quality control
Given the possibility of the presence of abnormal samplings, it is necessary to remove the outliers detected in raw PM 2.5 observations to reduce the false alarm rate in change point detection during the subsequent homogeneity test. Specifically, hourly PM 2.5 concentration data values meeting one of the following criteria were excluded: (1) out of the range between 1 and 1000 µg m −3 and (2) more than 3 standard deviations from the median of observations within a 15 h time window. Both criteria aimed to remove large outliers which could result in biased daily averages. Overall, 3.46 % of PM 2.5 samples were treated as outliers and were then excluded accordingly (treated as missing values).

Gap filling and resampling
As indicated in our recent study (Bai et al., 2020b), missingvalue-related data gaps become a big obstacle in the exploitation of raw PM 2.5 observations that were retrieved from the CNEMC website, as PM 2.5 observations on 40 % of sampling days suffered from data losses due to unexpected reasons. To reduce the impact of missing-value-related sampling (from hourly to daily) bias on the subsequent homogeneity test, we filled those missing-value-related data gaps that were found in each 24 h PM 2.5 observation using our recently developed diurnal cycle constrained empirical orthogonal function (DCCEOF) method (Bai et al., 2020b). Such a gap filling effort enabled us to improve the percentage of days without missing data during the study time period from 58.8 % to 97.3 %.
In spite of the improvement of data integrity after gap filling, the resultant PM 2.5 time series remain temporally discontinuous due to the emergence of several long-lasting (e.g., more than 24 consecutive hours) missing-data episodes.
K. Bai et al.: A homogenized daily in situ PM 2.5 concentration dataset Also, the hourly time series are still too noisy to be handled by the current homogeneity test software due to the significant variation in PM 2.5 concentration over space and time. In such a context, the hourly PM 2.5 concentration records were aggregated to daily and monthly scales to initiate the homogeneity test. Moreover, the monthly series was primarily used to detect the possible change points while the daily series was adjusted in reference to the corresponding reference series according to the change points detected from the monthly series. To avoid a large resampling bias, monthly averages were calculated only for those with at least 20 valid daily means of a possible month at each site. The frequency of missing values in each month was also calculated as possible metadata information to support the examination of the detected change points.

Homogeneity test
A commonly used homogeneity test software, the RHtestsV4 package, was hereby applied to detect the possible discontinuities in raw PM 2.5 data series that were retrieved from the CNEMC website. As suggested in Wang and Feng (2013), RHtestsV4 is capable of detecting and adjusting change points in a data series with first-order autoregressive errors. Given the low false alarm rate in change point detection and the capability to adjust discontinuity, the RHtests software packages have been widely used to homogenize climate data records such as temperature Xu et al., 2013;Zhao et al., 2014), precipitation (Wang et al., 2010a;Nie et al., 2019), and other data like boundary layer height . Two typical methods, namely the PMTred and PMFred, were embedded in a recursive testing algorithm in RHtestsV4, with the former relying on the penalized maximal t test (PMT) while the latter based on the penalized maximal F test (PMF) (Wang et al., 2007;Wang, 2008a). With the incorporation of these empirical penalty functions (Wang, 2008a, b), the problem of the uneven distribution of the false alarm rate is largely alleviated in RHtestsV4. In contrast to the PMF, which works without a reference series, the PMT uses a reference series to detect change points, and the results are thus far more reliable (Wang, 2008a, b). The way to generate reference series will be described in the next subsection. Also, the RHtestsV4 is capable of making essential adjustments to the detected discontinuities by taking advantage of the QM adjustment method .
Here the PMT method rather than the PMF was used to detect change points given the higher confidence of the former method in change point detection due to the involvement of reference series . To ensure the reliability of detected discontinuities, the change point was defined and confirmed at a nominal 99 % confidence level, and the data records were then declared to be homogeneous once no change point was identified. Subsequently, the QM adjustment method was applied to correct PM 2.5 observations with evident drifts with the support of reference series, namely, to homogenize PM 2.5 concentration data series. To avoid large sampling uncertainty in the estimate of QM adjustments, the Mq (i.e., the number of categories on which the empirical cumulative distribution function is estimated) was automatically determined by the software to ensure adequate samples for the estimation of mean difference and probability density function. Meanwhile, the number to determine the base segment (i.e., Iadj) was set to zero so that data in other segments were all adjusted to the segment with the longest temporal coverage.

Construction of reference series
A good reference series is vital to the relative homogeneity test because it helps pinpoint possible discontinuities in each base series (the data series to be tested) and determines the performance of the subsequent data adjustment. In general, reference series can be organized by using one specific record either measured from one adjacent station or aggregated from multiple observations (Cao and Yan, 2012;Peterson and Easterling, 1994;Xu et al., 2013;Wang et al., 2016). The most straightforward way is to use the neighboring data series either measured at the nearest station or series that are highly correlated with the base series (Peterson and Easterling, 1994;Cao and Yan, 2012;Wang and Feng, 2013). Such methods, however, fail to take the representativeness of the neighboring series into account since the neighboring series may also suffer from discontinuities.
To avoid the misuse of inhomogeneous PM 2.5 concentration records as reference series, a complex yet robust data integration scheme was hereby developed to screen, organize, and construct reference series for each in situ PM 2.5 concentration data series. For each daily PM 2.5 concentration data series, all the neighboring series were first identified from their surroundings with a lag distance as large as of 50 km. No reference series was constructed once there was no neighboring series available within the given radius, and in turn the homogeneity of the given record was not examined. Otherwise, both correlation coefficient (R) and coefficient of variation (CV) were calculated between the given base series and each selected neighboring series to assess their representativeness (Shi et al., 2018;Rodriguez et al., 2019). Then, neighboring series with R greater than 0.8 and CV smaller then 0.2 were selected as candidates to construct the reference series for a given base series.
The reference series was then constructed by averaging both the base and the candidate series at each observation time if there was only one candidate series. For the situation with more than one candidate series, the empirical orthogonal function (EOF) method was applied to these multiple candidates and then the original fields were reconstructed with the leading principal components when the accumulated variance explained by them exceeded 80 %. This was expected to reduce the possible impacts of abnormal observations and short-term discontinuities in the neighboring can-didates on the resultant reference series. Subsequently, the reference series were organized and constructed through a spatial weighting scheme as each reconstructed record was assigned a spatially resolved weight according to their relative distances to the base series over space. Here we applied a Gaussian kernel function to estimate the weight of each neighboring observation that can influence the base series in space, and such a scheme has been proven to be effective in assessing the spatial autocorrelation of PM 2.5 concentration (Bai et al., 2019b). Mathematically, the reference series can be constructed from the following equations: where PM ref and PM cand denote the reference and candidate series, respectively. N is the total number of candidate series while w is the spatially resolved weight assigned to each candidate series and d is the spatial lag distance between the base and the corresponding candidate series. h is a spatial correlation length that is used to modulate the relative influence of a distant observation on the data measured at the base site. In this study, an empirical value of 50 km was used according to the estimated semi-variogram results (Bai et al., 2019b). For any record having neighboring series within 50 km but poorly correlated (R < 0.8 or CV > 0.2) to all its neighbors (meaning the base series differ from the neighbors), the reference series were created by following the same procedures as those detailed above by taking the nearest neighbor as the base series. For the situation with only one candidate series available, it is logical to compare both the base and the candidate series against other data to check which one should be corrected. In this study, the PM 2.5 time series estimated from the MERRA-2 aerosol reanalysis in the same way as described in He et al. (2019) was used. The one with higher correlation to this external PM 2.5 time series was then used as the reference (deemed as homogeneous) while the other was considered as the base series (that needs to be adjusted). Such an inclusive scheme empowered us to screen and construct reference series for 1262 long-term PM 2.5 concentration records across the board. In contrast, no reference series were constructed for 47 isolated records.

Post-processing measures
Several post-processing measures were applied to the adjusted data records to further improve the quality of this dataset. Since nonpositive values may appear in the QM adjusted data series if the original values are close to zero (Wang et al., 2010b), nonpositive values were replaced with the smallest valid PM 2.5 concentration amount measured at each monitoring site during the study period. Subsequently, the data gaps in the adjusted data due to long-term missing values were filled by first calibrating the corresponding data values in the reference series measured on the same date (if available) to the homogenized datum level. The modified quantile-quantile adjustment (MQQA) method proposed in Bai et al. (2016) was hereby used given its adaptive data adjustment principle. For the predicted values, such a MQQA scheme rendered higher accuracy than those interpolated from data values measured on adjacent dates because PM 2.5 concentration is spatially more correlated than in the temporal domain (Bai et al., 2019b). For the remaining data gaps, those missing values were reconstructed in a similar procedure to the DCCEOF method (Bai et al., 2020b). Note that the matrix used for EOF analysis in the context of DCCEOF was constructed using the neighboring data series measured within a radius of 100 km with a temporal lag of 30 d at most. Finally, all data values were rounded to integers to be in line with the original PM 2.5 concentration observations.

Descriptive statistics
Prior to data homogenization, we first need to exclude those short-term and less reliable records. Figure 2 shows the temporal variations in the number of air quality monitoring stations deployed in China during 2015-2019 as well as the spatial patterns of the frequency of missing values for each long-term PM 2.5 concentration record. It shows that a total of about 1630 air quality monitoring stations had been deployed in China before 2020. Nevertheless, about 1500 sites routinely providing PM 2.5 observations were kept in operation since 2015 (Fig. 2a). By referring to the data continuity of PM 2.5 observations, it is noticeable that 100 monitoring stations had been withdrawn before 2020 because no PM 2.5 observations were provided for more than three consecutive months since the release of their last valid data (Fig. 2b). Meanwhile, 42 pairs of stations were found to be relocated since new stations nearby started to provide PM 2.5 observations soon after the suspension of the original site. This is also corroborated by the temporal lags of PM 2.5 observations between original and newly deployed stations, as many of them were found to have a time lag less than 15 d. Also, 94 sites were found to have limited data records due to short temporal coverage (newly deployed). Finally, 1353 long-term PM 2.5 concentration records with their first valid data released earlier than 2015 were identified. In regard to the frequency of missing values, it is indicative that data gaps were obvious in these long-term PM 2.5 concentration records, with about 6 % of hourly data values missed on ∼ 47 % of sampling days on average. This also motivates us to first fill such data gaps to improve the data integrity.

Homogenization of in situ PM 2.5 data
A total of 1395 long-term (with 5-year observations) PM 2.5 concentration records were acquired with the inclusion of 42 temporally merged data series at those relocated stations. After removing those suffering from more than three consecutive months of data losses, 1309 long-term yet consecutive PM 2.5 concentration records were obtained. The homogeneity test was finally performed on 1262 records due to the availability of reference series. Figure 3 shows the spatial patterns of the total number of change points detected in 1262 monthly PM 2.5 concentration records. The ubiquitous change points imply that there is an obvious inhomogeneity in this in situ PM 2.5 concentration dataset. About 57 % (719 out of 1262) of the records failed to pass the homogeneity test due to the presence of change points. Given the overall good agreement between the base and reference series (refer to Fig. S1 for the correlation coefficient and root mean square error between them), it indicted that these PM 2.5 concentration records did suffer from evident discontinuities. Meanwhile, the vast majority (∼ 80 %) of the inhomogeneous PM 2.5 records suffered from no more than two change points (Fig. 3), suggesting the mean shift could be the primary reason for the detected discontinuities. Moreover, 20 records were even found to be suffering from no less than five significant change points, indicating phenomenal discontinuities in these records. Figure 4 shows the temporal variability of the number of change points detected in monthly PM 2.5 concentration records. As indicated, change points were detected in every specific month of the year from May 2015 to July 2019, especially in late spring (e.g., May), in which change points were more likely to be detected (Fig. 4b). This is attributable to the seasonality of PM 2.5 loading in China as high PM 2.5 concentrations are always observed in the winter whereas low values are observed in the summer. Consequently, change points were more likely to be detected during the chronic transition periods (e.g., spring to summer). In addition, it is noteworthy that a large volume of change points was detected in early 2015, indicating the existence of phenomenal discontinuities during this period (Fig. 4a). After checking the temporal variations in PM 2.5 concentration, findings indicate that PM 2.5 observations varied with large deviations among each other during this period. This could be linked to the imperfect instrument calibration or irregular operation in the early stages.
Due to the lack of essential metadata information, it is a challenge for us to verify each detected change point through a manual inspection. Rather, the variations in the base and reference series were explored to identify the possible reasons for the detected discontinuities. Figure 5 presents three typical inhomogeneous PM 2.5 time series with different numbers of change points. The inter-comparisons between the base and reference series indicate an overall good agreement among them in terms of the long-term variation tendency. However, drifts were still phenomenal in their residual series, which were even more evident when referring to their mean-shift series. For example, both the residual and mean-shift series shown in Fig. 5d clearly illustrate a typical discontinuity as there was an obvious departure of mean PM 2.5 concentration level during the period of January to October 2016. In contrast, Fig. 5b and e present another typical inhomogeneity as a statistically significant decreasing trend was found in the residual series, with monthly PM 2.5 concentration deviations decreasing from nearly 5 to −4 µg m −3 stepwise. Such inhomogeneity would undoubtedly result in a large bias in the trend estimations over that region. Figure 5c and f show the change points detected in the merged PM 2.5 time series at a pair of relocated sites. It is noteworthy that the detected discontinuity should be largely ascribed to the inconsistency that emerged in the first data series rather than to the site relocation. Figure 6 shows the estimated linear trends for PM 2.5 residual series that failed to pass the homogeneity test. Approximately 89 % of the residual series were found to exhibit statistically significant linear trends, suggesting the vital importance of homogenizing such PM 2.5 concentration records as the trend estimations at these stations could be prone to large bias without essential adjustments. Further comparisons of the percentage of data gaps between homogeneous and inhomogeneous records (Fig. S2) as well as the spatial distance between the base and the reference series (Fig. S3) indicate that both the frequency of data gaps and lag distance in space have no obvious impact on the change point detection. In other words, the detected change points have no linkage with neither missing value frequency nor spatial distance between the base and neighboring series, suggesting a high confidence level of the identified discontinuities in these PM 2.5 concentration records.
Given the emergence of obvious discontinuities in more than half of the selected long-term PM 2.5 concentration records, the QM adjustment method was applied to correct the discontinuities detected in each PM 2.5 concentration record. Figure 7 shows an example of homogenization on PM 2.5 concentration data series that suffered from evident drifts from its reference (large drifts shown in Fig. 5d). The inter-comparisons of PM 2.5 concentration data between the base and reference series indicate that the PM 2.5 concentration level was obviously underestimated by the raw observations compared with the reference, especially during the middle of 2016 (Fig. 7a). Such evident drifts were remarkably diminished after the homogenization (Fig. 7b), which shows a good agreement of the mean PM 2.5 concentration level between the homogenized datum and the reference series.

Validation with independent dataset
In this study, PM 2.5 observations that were collected independently at five consulates of the United States distributed throughout five major Chinese cities between 2015 and 2017 were used to evaluate the consistency of the derived PM 2.5 concentration records. Figure 8 shows site-specific comparisons of daily PM 2.5 concentration between homogenized and observed data in Beijing, Shanghai, Chengdu, Shenyang, and Guangzhou. It is indicative of the homogenized daily PM 2.5 concentration data being in good agreement with PM 2.5 observations sampled at US consulates, with a correlation coefficient value of > 0.95 and root mean square error of < 15 µg m −3 . Given the independent measurement of PM 2.5 concentration data at US consulates, we argue that the homogenized PM 2.5 records are accurate enough in characterizing the variability of PM 2.5 loadings in China. It is also noteworthy that the homogenized PM 2.5 records are temporally complete whereas missing values are found in PM 2.5 observations sampled at US consulates.

PM 2.5 trends estimated from the homogenized dataset
A homogenized data record is essential to trend analysis. Figure 9 presents the annual mean concentration of PM 2.5 across China between 2015 and 2019. As shown, there is a phenomenal reduction of PM 2.5 concentration in China in the past 5 years, especially over the North China Plain (the region outlined by a red rectangle shown in Fig. 9f), where the annual mean PM 2.5 concentration decreased from more than 100 µg m −3 in 2015 to about 60 µg m −3 in 2019. Such an evident decrease in PM 2.5 concentration clearly demonstrates the effectiveness of clean air actions that were implemented in recent years.
To evaluate the benefits of data homogenization on PM 2.5 trend estimations, PM 2.5 trends estimated from both the raw observations and homogenized dataset were compared. Prior to trend analysis, each PM 2.5 concentration record was standardized in reference to its mean annual cycle (i.e., PM 2.5 concentration on the same date of the year between 2015 and 2019 was averaged) to reduce the impacts of seasonality and spatial variations. Figure 10 shows a site-specific comparison of PM 2.5 trend estimations derived from raw observations and homogenized datasets during 2015-2019. In general, trend estimations from both datasets showed an evident decreasing tendency of PM 2.5 concentration across China during the study period. Nevertheless, noteworthy is that trend estimations derived from raw PM 2.5 observations  (a, d) Significant deviations during a short time period, (d, e) long-term chronic drifts with statistically significant varying trend detected in the residual series, (c, f) discontinuity due to site relocation. The left panels compare the base series with the reference and the neighboring series used to compose the reference while the right panels show the residual series between the base and reference series as well as their mean-shift series. suffered from obvious inhomogeneity over space, being evidenced by antiphase (positive versus negative) trend estimations even at adjacent stations, especially for those that had positive trends while all adjacent neighbors exhibited negative trends. These antiphase trend estimations over a small region also corroborate the existence of obvious inhomogeneity in raw observed in situ PM 2.5 concentration dataset.
The dotted antiphase trend estimations were substantially diminished after data homogenization, resulting in a spatially much more homogeneous decreasing tendency of PM 2.5 concentration across China (Fig. 10b). It is indicative that after data homogenization the national mean PM 2.5 trend was enlarged from −7.01 % a −1 to −7.25 % a −1 while the uncer-tainty was reduced from 0.25 % a −1 to 0.22 % a −1 . Also, the number of PM 2.5 records with statistically significant trends was increased from 1208 to 1248. These results collectively justify the effectiveness of the QM adjustment method in mitigating data inhomogeneity in PM 2.5 observations, which also highlight the critical importance of data homogenization in accounting for discontinuities in this in situ PM 2.5 concentration dataset. Overall, our results indicate an obvious decreasing trend of PM 2.5 concentration in China in the past 5 years at a mean rate of −7.25 ± 0.22 % a −1 . Table 1 further compares the regional mean PM 2.5 trend between 2015 and 2019. Compared with other regions of interest (ROIs) such as the Pearl River Delta (PRD; refer to Fig. S4 for the loca-  To further assess the improvement of the data quality after homogenization, the daily in situ PM 2.5 concentration records at a 1 • × 1 • grid cell resolution were grouped across China. In each grid cell, the regional mean correlation coefficient among PM 2.5 concentration time series and standard deviation of PM 2.5 trends were estimated from the raw observed and homogenized daily PM 2.5 concentration time series, respectively. Their relative differences were then calculated to show the improvements of data homogeneity within each grid cell. As shown in Fig. 11, the correlation among PM 2.5 concentration data was enhanced ubiquitously after homogenization, especially in the southwest of China (e.g., Yunnan) where obvious inhomogeneity was observed in the raw PM 2.5 observations (Fig. 10a). Meanwhile, the standard deviation of PM 2.5 trends within each grid cell was also substantially reduced, even by more than two times in magnitude (Fig. 11b). These results also highlight the critical need to homogenize the observed PM 2.5 concentration data from a large-scale monitoring network to reduce temporal inconsistency and spatial inhomogeneity that were not even noticed before.

Data availability
The raw observations of in situ PM 2.5 concentration data in China used in this study were retrieved via a web crawler from the National Urban Air Quality Real-time Publishing Platform (http://106. 37.208.233:20035, China National Environmental Monitoring Center, 2020) between 2014 and 2019. Given the deployment of many new monitoring sites in 2014, we decided to generate a coherent PM 2.5 concentration dataset starting from 2015 to include as many PM 2.5 data records as possible. The homogenized daily in situ PM 2.5 concentration dataset developed in this study is publicly accessible at https://doi.org/10.1594/PANGAEA.917557 (Bai et al., 2020a). To provide a long-term coherent PM 2.5 concentration dataset to the scientific community, the homogenized PM 2.5 concentration dataset will be regularly updated for each half a year by including new PM 2.5 observations that are retrieved during the past 6 months.

Conclusions
In this study, a homogenized yet temporally complete daily in situ PM 2.5 concentration dataset was generated based on the discrete hourly PM 2.5 concentration records that were retrieved from the China National Urban Air Quality Real-time Publishing Platform using a web crawler during the period of 2015-2019. To create such a long-term coherent dataset, a set of analytic methods were geared up seamlessly and applied sequentially to the retrieved raw PM 2.5 concentration records, involving quality control, gap filling, data merging, change point detection, and bias correction. This new dataset could help the scientific community better elucidate the temporal and spatial variability of haze pollution in China in recent years, which is expected to improve the understanding of underlying causes.
The raw PM 2.5 concentration records were found to be suffering from phenomenal inhomogeneity caused by data inconsistency and temporal discontinuity as well as the relocation and repeal of a number of monitoring stations. More than half of the long-term PM 2.5 concentration records were found to fail the homogeneity test due to the presence of significant change points. Further investigation confirms that large yet short-term mean shifts and chronic drifts are two    primary reasons for the detected discontinuities in raw PM 2.5 concentration records. Based on the homogenized dataset, the long-term trends of PM 2.5 concentration in China were estimated. In contrast to the inhomogeneous trend estimations that were derived from raw PM 2.5 concentration records, the homogenized dataset yielded a spatially much more homogeneous decreasing tendency of PM 2.5 concentration across China at a mean rate of about -7.3 % per year. Such an improvement of homogeneity was also evidenced by the enhanced correlation and reduced Figure 11. Spatial distributions of (a) the improvements of mean correlation coefficient among PM 2.5 concentration records before and after homogenization at a 1 • × 1 • grid cell resolution in the study area, and (b) their corresponding standard deviations of PM 2.5 trends.
standard deviation of trend estimations between homogenized PM 2.5 concentration time series in the surroundings. These results clearly demonstrate the benefits of data homogenization on the improvement of the quality of this PM 2.5 concentration dataset as evident discontinuities have been removed after homogenization. Overall, our results clearly indicate the presence of discontinuities in the raw in situ PM 2.5 concentration observations that were measured in China, and the homogenization actions are essential to the acquisition of a long-term coherent PM 2.5 concentration dataset that can be used to advance PM 2.5 pollution related policy making and public health risk assessment.
Author contributions. The study was completed with cooperation between all authors. JG and KB conceived of the idea behind generating a homogenous PM 2.5 dataset across China. KB and KL conducted the data analyses and KB wrote the paper. All authors discussed the experimental results and helped review the paper.
Competing interests. The authors declare that they have no conflict of interest. Review statement. This paper was edited by David Carlson and reviewed by two anonymous referees.