LGHAP v2: A global gap-free aerosol optical depth and PM2.5 concentration dataset since 2000 derived via big earth data analytics
Abstract. The Long-term Gap-free High-resolution Air Pollutants concentration dataset (LGHAP) provides spatially contiguous daily aerosol optical depth (AOD) and particulate matters (PMs) concentration at 1-km grid resolution in China since 2000. This advancement empowered some unprecedented assessments of aerosol variations and its impacts on environment, health, and climate in the past few years. However, there is a need to improve such a MODIS-like gap-free high resolution AOD and PM2.5 concentration dataset with new robust features. In this study, we present the version 2 of such a global-scale LGHAP dataset (LGHAP v2) that was generated using an improved big earth data analytics approach via a seamless integration of distinct data science, pattern recognition, and deep learning methods. To better reconstruct global AOD distribution from daily MODIS AOD imageries, multimodal AODs and air quality measurements acquired from relevant satellites, ground monitoring stations, and numerical models across the globe throughout the past two decades were firstly harmonized by harnessing the capability of random forest-based data-driven models. Then, an improved tensor-flow-based AOD reconstruction algorithm was developed to weave harmonized multi-source AODs products together for gap-filling. The results of ablation experiments demonstrated the improved tensor-flow-based gap filling method has a better performance in terms of both convergence speed and data accuracy. Ground-based validation results indicated a good data accuracy of the global gap-filled AOD dataset, with R of 0.85 and RMSE of 0.14 compared against worldwide AOD observations from AERONET, which is better than the purely reconstructed AODs (R=0.83, RMSE=0.15) and slightly worse than raw MAIAC AOD retrievals from Terra (R=0.88, RMSE=0.11). A novel deep learning model, named as the scene-aware ensemble learning graph attention network (SCAGAT), was developed to better predict PM2.5 concentrations across the globe. By gaining better spatial representativeness of data-driven models across regions, the SCAGAT algorithm performed better during spatial extrapolation, largely reducing modeling biases over regions even though in situ PM2.5 concentration measurements are limited or absent. Site-specific validation results indicated that the gap-free PM2.5 concentration estimates exhibit higher prediction accuracies with R of 0.95 and RMSE of 5.7 μg m−3, compared against the PM2.5 concentration measurements obtained from priorly held-out sites worldwide. Overall, leveraging state-of-the-art methods in data science and artificial intelligence, a quality-enhanced LGHAP v2 dataset was generated through big earth data analytics by weaving multimodal AODs and air quality measurements from different sources together cohesively. The gap-free, high-resolution, and global coverage merits render LGHAP v2 dataset an invaluable data base to advance aerosol- and haze-related studies and trigger multidisciplinary applications for environmental management, health risk assessment, and climate change analysis. All gap-free AOD and PM2.5 grids in the LGHAP v2 dataset are shared online publicly (Bai et al., 2023a), with data user guide and relevant visualization codes available at https://doi.org/10.5281/zenodo.10216396.
Status: final response (author comments only)
LGHAP: Long-term Gap-free High-resolution Air Pollutants concentration dataset https://zenodo.org/communities/ecnu_lghap
Viewed (geographical distribution)