Roadside noise barriers (RNBs) are important urban
infrastructures to ensure that cities remain liveable. However, the absence of accurate and large-scale geospatial data on RNBs has impeded the increasing progress of rational urban planning, sustainable cities, and healthy environments. To address this problem, this study creates a vectorized RNB dataset in China using street view imagery and a geospatial artificial intelligence framework. First, intensive sampling is performed on the road network of each city based on OpenStreetMap, which is used as the georeference for downloading
In recent years, several studies have documented the substantial impact of traffic noise problems in cities (Apparicio et al., 2016; Begou et al., 2020). Roadside noise barriers (RNBs) are a vital urban infrastructure that contribute significantly to mitigate undesirable traffic noise in communities (Abdulkareem et al., 2021; Ning et al., 2010). Additionally, RNBs contribute to the development of sustainable cities in many ways. For example, with the emphasis on new energy, RNBs are being used to install solar photovoltaic panels, thereby increasing the utility of new energy sources (Gu et al., 2012; Zhong et al., 2021). The reasonable presence of RNBs also enables the airflow in the urban canyon region to be adjusted, thereby improving the roadside air quality (Huang et al., 2021; Zhao et al., 2021). Because of the importance of RNBs in building sustainable cities, the demand for RNBs has increased alongside traffic growth in recent decades (Den Boer and Schroten, 2007; Oltean-Dumbrava and Miah, 2016). There are bottom-up benefits from establishing an accurate and standardized large-scale RNB dataset with detailed geospatial information about RNBs, including their mileage, location, and distribution (Liu et al., 2020; Wang and Wang, 2021). Specifically, precise RNB locations enable traffic departments to effectively manage and maintain this type of infrastructure (Sainju and Jiang, 2020), urban research can simulate dynamic cities based on accurate RNB geospatial information (Wang and Wang, 2021; Zhao et al., 2017), and governments can rely on the RNB maps to examine urban layouts and create green and sustainable cities (Song et al., 2021; Song and Wu, 2021).
Over the past few years, extensive geospatial databases have been established to store data on many aspects of urban infrastructure (Griffiths and Boehm, 2019; Perkins and Xiang, 2006). However, the sharing and exchange of RNB data in these databases are restricted, and the data only cover a limited geographic area (Wang et al., 2019; K. Zhang et al., 2022). These challenges to data acquisition are because databases have to adhere to various standards related to geographic data (e.g., file format and geographic coordination reference; Lafia et al., 2018). On the other hand, the RNB data are often created and updated manually through road inspections and investigations which are costly and time consuming, especially on a large scale (Potvin et al., 2019; Ranasinghe et al., 2019). The RNB geospatial dataset must be generated, and kept up to date, as soon as possible using alternative, efficient methods.
Street view imagery is georeferenced data densely covering the road network of cities. As a new geospatial data source, it provides depictions of real-world surroundings, including natural landscapes and the built environment, and enables users to recognize physical objects, urban dynamics features, and geographic scenes on a large scale (Zhang et al., 2018). In addition, as part of the data sharing movement, an increasing number of community-based organizations and corporations, such as Baidu Maps, Tencent Maps, and Google Maps, are regularly generating and updating open-access street view imagery (Qin et al., 2020; Zhang et al., 2019). Such big data bring great prospects for acquiring urban infrastructure information (e.g., RNBs), with benefits such as broad coverage, a rapid update speed, and low acquisition costs (Kang et al., 2020). However, manual interpretation is a tedious task, and conventional computer vision algorithms struggle when confronted with large amounts of data and complex image features (Zhang et al., 2018).
With the advancement of computing hardware and frameworks, deep learning methods now have an increased capacity for extracting semantic features from a large amount of data (Lecun et al., 2015; Liu et al., 2022). The emerging approaches are increasingly being used to interpret physical objects and detect interior patterns from Earth observation data (Z. Zhang et al., 2022; Qian et al., 2022). Meanwhile, image classification based on deep learning has been used to identify RNBs using street view imagery (Zhong et al., 2021). However, for the purposes of identifying RNBs, prior geographic knowledge, which is essential, is frequently overlooked, such as the fact that RNBs are frequently located between roads and densely populated regions (e.g., residential, educational, and medical areas; Arenas, 2008; Wang et al., 2018; K. Zhang et al., 2022). In recent years, a new framework of data-driven research based on geospatial artificial intelligence (GeoAI) and machine learning has resulted in multiple notable improvements in the discovery of geographic scene knowledge (Goodchild and Li, 2021; Li, 2020). When empirical and prior spatial information are included into deep learning approaches, they can help to develop a more holistic understanding of a research subject and mitigate the effects of data scarcity or representational bias (Janowicz et al., 2019; Qian et al., 2020). As a result, it is possible to enhance the effectiveness of deep learning methods in identifying RNBs by incorporating some prior geographic knowledge from street view imagery. Additionally, Wolpert and Macready (1997) introduced the “no free lunch” theory, demonstrating that a single model must pay for some accuracy by degrading its generalizability. This is acceptable, as it is challenging to construct a perfect solution for all scenarios using a single model, particularly when dealing with vast volumes of data and large-scale areas (Wang and Li, 2021).
The purpose of this study is to build an accurate and nationwide vectorized
RNB dataset utilizing Baidu Street View (BSV) imagery. To improve the
performance for the detection of RNBs, this work proposes a GeoAI framework.
Concretely, an ensemble of convolutional neural networks incorporating image
context information (IC-CNNs) is developed, which considers the prior
geographic knowledge contained in street view images. Subsequently, a
post-processing method is applied to generate the vectorized RNB dataset
based on the identified RNB locations. Last, the RNB dataset quality is
quantitatively evaluated from two perspectives, i.e., the detection accuracy
and the completeness and positional accuracy. The main contributions
of this study can be summarized as follows:
This study provides the first reliable and nationwide vectorized RNB dataset in China and provides labeled BSV images which can be used as a benchmark dataset. A GeoAI framework is presented for the processing of numerous BSV images in order to generate the RNB mapping and for the comprehensively evaluation of the generated results. This study presents multiple IC-CNNs based on prior geographic knowledge and an ensemble learning strategy to achieve high-performance object identification from street view imagery.
The remainder of this paper is organized as follows. Section 2 briefly
describes the data and methods used to generate and evaluate the RNB
dataset. Section 3 presents the results of the RNB mapping and an evaluation and analysis for the RNB dataset. Section 4 discusses the capability
of proposed methods, as well as the challenges and limitations of this work.
The last section provides the conclusions of this study.
The GeoAI framework's workflow is divided into three stages: data preparation, modeling, and evaluation, as shown in Fig. 1. To begin with, BSV images are gathered during the data preparation stage using OpenStreetMap (OSM) road data and the BSV application programming interface (API). Subsequently, BSV images are used to generate various samples for modeling and evaluation. During the modeling stage, deep learning approaches are used to detect RNBs from the BSV imagery. Using the vectorization post-processing method, the identified and scattered RNB locations are subsequently processed into a vectorized dataset. During the evaluation stage, the quality of the created dataset is quantitatively assessed in two aspects, i.e., the detection accuracy and completeness and positional accuracy.
The flowchart of the GeoAI framework to generate the vectorized RNB dataset.
There are three types of data are acquired for this study, i.e., the road networks, administrative boundary, and street view imagery. Afterwards, training, validation, and test samples are collected based on these data. The data from Taiwan Province are scarce.
The road networks were downloaded from OSM (
There are three data sources used in this study. OSM road network data
The city boundary was acquired from
With their high-resolution and detailed information on Chinese streets, BSV
images are of comparable quality to Google Street View images, which are not
available in China (H. Zhou et al., 2019). Numerous sample points along OSM roads are collected, and the BSV API is utilized to obtain street view images at those locations. Following the work of K. Zhang et al. (2022), a sampling interval of around 25 m is utilized to account for the tradeoff between data granularity and the expense of downloading imagery. As a result, the total number of sample points is 24 871 839. As shown in Fig. 3, an illustration of the BSV images, with photographs showing different directions, shows that a BSV image with a 90
Illustration of BSV images, with photographs showing different directions (BSV images are from © Baidu Maps, 2022).
Zonal statistics of the number of BSV images in China.
Illustration of BSV samples, including four typical types of RNBs
based on physical shapes
An effective sampling technique for generating training, validation, and test image samples is developed to detect RNBs from the large volume of BSV images collected. According to their physical shapes, the RNBs identified in this study can be categorized into the following four distinct types: upright noise barrier, top curved noise barrier, noise barrier with folded corners at the top, and large curved noise barrier, as depicted in Fig. 5. Figure 1 illustrates the different steps followed in the data preparation stage. The BSV images are classified into four tiers based on their location within the city administration hierarchy. Subsequently, the training, validation, and test sampling set are subdivided from the entire images, accounting for 60 %, 20 %, and 20 % of images, respectively. These sampling sets can be used to collect the corresponding samples and are beneficial in that they avoid the mixing of samples.
Previous investigations revealed that BSV images with RNBs are rare,
accounting for less than 5 % of the sampled images. To alleviate the
impact of the class imbalance problem on model training, 50 000 images are
randomly selected from each city tier based on the training sampling set.
These samples are labeled as positive type (i.e., image with RNB) or
negative type (i.e., image without RNB) by manual visual interpretation, the
details of which are shown in Fig. 6. Subsequently, the same number of
positive and negative samples are maintained. Certain objects, such as
tunnel inner walls, billboards, and guardrails, seem like RNBs in images,
which intensifies the difficulty of deep learning, as shown in Fig. 5.
Therefore, 500 images of each of these objects are added as confusing
negative samples to the training samples. The ultimate training sample size
is 14 484, including 6492 positive and 7992 negative samples. To generate
the validation and test samples, 500 and 2500 image samples from each city
tier are chosen. There are 79 positive samples and 1921 negative samples in
the validation samples, while there are 350 positive samples and 9650
negative samples in the test samples. The details of the sample collection
results are shown in Table 1. The labeled BSV images are available at
The flowchart of BSV image labeling.
The construction of the convolutional neural network incorporating image background information (BSV images are from © Baidu Maps, 2022).
Details of sample collection results.
RNBs are widely placed on the roadside in densely populated regions, such as
residential areas and educational and government institutions, as previously
described in other studies (Arenas, 2008; Wang et al., 2018; K. Zhang et al., 2022). Therefore, based on this prior geographic knowledge, an IC-CNN that
leverages the context information contained in BSV images is developed
which aims at enhancing the RNB detection accuracy. Figure 7 illustrates the
construction of IC-CNN, which adopts the ResNet architecture (He et al.,
2016). In this workflow, prior geographic knowledge is incorporated into the
neural network by means of transferring learning. Initially, 500 samples are
randomly selected from positive and negative training samples in each tier.
There are three context labels added, depending on the context of these BSV images, i.e., building dominated, non-building dominated, and uncertain (unable to judge the background of the BSV image because it is obscured by objects), as shown in Fig. 6. The context labels are interpreted by semantic segmentation models released by the MIT Computer Vision team (B. Zhou et al., 2019). Besides the sky and ground objects, images are judged to be building dominated if the ratio of building objects is the most; otherwise, they are evaluated to be non-building dominated. Additionally, the uncertain type is classified by a visual interpretation of whether the background environment in the image is obscured. These labeled images are available at
Zonal statistics of RNB mileage in China. The blank areas indicate no RNBs or a lack of BSV images.
Distribution of RNBs in several representative cities (base maps are from Esri).
Owing to the high cost of labeling and the restricted quantity of trained samples, an ensemble learning strategy for enhancing RNB detection accuracy is utilized in this study based on the “no free lunch” theory (Wolpert and Macready, 1997). In an ensemble learning domain, the effective strategy to boost performance is to integrate the numerous high-variance models together (Cao et al., 2020). Therefore, this study integrates four IC-CNNs, and their convolutional layers are chosen from the ResNet family (He et al., 2016; Zagoruyko and Komodakis, 2016), including ResNet101, ResNet152, Wide ResNet50, and Wide ResNet101. The integration of the four IC-CNNs with varying capacities for feature extraction can make a significant contribution to achieving high detection accuracy.
After performing a detection run by an ensemble of IC-CNNs, the identified and scattered RNB locations are connected to create a vectorized RNB dataset by a post-processing technique, which is based on the spatial neighbor relationship between samples. Specifically, if adjacent sample images of the same road contain RNB objects, their locations will be connected. Furthermore, the findings of Sainju and Jiang (2020) demonstrated that the “near objects are more related” principle (Tobler, 1970, 2004) holds true when using street view imagery to detect objects at the urban scale. Therefore, in this study, given the likelihood of RNB misidentification, if a sample image is flanked by images containing RNBs in the same road, it will be considered as a positive type to minimize the impact of misidentification.
To evaluate the accuracy of RNB detection, four quantitative metrics in the
deep learning classification task, including overall accuracy (OA), recall,
precision, and
To quantitatively evaluate the completeness and positional accuracy of generated RNBs, two quantitative metrics, including the root mean squared error (RMSE) and the intersection over union (IoU) are adopted (Rezatofighi et al., 2019). To calculate these metrics, numerous roads are selected from various cities and are surveyed manually as ground truths based on BSV
imagery. Based on the mileage deviation and overlap relationship between the
generated and surveyed RNBs, RMSE and IoU are calculated following Eqs. (7)
and (8), respectively:
Several techniques to enhance the performance of the model throughout the training and inference stages are employed in this study. Data augmentation techniques such as random resized cropping and random horizontal flipping are utilized to increase the data volume and decrease model bias error. The model parameters are optimized using the cosine annealing learning rate scheduler (Bhattacharyya et al., 2021) and AdamW optimizer (Loshchilov and Hutter, 2017). Long training and inference resized tuning (Touvron et al., 2019) are employed to improve the model's performance. Finally, an ensemble of models identifies RNBs based on the voting mechanism.
RNB mapping result in the city scale (BSV images are from © Baidu Maps, 2022).
Heat maps of IC-CNNs on BSV images with RNB. The hotspots indicate the area where the attention of IC-CNN is focused (BSV images are from © Baidu Maps, 2022).
The final RNBs dataset is available at
Evaluation results of RNB identification in different city tiers. The evaluation results of every city tier are calculated using the test samples of the corresponding city tier, while the overall evaluation results are calculated using the entire test samples.
Ablation study design. The ablation study combines the four strategies used in this study to illustrate their effectiveness.
Quantitative results of ablation. The ablation results show that the proposed methods have the highest RNB detection accuracy. The bold values indicate the highest value in each metric.
After analyzing the generated RNB dataset from a national scale, three cities with the highest RNB mileage in each tier are selected to analyze the citywide mapping results, as shown in Fig. 9. The figure shows that RNBs are generally clustered in the central areas of these cities. For example, the RNBs in Shanghai are mainly clustered on the third ring road, while those in Beijing are mainly clustered on the sixth ring road. As a result, when combined with the planned layout and actual mapping of RNB distribution, the generated RNB dataset can partially reflect the rationality of urban infrastructure planning and layout.
Table 2 summarizes the evaluation results of the RNB identification at different city tiers based on test samples. The OA and the
To evaluate the completeness and positional accuracy of the RNB dataset,
approximately 254.45 km of roads are selected from different city tiers and
manually surveyed using the BSV imagery. Appendix C summarizes the detailed
quantitative differences between generated and surveyed RNBs in terms of
mileage deviation and level of overlap. The overall RMSE for the mileage
deviation is 0.08 km, and the IoU for the overlay level is 88.08 %
Moreover, as illustrated in Fig. 10, a visual comparison between surveyed and generated RNBs on various roads depicts that the generated and surveyed RNBs on the road are overall consistent in terms of mapping. However, several validated points demonstrated that the proposed deep learning approach incorrectly recognized small RNB objects in the images, such as validated points IV, II, and III on Beijing's Jingmen Highway, Zhengzhou's Longhai Expressway, and Wenzhou's Ouhai Boulevard, respectively. Additionally, several objects that looked similar to RNBs, such as multi-windowed buildings, are misclassified as a positive type, for example, point IV on Wenzhou's Ouhai Boulevard and points II and III on Nantong's Binjiang Bridge. Despite these misclassifications, most of the validated points demonstrated a high accuracy of the RNB prediction and the high performance of the proposed framework, implying the reliability of the generated RNB dataset.
An ablation study is conducted to demonstrate the quality of the generated dataset and validate the effectiveness of developed methods (Table 3). As shown in Table 4, the combination of proposed strategies achieves the highest performance. The ablation results illustrate that the effectiveness of proposed strategies, including integrating image context information into CNN, adding confusing negative samples, and using an ensemble learning strategy. Additionally, Fig. 11 depicts the areas of the IC-CNN's attention, revealing that IC-CNNs not only have a capacity for focusing on RNB objects in BSV images but also have a sense of their surroundings. The results suggest the reliability of the generated dataset and partially decipher the “black box” of deep learning to explain the high performance of the developed methods. Notably, this study successfully achieves incorporating some of the prior geographic knowledge into the deep learning method. RNB detection accuracy can be increased further by combining more comprehensive knowledge of geographic scenes from BSV images into deep learning network, such as various geographic elements and processes and the associated construction theory (Lü et al., 2018).
Confidence assessment in the mapping accuracy for cities with low-mileage RNBs.
This study has several limitations in the process of dataset generation which can be grouped into three categories, namely data source, ground scenario, and modeling.
Due to the economic status, topographical conditions, or government policies, not all Chinese cities are covered by BSV imagery, with data not being available for 17 cities (Deng et al., 2021; Du et al., 2020). In addition, challenges owing to overexposure or obstruction of the sensors by vehicles hinder the capturing of a complete street scene. As a result, the natural characteristics of the data source can have certain impacts on the accuracy of the RNB dataset.
The road/traffic environment is often complex. Concretely, BSV sensors can detect RNBs on distant highways or other lanes, and it may result in some mistakes during RNB detection and mapping. However, the likelihood of this occurring is small (about 4 % of RNB samples) by sampling investigation.
This study implicitly presupposes that BSV images are independent and identically distributed. As shown in Fig. 9, the developed GeoAI framework can achieve a high performance in continuous RNB mapping. However, the spatial autocorrelation effect in BSV images is overlooked, as BSV images taken along the same road network path frequently resemble the adjacent one (Sainju and Jiang, 2020).
Moreover, there are some uncertainties in cities with short mileage RNBs which may be generated due to misidentification. A manual survey is performed to verify the confidence level of these cities. Table 5 shows the quantitative results, which indicate that the shorter the RNBs, the lower the confidence level. In addition, the results show that the confidence level is lowest for cities with RNBs of less than 0.2 km, so further validation is needed when applying them in specific applications.
In the future, to address the data shortage issue, more data sources, such as Google Maps and Tencent Maps, will be used. Additionally, approaches for photogrammetry and image scene understanding techniques will be developed to tackle the complex ground scenario. Finally, end-to-end deep learning algorithms will be constantly enhanced by the addition of more powerful units and structures to account for spatial autocorrelation in street view imagery.
The road networks come from OSM (
The codes of deep learning approaches in this study are available at
This study presents the first nationwide vectorized dataset of RNB and the
benchmark dataset of the labeled BSV images in China using BSV imagery and a
GeoAI framework. In this study, based on prior geographic knowledge in BSV
imagery, RNB samples are identified based on deep learning approaches, and
the vectorized RNB dataset is subsequently constructed using the vectorization post-processing procedure. The created RNB dataset is evaluated from two perspectives, i.e., the detection accuracy and the completeness and positional accuracy. The four quantitative metrics, OA, recall, precision, and
The intended applications for the two datasets are diverse. In terms of the vectorized dataset of RNBs, urban studies can benefit from accurate information of RNB mileage, location, and distribution. For example, the regional energy potential of solar photovoltaic panels on RNB can be estimated, finer 3D urban models are able to be developed, and the sustainability of urban layouts can be evaluated. On the other hand, the benchmark dataset of labeled BSV images may contribute to multiple other research and applications related to RNBs identification, such as developing advanced deep learning algorithms and fine-tuning existing computer vision models to detect RNBs more accurately and exploring the further relationship between the RNB locations and surrounding environment.
Details of the BSV image identification results.
Identification of a confusion matrix based on test samples.
The total RNB mileage in China is 2667.02 km. The RNB mileage values in different city tiers are 614.34, 995.45, 710.25, and 346.32 km, respectively. The average RNB mileage values in different city tiers are 102.39 km (
Details of RNB mileage by city in China. The RNB mileage values of some cities are 0 km, indicating that they lack RNBs or BSV images or that the BSV images are out of date. Specifically, there are 17 cities lacking BSV images, e.g., Baisha, Baoting, Changjiang, Dingan, Ledong, Língāo, Sansha, Wenchang, Jiyuan, Daxing'anling, Shuangyashan, Guoluo, Huangnan, Bazhong, Nujiang, Zhoushan, Xinji.
Quantitative comparison with the generated and surveyed RNBs in different roads in different city tiers. The 4–7.5 km of roads with RNBs are selected as surveyed objects. The total road mileage is around 254.45 km.
ZQ developed the framework, performed experiments, and wrote the original draft. MC conceptualized and supervised the project and contributed with the design of the work and the critical revision of the article, together with TZ, FZ, ZZ, and RZ. YYa and KZ collected and processed data source and published the dataset. ZS aided with the data preparation. PM aided in data collection and visualization. GL and YYe contributed with the technical review.
The contact author has declared that none of the authors has any competing interests.
Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
We appreciate the detailed suggestions and comments from the anonymous reviewers. We express heartfelt thanks to the other members of the Smart City Sensing and Simulation lab and OpenGMS lab, who undertook the data collection and annotation work. The data of this work are licensed and hosted by the National Tibetan Plateau Data Center.
This research has been supported by the National Natural Science Foundation of China and National Natural Science Foundation of China-Guangdong Joint Fund (grant no. U1811464) and the Postgraduate Research and Practice Innovation Program of Jiangsu Province (grant no. KYCX22_1567).
This paper was edited by Alexander Gruber and reviewed by two anonymous referees.