SlideShare uma empresa Scribd logo
1 de 9
Baixar para ler offline
BOHR International Journal of Computational Intelligence and Communication Network
2022, Vol. 1, No. 1, pp. 1–9
https://doi.org/10.54646/bijcicn.001
www.bohrpub.com
Recognizing Traffic Signs with Synthetic Data and Deep Learning
Avaz Naghipour∗ and Rahim Pasbani
Department of Computer Engineering, University College of Nabi Akram, Tabriz, Iran
∗Corresponding author: naghipour@ucna.ac.ir
Abstract. Recently, in-depth learning about computer vision and object classification tasks has surpassed other
machine learning (ML) algorithms. This algorithm, alike similar ML algorithms, requires a dataset for training.
In most real cases, developing an appropriate dataset is expensive and time-consuming. Also, in some situations,
providing the dataset is unsafe or even impossible. In this paper, we proposed a novel framework for traffic sign
recognition using synthetic data and deep learning. The main feature of the proposed method is its independence
from the real-life dataset, which leads to high accuracy in the real test dataset. Creating one-by-one synthetic data
is more labor-intensive and costlier than providing real data. To tackle the issue, the proposed framework uses a
procedural method, which gives the possibility to develop countless high-quality data that are close enough to the
real data. Due to its procedural nature, this framework can be easily edited and tuned.
Keywords: Deep Learning, Convolutional Neural Networks, Computer Graphics, Synthetic Data, Traffic Sign
Recognition.
INTRODUCTION
Nowadays, many kinds of research and innovations are
conducted to enhance autonomous vehicle technologies.
Giving a semantic perception to artificial intelligence (AI)-
based drivers to recognize environmental objects is one of
the main goals in research [1–3]. Indisputably, traffic sign
recognition ability plays a significant role in AI-based vehi-
cles. These signs are guidance that makes drivers aware
of upcoming situations. Thus, traffic sign detection and
recognition that computer vision applications are trying to
address are considered a significant issue. While having a
good dataset for ML applications (especially in supervised
ML) is mandatory, there are a few premade datasets avail-
able on demand [4]. Accessible free datasets are usually
used to benchmark competition or evaluate state-of-the-art
applications. For preparing production-level applications,
first, it is crucial to provide a proper dataset with adequate
quantity and quality. In the case of an image classifier, the
datasets are normally images captured with cameras from
real-life instances. At the first glance, providing images
seems to be a handy and cost-efficient procedure; however,
when the required number of images for the train classi-
fiers is taken into consideration, the difficulty of preparing
such datasets becomes bold. ML algorithms in general
and deep learning in particular use thousands or even
millions of images to give a reliable and practical result.
Obviously, providing this amount of image in many cases
is wearisome and costly, if not impossible.
Other than the cost and expenses, a key issue with pro-
viding a traffic sign dataset is time. For more clarity, let us
conjure up the traffic sign dataset obtaining procedure, and
how time-consuming it would be to capture, crop, edit, and
label singly and manually. A more significant problem is
the time needed to capture photos in different seasons and
conditions of a year. For instance, if images are captured
only during a hot summer, the dataset would not include
the images of signs covered with snow during winter, and
encountering such images causes trouble for the classifier.
Thus, generalizing the model comprehensively requires
at least 1 year of waiting, to include all seasonal visual
appearances.
Another solution is using CAD datasets. Synthetic
images rendered from a 3D virtual scene have been used
vastly in computer vision tasks [5]. Recently, they are
used in object detection and classifier applications. Flying
Chairs [6], FlyingThings3D [7], SYNTHIA [8], and Scene
Net [9] are examples of synthetic images based on datasets
that are used to train or evaluate relevant ML algorithms.
Fortunately, by progressing in the computer graphics
(CG) industry, a number of online CAD datasets and
premade 3D objects are growing. However, except for very
1
2 Avaz Naghipour and Rahim Pasbani
few datasets, there has been no access to CAD models yet.
Considering the issue of making CAD data, it is perceptible
that capturing real images may be cheaper than developing
synthetic ones. Modeling even a very simple 3D model
is a time-consuming process that requires experts to be
accomplished.
Another problem with most synthetic images is their
dissimilarity to real objects. These images are far distinctive
from real-world references in terms of appearance. By
browsing some accessible CAD datasets, it can be explicitly
seen that the objects do not have the proper lighting as
we have in the real world. Another significant issue that
makes CAD models look rough is the texture and material
of models. Instead of resembling real materials such as
wood, fibers, and metals, those models seem to be made
of solid clay. If the ML application is trained by non-real-
like images, the result will not give adequate accuracy
in ground-truth cases. This is why sometimes researchers
choose to mix them with some real images to improve the
functionality of the models [10].
To overcome these challenges, an efficient approach is
proposed in the present work to develop synthetic traffic
signs. To this end, a procedural way is used to provide
the desired dataset without any quantity limitation. In the
proposed method, every small detail is taken into account
to make the images quite real-looking, so that they can
hardly be recognized from real images. To do so, various
ML classifiers were employed to be trained by the devel-
oped dataset. The outline of the paper is as follows: Section
2 discusses related works. Section 3 presents synthetic
image generation. The overview of image processing filters
and image augmentation are studied in Sections 4 and 5,
respectively. Section 6 describes setting up deep convolu-
tional neural network (DCNN) architecture. In Section 7,
experiments and results are reported. Section 8 concludes
this paper.
RELATED WORKS
There are various algorithms for classifying traffic signs.
The most regarded algorithms in this field are based on ML
methods; however, there are some research projects that
employ the color and shape of the signs. These characteris-
tics cannot be directly used to classify the traffic signs, but
they can remarkably help the actual classifier.
The support vector machine (SVM) for classification has
always been a selection at hand. In ref. [11], Maldonado
et al. used SVM for automatic multiclass traffic sign detec-
tion and classification using a one-vs-all approach with a
Gaussian kernel. In the other attempt [12], by considering
the limitation in the shape and color of signs, authors used
a color segmentation and shape matching approach, and
then, the dataset has classified using SVM. The obtained
results are promising. In the method suggested in ref. [13],
after the detection of a sign via the MSER procedure,
the HSV-HOG-LBP features are extracted, and then, a
random forest is used to finalize the recognition process.
Ref. [14] has tried to prove the effectiveness of the random
forest recognition algorithm in both accuracy and speed
on traffic sign recognition. Nowadays, modern computer
vision classifiers mostly deploy CNN for recognition tasks.
In ref. [15], the densely connected CNN is used for traffic
sign detection. In ref. [16], the authors have shown the
results of different architectures of CNN to solve the same
recognition problem.
In the area of synthetic data deployment, some remark-
able works have been presented so far. The authors in
ref. [17] have suggested a 2D synthetic text generator
engine, which places texts onto random backgrounds and
employs the obtained data to train a CNN-based classifier
to recognize texts in graphics. Furthermore, a 3D synthetic
dataset was used in ref. [18] to predict hand gestures
in an image. The accuracy has risen after adding some
real images to the training dataset. The CAD data have
been used in some research for classifying and object
detection tasks [19, 20]. Another important challenge in
computer vision is viewport prediction. In ref. [21], ren-
dered images trained a CNN, and the result was excellent.
Most of the relevant methods have used premade online
CAD libraries, such as Trimble 3D warehouse, TurboSquid,
Yobi3D, and ShapeNet. Using premade datasets is handy
to test new models or benchmark competitions. But, in
real applications, any problem needs its own exclusively
developed dataset to be prepared.
SYNTHETIC IMAGE GENERATION
The proposed method in this research contains two main
steps. In the first step, a procedural virtual 3D scene is cre-
ated, and in the second step, a specific DCNN architecture
is designed to be trained by the generated dataset obtained
in the previous step.
First of all, a virtual 3D scene needs a setup. Then,
procedurally, in 3D world space, feasible variations, and
randomizations are made, so that every state forms a
believable arrangement of the scene’s objects and com-
ponents. By every run, a specific arrangement is made
and rendered. In the next step, to make extra purpose-
ful variations, some image processing-based filters and
manipulations are applied to the rendered images. Finally,
the image augmentation technique is employed to general-
ize the proposed classifier, as well as, image augmentation
increases the size of the training dataset.
The aim was to set up a versatile virtual scene that
can automatically develop a feasible arrangement of the
objects. To reach such a system, some objects are required
such as some 3D objects, lights, sky objects, and sev-
eral controllers that are used to control the properties of
scene components. The controllers provide mathematical
relationships among all objects in the scene. To avoid
infeasible cases, some constraints are considered. In fact,
the constraints are small codes written in Python, which
Recognizing Traffic Signs with Synthetic Data and Deep Learning 3
Figure 1. 3D scene contains 3D objects, lights, sky, background,
and some controllers, which control every aspect of the random-
ization process.
control the randomization process to prevent impractical
setups. Figure 1 depicts a schematic view of the mentioned
procedural scene.
For evaluating the proposed approach in the real world,
the German Traffic Sign Recognition Benchmark (GTSRB)
dataset is selected. This dataset was captured in different
lighting situations. Some of them were captured on a sunny
day and some in shadow or bluish-morning-like lighting.
Moreover, there are some images that became overexposed
due to the reflective surface of the sign board. Also, there
can be seen motion blur in some images, indicating that the
pictures are captured while driving.
To achieve a comprehensive model that is capable of
classifying the different types of signs, the image generator
has to be very versatile with the ability to cover all of the
possible variations. Some of the variations applied on the
scene are listed as follows.
• Illumination: One of the main issues in rendering
photo-realistic images refers to the correct lighting
of the scene. In CG, the lighting procedure is almost
divided into two main parts, i.e., direct lighting and
indirect lighting. Direct light takes charge of the main
illumination, which usually casts sharp shadows on
objects, and on the other hand, indirect light is an
environmental light that is resulted from bouncing
rays. Since traffic signs are almost always placed
outdoors, they are mostly illuminated by the sun and
sky. The sun is considered a direct light, and the sky
is responsible for indirect lighting. In the real world
rules, both the sun angle and sky color are inter-
twined [22]. The sky color gradient varies according
to the sun’s position and some other factors such
as the haze and aerosol in the air. Simulation of the
sun as an infinite light source (direct lighting source)
in most CG applications is simple. The technique
that is usually used for indirect lighting simulation
is referred to as image-based lighting (IBL). In this
method, a big sphere or hemisphere surrounds the
scene, and its texture sends light rays into the scene.
In this research, the Preetham sky model was used to
generate a virtual sky. This model needs the sun posi-
tion, the viewing direction, and the turbidity factor to
compute the color of the texture pixels [23]. Turbidity
is defined as the haziness of fluid-type materials. To
achieve a different range of sky models, the sun’s
position is randomized around the scene; moreover,
for every run, a random integer number between 2
and 10 is designated to the turbidity factor. The sky
color is allowed to affect the background image to
match the scene’s overall color.
• Position and Rotation: By any run, the basic spatial
properties of the sign object, such as position and
rotation, change, but they never get out of the camera
view. Both camera and objects have a chance to
relocate or spin. By looking at the GTSRB images as
the reference, the minimum and maximum available
space around the sign object can be estimated. We just
try to limit the movement of the sign object, so that it
sticks in the middle of the frame.
• Motion Blur and Out of Focus: Motion blur may
happen when we try to take a picture of fast-moving
objects. This phenomenon directly depends on the
shutter speed of a camera. Another effect is called out
of focus. This effect occurs when a certain object is far
from the camera’s focal distance. Both of the effects
above can be simply simulated by specific image pro-
cessing filters even after rendering. To simulate the
motion blur effect, usually, some filters are applied
to images that stretch the image along the moving
direction. In this research, the direction is selected
randomly but is almost near the horizontal line.
• Signboard Damages and Imperfections: Usually
road signs are exposed to physical damage and
strikes. These damages often cause deformation.
To mimic this effect, some deformers have been
deployed. Deforming is usually done by using dis-
placement maps. These maps are gray-scale noisy
images projected onto object UV coordinates and
push polygons up or down corresponding to the
brightness of the projected map, along polygon nor-
mal vectors. By every run, this map is regenerated
with a different noisy pattern.
• Dust and Scratches: Rain, storm, snow, dust, and
other natural phenomena may dirty the signs and
makes them unclear. Some controllers are designed
to simulate these types of effects by adding some
random pattern onto sign textures. For adding more
details, divers’ noisy images are deployed to fake
dirtiness on the sign boards. Also, some mask tex-
tures specify the areas where this dirtiness should
appear.
• Backdrop and Environment: Each season has its own
visual effects on the objects’ appearance. In rendered
4 Avaz Naghipour and Rahim Pasbani
Figure 2. Histogram of an object with four different shadow situations. Shadow cast by the environment; changes the entire distribution
of pixels’ data.
images, these effects can be realized by changing
the background image. An important factor to be
taken into account is that neural networks can learn
unwanted patterns such as backgrounds. Thus, we
should be aware of using repetitive backdrops as
much as possible. To prevent this side effect, the
proposed method uses one hundred different images;
however, the risk is yet probable by every 100 runs.
Hence, for every run, the position, scale, and rotation
of the background images are changed randomly.
This can guarantee that final rendered images will
never have the same background. These randomiza-
tions are controlled by controllers in order to prevent
infeasible images.
• Shadows: The sign objects are placed in different
positions that may receive any type of shadows cast
by other objects. These shadows, when analyzed
numerically, have a significant effect on their appear-
ance and color. To implement these shadows in a vir-
tual world, several objects with different shapes and
sizes have been settled in the scene. With each run,
some properties of these objects, such as positions,
rotations, distances, and visibilities, become random.
Shadows play a significant role in the overall looking
of any image. For more clarity, in Figure 2, the same
scene has been rendered four times just by changing
in received shadows. Every time each image’s his-
togram has been plotted. As seen on these plots, most
pixel values were changed while semantically all of
these images represent the same sign. In addition
to the light and shadows, any change in position,
rotation, scale, shear, color, and other properties will
overturn pixel data. Thus, designing a classifier that
remains invariant to all these variations is a big chal-
lenge both in image processing and computer vision
fields. So the goal is to provide a comprehensive
dataset that is able to include the road signs in any
condition. This makes the classifier behave invariant
toward unnecessary information.
IMAGE PROCESSING FILTERS
Pictures captured by ordinary cameras often contain some
noise. This noise mostly can be seen explicitly in low-light
situations. Also in rendering, due to indirect illumination,
inherently all rendered images are noisy. However, for
emphasizing, some subtle noise is randomly added to the
rendered images.
By browsing GTSRB data more accurately, it can be seen
that some images are very dark and some images are very
bright. Despite the different lighting situations regarded
in the 3D scene rendering step, some extra darkening and
brightening filters are applied to some rendered images.
As mentioned before, the motion blur and defocus can
be faked by 2D filters. In this step, these effects are also
applied to randomly selected rendered images.
After many trials and errors, it was found that the
mentioned filters help the final result and accuracy get
better.
IMAGE AUGMENTATION
At the final step of dataset preparation, all the rendered
images are candidates for applying augmentation. In this
step, all the variations supposed to be applied on ren-
dered images are offline, and there is no access to 3D
Recognizing Traffic Signs with Synthetic Data and Deep Learning 5
Figure 3. Heavy augmentation is applied to rendered images. Most of the image properties have been changed during this operation,
such as position, scale, crop, distortion, and color.
Figure 4. Selected sign types for generating synthetic images.
scene options anymore. Intense variations on images are
applied because the GTSRB dataset includes images cap-
tured in very diverse situations. These situations, even
with the human eye, are hard to recognize. In general,
image augmentation leads a robust training and a reduc-
tion in overfitting. The augmentation used in this work
changes almost every property of rendered images, such
as position, rotation, scale, shear, crop, contrast, distortion,
random masking shapes, and some color perturbation. In
Figure 3, some augmented images are illustrated.
Eventually, the proposed synthetic image generator pro-
duced 2500 images for each class. Since this generator
is entirely procedural, it is possible to create an infinite
number of images without much effort. Moreover, this
method avoids repetitive images in the generated dataset.
Of these 2500 images, 2000 of them were allocated for
training and 500 for validation (per class).
To assess the proposed method, 12 classes of the GTSRB
dataset are selected. Intentionally, some challenging and
difficult classes are chosen so that they are similar to each
other in terms of shape, figure, or color. In Figure 4, these
selected classes are illustrated. These 12 classes in the
GTSRB dataset aggregately contain about 10000 images
that will be considered as a test set to evaluate the classifier
efficiency.
SETTING UP DCNN ARCHITECTURE
CNN is one of the major types of feed-forward neural
networks that can track the spatial position of elements in
the images as detection features [24]. These features carry
meaningful data, which play the main role in detection and
recognition tasks. This advantage makes the CNNs more
efficient than the Multi-Layer Perceptron (MLP) in image
classification tasks. Some other types of layers are embed-
ded between the convolution layers to reduce dimension
(pooling layer) or add non-linearity (activation function)
to the layer’s output [25].
Since this work is aimed at providing the fact that
synthetic data can be used to train the CNN models, the
utilized model in this work is not precisely optimized.
The proposed DCNN architecture contains four blocks
before connecting to the two fully connected layers. There
are two convolution layers with 32 filters in the first
block. Then, batch normalization is added to speed up and
improve the accuracy of the training process [26]. Later, a
max pooling with the pool size of (2, 2) and stride 2 shrinks
the size of the first block from 80×80-pixel to 40×40-pixel.
After the first dense layer, a dropout layer is added as the
regularization method to improve the generalization errors
of the network. Additionally, dropout has a tremendous
role in avoiding the overfitting problem [27]. For the first
two blocks, two convolution layers are successively used
without pooling between them. One reason is the result of
using pooling after each convolution layer, and the size of
the tensors immediately gets smaller, so, some significant
data may be lost. Besides, the consecutive convolution
layers result in more spatial data in the feature map [28].
The numbers of the filters used for the next convolution
layers are 64, 64, 128, and 256, respectively.
In this work, the “max pooling” method is used for
the pooling layer. All the utilized activation functions are
6 Avaz Naghipour and Rahim Pasbani
Figure 5. Schematic diagrams of proposed model layers. This architecture is comprised of convolution, pooling, batch normalization,
dropouts, and fully connected layers. Input images have 80 pixels for both height and width.
Rectified Linear Units (ReLUs). These blocks finally ended
with two fully connected layers. These layers usually are
used to collect and optimize scores for each class. The
first fully connected layer contains 128 neurons, and a
batch normal and dropout follow it. The last layer is the
second fully connected with only 12 neurons, and softmax
is used as the activation function. This layer decides that
the input image belongs to which class. The mentioned
architecture was schematically plotted and can be seen
in Figure 5.
The proposed model is ready to receive the provided
synthetic images as input to begin the training process. But
for achieving optimum weights and biases, a proper loss
function must be established. Imagine that x is an instance
image vector and sk (x) is the score of class k which softmax
computes, so there is a linear relationship between x and
the score as below [29]:
sk(x) = xT
θ(k)
(1)
In Equation (1), θ(k) represents parameter vector for class
k. We need the probability of belonging to class k, so the
softmax function at the end of the model chain calculates
this probability (p̂k) [29]:
p̂k=
esk(x)
∑K
j=1 esj(x)
(2)
where K is the number of classes.
Since the softmax predicts only one class per time, this is
suitable for our case as every sign only belongs to one class.
Cross entropy is a proven way for classification problems
to define a loss function [29]:
J (Θ) = −
1
m∑
m
i=1∑
K
k=1
y
(i)
k log

p̂
(i)
k

(3)
Now the cost function J (Θ) can be obtained by forming
Eq. (3). In this equation, y
(i)
k is the true label of instance i
that belongs to class k. This value is 1 if ith instance belongs
to the class k and 0 in other cases. To obtain the gradient
vector of class k, it needs to calculate the gradient of the
cost function with respect to kth parameter (θ(k)) [29]:
∇θ(k) J (Θ) =
1
m ∑
m
i=1
(p̂
(i)
k − y
(i)
k ) x(i)
(4)
Now using one of the gradient descent family optimizers,
the model finds the parameters Θ that minimize the cost
function. In fact, these parameters are the filters and other
types of learnable variables [29].
EXPERIMENTS AND RESULTS
The designed synthetic data generator is capable of gener-
ating any number of images with any essential dimension.
A total of 80 pixels for both height and width are chosen.
In total, 24000 images are included in training the proposed
DCNN model. Figure 6 (12) shows the train and valida-
tion loss/accuracy over 200 epochs.
As seen in Figure 6 around epoch number 200, the model
almost converges, and validation loss and accuracy are in
an acceptable situation in terms of overfitting. To test the
dataset, corresponding classes from the GTSRB are used.
In the machine learning field and especially in supervised
machine learning, the confusion matrix is considered one
of the significant visualization methods for statistical clas-
sification tasks [30].
The confusion matrix for our proposed model on the test
dataset is depicted in Figure 7. Classes that are more close
to each other are confused with similar classes.
Recognizing Traffic Signs with Synthetic Data and Deep Learning 7
Figure 6. Model almost convergence and validation, loss, and
accuracy.
Figure 7. Normalized confusion matrix for the German Traffic
Sign Recognition Benchmark (GTSRB) dataset.
By referring to the plotted confusion matrix in
Fig. 7(12), it is clear that predicting classes 3 and 4 leads
to higher errors than others. The first reason refers to the
appearance of the two mentioned classes. They are very
Table 1. Comparison of the proposed method with other methods
according to the German Traffic Sign Recognition Benchmark
(GTSRB) benchmark [31].
# Team Method Accuracy %
... ... ... ...
72 Italian-crash Multi Dataset Algorithm 83.08
12 TDC CVOG + CCV + NN (Team 2) 82.67
11 TDC CVOG + CCV + NN (Team 1) 82.37
# Our Method Synthetic data + DCNN 91.91
74 TDC CVOG + ANN (Team 3) 81.80
97 RMULG Subwindows+ETGRAY 79.71
+LIBLINEAR
134 olbustosa HOG_SVM 76.35
... ... ... ...
close to each other. The second reason refers to the GTSRB
image size and aspect ratio. Some of the images of this
dataset are very small in size and also are non-uniform in
height and width ratio, while the generated train dataset is
entirely square in size (80 × 80 pixels).
On the GTSRB website, recent benchmark competition
results can be observed. Some of these results, close to this
work result, are listed in Table 1. The main characteristic
of the proposed method with other listed methods on the
GTSRB website is the training dataset type. Most of them
used GTSRB’s own training dataset; however, in this work,
the synthetic dataset is generated and used to train the
model. Some real-life datasets, such as GTSRB, are biased
in terms of distribution among classes; nevertheless, our
dataset was evenly distributed (2000 images per class).
This may affect the decision-making results. Of course,
sometimes this could be intentional because the distribu-
tion over classes is not even in real-life situations. For
example, the number of priority signs in the city is nor-
mally much more than the number of roundabout signage.
The obtained results show that the DCNN gives the best
results among other image classification methods. Notably,
DCNN architecture shows more than 91.91% accuracy in
the GTSRB dataset with no view of any real traffic sign
image.
CONCLUSION
Deploying machine learning in industry-level production
is required to provide an exclusive dataset that meets
the requirements. Providing labeled ground-truth datasets
for computer vision tasks is usually expensive, time-
consuming, and labor-intensive. Furthermore, there are
some cases that create a real dataset that is not safe or prac-
tically impossible. Using CAD models is another option,
but creating desired models one by one in most cases
becomes more expensive than providing real datasets.
8 Avaz Naghipour and Rahim Pasbani
To cover the challenge in this paper and develop a syn-
thetic dataset for the traffic sign recognition task, a proce-
dural method was used. By using computer graphic tools,
the proposed method facilitates generating numerous
images that are precisely analogous to real-life instances.
Moreover, a well-structured DCNN architecture was set
up that decently fulfilled the classification task. Without
seeing any real data, this classifier could categorize the
real-world GTSRB dataset with more than 91.91% accuracy.
The provided dataset has more details than the require-
ments of the GTSRB dataset. We took many details into
account that might not be necessary, but it made the clas-
sifier more reliable for complicated situations. Rendered
images and real pictures captured by a camera intrinsically
contain many dissimilarities. Using synthetic images to
train machine learning models requires narrowing this
similarity gap. Augmentation and other image processing
filters are helpful in enhancing accuracy. Additionally,
without augmentation and dropout, overfitting and gen-
eralization issues would be bold. For the next research,
we decide to utilize this procedure for more complicated
tasks like road and street object detection. Clearly, such a
procedure requires higher attempts to set up a system that
can provide credible rendered images.
CONFLICT OF INTEREST
The authors declare that the research was conducted in the
absence of any commercial or financial relationships that
could be construed as a potential conflict of interest.
AUTHOR CONTRIBUTIONS
RP conducted an initial literature review and data collec-
tion, performed the experiments, prepared the results, and
drafted the manuscript. AN helped in writing-editing and
conceptualization, analyzed the result, and contributed to
supervision. Both authors read and approved the final
manuscript.
REFERENCES
[1] Li, L., Huang, W., Liu, Y., Zheng, N., Wang, F. (2016). Intelligence
Testing for Autonomous Vehicles: A New Approach, IEEE Transac-
tions on Intelligent Vehicles, 1(2), 158–166.
[2] Gidado, U. M., Chiroma, H., Aljojo, N., Abubakar, S., Popoola,
S. I., Al-Garadi, M. A. (2020). A Survey on Deep Learning for
Steering Angle Prediction in Autonomous Vehicles, IEEE Access, 8,
163797–163817.
[3] Arnold, E., Al-Jarrah, O. Y., Dianati, M., Fallah, S., Oxtoby, D.
Mouzakitis, A. (2019). A Survey on 3D Object Detection Methods for
Autonomous Driving Applications, IEEE Transactions on Intelligent
Transportation Systems, 20(10), 3782–3795.
[4] Gjoreski, H., Ciliberto, M., Wang, L., Morales, F. J. O., Mekki, S.,
Valentin, S., Roggen D. (2018). The University of Sussex-Huawei
Locomotion and Transportation Dataset for Multimodal Analytics
with Mobile Devices, IEEE Access, 6, 42592–42604.
[5] Wang, T., Wu, D. J., Coates A., Ng, A. Y. (2012). End-to-End
Text Recognition with Convolutional Neural Networks, Proceedings
of the 21st International Conference on Pattern Recognition Tsukuba,
3304–3308.
[6] Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazırbas, C.,
Golkov, V. (2015). FlowNet: Learning Optical Flow with Con-
volutional Networks, IEEE International Conference on Computer,
2758–2766.
[7] Mayer, N., Ilg, E., Hausser, P., Fischer, P. (2016). A Large Dataset to
Train Convolutional Networks for Disparity, Optical Flow, and Scene
Flow Estimation, IEEE Conference on Computer Vision and Pattern
Recognition, 4040–4048.
[8] Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A. M. (2016).
The SYNTHIA Dataset: A Large Collection of Synthetic Images
for Semantic Segmentation of Urban Scenes, IEEE Conference on
Computer Vision and Pattern Recognition, 3234–3243.
[9] Handa, A., Patraucean, V., Badrinarayanan, V., Stent, S. Cipolla,
R. (2016). Understanding Real World Indoor Scenes with Synthetic
Data, IEEE Conference on Computer Vision and Pattern Recognition,
4077–4085.
[10] Tsai, C., Tsai, S. Hsu, Y., Wu, Y. (2017). Synthetic Training of
Deep CNN for 3D Hand Gesture Identification, International Con-
ference on Control, Artificial Intelligence, Robotics  Optimization,
165–170.
[11] Maldonado-Bascon, S., Lafuente-Arroyo, S., Gil-Jimenez, P., Gomez-
Moreno, H., Lopez-Ferreras, F. (2007). Road-Sign Detection and
Recognition Based on Support Vector Machines, IEEE Transactions on
Intelligent Transportation Systems, 8(2), 264–278.
[12] Wali, S. B., Hannan, M. A., Hussain, A., Samad, S. A. (2015). An
Automatic Traffic Sign Detection and Recognition System Based
on Colour Segmentation, Shape Matching, and SVM, Mathematical
Problems in Engineering, 1–11.
[13] Kuang, X., Fu, W., Yang, L. (2018). Real-Time Detection and Recog-
nition of Road Traffic Signs using MSER and Random Forests,
International Journal of Online Engineering, 14(3) 34–51.
[14] Ellahyani, A., Ansari, M. E., Jafari, I. E. (2016). Traffic Sign Detection
and Recognition Based on Random Forests, Applied Soft Computing,
46, 805–815.
[15] Liang, Z., Shao, J., Zhang, D., Gao, L. (2019). Traffic Sign Detection
and Recognition Based on Pyramidal Convolutional Networks, Neu-
ral Computing and Applications, 32(11), 6533–6543.
[16] Shustanov, A., Yakimov, P. (2017). CNN Design for Real-Time Traffic
Sign Recognition, Procedia Engineering, 201, 718–725.
[17] Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A. (2014).
Synthetic Data and Artificial Neural Networks for Natural Scene Text
Recognition, arXiv:1406.2227.
[18] Tsai, C., Tsai, Y., Hsu, S., Wu, Y. (2017). Synthetic Training of Deep
CNN for 3D Hand Gesture Identification, International Conference on
Control, Artificial Intelligence, Robotics  Optimization, 165—170.
[19] Peng, X., Sun, B., Ali, K., Saenko K. (2015). Learning Deep Object
Detectors from 3D Models, IEEE International Conference on Computer
Vision, 1278–1286.
[20] Sun, B., Saenko, K. (2014). From Virtual to Reality: Fast Adaptation
of Virtual Object Detectors to Real Domains, Proceedings of the British
Machine Vision Conference.
[21] Su, H., Qi, C. R., Li, Y., Guibas, L. J. (2015). Render for CNN:
Viewpoint Estimation in Images using CNNs Trained with Rendered
3D Model Views, IEEE International Conference on Computer Vision,
2686–2694.
[22] Satilmis, P., Bashford-Rogers, T., Chalmers, A., Debattista, K. (2017).
A Machine-Learning-Driven Sky Model, IEEE Computer Graphics and
Applications, 37(1), 80–91.
[23] Jung, J., Lee, J. Y., Kweon, I. S. (2019). One-Day Outdoor Photometric
Stereo using Skylight Estimation, International Journal of Computer
Vision, 127(8), 1126–1142.
Recognizing Traffic Signs with Synthetic Data and Deep Learning 9
[24] Bilal, A., Jourabloo A., Ye, M., Liu, X., Ren, L. (2018). Do Convolu-
tional Neural Networks Learn Class Hierarchy?, IEEE Transactions
on Visualization and Computer Graphics, 24(1), 152–162.
[25] LeCun, Y., Bottou, L., Bengio Y., Haffner, P. (1998). Gradient-Based
Learning Applied to Document Recognition, Proceedings of the IEEE,
86(11), 2278–2324.
[26] Bjorck, J., Gomes, C., Selman, B., Weinberger, K. Q. (2018). Under-
standing Batch Normalization, Advances in Neural Information Pro-
cessing Systems, 7694–7705.
[27] Krizhevsky, A., Sutskever, I., Hinton, G. E. (2017). ImageNet Classi-
fication with Deep Convolutional Neural Networks, Communications
of the ACM, 60(6), 84–90.
[28] Zhang, Z., Wang, H., Liu S., Xiao, B. (2018). Consecutive Convolu-
tional Activations for Scene Character Recognition, IEEE Access, 6,
35734–35742.
[29] Géron, A. (2019). Hands-on Machine Learning with Scikit-Learn,
Keras, and TensorFlow: Concepts, Tools, and Techniques to Build
Intelligent Systems, O’Reilly Media, 2thed.
[30] Stehman, S. V. (1997). Selecting and Interpreting Measures of The-
matic Classification Accuracy, Remote Sensing of Environment, 62(1),
77–89.
[31] German Traffic Sign Benchmarks, https://benchmark.ini.rub.de/gts
rb_results_ijcnn.html.

Mais conteúdo relacionado

Semelhante a Recognizing Traffic Signs with Synthetic Data and Deep Learning

An Analysis of Various Deep Learning Algorithms for Image Processing
An Analysis of Various Deep Learning Algorithms for Image ProcessingAn Analysis of Various Deep Learning Algorithms for Image Processing
An Analysis of Various Deep Learning Algorithms for Image Processingvivatechijri
 
IRJET- A Study of Generative Adversarial Networks in 3D Modelling
IRJET- A Study of Generative Adversarial Networks in 3D ModellingIRJET- A Study of Generative Adversarial Networks in 3D Modelling
IRJET- A Study of Generative Adversarial Networks in 3D ModellingIRJET Journal
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdfmokamojah
 
LIDAR Magizine 2015: The Birth of 3D Mapping Artificial Intelligence
LIDAR Magizine 2015: The Birth of 3D Mapping Artificial IntelligenceLIDAR Magizine 2015: The Birth of 3D Mapping Artificial Intelligence
LIDAR Magizine 2015: The Birth of 3D Mapping Artificial IntelligenceJason Creadore 🌐
 
Investigating the Effect of BD-CRAFT to Text Detection Algorithms
Investigating the Effect of BD-CRAFT to Text Detection AlgorithmsInvestigating the Effect of BD-CRAFT to Text Detection Algorithms
Investigating the Effect of BD-CRAFT to Text Detection Algorithmsgerogepatton
 
INVESTIGATING THE EFFECT OF BD-CRAFT TO TEXT DETECTION ALGORITHMS
INVESTIGATING THE EFFECT OF BD-CRAFT TO TEXT DETECTION ALGORITHMSINVESTIGATING THE EFFECT OF BD-CRAFT TO TEXT DETECTION ALGORITHMS
INVESTIGATING THE EFFECT OF BD-CRAFT TO TEXT DETECTION ALGORITHMSijaia
 
Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsIRJET Journal
 
IRJET - A Survey Paper on Efficient Object Detection and Matching using F...
IRJET -  	  A Survey Paper on Efficient Object Detection and Matching using F...IRJET -  	  A Survey Paper on Efficient Object Detection and Matching using F...
IRJET - A Survey Paper on Efficient Object Detection and Matching using F...IRJET Journal
 
Tracking Chessboard Corners Using Projective Transformation for Augmented Rea...
Tracking Chessboard Corners Using Projective Transformation for Augmented Rea...Tracking Chessboard Corners Using Projective Transformation for Augmented Rea...
Tracking Chessboard Corners Using Projective Transformation for Augmented Rea...CSCJournals
 
IRJET- 3D Object Recognition of Car Image Detection
IRJET-  	  3D Object Recognition of Car Image DetectionIRJET-  	  3D Object Recognition of Car Image Detection
IRJET- 3D Object Recognition of Car Image DetectionIRJET Journal
 
Paper id 25201491
Paper id 25201491Paper id 25201491
Paper id 25201491IJRAT
 
A Traffic Sign Classifier Model using Sage Maker
A Traffic Sign Classifier Model using Sage MakerA Traffic Sign Classifier Model using Sage Maker
A Traffic Sign Classifier Model using Sage Makerijtsrd
 
Efficient resampling features and convolution neural network model for image ...
Efficient resampling features and convolution neural network model for image ...Efficient resampling features and convolution neural network model for image ...
Efficient resampling features and convolution neural network model for image ...IJEECSIAES
 
Efficient resampling features and convolution neural network model for image ...
Efficient resampling features and convolution neural network model for image ...Efficient resampling features and convolution neural network model for image ...
Efficient resampling features and convolution neural network model for image ...nooriasukmaningtyas
 
IRJET- Traffic Sign Classification and Detection using Deep Learning
IRJET- Traffic Sign Classification and Detection using Deep LearningIRJET- Traffic Sign Classification and Detection using Deep Learning
IRJET- Traffic Sign Classification and Detection using Deep LearningIRJET Journal
 
Efficient Point Cloud Pre-processing using The Point Cloud Library
Efficient Point Cloud Pre-processing using The Point Cloud LibraryEfficient Point Cloud Pre-processing using The Point Cloud Library
Efficient Point Cloud Pre-processing using The Point Cloud LibraryCSCJournals
 
Efficient Point Cloud Pre-processing using The Point Cloud Library
Efficient Point Cloud Pre-processing using The Point Cloud LibraryEfficient Point Cloud Pre-processing using The Point Cloud Library
Efficient Point Cloud Pre-processing using The Point Cloud LibraryCSCJournals
 

Semelhante a Recognizing Traffic Signs with Synthetic Data and Deep Learning (20)

An Analysis of Various Deep Learning Algorithms for Image Processing
An Analysis of Various Deep Learning Algorithms for Image ProcessingAn Analysis of Various Deep Learning Algorithms for Image Processing
An Analysis of Various Deep Learning Algorithms for Image Processing
 
IRJET- A Study of Generative Adversarial Networks in 3D Modelling
IRJET- A Study of Generative Adversarial Networks in 3D ModellingIRJET- A Study of Generative Adversarial Networks in 3D Modelling
IRJET- A Study of Generative Adversarial Networks in 3D Modelling
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
 
Rajshree1.pdf
Rajshree1.pdfRajshree1.pdf
Rajshree1.pdf
 
LIDAR Magizine 2015: The Birth of 3D Mapping Artificial Intelligence
LIDAR Magizine 2015: The Birth of 3D Mapping Artificial IntelligenceLIDAR Magizine 2015: The Birth of 3D Mapping Artificial Intelligence
LIDAR Magizine 2015: The Birth of 3D Mapping Artificial Intelligence
 
Investigating the Effect of BD-CRAFT to Text Detection Algorithms
Investigating the Effect of BD-CRAFT to Text Detection AlgorithmsInvestigating the Effect of BD-CRAFT to Text Detection Algorithms
Investigating the Effect of BD-CRAFT to Text Detection Algorithms
 
INVESTIGATING THE EFFECT OF BD-CRAFT TO TEXT DETECTION ALGORITHMS
INVESTIGATING THE EFFECT OF BD-CRAFT TO TEXT DETECTION ALGORITHMSINVESTIGATING THE EFFECT OF BD-CRAFT TO TEXT DETECTION ALGORITHMS
INVESTIGATING THE EFFECT OF BD-CRAFT TO TEXT DETECTION ALGORITHMS
 
Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather Conditions
 
IRJET - A Survey Paper on Efficient Object Detection and Matching using F...
IRJET -  	  A Survey Paper on Efficient Object Detection and Matching using F...IRJET -  	  A Survey Paper on Efficient Object Detection and Matching using F...
IRJET - A Survey Paper on Efficient Object Detection and Matching using F...
 
paper
paperpaper
paper
 
Tracking Chessboard Corners Using Projective Transformation for Augmented Rea...
Tracking Chessboard Corners Using Projective Transformation for Augmented Rea...Tracking Chessboard Corners Using Projective Transformation for Augmented Rea...
Tracking Chessboard Corners Using Projective Transformation for Augmented Rea...
 
IRJET- 3D Object Recognition of Car Image Detection
IRJET-  	  3D Object Recognition of Car Image DetectionIRJET-  	  3D Object Recognition of Car Image Detection
IRJET- 3D Object Recognition of Car Image Detection
 
Paper id 25201491
Paper id 25201491Paper id 25201491
Paper id 25201491
 
A Traffic Sign Classifier Model using Sage Maker
A Traffic Sign Classifier Model using Sage MakerA Traffic Sign Classifier Model using Sage Maker
A Traffic Sign Classifier Model using Sage Maker
 
Efficient resampling features and convolution neural network model for image ...
Efficient resampling features and convolution neural network model for image ...Efficient resampling features and convolution neural network model for image ...
Efficient resampling features and convolution neural network model for image ...
 
Efficient resampling features and convolution neural network model for image ...
Efficient resampling features and convolution neural network model for image ...Efficient resampling features and convolution neural network model for image ...
Efficient resampling features and convolution neural network model for image ...
 
IRJET- Traffic Sign Classification and Detection using Deep Learning
IRJET- Traffic Sign Classification and Detection using Deep LearningIRJET- Traffic Sign Classification and Detection using Deep Learning
IRJET- Traffic Sign Classification and Detection using Deep Learning
 
Comparison of Rendering Processes on 3D Model
Comparison of Rendering Processes on 3D ModelComparison of Rendering Processes on 3D Model
Comparison of Rendering Processes on 3D Model
 
Efficient Point Cloud Pre-processing using The Point Cloud Library
Efficient Point Cloud Pre-processing using The Point Cloud LibraryEfficient Point Cloud Pre-processing using The Point Cloud Library
Efficient Point Cloud Pre-processing using The Point Cloud Library
 
Efficient Point Cloud Pre-processing using The Point Cloud Library
Efficient Point Cloud Pre-processing using The Point Cloud LibraryEfficient Point Cloud Pre-processing using The Point Cloud Library
Efficient Point Cloud Pre-processing using The Point Cloud Library
 

Último

VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptNANDHAKUMARA10
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projectssmsksolar
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 

Último (20)

VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 

Recognizing Traffic Signs with Synthetic Data and Deep Learning

  • 1. BOHR International Journal of Computational Intelligence and Communication Network 2022, Vol. 1, No. 1, pp. 1–9 https://doi.org/10.54646/bijcicn.001 www.bohrpub.com Recognizing Traffic Signs with Synthetic Data and Deep Learning Avaz Naghipour∗ and Rahim Pasbani Department of Computer Engineering, University College of Nabi Akram, Tabriz, Iran ∗Corresponding author: naghipour@ucna.ac.ir Abstract. Recently, in-depth learning about computer vision and object classification tasks has surpassed other machine learning (ML) algorithms. This algorithm, alike similar ML algorithms, requires a dataset for training. In most real cases, developing an appropriate dataset is expensive and time-consuming. Also, in some situations, providing the dataset is unsafe or even impossible. In this paper, we proposed a novel framework for traffic sign recognition using synthetic data and deep learning. The main feature of the proposed method is its independence from the real-life dataset, which leads to high accuracy in the real test dataset. Creating one-by-one synthetic data is more labor-intensive and costlier than providing real data. To tackle the issue, the proposed framework uses a procedural method, which gives the possibility to develop countless high-quality data that are close enough to the real data. Due to its procedural nature, this framework can be easily edited and tuned. Keywords: Deep Learning, Convolutional Neural Networks, Computer Graphics, Synthetic Data, Traffic Sign Recognition. INTRODUCTION Nowadays, many kinds of research and innovations are conducted to enhance autonomous vehicle technologies. Giving a semantic perception to artificial intelligence (AI)- based drivers to recognize environmental objects is one of the main goals in research [1–3]. Indisputably, traffic sign recognition ability plays a significant role in AI-based vehi- cles. These signs are guidance that makes drivers aware of upcoming situations. Thus, traffic sign detection and recognition that computer vision applications are trying to address are considered a significant issue. While having a good dataset for ML applications (especially in supervised ML) is mandatory, there are a few premade datasets avail- able on demand [4]. Accessible free datasets are usually used to benchmark competition or evaluate state-of-the-art applications. For preparing production-level applications, first, it is crucial to provide a proper dataset with adequate quantity and quality. In the case of an image classifier, the datasets are normally images captured with cameras from real-life instances. At the first glance, providing images seems to be a handy and cost-efficient procedure; however, when the required number of images for the train classi- fiers is taken into consideration, the difficulty of preparing such datasets becomes bold. ML algorithms in general and deep learning in particular use thousands or even millions of images to give a reliable and practical result. Obviously, providing this amount of image in many cases is wearisome and costly, if not impossible. Other than the cost and expenses, a key issue with pro- viding a traffic sign dataset is time. For more clarity, let us conjure up the traffic sign dataset obtaining procedure, and how time-consuming it would be to capture, crop, edit, and label singly and manually. A more significant problem is the time needed to capture photos in different seasons and conditions of a year. For instance, if images are captured only during a hot summer, the dataset would not include the images of signs covered with snow during winter, and encountering such images causes trouble for the classifier. Thus, generalizing the model comprehensively requires at least 1 year of waiting, to include all seasonal visual appearances. Another solution is using CAD datasets. Synthetic images rendered from a 3D virtual scene have been used vastly in computer vision tasks [5]. Recently, they are used in object detection and classifier applications. Flying Chairs [6], FlyingThings3D [7], SYNTHIA [8], and Scene Net [9] are examples of synthetic images based on datasets that are used to train or evaluate relevant ML algorithms. Fortunately, by progressing in the computer graphics (CG) industry, a number of online CAD datasets and premade 3D objects are growing. However, except for very 1
  • 2. 2 Avaz Naghipour and Rahim Pasbani few datasets, there has been no access to CAD models yet. Considering the issue of making CAD data, it is perceptible that capturing real images may be cheaper than developing synthetic ones. Modeling even a very simple 3D model is a time-consuming process that requires experts to be accomplished. Another problem with most synthetic images is their dissimilarity to real objects. These images are far distinctive from real-world references in terms of appearance. By browsing some accessible CAD datasets, it can be explicitly seen that the objects do not have the proper lighting as we have in the real world. Another significant issue that makes CAD models look rough is the texture and material of models. Instead of resembling real materials such as wood, fibers, and metals, those models seem to be made of solid clay. If the ML application is trained by non-real- like images, the result will not give adequate accuracy in ground-truth cases. This is why sometimes researchers choose to mix them with some real images to improve the functionality of the models [10]. To overcome these challenges, an efficient approach is proposed in the present work to develop synthetic traffic signs. To this end, a procedural way is used to provide the desired dataset without any quantity limitation. In the proposed method, every small detail is taken into account to make the images quite real-looking, so that they can hardly be recognized from real images. To do so, various ML classifiers were employed to be trained by the devel- oped dataset. The outline of the paper is as follows: Section 2 discusses related works. Section 3 presents synthetic image generation. The overview of image processing filters and image augmentation are studied in Sections 4 and 5, respectively. Section 6 describes setting up deep convolu- tional neural network (DCNN) architecture. In Section 7, experiments and results are reported. Section 8 concludes this paper. RELATED WORKS There are various algorithms for classifying traffic signs. The most regarded algorithms in this field are based on ML methods; however, there are some research projects that employ the color and shape of the signs. These characteris- tics cannot be directly used to classify the traffic signs, but they can remarkably help the actual classifier. The support vector machine (SVM) for classification has always been a selection at hand. In ref. [11], Maldonado et al. used SVM for automatic multiclass traffic sign detec- tion and classification using a one-vs-all approach with a Gaussian kernel. In the other attempt [12], by considering the limitation in the shape and color of signs, authors used a color segmentation and shape matching approach, and then, the dataset has classified using SVM. The obtained results are promising. In the method suggested in ref. [13], after the detection of a sign via the MSER procedure, the HSV-HOG-LBP features are extracted, and then, a random forest is used to finalize the recognition process. Ref. [14] has tried to prove the effectiveness of the random forest recognition algorithm in both accuracy and speed on traffic sign recognition. Nowadays, modern computer vision classifiers mostly deploy CNN for recognition tasks. In ref. [15], the densely connected CNN is used for traffic sign detection. In ref. [16], the authors have shown the results of different architectures of CNN to solve the same recognition problem. In the area of synthetic data deployment, some remark- able works have been presented so far. The authors in ref. [17] have suggested a 2D synthetic text generator engine, which places texts onto random backgrounds and employs the obtained data to train a CNN-based classifier to recognize texts in graphics. Furthermore, a 3D synthetic dataset was used in ref. [18] to predict hand gestures in an image. The accuracy has risen after adding some real images to the training dataset. The CAD data have been used in some research for classifying and object detection tasks [19, 20]. Another important challenge in computer vision is viewport prediction. In ref. [21], ren- dered images trained a CNN, and the result was excellent. Most of the relevant methods have used premade online CAD libraries, such as Trimble 3D warehouse, TurboSquid, Yobi3D, and ShapeNet. Using premade datasets is handy to test new models or benchmark competitions. But, in real applications, any problem needs its own exclusively developed dataset to be prepared. SYNTHETIC IMAGE GENERATION The proposed method in this research contains two main steps. In the first step, a procedural virtual 3D scene is cre- ated, and in the second step, a specific DCNN architecture is designed to be trained by the generated dataset obtained in the previous step. First of all, a virtual 3D scene needs a setup. Then, procedurally, in 3D world space, feasible variations, and randomizations are made, so that every state forms a believable arrangement of the scene’s objects and com- ponents. By every run, a specific arrangement is made and rendered. In the next step, to make extra purpose- ful variations, some image processing-based filters and manipulations are applied to the rendered images. Finally, the image augmentation technique is employed to general- ize the proposed classifier, as well as, image augmentation increases the size of the training dataset. The aim was to set up a versatile virtual scene that can automatically develop a feasible arrangement of the objects. To reach such a system, some objects are required such as some 3D objects, lights, sky objects, and sev- eral controllers that are used to control the properties of scene components. The controllers provide mathematical relationships among all objects in the scene. To avoid infeasible cases, some constraints are considered. In fact, the constraints are small codes written in Python, which
  • 3. Recognizing Traffic Signs with Synthetic Data and Deep Learning 3 Figure 1. 3D scene contains 3D objects, lights, sky, background, and some controllers, which control every aspect of the random- ization process. control the randomization process to prevent impractical setups. Figure 1 depicts a schematic view of the mentioned procedural scene. For evaluating the proposed approach in the real world, the German Traffic Sign Recognition Benchmark (GTSRB) dataset is selected. This dataset was captured in different lighting situations. Some of them were captured on a sunny day and some in shadow or bluish-morning-like lighting. Moreover, there are some images that became overexposed due to the reflective surface of the sign board. Also, there can be seen motion blur in some images, indicating that the pictures are captured while driving. To achieve a comprehensive model that is capable of classifying the different types of signs, the image generator has to be very versatile with the ability to cover all of the possible variations. Some of the variations applied on the scene are listed as follows. • Illumination: One of the main issues in rendering photo-realistic images refers to the correct lighting of the scene. In CG, the lighting procedure is almost divided into two main parts, i.e., direct lighting and indirect lighting. Direct light takes charge of the main illumination, which usually casts sharp shadows on objects, and on the other hand, indirect light is an environmental light that is resulted from bouncing rays. Since traffic signs are almost always placed outdoors, they are mostly illuminated by the sun and sky. The sun is considered a direct light, and the sky is responsible for indirect lighting. In the real world rules, both the sun angle and sky color are inter- twined [22]. The sky color gradient varies according to the sun’s position and some other factors such as the haze and aerosol in the air. Simulation of the sun as an infinite light source (direct lighting source) in most CG applications is simple. The technique that is usually used for indirect lighting simulation is referred to as image-based lighting (IBL). In this method, a big sphere or hemisphere surrounds the scene, and its texture sends light rays into the scene. In this research, the Preetham sky model was used to generate a virtual sky. This model needs the sun posi- tion, the viewing direction, and the turbidity factor to compute the color of the texture pixels [23]. Turbidity is defined as the haziness of fluid-type materials. To achieve a different range of sky models, the sun’s position is randomized around the scene; moreover, for every run, a random integer number between 2 and 10 is designated to the turbidity factor. The sky color is allowed to affect the background image to match the scene’s overall color. • Position and Rotation: By any run, the basic spatial properties of the sign object, such as position and rotation, change, but they never get out of the camera view. Both camera and objects have a chance to relocate or spin. By looking at the GTSRB images as the reference, the minimum and maximum available space around the sign object can be estimated. We just try to limit the movement of the sign object, so that it sticks in the middle of the frame. • Motion Blur and Out of Focus: Motion blur may happen when we try to take a picture of fast-moving objects. This phenomenon directly depends on the shutter speed of a camera. Another effect is called out of focus. This effect occurs when a certain object is far from the camera’s focal distance. Both of the effects above can be simply simulated by specific image pro- cessing filters even after rendering. To simulate the motion blur effect, usually, some filters are applied to images that stretch the image along the moving direction. In this research, the direction is selected randomly but is almost near the horizontal line. • Signboard Damages and Imperfections: Usually road signs are exposed to physical damage and strikes. These damages often cause deformation. To mimic this effect, some deformers have been deployed. Deforming is usually done by using dis- placement maps. These maps are gray-scale noisy images projected onto object UV coordinates and push polygons up or down corresponding to the brightness of the projected map, along polygon nor- mal vectors. By every run, this map is regenerated with a different noisy pattern. • Dust and Scratches: Rain, storm, snow, dust, and other natural phenomena may dirty the signs and makes them unclear. Some controllers are designed to simulate these types of effects by adding some random pattern onto sign textures. For adding more details, divers’ noisy images are deployed to fake dirtiness on the sign boards. Also, some mask tex- tures specify the areas where this dirtiness should appear. • Backdrop and Environment: Each season has its own visual effects on the objects’ appearance. In rendered
  • 4. 4 Avaz Naghipour and Rahim Pasbani Figure 2. Histogram of an object with four different shadow situations. Shadow cast by the environment; changes the entire distribution of pixels’ data. images, these effects can be realized by changing the background image. An important factor to be taken into account is that neural networks can learn unwanted patterns such as backgrounds. Thus, we should be aware of using repetitive backdrops as much as possible. To prevent this side effect, the proposed method uses one hundred different images; however, the risk is yet probable by every 100 runs. Hence, for every run, the position, scale, and rotation of the background images are changed randomly. This can guarantee that final rendered images will never have the same background. These randomiza- tions are controlled by controllers in order to prevent infeasible images. • Shadows: The sign objects are placed in different positions that may receive any type of shadows cast by other objects. These shadows, when analyzed numerically, have a significant effect on their appear- ance and color. To implement these shadows in a vir- tual world, several objects with different shapes and sizes have been settled in the scene. With each run, some properties of these objects, such as positions, rotations, distances, and visibilities, become random. Shadows play a significant role in the overall looking of any image. For more clarity, in Figure 2, the same scene has been rendered four times just by changing in received shadows. Every time each image’s his- togram has been plotted. As seen on these plots, most pixel values were changed while semantically all of these images represent the same sign. In addition to the light and shadows, any change in position, rotation, scale, shear, color, and other properties will overturn pixel data. Thus, designing a classifier that remains invariant to all these variations is a big chal- lenge both in image processing and computer vision fields. So the goal is to provide a comprehensive dataset that is able to include the road signs in any condition. This makes the classifier behave invariant toward unnecessary information. IMAGE PROCESSING FILTERS Pictures captured by ordinary cameras often contain some noise. This noise mostly can be seen explicitly in low-light situations. Also in rendering, due to indirect illumination, inherently all rendered images are noisy. However, for emphasizing, some subtle noise is randomly added to the rendered images. By browsing GTSRB data more accurately, it can be seen that some images are very dark and some images are very bright. Despite the different lighting situations regarded in the 3D scene rendering step, some extra darkening and brightening filters are applied to some rendered images. As mentioned before, the motion blur and defocus can be faked by 2D filters. In this step, these effects are also applied to randomly selected rendered images. After many trials and errors, it was found that the mentioned filters help the final result and accuracy get better. IMAGE AUGMENTATION At the final step of dataset preparation, all the rendered images are candidates for applying augmentation. In this step, all the variations supposed to be applied on ren- dered images are offline, and there is no access to 3D
  • 5. Recognizing Traffic Signs with Synthetic Data and Deep Learning 5 Figure 3. Heavy augmentation is applied to rendered images. Most of the image properties have been changed during this operation, such as position, scale, crop, distortion, and color. Figure 4. Selected sign types for generating synthetic images. scene options anymore. Intense variations on images are applied because the GTSRB dataset includes images cap- tured in very diverse situations. These situations, even with the human eye, are hard to recognize. In general, image augmentation leads a robust training and a reduc- tion in overfitting. The augmentation used in this work changes almost every property of rendered images, such as position, rotation, scale, shear, crop, contrast, distortion, random masking shapes, and some color perturbation. In Figure 3, some augmented images are illustrated. Eventually, the proposed synthetic image generator pro- duced 2500 images for each class. Since this generator is entirely procedural, it is possible to create an infinite number of images without much effort. Moreover, this method avoids repetitive images in the generated dataset. Of these 2500 images, 2000 of them were allocated for training and 500 for validation (per class). To assess the proposed method, 12 classes of the GTSRB dataset are selected. Intentionally, some challenging and difficult classes are chosen so that they are similar to each other in terms of shape, figure, or color. In Figure 4, these selected classes are illustrated. These 12 classes in the GTSRB dataset aggregately contain about 10000 images that will be considered as a test set to evaluate the classifier efficiency. SETTING UP DCNN ARCHITECTURE CNN is one of the major types of feed-forward neural networks that can track the spatial position of elements in the images as detection features [24]. These features carry meaningful data, which play the main role in detection and recognition tasks. This advantage makes the CNNs more efficient than the Multi-Layer Perceptron (MLP) in image classification tasks. Some other types of layers are embed- ded between the convolution layers to reduce dimension (pooling layer) or add non-linearity (activation function) to the layer’s output [25]. Since this work is aimed at providing the fact that synthetic data can be used to train the CNN models, the utilized model in this work is not precisely optimized. The proposed DCNN architecture contains four blocks before connecting to the two fully connected layers. There are two convolution layers with 32 filters in the first block. Then, batch normalization is added to speed up and improve the accuracy of the training process [26]. Later, a max pooling with the pool size of (2, 2) and stride 2 shrinks the size of the first block from 80×80-pixel to 40×40-pixel. After the first dense layer, a dropout layer is added as the regularization method to improve the generalization errors of the network. Additionally, dropout has a tremendous role in avoiding the overfitting problem [27]. For the first two blocks, two convolution layers are successively used without pooling between them. One reason is the result of using pooling after each convolution layer, and the size of the tensors immediately gets smaller, so, some significant data may be lost. Besides, the consecutive convolution layers result in more spatial data in the feature map [28]. The numbers of the filters used for the next convolution layers are 64, 64, 128, and 256, respectively. In this work, the “max pooling” method is used for the pooling layer. All the utilized activation functions are
  • 6. 6 Avaz Naghipour and Rahim Pasbani Figure 5. Schematic diagrams of proposed model layers. This architecture is comprised of convolution, pooling, batch normalization, dropouts, and fully connected layers. Input images have 80 pixels for both height and width. Rectified Linear Units (ReLUs). These blocks finally ended with two fully connected layers. These layers usually are used to collect and optimize scores for each class. The first fully connected layer contains 128 neurons, and a batch normal and dropout follow it. The last layer is the second fully connected with only 12 neurons, and softmax is used as the activation function. This layer decides that the input image belongs to which class. The mentioned architecture was schematically plotted and can be seen in Figure 5. The proposed model is ready to receive the provided synthetic images as input to begin the training process. But for achieving optimum weights and biases, a proper loss function must be established. Imagine that x is an instance image vector and sk (x) is the score of class k which softmax computes, so there is a linear relationship between x and the score as below [29]: sk(x) = xT θ(k) (1) In Equation (1), θ(k) represents parameter vector for class k. We need the probability of belonging to class k, so the softmax function at the end of the model chain calculates this probability (p̂k) [29]: p̂k= esk(x) ∑K j=1 esj(x) (2) where K is the number of classes. Since the softmax predicts only one class per time, this is suitable for our case as every sign only belongs to one class. Cross entropy is a proven way for classification problems to define a loss function [29]: J (Θ) = − 1 m∑ m i=1∑ K k=1 y (i) k log p̂ (i) k (3) Now the cost function J (Θ) can be obtained by forming Eq. (3). In this equation, y (i) k is the true label of instance i that belongs to class k. This value is 1 if ith instance belongs to the class k and 0 in other cases. To obtain the gradient vector of class k, it needs to calculate the gradient of the cost function with respect to kth parameter (θ(k)) [29]: ∇θ(k) J (Θ) = 1 m ∑ m i=1 (p̂ (i) k − y (i) k ) x(i) (4) Now using one of the gradient descent family optimizers, the model finds the parameters Θ that minimize the cost function. In fact, these parameters are the filters and other types of learnable variables [29]. EXPERIMENTS AND RESULTS The designed synthetic data generator is capable of gener- ating any number of images with any essential dimension. A total of 80 pixels for both height and width are chosen. In total, 24000 images are included in training the proposed DCNN model. Figure 6 (12) shows the train and valida- tion loss/accuracy over 200 epochs. As seen in Figure 6 around epoch number 200, the model almost converges, and validation loss and accuracy are in an acceptable situation in terms of overfitting. To test the dataset, corresponding classes from the GTSRB are used. In the machine learning field and especially in supervised machine learning, the confusion matrix is considered one of the significant visualization methods for statistical clas- sification tasks [30]. The confusion matrix for our proposed model on the test dataset is depicted in Figure 7. Classes that are more close to each other are confused with similar classes.
  • 7. Recognizing Traffic Signs with Synthetic Data and Deep Learning 7 Figure 6. Model almost convergence and validation, loss, and accuracy. Figure 7. Normalized confusion matrix for the German Traffic Sign Recognition Benchmark (GTSRB) dataset. By referring to the plotted confusion matrix in Fig. 7(12), it is clear that predicting classes 3 and 4 leads to higher errors than others. The first reason refers to the appearance of the two mentioned classes. They are very Table 1. Comparison of the proposed method with other methods according to the German Traffic Sign Recognition Benchmark (GTSRB) benchmark [31]. # Team Method Accuracy % ... ... ... ... 72 Italian-crash Multi Dataset Algorithm 83.08 12 TDC CVOG + CCV + NN (Team 2) 82.67 11 TDC CVOG + CCV + NN (Team 1) 82.37 # Our Method Synthetic data + DCNN 91.91 74 TDC CVOG + ANN (Team 3) 81.80 97 RMULG Subwindows+ETGRAY 79.71 +LIBLINEAR 134 olbustosa HOG_SVM 76.35 ... ... ... ... close to each other. The second reason refers to the GTSRB image size and aspect ratio. Some of the images of this dataset are very small in size and also are non-uniform in height and width ratio, while the generated train dataset is entirely square in size (80 × 80 pixels). On the GTSRB website, recent benchmark competition results can be observed. Some of these results, close to this work result, are listed in Table 1. The main characteristic of the proposed method with other listed methods on the GTSRB website is the training dataset type. Most of them used GTSRB’s own training dataset; however, in this work, the synthetic dataset is generated and used to train the model. Some real-life datasets, such as GTSRB, are biased in terms of distribution among classes; nevertheless, our dataset was evenly distributed (2000 images per class). This may affect the decision-making results. Of course, sometimes this could be intentional because the distribu- tion over classes is not even in real-life situations. For example, the number of priority signs in the city is nor- mally much more than the number of roundabout signage. The obtained results show that the DCNN gives the best results among other image classification methods. Notably, DCNN architecture shows more than 91.91% accuracy in the GTSRB dataset with no view of any real traffic sign image. CONCLUSION Deploying machine learning in industry-level production is required to provide an exclusive dataset that meets the requirements. Providing labeled ground-truth datasets for computer vision tasks is usually expensive, time- consuming, and labor-intensive. Furthermore, there are some cases that create a real dataset that is not safe or prac- tically impossible. Using CAD models is another option, but creating desired models one by one in most cases becomes more expensive than providing real datasets.
  • 8. 8 Avaz Naghipour and Rahim Pasbani To cover the challenge in this paper and develop a syn- thetic dataset for the traffic sign recognition task, a proce- dural method was used. By using computer graphic tools, the proposed method facilitates generating numerous images that are precisely analogous to real-life instances. Moreover, a well-structured DCNN architecture was set up that decently fulfilled the classification task. Without seeing any real data, this classifier could categorize the real-world GTSRB dataset with more than 91.91% accuracy. The provided dataset has more details than the require- ments of the GTSRB dataset. We took many details into account that might not be necessary, but it made the clas- sifier more reliable for complicated situations. Rendered images and real pictures captured by a camera intrinsically contain many dissimilarities. Using synthetic images to train machine learning models requires narrowing this similarity gap. Augmentation and other image processing filters are helpful in enhancing accuracy. Additionally, without augmentation and dropout, overfitting and gen- eralization issues would be bold. For the next research, we decide to utilize this procedure for more complicated tasks like road and street object detection. Clearly, such a procedure requires higher attempts to set up a system that can provide credible rendered images. CONFLICT OF INTEREST The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. AUTHOR CONTRIBUTIONS RP conducted an initial literature review and data collec- tion, performed the experiments, prepared the results, and drafted the manuscript. AN helped in writing-editing and conceptualization, analyzed the result, and contributed to supervision. Both authors read and approved the final manuscript. REFERENCES [1] Li, L., Huang, W., Liu, Y., Zheng, N., Wang, F. (2016). Intelligence Testing for Autonomous Vehicles: A New Approach, IEEE Transac- tions on Intelligent Vehicles, 1(2), 158–166. [2] Gidado, U. M., Chiroma, H., Aljojo, N., Abubakar, S., Popoola, S. I., Al-Garadi, M. A. (2020). A Survey on Deep Learning for Steering Angle Prediction in Autonomous Vehicles, IEEE Access, 8, 163797–163817. [3] Arnold, E., Al-Jarrah, O. Y., Dianati, M., Fallah, S., Oxtoby, D. Mouzakitis, A. (2019). A Survey on 3D Object Detection Methods for Autonomous Driving Applications, IEEE Transactions on Intelligent Transportation Systems, 20(10), 3782–3795. [4] Gjoreski, H., Ciliberto, M., Wang, L., Morales, F. J. O., Mekki, S., Valentin, S., Roggen D. (2018). The University of Sussex-Huawei Locomotion and Transportation Dataset for Multimodal Analytics with Mobile Devices, IEEE Access, 6, 42592–42604. [5] Wang, T., Wu, D. J., Coates A., Ng, A. Y. (2012). End-to-End Text Recognition with Convolutional Neural Networks, Proceedings of the 21st International Conference on Pattern Recognition Tsukuba, 3304–3308. [6] Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazırbas, C., Golkov, V. (2015). FlowNet: Learning Optical Flow with Con- volutional Networks, IEEE International Conference on Computer, 2758–2766. [7] Mayer, N., Ilg, E., Hausser, P., Fischer, P. (2016). A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation, IEEE Conference on Computer Vision and Pattern Recognition, 4040–4048. [8] Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A. M. (2016). The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes, IEEE Conference on Computer Vision and Pattern Recognition, 3234–3243. [9] Handa, A., Patraucean, V., Badrinarayanan, V., Stent, S. Cipolla, R. (2016). Understanding Real World Indoor Scenes with Synthetic Data, IEEE Conference on Computer Vision and Pattern Recognition, 4077–4085. [10] Tsai, C., Tsai, S. Hsu, Y., Wu, Y. (2017). Synthetic Training of Deep CNN for 3D Hand Gesture Identification, International Con- ference on Control, Artificial Intelligence, Robotics Optimization, 165–170. [11] Maldonado-Bascon, S., Lafuente-Arroyo, S., Gil-Jimenez, P., Gomez- Moreno, H., Lopez-Ferreras, F. (2007). Road-Sign Detection and Recognition Based on Support Vector Machines, IEEE Transactions on Intelligent Transportation Systems, 8(2), 264–278. [12] Wali, S. B., Hannan, M. A., Hussain, A., Samad, S. A. (2015). An Automatic Traffic Sign Detection and Recognition System Based on Colour Segmentation, Shape Matching, and SVM, Mathematical Problems in Engineering, 1–11. [13] Kuang, X., Fu, W., Yang, L. (2018). Real-Time Detection and Recog- nition of Road Traffic Signs using MSER and Random Forests, International Journal of Online Engineering, 14(3) 34–51. [14] Ellahyani, A., Ansari, M. E., Jafari, I. E. (2016). Traffic Sign Detection and Recognition Based on Random Forests, Applied Soft Computing, 46, 805–815. [15] Liang, Z., Shao, J., Zhang, D., Gao, L. (2019). Traffic Sign Detection and Recognition Based on Pyramidal Convolutional Networks, Neu- ral Computing and Applications, 32(11), 6533–6543. [16] Shustanov, A., Yakimov, P. (2017). CNN Design for Real-Time Traffic Sign Recognition, Procedia Engineering, 201, 718–725. [17] Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A. (2014). Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition, arXiv:1406.2227. [18] Tsai, C., Tsai, Y., Hsu, S., Wu, Y. (2017). Synthetic Training of Deep CNN for 3D Hand Gesture Identification, International Conference on Control, Artificial Intelligence, Robotics Optimization, 165—170. [19] Peng, X., Sun, B., Ali, K., Saenko K. (2015). Learning Deep Object Detectors from 3D Models, IEEE International Conference on Computer Vision, 1278–1286. [20] Sun, B., Saenko, K. (2014). From Virtual to Reality: Fast Adaptation of Virtual Object Detectors to Real Domains, Proceedings of the British Machine Vision Conference. [21] Su, H., Qi, C. R., Li, Y., Guibas, L. J. (2015). Render for CNN: Viewpoint Estimation in Images using CNNs Trained with Rendered 3D Model Views, IEEE International Conference on Computer Vision, 2686–2694. [22] Satilmis, P., Bashford-Rogers, T., Chalmers, A., Debattista, K. (2017). A Machine-Learning-Driven Sky Model, IEEE Computer Graphics and Applications, 37(1), 80–91. [23] Jung, J., Lee, J. Y., Kweon, I. S. (2019). One-Day Outdoor Photometric Stereo using Skylight Estimation, International Journal of Computer Vision, 127(8), 1126–1142.
  • 9. Recognizing Traffic Signs with Synthetic Data and Deep Learning 9 [24] Bilal, A., Jourabloo A., Ye, M., Liu, X., Ren, L. (2018). Do Convolu- tional Neural Networks Learn Class Hierarchy?, IEEE Transactions on Visualization and Computer Graphics, 24(1), 152–162. [25] LeCun, Y., Bottou, L., Bengio Y., Haffner, P. (1998). Gradient-Based Learning Applied to Document Recognition, Proceedings of the IEEE, 86(11), 2278–2324. [26] Bjorck, J., Gomes, C., Selman, B., Weinberger, K. Q. (2018). Under- standing Batch Normalization, Advances in Neural Information Pro- cessing Systems, 7694–7705. [27] Krizhevsky, A., Sutskever, I., Hinton, G. E. (2017). ImageNet Classi- fication with Deep Convolutional Neural Networks, Communications of the ACM, 60(6), 84–90. [28] Zhang, Z., Wang, H., Liu S., Xiao, B. (2018). Consecutive Convolu- tional Activations for Scene Character Recognition, IEEE Access, 6, 35734–35742. [29] Géron, A. (2019). Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, O’Reilly Media, 2thed. [30] Stehman, S. V. (1997). Selecting and Interpreting Measures of The- matic Classification Accuracy, Remote Sensing of Environment, 62(1), 77–89. [31] German Traffic Sign Benchmarks, https://benchmark.ini.rub.de/gts rb_results_ijcnn.html.