2. Introduction
➢ Convolutional neural networks
(CNNs) have been tremendously
successful in computer vision, e.g.
image recognition and object
detection
➢ But convolutions are not able to
express non-linear behaviour, they
can do so using an activation
function but even though it can only
provide pointwise non-linearity.
Hence, the paper used kervolution
which uses the kernel trick to solve
this.
3. Recent Approaches to the Problem
A minimal character based CNN architecture based model:
https://arxiv.org/ftp/arxiv/papers/1901/1901.06032.pdf
https://www.analyticsvidhya.com/blog/2020/10/what-is-the-convolution
al-neural-network-architecture/
4. Our Implementation to the Problem
● We used Kervolutional layers to deploy our model using PyTorch.
● When Kernel type is linear, it’s a usual CNN, but in our implementation we
changed our Kernel types across Polynomial and Gaussian to introduce
non-linearity which in turn, gave better performance.
6. Baseline Model: Kervolution
● The ith
element of the
convolution output f(x) is
calculated as a simple inner
product between vector x(i)
and
vector w.
● Whereas the kervolution is
calculated via the kernel trick
which essentially maps the
vector in a non linear space
and then takes the inner
product
Convolution
Kervolution
7. ● Kernel function takes kervolution to non-linear space, thus
the model capacity is increased without introducing extra
parameters.
● Kervolution measures the similarity by match kernels, which
are equivalent to extracting specific features.
● One of the advantages of kervolution is that the non-linear
properties can be customized without explicit calculation.
Models Capacity and features
8. Polynomial Kervolution
● To show the behavior of polynomial Kervolution, the learned filters of
LeNet-5 trained for MNIST are visualized i which contains all six channels of
the first Kervolutional layer using polynomial kernel (dp = 3, cp = 1)
9. Continued..
● For a comparison, the learned filters from CNN are also presented. It is
interesting that some of the learned filters of KNN and CNN are quite similar,
This verifies our understanding of polynomial kernel, which is a combination
of linear and higher order terms.
● This also indicates that polynomial kervolution introduces higher order
feature interaction in a more flexible and direct way than the existing
methods.
10. Gaussian Kervolution
The Gaussian RBF kernel extends kervolution to infinite dimensions.
where γg
(γg
∈ R+
) is a hyperparameter to control the smoothness of
decision boundary.
15. Conclusions & Future Work
● Kervolution generalise convolution to non-linear space.
● Extends convolutional neural networks to kervolutional Neural
network.
● Not only retains the advantages of convolution( sharing weights and
equivalence to translation) but also enhances model capacity and
captures higher order interactions of features, via patch-wise kernel
functions without introducing additional parameters.
16. Future Work: Continued...
● With careful kernel chosen, the performance of CNN can be
significantly improved on MNIST, CIFAR, and ImageNet dataset
via replacing convolutional layers by kervolutional layers.
● Due to the large number of choices of kervolution, we cannot
perform a brute force search for all the possibilities.
● We expect the introduction of kervolutional layers in more
architectures and extensive hyperparameter searches can further
improve the performance.
17. Individual Contribution & Code
Sahasra Ranjan
(190050102)
Worked on the Kervolution Neural Networks and implemented the
training procedure on GPU using pytorch.
Paarth Jain (190050076) Worked on the training procedure and generated results for
various hyperparameters and network settings
Atul Verma (19B090004) Prepared presentation and project report
Tirthankar Adhikari
(190070003)
Debugging the implemented code and preparing presentation
Shrey Gupta (190100112)
18. Github Repository Link for Final code, Readme Files and Results:
GitHub Repo: https://github.com/Lhisoka/GNR-638-Project
Project PPT:
https://docs.google.com/presentation/d/1-VgwYgyPi4UW1CoTHDgVi7EISm5AbeZPVu62b
CwqDsg/edit?usp=sharing
Note: All of our code is based on the following documentation:
https://openaccess.thecvf.com/content_CVPR_2019/papers/Wang_Kervolutiona
l_Neural_Networks_CVPR_2019_paper.pdf
19. Given the recent rapid development in this field, there
are a lot more remaining to be explored