Under the guidance of
Mrs.G.Velvizhi M.Tech.,(DCS)
Assistant Professor
Department of CSE
RAAKCET.
DETECTION OF ANDROID MALWARES USING RECURRENT
NEURAL NETWORKS
TEAM MEMBERS REGISTER NUMBER
OBJECTIVE OF THE PROJECT
Title: Detection of Android Malwares using Recurrent
Neural Networks
The main goal of this project is to develop an efficient
deep learning model to detect the android malwares from
the genuine files.
2
DOMAIN OF THE PROJECT
Domain: Deep Learning
Explanation: Deep learning is a class of machine learning
algorithms that uses multiple layers to progressively
extract higher-level features from the raw input.
Deep learning is an important element of data science,
which includes statistics and predictive modelling.
3
DOMAIN OF THE PROJECT
It is extremely beneficial to data scientists who are tasked
with collecting, analysing and interpreting large amounts
of data.
Deep learning makes this process faster and easier.
Deep Learning can apply to the complex problems such as
image classification, natural language processing and
speech recognition.
4
LITERATURE SURVEY
5
S.No PAPER SUMMARY DATASETS
[1]
Nan Zhang et al., “Deep
learning feature exploration for
Android malware detection”,
Applied Soft Computing
Journal, 102 (2021) 107069.
They proposed TC-Droid, an automatic
framework for Android malware detection
based on text classification method that feeds
on the text sequence of APPs analysis reports
generated by AndroPyTool, applied on the
convolutional neural network (CNN) under
original report text.
Four datasets
namely D1,
D2, D3 and
D4 derived
from the real
world apps.
[2]
Stuart Millar et al., “Multi-
view deep learning for zero-
day Android malware
detection”, Journal of
Information Security and
Applications, 58 (2021)
102718.
They experimented using four stages. First
covers hyperparameter tuning for the opcodes
CNN and the APIs CNN. The second is an
analysis of features learned by the permissions
network. The third evaluates the single and
multi-view settings for malware detection to
prove our model is effective in a conventional
detection setting. The fourth is a series of zero-
day experiments to recreate a challenging
scenario where a malware detector is tested
against a new family of malware it has never
encountered before.
Malgenome,
Intel Security,
Drebin,
AMD
LITERATURE SURVEY
6
S.No PAPER SUMMARY DATASETS
[3]
Santosh K. Smmarwar et al.,
“An optimized and efficient
android malware detection
framework for future sustainable
computing”, Sustainable Energy
Technologies and Assessments,
54 (2022) 102852.
They proposed an Optimized and efficient
Ensemble Learning-based Android Malware
Detection framework, called “OEL-AMD”
that employs statistical feature engineering to
eliminate non-informative features as well as
encode statistical characteristics and Binary
Grey Wolf Optimization (BGWO) based
meta-heuristic feature selection is used to
prepare optimal feature sets for static and
dynamic layers. Finally, different base
learners are trained using hyper-parameters
tuning to boost the inductive reasoning
capability of the ensemble model for
classification and an aggregated performance
is computed.
CICInvesAnd
Mal2019
[4]
Vikas Sihag et al., “De-LADY:
Deep learning based Android
malware detection using
Dynamic features”, Journal of
Internet Services and
Information Security (JISIS),
volume: 11, number: 2 (May
2021), pp. 34-45.
They proposed De-LADY (Deep Learning
based Android malware detection using
DYnamic features) an obfuscation resilient
approach. It utilizes behavioral characteristics
from dynamic analysis of an application
executed in emulated environment.
Four datasets
were derived
from different
sources.
EXISTING SYSTEM
In the existing approach, a fast Android malware detection framework based
on the combination of multiple features: FAMD (Fast Android Malware
Detector).
Initially, permissions and Dalvik opcode sequences from samples to construct
the original feature set has been extracted.
Then, the Dalvik opcodes are preprocessed with the N-Gram technique and
the FCBF (Fast Correlation-Based Filter) algorithm based on symmetrical
uncertainty is employed to reduce feature dimensionality.
Finally, the dimensionality reduced features are input into the CatBoost
classfier for malware detection.
The dataset DS-1, which are collected from various resources and the
benchmark dataset Drebin were used in this experiment.
7
DRAWBACK OF EXISTING SYSTEM
This approach is inadequate in detecting new emerging
malicious applications.
9
PROPOSED SYSTEM
A recurrent neural network (RNN) is a class of artificial neural
networks where connections between nodes can create a cycle,
allowing output from some nodes to affect subsequent input to the
same nodes.
Recurrent Neural Networks enable you to model time-dependent and
sequential data problems, like stock exchange prediction, artificial
intelligence and text generation.
Models under the Recurrent Neural Network are:
Long Short Term Memory (LSTM)
Gated Recurrent Unit (GRU)
10
PROPOSED SYSTEM
LONG SHORT TERM MEMORY (LSTM)
LSTM is a kind of recurrent neural network (RNN) design applied in
deep learning field.
LSTM has feedback connection that is unrelated to standard feed
forward neural networks.
LSTM unit consists of a cell, an input gate, an output gate and a forget
gate. The cell recollects values over arbitrary time breaks and
therefore the three gates control the data flow into and out of the cell.
11
PROPOSED SYSTEM
Input gate (i): The input gate computes the sum of input that is
allowed to pass through it and is calculated by:
i = σ (xt Ui + st-1 Wi)
Forget gate (f): The forget gate helps the network to choose what and
how much information from the earlier level to transfer to the
succeeding level. It is given by:
f = σ (xt Uf + st-1 Wf)
Output gate (o): The output gate, describes the output passed at each
step of the network. It is given by:
o = σ (xt Uo + st-1 Wo) 12
PROPOSED SYSTEM
GATED RECURRENT UNIT (GRU)
GRU is a type of deep learning algorithm that is enhanced from the
LSTM algorithm to minimize the complication of the algorithm by
using update gate and reset gate.
The update gate is used to regulate hidden state volume to be
forwarded to the next state.
The reset gate is used to define the consequence of the previous
hidden state information.
13
PROPOSED SYSTEM
GATED RECURRENT UNIT (Contd.)
Update Gate (z): It determines how much of the past information
needs to be passed along into the future. It is similar to the Output
Gate in an LSTM recurrent unit.
z = σ (xt Uz + st-1 Wz)
Reset Gate (r): It defines how much of the past information to forget.
It is similar to the combination of the Input Gate and the Forget Gate
in an LSTM recurrent unit.
r = σ (xt Ui + st-1 Wr)
14
SYSTEM REQUIREMENTS
HARDWARE REQUIREMENTS
Processor Intel Core i3 7th Generation
RAM 8 GB
Memory 64 GB
15
SOFTWARE REQUIREMENTS
Language Python 3.5
Library Tensorflow 2.1.0
Keras 2.2.4
IDE Google Colaboratory