2. Malwise—An Effective and Efficient
Classification System for Packed and
Polymorphic Malware
GUIDED BY,
Mrs.ASHITHA.S.S
Asst.Professor
IT Department
LMCST
PRESENTED BY,
FEBIN JOY KAVIYIL
S7 CS
LMCST
febinjoykaviyil@gmail.com
3. • Significant threat
• Prominent in last few years
• Malware detection – a field with
challenging research opportunities
• Anti-Malware systems
• right from the beginning
• rapid advancement
Introduction
3
4. • Initial techniques involved the use
of controlled environments
• Next or current phase involves the
use of malware databases
Introduction
4
5. • Predominant technique to detect
malware instance is using malware
signatures
• Database comprises of identified
signatures
• Efficient but not effective against
malware variants
• Malwise proposes a new technique for
signature generation
Introduction
5
6. • Database creation
oChallenging
oNeeds access to set of known malware
oNeeds constant updating
• Packing
oAdditional code packing to hinder analysis
o86% malwares are packed
• Signature generation
• Classification
oBy comparing signatures
Introduction
6
7. • Database creation
oFlow graph based signatures are
stored
• Unpacking
o Using entropy analysis
• Signature generation
oControl flow graph based
• Classification
oUsing string edit distances
Introduction
7
9. • Using entropy analysis
• Entropy is the amount of information
contained in a block
• Entropy of a block is given by
• Compressed and encrypted data have
high entropy
• In earlier systems controlled emulators
where used to find OEPs-Original Entry
Point
• This was efficient but ineffective
UNPACKING
9
10. • In malwise the concept is extended
by checking entropy from time to
time
• If entropy of the analyzed data is low
we can assume that no more
encrypted or compressed data is
present and hence stop unpacking
Unpacking
SAMPLE
ENTROPY
HIGH ?
UNPACK
FINISH
UNPACKING
NO
YES
10
12. • Using Speculative DE-assembly
• Procedures are identified
• Incorrectly identified procedures
are eliminated
• Intermediate representation is
formed
• Weights are assigned with each
signature
De assembly
Intermediate
representation
Control flow
graph
Signature
12
13. Exact Flow graph matching
• Only exact replicas or isomorphisms are
identified
• Signatures are created by ordering the
nodes of the control flow graph in depth
first order
• Signature will consist of a list of graph
edges for ordered nodes
• Efficient
• Matching done using dictionary lookup
• Weight is found by
Now signatures can be generated for the two flowgraph matching methods available..
Bi-No of basic blocks in binary
Depth first ordered flowgraph and its signature
Signature generation
13
14. Approximate Flowgraph matching
• Approximate matches of
control flow graph are
considered
• Enables detection of
Variants
• Structuring is used to
generate signatures
• The output will be a string
character tokens
representing high level
structured constructs
• Weight is found by
Control flowgraph->High level structured graph->SIGNATURE
Si - Signature of S in binary
Signature generation
14
15. Now to obtain the final signature the obtained string is converted to binary
Signature generation
15
16. • Done using Set similarity
• Database will be comprised of signatures of
known malware
• The input will be a binary
• A similarity is constructed between the
binary’s flowgraph strings and each set of
flowgraphs associated with malwares in the
database
• Complex mechanism
• Considers the weights associated with the
signatures as well
New sample
Non malicious Malicious
16
17. Basic principle for classification
• The process results with a
similarity value for each set of
signatures in the malware
• Value ranges between 0 and 1
• Value > 0.95 => Isomorphs
• Value < 0.6 => No similarity
• 0.6 > Value < 0.95 => Variant
• The threshold values were
fixed after a thorough pilot
study
Classification
SAMPLE DATABASE
SIMLARITY CHECK
> 0.95
ISOMORPHIC
> 0.6 VARIANT
NON MALICIOUS
17
19. OEP
• More efficient and effective than any incorporated technique
• The table shows Malwise’s performance with some common
softwares
19
20. Classification
• Detection rate was rounded to be about 57.8%
• Earlier approaches achieved maximum up to 39.6
• Resilience to false positives
• Less than 0.61% of the samples were incorrectly
identified as malwares
• At least 10 procedures should be present in the
flowgraph for performing approximate flowgraph
matching.
• For exact flowgraph matching at least 15 procedures
should be present
Evaluation
20
21. ISSUE EARLIER APPROACH MALWISE
UNPACKING USING CONTROLLED
ENVIRONMENTS
USING ENTROPY ANALYSIS
SIGNATURE
GENERATION
BASED ON BYTE LEVEL
REPRESENTARION
BASED ON CONTROL FLOW
GRAPH
DATABASE SOURCE CODE DEPENDENT
SIGNATURES
CONTROL FLOW DEPENDENT
SIGNATURES
CLASSIFICAION EXACT MATCHING ONLY EXACT MATCHING AND
APPROXIMATE MATCHING
21
22. • Malwares and malware variants can be identified using similarity in
Control flow graph
• Unpacking using Entropy analysis proved more efficient
• MALWISE proves to be a more efficient and effective substitute for
the existing anti-malware systems in internet gateways or so called
anti-viruses on our desktops
• Not yet implemented as anti-malware system
• However SIMSEER(http://www.simseer.com) and
BUGWISE(http://www.bugwise.com) uses the same technique
22
New malware detection method
Could be asubstitute fo existing systems
Studies show that in the years 2007-13 twice as much malware as in the past 20 years have been detected
Hackers are more enthusiastic and keep coming up with new ways to harm our systems
Signature-invariant characters or patterns which that uniquely identifies a program . It is formed from the byte level representation.
Variant – those malwares created by slight editing in the source code.The source code and working may be similar but the byte level representation will be entirely different
where p(i) is the probability of the ith unit of information in event x’s sequence of N symbols
STRUCTURING IS THE PROCESS OF RECOVERING HIGH LEVEL STRUCTURED CONTROL FLOW FROM A CONTROL FLOW GRAPH
Failure in pepsin was due to unused encrypted data left in the process image