In the targeted email attacks, it is often used the documentation file embedded with the execution files. To detect this kind of malicious documentation file, researching with the malcode detection approach has been focused. However, because the attacker can write the arbitrary code, thus it is always behind of the attacker to find the unknown malcode by focusing the traditional malcode detection methods.
In this talk I will introduce a different analytical approach compared to the more traditional malcode detection approach to detecting targeted email attacks by focusing on structural analysis of file formats. I will explain the ability to detect malware solely on file size and introduce o-checker which has implemented a general detection method that does not rely on the content of malicious code.
Yuuhei Ootsubo
Started to be interested in programming around 1987.
2005 Employed by the National Police Agency.
2007 National Police Agency Public Safety Information Technology Counter Crime Division.
2001 National Police Agency Information Communication Division Information Technology Analysis Division.
2012 Assigned to The National Information Security Center.
o-checker : Malicious document file detection tool - Malicious feature can be detected based on file size by Yuuhei Ootsubo
1. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
o-checker :
Malicious document file detection tool
- File sizes tell whether the document file is malicious or not -
Yuhei Otsubo
1
2. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Agenda
2
1. Background
2. Structure of malicious document files
3. Overview of o-checker
4. Detection mechanism
5. Demo
6. Application
7. Conclusion
3. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
1. BACKGROUND
3
4. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Increase in targeted email attacks (1/3)
4http://www.symantec.com/threatreport/topic.jsp?aid=industrial_espionage&id=malicious_code_trends
1. Background
5. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Increase in targeted email attacks (2/3)
5
Number of security advisories on targeted email attacks to government
institutions
※GSOC:Government Security Operation Coordination team
1. Background
6. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Increase in targeted email attacks (3/3)
6
5.4% → 33%
research commissioned by METI(Ministry of Economy, Trade and Industry), (2007,2011)
Rate of companies that had experienced targeted attacks
1. Background
2007 2011
7. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Example of target attacks
7
secret
Send emails
with malware
Open an
attachment
Infected
with malware
Network of companies
and private individuals
Attacker Victim
Data
exfiltration
1
2
3
④
1. Background
8. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
File types of targeted email attacks
8
Trend of the extension of the attachment of targeted email attacks,Trend Micro Japan(2013)
http://is702.jp/special/1431/
Executable files :59%
Document files :41%
1. Background
9. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
2. STRUCTURE OF MALICIOUS
DOCUMENT FILES
9
10. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
文書ファイル
Exploit
Shellcode
Malware executable file
Decoy file(for display)
Structure of malicious document files
10
Abuses a browsing
software
vulnerability
Creates a malware
executable file and
a decoy file then
executes/opens them
Encoded by various
ways.
No relation with
document contents
2. Structure of malicious document files
11. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Example of malicious document files
11
2. Structure of malicious document files
Bitmap View Hex View
12. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Exploit(1/2)
12
• Object 29
• A JavaScript action
• Its script is stored in object
31.
2. Structure of malicious document files
• Object 31
• A JavaScript script(Exploit)
• Flate compression method
13. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Exploit(2/2)
13
After decoding the Flate compression data
↓Shellcode encoded by escape() function
2. Structure of malicious document files
14. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Shellcode
14
2. Structure of malicious document files
15. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Shellcode
15
2. Structure of malicious document files
Decoder : 40 Bytes
Shellcode is encoded with
printable characters
16. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Executable file(1/2)
16
Encoded executable file
2. Structure of malicious document files
17. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Executable file(2/2)
17
After decode
2. Structure of malicious document files
18. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Decoy file
18
2. Structure of malicious document files
19. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
3. OVERVIEW OF O-CHECKER
19
20. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Utilizes particular patterns found
through static/dynamic analysis.
Document file
Exploit
Shellcode
Malware executable file
Decoy file(for display)
traditional methods of malicious document detection
20
malicious code
3. Overview of o-checker
Traditional methods
• Particular patterns can be changed
by encode.
• There are cases when exploits only
work on specific environments and
dynamic analysis is difficult.
Problems
21. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Vicious circle
21
If a detection method focuses on codes that can be written
arbitrarily by attackers
Vicious circle
Creates a signature based on
malicious code.
Defender
Changes malicious code so that it can
avoid detection.
Attacker
3. Overview of o-checker
22. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Breaking the vicious circle
22
Vicious circle
Creates a signature based on
structural analysis of file formats.
Defender
Changes malicious code so that it can
avoid detection.
Attacker
Changes file format syntax so that it
can avoid detection.
3. Overview of o-checker
If a detection method focuses on file formats that cannot be written
arbitrarily by attackers
23. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Characteristic of malicious document ~based on file format
23
A document file is an aggregate of pictures, text, and auxiliary data.
There is no data which a browsing software does not process.
Whether a browsing software processes data or not
reason
Exploit
for abusing a browsing software
vulnerabilities
Shellcode exploit code includes shellcode
Executable file -
If a browsing software parses it, the
contents will be displayed garbled or
a browsing software will malfunction.
Decoy file -
If a browsing software parses it, the
contents will be displayed garbled or
a browsing software will malfunction.
Each data has its purpose.
3. Overview of o-checker
24. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Detection mechanism (simplified)
All of structures match
directly to contents
Contents Structures
A part of structures is
mismatched to contents
Contents Structures
Normal document files Malicious document files
Detection based on structural analysis
of document formats
3. Overview of o-checker
25. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Performance of o-checker
25
• High speed and high detection rates
detection rates:98.9% Average execution time:0.3s
• Almost maintenance-free
Updating
frequency
Remarks
Anti-virus software Every day 200,000 new type of malware per day
(2012)※
o-checker
Almost
none
It needs update, if a new document
file format comes out.
msanalysis.py
input
Documentation files
embedded with
executable files
pdfanalysis.py
input
Alert
3. Overview of o-checker
※:http://www.kaspersky.com/about/news/virus/2012/2012_by_the_numbers_Kaspersky_Lab_now_detects_200000_new_malicious_programs_every_day
26. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
4. DETECTION MECHANISM
26
27. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Inspection items
27
(A) Attached data after EOF
(B) Anomaly file size
(C) Data not referred from FAT
(D) Free sector in the last sector
(E) Unaccounted-for sector
(F) Unaccounted-for section
(G) Unreferenced object
(H) Camouflaged stream
Rich Text
CFB
PDF
o-checker
4. Detection mechanism
28. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Structure of Rich Text files
28
{¥rtf
Hello!¥par
This is some {¥b bold} text.¥par
}
RTF files are usually 7-bit ASCII plain text. RTF consists of
control words, control symbols, and groups. ※
※:wikipedia
an example, RTF code
The signature that
indicates a RTF file
(}) corresponding to the first ({) is located
at the end of a file (EOF).
4. Detection mechanism
29. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
(A) Attached data after EOF
29
{¥rtf
Hello!¥par
This is some {¥b bold} text.¥par
}
MZ・
ク
・ コ エ ヘ!ク
L!This program cannot be run in
DOS mode.$ 猝t讀ォオ、ォオ、ォオュモ
招ヲォオュモ楫喚オ、ォオ0ェオュモ卸・オュモ匏ウォ
Rich、ォオ
PE d・ ヤノ[J ・ "
An executable
file is
inserted at
the end of a
file in order
not to affect
the display.
a RTF file embedded with an executable files.
Data exists after EOF.
4. Detection mechanism
30. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
CFB (doc,xls,ppt,jtd/jtdc)
30
Root Storage
Storage 1 Storage 2
Storage 3
Stream A
Stream B Stream C
[MS-CFB] – v20130118 Compound File Binary Format (http://msdn.microsoft.com/en-us/library/dd942138.aspx)
In file system
Stream → File
Storage → Folder
CFB:Compound File Binary
• A layered structure can be stored in one file.
• An archive format which Microsoft Corp. developed
• It is used by Microsoft Word etc.
doc,ppt,xls,jtd/jtdc※
4. Detection mechanism
※:jtd and jtdc are used by Ichitaro (一太郎),
a Japanese word processor developed by
JustSystems Corp.
31. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Structure of CFB files
31
Header
FAT0
Directory Entry
Stream A
Stream A
Free Sector
Stream B
1
2
3
4
5
Physical Structure
-2
-2
3
-2
-1
-2
Directory Entry
index
sector
Stream Name:a.txt
Size:696 Index:2
Stream Name:b.txt
Size:318 Index:5
Storage Name:root
Size:- Index:-
FAT
(File Allocation Table)
4. Detection mechanism
32. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Physical structure of CFB
32
Header
FAT0
Directory Entry
Stream A
Stream A
Free Sector
Stream B
1
2
3
4
5
Physical Structure
512 Byte
(512 or 4096) x N Byte
FileSize = 512 + (512 or 4096) x N
= 512 x M
The file size of a regular CFB file is
certainly a multiple of 512.
4. Detection mechanism
33. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
(B) Anomaly file size
33
Header
FAT0
Directory Entry
Stream A
Stream A
Free Sector
Stream B
1
2
3
4
5
Physical Structure
-2
-2
3
-2
-1
-2
FAT
(File Allocation Table)
Directory Entry
malware6
7
The size of the file except a header is not
a multiple of the size of sector.
If the file size is divided by 512, the
remainder will come out.
Stream Name:a.txt
Size:696 Index:2
Stream Name:b.txt
Size:318 Index:5
Storage Name:root
Size:- Index:-
4. Detection mechanism
34. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
(C) Data not referred from FAT
34
Header
FAT0
Directory Entry
Stream A
Stream A
Free Sector
Stream B
1
2
3
4
5
Physical Structure
-2
-2
3
-2
-1
-2
Directory Entry
malware6
7
The file exceeds the area
-1
The area which can be referred to by FAT:
(The number of sectors of FAT)×128×512 (Byte)
Stream Name:a.txt
Size:696 Index:2
Stream Name:b.txt
Size:318 Index:5
Storage Name:root
Size:- Index:-
FAT
(File Allocation Table)
?
4. Detection mechanism
35. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
(D) Free sector in the last sector
35
Header
FAT0
Directory Entry
Stream A
Stream A
Free Sector
Stream B
1
2
3
4
5
Physical Structure
-2
-2
3
-2
-1
-2
Directory Entry
malware6
The sector corresponding to the end of the
file (n-th sector) is a free sector.
-1
When the size of sector is 512,
n = (file size-512)/512
n
Stream Name:a.txt
Size:696 Index:2
Stream Name:b.txt
Size:318 Index:5
Storage Name:root
Size:- Index:-
FAT
(File Allocation Table)
4. Detection mechanism
36. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
(E) Unaccounted-for sector
36
Header
FAT0
Directory Entry
Stream A
Stream A
Free Sector
Stream B
1
2
3
4
5
Physical Structure
-2
-2
3
-2
-1
-2
Directory Entry
malware6
There is a sector which cannot be classified
into FAT (DI-FAT and mini-FAT are included),
DE, stream and free sector.
-2
Stream Name:a.txt
Size:696 Index:2
Stream Name:b.txt
Size:318 Index:5
Storage Name:root
Size:- Index:-
FAT
(File Allocation Table)
4. Detection mechanism
37. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
PDF document
Structure of PDF:Physical structure
Comment (Header)
Body
Cross-reference table
Trailer Comment(EOF)
Sequence of indirect
objects (fonts, pages and
sampled images)
1 0 obj
2 0 obj
n 0 obj
End-of-file marker
x 0 obj <</R2 /P-64 /V 2 /O
(dfhjaklgk… …>>
A PDF file is an aggregate of many objects(numeric, string, a
sequence of bytes etc.)
4 elements
4. Detection mechanism
38. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Structure of PDF:Types of objects
38
Basic Objects
(A) Numeric
(B) String
(C) Name
(D) Boolean
(E) Null
Composite Objects
(F) Array
(G) Dictionary
Others
(H) Stream(a sequence of bytes)
(I) Indirect(referring to other objects)
(PDF32000-1:2008 7.3)
4. Detection mechanism
39. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Structure of PDF:Stream filter
39
Filter Name Description
/ASCIIHexDecode Decodes data encoded in an ASCII hexadecimal representation, reproducing
the original binary data.
/ASCII85Decode Decodes data encoded in an ASCII base-85 representation, reproducing the
original binary data.
/LZWDecode Decompresses data encoded using the LZW adaptive compression method,
reproducing the original text or binary data.
/FlateDecode Decompresses data encoded using the zlib/deflate compression method,
reproducing the original text or binary data.
/RunLengthDecode Decompresses data encoded using a byte-oriented run-length encoding
algorithm, reproducing the original text or binary data.
/CCITTFaxDecode Decompresses data encoded using the CCITT facsimile standard, reproducing
the original data.
/JBIG2Decode Decompresses data encoded using the JBIG2 standard, reproducing the
original monochrome image data.
/DCTDecode Decompresses data encoded using a DCT technique based on the JPEG standard,
reproducing image sample data that approximates the original data.
/JPXDecode Decompresses data encoded using the wavelet-based JPEG2000 standard,
reproducing the original image data.
(PDF32000-1:2008 7.4)
Stream filters indicate how to decode stream data. The standard filters
are summarized in the following table.
4. Detection mechanism
40. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Structure of PDF:Document structure
40
Trailer
Document
information
Document catalog
Outline
hierarchy
Page tree
Page Page
Content stream Annotations Content stream Thumbnail image
:Object
Structure of a PDF document
:Link
By following the link from trailer, all objects
can be referred to.
4. Detection mechanism
41. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Structure of PDF:Encryption
41
Structure of a PDF document enctypted
Encryption applies to almost all strings and streams in the PDF file.
Leaving the other object types unencrypted allows random access to the
objects within a document. (except for the object stored in ObjStm)
Trailer
Document
information
Document catalog
Outline
hierarchy
Page tree
Page Page
Content stream Annotations Content stream Thumbnail image
:Object
:Link
4. Detection mechanism
42. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Structure of PDF:ObjStm (Object Streams)
42
ObjStm (Object Streams) are introduced in
PDF 1.5. The purpose of ObjStm is to allow
indirect objects other than streams to be
stored more compactly by using the facilities
provided by stream compression filters.
( PDF32000-1:2008 7.5.7)
Packing Compressing Encryption
4. Detection mechanism
43. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
(F) Unaccounted-for section
43
PDF Document
Comment(Header)
Body
Cross-reference table
Trailer
Comment(EOF)
Executable file
Classifying objects into four
elements from the head of a
file, there is data which
cannot be classified.
4. Detection Mechanism
44. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
(G) Unreferenced object
44
Executable file
A PDF file embedded with a executable files
When an executable file is inserted as an object in disregard of
document structure, it is often unreferenced.
Executable file :Object embedded with
an executable file
Trailer
Document
information
Document catalog
Outline
hierarchy
Page tree
Page Page
Content stream Annotations Content stream Thumbnail image
:Object
:Link
4. Detection mechanism
45. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Executable file
(H) Camouflaged Stream
45
Camouflaged Filter
Putting to the end of Streams
Regular Stream EOD
End-of-data marker
Data used for decoding
(Decoding is successful.)
Data which is not used for decoding
When the filter is FlateDecode, DCTDecode or JBIG2Decode,
entropy
Plain Text small
FlateDecode big
Execution file big
An attacker camouflages as the
object is using the filter of which
value of entropy is similar to the
value of entropy of executable
files.
(Decoding goes wrong.)
4. Detection mechanism
46. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Malicious documents Clean documents
File type Extension Quantity
Average
size(KB)
Quantity
Average
size(KB)
Rich Text rtf 98 266.5 199 516.2
doc 36 252.2 1,195 106.1
CFB xls 49 180.4 298 191.7
jtd/jtdc 17 268.5 - -
PDF pdf 164 351.2 9,109 101.7
Total 364 291.8 10,801 322.7
Experiment
46
Document files used for
targeted email attacks
from 2009 to 2012※1
Clean document files
classified according to
contagio a malware dump
site※2
※1:Rich Text by which the extension was camouflaged by doc is
counted as rtf.
※2:http://contagiodump.blogspot.jp/2013/03/16800-clean-and-11960-
malicious-files.html
4. Detection mechanism
47. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Detection rate of o-checker
47
99.0%
77.5%
90.2%
97.1%
96.1%
49.4%
43.9%
63.4%
99.0%
98.0%
99.4%
Rich Text
CFB
PDF
o-checker
4. Detection mechanism
(A) Attached data after EOF
(B) Anomaly file size
(C) Data not referred from FAT
(D) Free sector in the last sector
(E) Unaccounted-for sector
(F) Unaccounted-for section
(G) Unreferenced object
(H) Camouflaged stream
49. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Output of o-checker
49
C:¥tmp>pdfanalysis.py a.pdf
00000000-00000008:comment,
00000009-0000000F:comment,
00000010-00000110:obj 25 0 old(not used)
00000111-00000197:obj 26 0 old(not used)
:
00003622-000036B0:trailer
000036B1-000036C2:startxref 00003617
000036C3-000036C9:comment,
000036CA-0000E9E2:unknown
0000E9E3-0000E9E9:comment,
0000E9EA-0000EAEA:obj 25 0 ObjStm [7, 8, 13]
:
0001209D-000120A3:comment,
000120A4-000120A7:unknown
FFFFFFFF-FFFFFFFF:obj 7 0 xref from None
FFFFFFFF-FFFFFFFF:obj 8 0 xref from None
:
Offset address Classification result
Decoy Document
ObjStm
5. Demo
50. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Judgment option
50
C:¥tmp>pdfanalysis.py a.pdf -j
00000000-00000008:comment,
00000009-0000000F:comment,
00000010-00000110:obj 25 0 old(not used)
00000111-00000197:obj 26 0 old(not used)
:
:
0001209D-000120A3:comment,
FFFFFFFF-FFFFFFFF:obj 7 0 xref from None
FFFFFFFF-FFFFFFFF:obj 8 0 xref from None
:
Malicious!
“-j” is
a judgment option.
The three judgment types
“Malicious!”,
“Suspicious!” or
“None!”
will be shown at the end
of an output.
5. Demo
52. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
6. APPLICATION
52
53. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Application to NIDS
53
Network cable
NIDS
o-checker
Packet
Capture
Reconstruct
e-mails
Alert
o-checker can be introduced into an existing system
without updating.
6. Application
54. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
NIDS
Problems of the application
54
o-checker
Alert
Failure to recover
e-mails
~2%
Broken document files
~2%
False positive
up ~2%
False positives increases because of the performance
of e-mail recovering software.
Network cable
Packet
Capture
Reconstruct
e-mails
6. Application
55. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
NIDS
Enhanced o-checker
55
new
o-checker
Alert
Deselection of broken document files based on
structural analysis of file formats
Network cable
Packet
Capture
Reconstruct
e-mails
Failure to recover
e-mails
~2%
Broken document files
~2%
False positive
up ~0%
6. Application
56. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
o-checker
Application to android (1)
56
Mail server
Manual delete
Manual check
58. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
E-mailer
o-checker
Application to android (2)
58
Mail server
Auto delete
Auto check
59. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
7. CONCLUSION
59
60. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
Conclusion
60
• Tradition detectional method reaches
its limit.
• Structural analysis of file formats
is effective to detect malicious
document files that have embedded
executable files.
• Various application of o-checker is
possible. Because it can detect
malicious documents by high
probability at high speed.
7. Conclusion
61. CODE BLUE Feb.17 (Mon) - 18 (Tue), 2014 Tokyo, Japan
61
Thank you!